We created Nevermined to help to solve big problems related to digital ecosystems
Nevermined was created to offer its users the ability to build digital ecosystems where untrusted parties can share and monetize their data in a way that is efficient, secure and privacy-preserving. The intention of that is to help to resolve very big problems in environments where collaboration between participants is necessary.
From 30,000 feet, Nevermined provides the following capabilities:
- Data Sharing — enabling the sharing and access of digital assets between untrusted parties in the data ecosystem
- Data In-Situ Computation — allowing the execution of models and algorithms without moving the data
- Marketplace and Data Catalog — the user interfaces glueing it together and facilitating user interactions with the rest of the data ecosystem
- Data Monetization and Incentives — facilitating the monetization of existing organization assets and the different mechanisms to incentivize the users of an ecosystem
- Data Governance — enabling to build the agreements that allow to govern an ecosystem with multiple and independent participants
I am sure you know many products and tools related to these topics. Nevermined provides a complementary value to all of them. But before speaking about that, let’s take a closer look at what is already there.
Data catalogs: helping to better understand the organization
Data catalogs have been around for a long time, and understanding data is crucial to maintaining an organization’s value chain. It doesn’t matter what business you are in, nowadays everybody knows data is essential for making informed decisions. Data catalogs are good tools for managing data within the organization. They help to understand what data is available, what’s the quality of the data, how fresh it is, who the owner is and how it’s being used.
Some of the most important existing catalog solutions are Collibra, AB Initio, Informatica, Amundsen, etc. These are good solutions that play a critical role within an organization’s data strategy, but they tend to fall short in the following areas:
- Managing data that resides outside of the organization or across multiple regions or subsidiaries.
When the data you need is not in your data warehouse or big data lake, data catalogs are not so good at maintaining the same level of information and access control as they are able to do locally.
- Managing access to groups and individuals outside of the organization.
If you need to share a report with someone external to the company, this typically requires exporting the data and sharing using a different tool. This adds complexity and results in losing visibility and control over your data.
- Incentivizing data catalog usage.
Let’s be honest, keeping a data catalog up-to-date consumes a lot of time and effort. It’s also a boring task, so it’s common to see organizations with catalogs not being used because they are not updated and are not reflecting the reality of the data.
Data marketplaces: creating value from your existing data
Data Marketplaces are good solutions for making data available, allowing the sharing and monetization of data that typically sits within the organization. At the same time, organizations looking for data about a specific topic can get access to it after some payment to the data publisher.
Some of the main inconveniences of data marketplaces are:
- Existing access control mechanisms are pretty rudimentary and typically involve a basic payment condition. Bring me the money and I will give you my data. However, if you need something more sophisticated like: “I will give you access if you pay X and you belong to the legal department of Acme Corp”, things start to become more complicated. Currently the only way to handle this type of scenario is by cobbling together physical contracts, authorization tools and reconciliation reporting.
- Typically they require moving your data to the marketplace’s separate infrastructure. This means the data leaves your control and is left to the marketplace to provide and manage access. Nevermined’s ethos is to avoid moving your data if you can.
- They are not directly connected to the internal reality of the organization. There is no direct integration between the organization data catalog and the marketplace where the data is made available.
Federated learning and analytics frameworks
When we speak about large amounts of data and how to get value from it, nowadays there are many powerful tools and frameworks for processing data in a centralized-distributed (Spark, Flink, etc.) or federated way (TensorFlow Federated, FATE, PySyft, etc.). They provide APIs, SDKs and some of them even infrastructure management, facilitating the execution of complex compute tasks. But they do have some shortfalls, like:
- They are good for processing structured or unstructured data but are disconnected from the other data management tools of the organization, like data catalogs and marketplaces.
- They need bespoke adaptation in order to orchestrate the execution of compute jobs. If you only use Spark for data analytics this is okay, but what if you need to do some Federated Learning too? The integration and usage of these analytics tools is totally different, which means a lot of overhead for managing the different technologies.
- It’s not possible to directly support scenarios where you facilitate a third party to execute a job on top of your data in a privacy-preserving manner. For example, if you are a medical center collaborating with a research group, is there a way the data scientists can train a model on top of the medical data I host in different datacenters in a privacy preserving manner? Probably not…
What is missing here?
All the solutions listed above are good, and after integration and configuration they will mostly likely do the job, more or less. But in order to create a new class of data ecosystems, what’s missing is the following:
- A more holistic approach to making use of data, within or outside the organization. The aforementioned tools assume a situation where everybody is at the same side of the table (same company, etc.), but they don’t provide any interoperability support to interact outside of the organization’s boundaries.
- There is either no integration between all of these solutions or the existing integration is quite rudimentary and bespoke. You need to integrate all the pieces yourself. Because of that, orchestration and interoperability becomes a challenge.
- Remote computation where the data is located, independently of whether it is federated or not, is complex and requires governance, access control and infrastructure orchestration.
- There is no direct, native way to facilitate the tokenization and sale of data assets generated within the organization.
Why is Nevermined different?
We created Nevermined to facilitate the interaction of organizations and individuals around their data. It’s what we call data ecosystems. Here are some examples of how Nevermined can be used to support some use cases that simply can’t be addressed with the aforementioned solutions:
- Banks operating in the same region are natural competitors, but they have common problems, such as credit card fraud. This could be resolved more efficiently and effectively if they collaborated. It’s also not really a competitive advantage for banks not to collaborate in this type of activity. Models detecting fraud trained on top of the data of all the banks belonging to this ecosystem in a privacy-preserving manner would improve the accuracy of these models reducing fraud and the cost associated with it.
- A telco and an insurance company are looking to create a partnership for offering a new line of products. These new products would be the result of the insights generated via the micro-segmentation of their individual customers. The telco has mobility data and the insurance company the profiles and needs of their customers. All of this data includes personal information and can’t be shared. In this case, Nevermined can help to run the AI models in a privacy-preserving manner, providing the insights that offer highly profiled products with laser-like precision to their common customers.
- A company operating a network of IoT devices generates massive amounts of data on a daily basis. This data is anonymized and doesn’t include any private information, but has value for some other partner companies. For example, wind speed data from wind turbines can help municipalities with long term scenario planning by modelling instances of wind based damage against insurance premiums. Nevermined can help to automate the publishing and tokenization of this data. This means the data owners can generate additional revenue and the data consumers get access to relevant data for their operations.
- Within the context of pharmaceutical transportation, effective data sharing and transparency are the way forward to improve collaboration within the supply chain. If you want to be fast while mitigating risks, you need to be a team player. To streamline handoffs or reconcile issues, you cannot afford to wait a week to release your data, even towards a competitor. Nevermined can help to connect all of the entities participating in a supply chain whilst providing transparency, high fidelity provenance records, integrity, security, access control and data sharing across the board.
With all the above in mind, we created Nevermined to spark the creation of true Data Ecosystems connecting technologies and helping to resolve problems that are difficult to resolve with the current batch of existing, independent technologies.
If you would like to know more about the commercial opportunities Nevermined makes available for your organization, or are interested in having a demo, please reach out to us at firstname.lastname@example.org, drop us a line on Discord or visit our website. Thanks to Rodolphe Marques, Jesse Steele and Don Gossen for the editorial support.
Originally posted on 2021-01-14 on Medium.