This is the third in a series of articles called “Nevermined: Big Data. Small effort”, meant to outline the current challenges companies face in handling data and the very simple, but highly effective solutions offered by Nevermined, a cutting-edge data sharing blockchain technology. #1, #2
What makes Nevermined actually possible? How does it work under the hood? We’ve been getting these questions since we published our first 2 articles of our series on the Why, What and How of Nevermined.
In this article we’ll give some insights into some of the technical components. However, since this is a “simplified” series, we’ll try to keep the technicalities at a comfortable level for our diverse audience. If you’re a developer or require some more in-depth info, feel free to explore our GitHub, our documentation or seek us out in our Discord.
To summarize our previous articles, Nevermined’s mission is to make it easy to share data in a secure and governed way. This is a crucial capability for organisations, because sharing data is what will enable them to create 3 new value streams.
In this article we’ll cover 2 key components of Nevermined. One has to do with Federated Learning and AI, the other one is about how we leverage the advantages of blockchain.
BRING AI TO THE DATA
First of all, it is crucial you understand our technical design principle: Insights happen wherever the data resides.
We’ve mentioned before that a major bottle-neck for sharing data is the fact that moving data from its existing premises is a liability. It can be leaked in transit and due to the private nature of many types of data, moving it implies serious regulatory challenges.
At Nevermined, we understand that data doesn’t want to be moved. So we devised a way to bring computing and machine learning to the data and not the other way round. The magic component for that is called DISC, which stands for Data In-Situ Computation.
Let’s explore this component by introducing two stakeholders, a Data Owner and a Data Consumer.
Data Owners have data that they want to share, either because it will help with developing new products, or because it can be monetised, or simply because they know their data can be useful for an internal team or a third party.
On the other side of the ’transaction’ are Data Consumers. They need data in order to train their machine learning models and are possibly willing to pay for it. Either because it can help enrich their own data, create new insights, or help with developing new products.
The DISC component of Nevermined allows you, as a Data Owner, to offer computation services to a Data Consumer. This means you allow them to execute algorithms or train models (Tensorflow, scikit-learn, etc.) in the infrastructure where your data is stored. This can be on premise, in the cloud, anywhere, really. Remember, insights happen wherever the data is.
Let’s unpack that. The simplified version of this process goes as follows:
- the Data Consumer provides the algorithm to execute
- this is moved to the Data Owner’s infrastructure where the data is being stored
- the Data Owner executes the algorithm on behalf of the Data Consumer
- finally, the Data Consumer receives the result of the execution of the algorithm, without having had access to the source data
For readers who know a thing or two about Orchestration, this version gives you a bit more technical detail:
- the Data Consumer provides a workflow and sets up a service agreement to execute it on top of the Data Provider’s data
- the Gateway (more about this later) hands over this workflow to a Compute Service, run by the Data Owner
- the Compute Service communicates with a Kubernetes cluster (using Argo) to prepare the execution environment (wherever the data is), including a Configuration Pod, a Compute Pod and a Publishing Pod
- the Configuration Pod performs any configuration needed to execute the Data Consumer’s workflow (setting up the access to the data, …)
- the Compute Pod starts and runs the Data Consumer algorithm. This can happen in multiple Data Providers simultaneously using a Federated Learning framework
- the Publishing Pod creates a new asset, including the result of the execution of the workflow. It transfers ownership of the new asset to the Data Consumer. In case of machine learning this would be a trained model.
If you want to find out even more, or you want to have your technical teams explore this, there is much more detail in our DISC specs.
But the key point to remember is:
Nevermined makes it easy to share data, because we bring the algorithm to the data, so the data doesn’t have to be moved.
IF THIS, THEN ABSOLUTELY THAT
The second element of our technical set-up is about creating business logic, or more precisely, immutable and automated business logic.
Typically, Data Owners & Consumers don’t trust each other. So, in a traditional (read: defensive) data culture, any attempt to share or access data from another party will result in lengthy legal processes and approval workflows, which are manual and expensive.
Nevermined circumvents this problem with digital, automated, high-fidelity processes. And for that we rely on blockchains, because they have unique characteristics like traceability, programmability and immutability. In our set-up, we have developed the following building blocks.
1- Access control
As a Data Owner, our Access Control allows you to ‘publish’ a data set and define the sharing conditions attached to the asset: by whom can it be accessed, for what price,… This can be integrated with existing access control solutions like LDAP, so that you can use the same fine grained access control rules that your organization is already using.
When a data set is published, Nevermined effectively turns it into a digital asset, registered on a blockchain. Nevermined uses the W3C DID standards. This means that every asset will get a unique and verifiable Decentralized Identifier stored on-chain.
Every asset will also get a DDO, a Digital Identity Document, which contains more contextual information and metadata related to the asset. The asset’s metadata is stored off-chain, either in a centralized or decentralized storage solution.
These 2 files are intrinsically linked and make up the digital asset.
Note that an asset doesn’t have to be limited to data. In more advanced scenarios, Nevermined enables you to tokenize algorithms, turn your assets into NFTs, set more complex reward distribution schemes for when a service is consumed, enable second-market sale royalties to NFTs, …
3- Blockchain Smart Contracts
Nevermined Smart Contracts are crucial building blocks, providing the immutability and automated business logic that makes data sharing with Nevermined easy, secure and trusted.
Our contracts are EVM-compatible, implemented in Solidity and provide the following functionality:
- Asset Registry
As mentioned under ‘Tokenization’, Nevermined uses W3C Decentralized Identifiers (DID) to register assets on-chain.
- Service Execution Agreements (SEAs)
Aka the core engine of the platform. Using smart contracts allows Data Owners to add conditions to their assets, on-chain, i.e. in an immutable way. It’s the SEAs that orchestrate the execution of the Data Access and Data Computation.
- NFTs (via ERC721 and ERC1155)
This allows for the tokenization of assets and the utilization of them as part of service agreements between users.
Note that we have developed the Nevermined Zeppelin OS contract management framework and Contract Tools. These tools make deploying and upgrading Smart Contracts easier, across multiple networks, whether that’s in production or testnet, whether that’s on public or private blockchains.
4- The Gateway
Tying it all together is the Gateway. The Gateway simply does what it says on the tin: it acts as a control mechanism. When a Data Consumer tries to access a data set, for instance, via a marketplace, the Gateway will check on-chain whether the conditions are met: is the data consumer whitelisted, has the correct price been paid? If so, the Data Owner will automatically grant compute access to the data and the DISC component mentioned above can be started.
Again, feel free to dig into the documentation.
But the key point to remember is:
Nevermined makes it easy to share data, because we make the transaction process more reliable, verifiable and automated, so you avoid getting stuck in lengthy agreement workflows.
PLUG AND PLAY
While these 2 technical pillars (Bring AI to Data + Blockchain integration) are in itself fairly novel for enterprise environments, we also want to point out that Nevermined is easy to integrate with existing systems. We see ourselves as a Middleware solution.
Which means that you don’t need to train your organisation to use a new tool. Nevermined can be integrated directly into your existing tools via our Software Development Kits (SDKs). They are libraries encapsulating the Nevermined business logic and can interact with all the components and APIs of the system.
To be precise, we have 3 SDKs (links included):
- The Nevermined SDK PY is a Python version, to be integrated with back-end applications. The primary users are Data Consumers/Scientists.
- The Nevermined SDK JAVA is a Java version, to be integrated with JVM applications. This is primarily developed for data engineers and enterprise environments.
We hope that this has given you an insight into the technical USP of Nevermined. Compared to other data sharing ‘solutions’, our machinery doesn’t just make it possible to share data: it makes it easy. And that’s crucial to unlock data’s real potential and create new value streams.
If you have more questions or would like to set up a demo/call with our tech team, do get in touch.
Originally posted on 2021-11-05 on Medium.