Essential building blocks of decentralized digital ecosystems

Data Provenance

9 min read

Adding DID + NFT + Access Control + Provenance + Integrity + Remote Computation to enable your Digital Ecosystem use cases

As this year ends, we’ve made massive advances on Nevermined, the world’s first decentralized data sharing solution, as we aim to provide a solid foundation for scalable Digital Ecosystems. The following provides a description of these Digital Ecosystem building blocks.

What’s a Digital Ecosystem?

At Nevermined, we define a “Digital Ecosystem” as a virtual environment where different entities interact for a common purpose. This can mean a supply chain process where a pharmaceutical organization produces drugs and that are then delivered, through the cooperation of different organizations, to hospitals. In this situation, the current freight forwarder may want access to the previous handler’s temperature control information to ensure that the drug’s temperature is adequately maintained.

Or it could mean a place where artists make their works available, requiring access to their works and also providing full consumption provenance and possibly creation attribution of their work within the community.

In these situations where we represent the interaction between different entities digitally, several essential technological building blocks can now come together to make these digital ecosystems possible.

These building blocks are:

1 — Decentralized Identifiers (DID)

A Decentralized Identifier, or DID, is a unique identifier that can be resolved to a standard resource describing the entity — a DID Document, aka DDO. If we apply this to Nevermined, the DID would be the unique identifier of an object represented in Nevermined (i.e. a dataset, an algorithm, an artwork, a skillset, etc.). The corresponding DDO includes the metadata information describing this object. The DID is recorded on-chain and is owned by the creator (but the ownership can be transferred). Effectively coupled, the on-chain DID can resolve to the off-chain metadata in the shape of a DDO.

For example, in the drug shipping use case, each handler could publish a reference to each of their shipment’s temperature data through a DID/DDO pair, which would then be discoverable by other freight forwarders in the supply chain. Depending on the access constraints to this temperature data, up or downstream freight forwarders and/or manufacturers could request access to the data throughout the shipments supply chain life cycle.

2 — Integrity Proofs

As part of the process of registering information on-chain or off-chain, some integrity information needs to be provided and recorded. Things like file checksums and cryptographic signatures. This means when you register an asset associated with a bunch of files, the md5 checksum of the file is recorded. When someone is using an asset of the network, like getting access or triggering a computation, the signature and the action of the user is recorded on-chain too. What this means is that if files are modified afterwards and the checksum of the files is different, or someone says that they never got access to an asset, there’s an integrity mechanism in place to prove what actually occurred.

For example, in the drug shipping use case, if the manufacturer recorded a shipment’s temperature data and made that data available for the freight forwarders to access, that asset would have a unique identifier on-chain. If the manufacturer then modified that data asset for some reason, the checksum of the new data asset would differ from the originally published asset and could be flagged for anomalous behavior.

3 — Decentralized Access Control

Most of the interactions between users of a Nevermined data ecosystem requires the management of access control to digital assets. Simply put, this means that for something I own, I want to give you some permissions to do something under some circumstances. These are all parameters that I control.

This conceptually is a little abstract but is extremely flexible at the same time. It means that depending on the problem I want to articulate and manage, different access control possibilities can be supported across any use case, including the following:

I have data, and I want to allow anyone paying me X amount to get access to my data.
I’m an artist, and for all my customers buying my art, I’m going to issue a Non-Fungible Token, or NFT, that allows them to get access to new and exclusive content.

For example, in the drug shipping use case, the manufacturer may allow certain freight forwarders to access the manufacturer’s temperature gauge data embedded in the drug shipment. In this case, access control would allow for accepted freight forwarders to access the data remotely to determine if the shipment’s temperature has gone above a certain threshold during the course of shipping.

4 — Identity Management

Corporate environments utilize complex identity management and access control mechanisms via Domain Controllers (i.e Active Directory, LDAP, etc). These solutions authenticate and authorize corporate users of a specific domain or network. Implementations like Active Directory enable the management of individual or group permissions within the organization by assigning security policies.

Correspondingly in order to complement the flexibility of existing access control mechanisms, Nevermined facilitates the integration of these domain controllers. The result is the ability to integrate existing permission policies with Nevermined’s access control smart contracts. This enables use cases such as:

As a marketing department manager, I’m going to give access control privileges to the sales department so that they can view last year’s Monthly Active User reports to the sales department.
I work in the IT department of Acme, and I’m going to allow John Doe to get access to the log files of my CMS servers.

For example, in the drug shipping use case, the manufacturer would plug-in Nevermined’s access control module to their existing domain controller(s). They could then simply add the access rights of certain freight forwarders to an existing group of users, giving the freight forwarder equivalent access rights as a pre-existing user group. This capability drastically simplifies how external users are authenticated and authorized.

5 — Non-Fungible Tokens (NFT)

A Decentralized Identifier (DID) that digitally represents some physical stuff, like drugs, aligns quite well with the concept of a Non-Fungible Token (NFT). The implication is that, if you are a data owner or an artist registering your artwork, you can mint NFTs associated with your DID (i.e. the digital proxy of your artwork) and distribute them amongst your customers or users. As an extension to this, NFT owners can use their NFTs to get access for trading purposes or for getting access to additional and exclusive stuff.

For example, in the drug shipping use case, possession of the drug would be represented as an NFT, with handoffs required during each phase of the shipment’s journey. This results in high-fidelity possession tracking, where handoffs between participants like the ground and air shipper are clearly delineated.

6 — Provenance

Provenanc e allows us to understand the context in which “something” was created, how it is used and by whom, and how ownership is transferred or delegated. W3C Provenance specification defines, in a use case-independent way, how provenance can be registered and used. This, combined with the utilization of a blockchain network, provides a transparent and unique source of truth for data ecosystems.

For example, in the drug shipping use case, combining NFTs with integrity proofs, plus the W3C Provenance standard creates an unparalleled level of transparency in tracking the shipment through the supply chain

7 — Privacy Enhancing Computation

When sharing data is not an option because of privacy constraints, permitting an algorithm to compute on a data set you can’t see is a rational alternative. There are many different possibilities and technical solutions for this depending on the computation being executed, like a simple aggregation or a fully-fledged analytics process using Spark or similar, as well as the privacy constraints of the data.

To deliver this type of sophisticated solution involves the orchestration of computation techniques, Federated Learning and on-chain computation. Which pattern is leveraged depends entirely on the use case.

What is important is to understand the use case, what limitations or requirements are implied, and how to enable computation in a frictionless manner.

The main intention here is to support use cases such as the following:

I work in the payments department of a bank. I have a model that can detect 50% of the frauds happening using the bank’s credit card transactions. If I could train my model in a privacy-preserving manner on top of other banks’ credit card transactions, I could improve the accuracy of my model, detect more fraudulent transactions and save a pile of money. I could also sell my model to the banks that share their data!
I work in a distribution center where COVID vaccines are received before local distribution. I would like to run a query against the temperature sensors of the carrier to check that temperature was always within the appropriate temperature range, and flag for further analysis any shipment that fails acceptance criteria.

For example, in the drug shipping use case, being able to orchestrate temperature threshold calculations across numerous providers’ datasets could help flag a problem with the shipment before it becomes a problem. However, knowing which dataset provided the flagging information could be kept private through aggregation techniques, etc. This type of capability could limit the risk exposure of any given data provider within the supply chain.

Decentralized pieces that enable cooperation

Coming back to the Digital Ecosystem concept, all the previous use cases describe situations where independent entities need to cooperate with each other (even when they are competitors) to fulfil a common goal. All the components described above help to compose and provide solutions to these complex scenarios in a decentralized manner.

The original goal of Nevermined was to allow data sharing and privacy-preserving computation. During the development journey, the above building blocks emerged as design principles. After speaking with various organizations, we realized these components are actually foundational elements, which allow us to solve more complex problems beyond classical data center issues.

Please, tell me more …

Most of this stuff is part of the foundations of Nevermined. So you can find more information in the following links:

How do we use Decentralized Identifiers (DID)?
Why Provenance is Important and how we integrated it?
How did we design the integration with corporate identity management systems?
What’s decentralized access control?
How do we orchestrate privacy-preserving computation?

If you want to follow the conversation, feel free to join our community on Discord.

If you would like to know more about the commercial opportunities Nevermined makes available for your organization, or are interested in having a demo, please reach out to us on info@nevermined.io or visit our website.

Thanks to Don Gossen and Jesse Steele for the editorial support.

Originally posted on 2020-12-29 on Medium.