A Conversation on Collaborative Computing with a Senior Expert on Data Privacy
Artifact management
12 min readPart of our series: Collaborative Computing — How Web3 can be rocket fuel for the future of Data and AI
When Lawrence Lundy-Bryan, Research Partner at Lunar Ventures and a Nevermined collaborator wrote ‘The Age of Collaborative Computing’, we were thrilled. We totally subscribe to this idea that we are at the beginning of a fundamental reshaping of how data and AI are used in the economy. And we certainly won’t argue about his prediction that Collaborative Computing is the next trillion dollar market.
So, to help us understand the challenge of establishing a new product category, we worked with Lawrence to create a series of interviews with experts in the data collaboration space who are buying, selling or investing in this vision.
We previously published 5 of Lawrence’s conversations (links at the bottom). For the final episode, he provided us with the insights from the discussion he had with a Senior Data Privacy Expert at a large European software company, who prefers anonymity.
Conversation with a Senior Data Privacy Expert
By Lawrence Lundy-Bryan, Research Partner at Lunar Ventures
Collaborative computing is the next trillion dollar market. We are at the beginning of a fundamental reshaping of how data is used in the economy. When data can be shared internally and externally without barriers, the value of all data assets can be maximized for private and public value.
To explore this vision in more depth, I spoke with a Senior Expert on Data Privacy, who works for a large European software company. A few key points are:
- Why data utilization is just as much an organizational as technical challenge
- How internal data sharing will drive the use of PETs
- Why we are unlikely to see globally traded data markets anytime soon
How do you help customers inside and outside of your organization make the most of their data assets?
It’s a journey that you go on with different people. The important thing to remember is that every function inside an organization has a different objective. The security folks see data assets as a way to make systems more secure. That’s how they will approach data assets. It’s not that they don’t care or understand the importance of customer privacy or the value of data for analytics, it’s just not their primary concern.
The same for the compliance team, they want to reduce risk and make sure processes around data are to the letter of the law. They want the analytics teams to have access to data, but within guide rails they have set up.
And the business folks just want to access and use the data they need to make better decisions. None of this is inherently in conflict, but you can see how there will be trade-offs to be made about how to use data assets.
Now, on a day to day basis, what this means is that data ends up siloed. The business people do not want to tell compliance exactly how they are using data. Not because they are doing anything wrong, but they just don’t want to jump through a load of hoops and slow the team down. Everyone has deadlines to meet and the reality is orchestration and communication with all the relevant teams can be slow.
So the inevitable outcome in this case is underutilized data assets. The systems aren’t as secure as they can be. The company is taking more compliance risks than they should. Everyone is worse off in theory. How do we solve this? Well it’s easy to say and much harder to implement. You want clear business objectives, rather than business unit objectives to be prioritized. Much of this is cultural, so you want to encourage communication, collaboration and transparency. You want clear lines of communication between teams operating with good faith intentions.
The term data governance is overused a lot and I get the impression it’s used as a catch-all for a lot of different activities. What do you mean when you use the term?
Exactly. There is a clear definition but people have taken the term and extended it to mean lots of different things. There are maybe three ways of thinking about this: one is the company size. Small companies need to survive first and foremost and are building for a small group of customers to start. They don’t and shouldn’t be thinking about 3 years down the line when they have more customers with more data and need to meet all these different rules and regulations. That’s not to say they shouldn’t be thinking about data governance, it makes sense to put in place processes around consent, provenance et cetera to avoid technical debt. But the job is to move fast and sometimes that means cutting corners. On the other hand, you have big organizations with big legal, compliance, security, data infrastructure teams who are responsible for not cutting corners right. The big guys have internal policies and frameworks.
The second factor is industry. How important data governance is depends if you are in financial services or logistics for example. Not only are there different regulations but also the costs of poor data governance are higher in some industries like finance and healthcare. I think maybe industry is more important than size when it comes to data governance. A startup in financial services will be thinking more about data governance than maybe an SME in manufacturing.
The third factor is really just industry but more granular: what is the business unit. Or to take it further, what is the objective of the business? An HR division will have a very different data governance strategy to a multi-cloud IT team: they have different needs and objectives. I think the importance of this point is that vendors can’t just come and claim to help with data governance for the entire organization, let alone an entire market. Data processes, culture and objectives are just too different to address them all in one single solution.
What are your main challenges when it comes to data sharing internally?
If we think specifically about making the most of data assets, usually the conversation turns to sharing the data internally. People may not even think that they are “making the most of the data” when they email over a customer record to a colleague or share the status of a project in procurement over instant messaging. But that’s what we’re talking about: people sharing information constantly across the organization.
There is an odd thing here, though – organizations might talk about data minimisation and think carefully about how they interact with third-party vendors, but, internally, data is all over the place. So people know the challenges about moving data from the EU to the US for example and so there are lots of rules around that externally. But internally, well, how often do people think about where their colleagues are based and then what can and can’t be shared. If it’s within the same organization there is no contractual relationship so people think very differently about what can and can’t be shared.
Now, when we think about finding ways to make data sharing more secure, then confidential computing and all PET tools have an important role. It used to be the case that you didn’t have to worry about data in use, only data in transit. Now, organizations are forced because of regulation about third-party vendors and how they use the data you send them. So this is where you can imagine a really useful tool that limits what third-parties can do with the data you send them. That can be keeping it totally encrypted or verifying the integrity of the processing. Different problems will have different requirements and, again, the level of regulatory requirement matters here too.
Then, how big is the leap from tools making it easier to share data internally to then sharing it externally? It feels like the same cryptography that allows for internal data sharing can easily be opened up.
Yes, that sounds right. It isn’t a big leap once you protect data in use for internal sharing. Then it’s easy to jump externally. Ultimately, it’s the same tools. The big challenge is getting organizational buy-in and changing processes. But once you have legal, compliance, tech and business all aligned on using data sharing tools that use encryption, then bringing in suppliers is the obvious next step. Once you’ve gone beyond the organizational boundary to suppliers, really, you are talking about an ecosystem rather than internal data and external data, so you are in a different world.
How important is geography when thinking about the future of data sharing?
This is arguably the most important point related to data governance or data sharing. Data localisation is already one of the most important considerations for the management of data assets. Governments like China, Russia and blocs like the EU all want data to sit where they have some control over it. Also there are no global standards on encryption for example and countries have different export control requirements which will increasingly apply to data. So basically you have countries that want to control data. However data isn’t static, it flows in and out of pipelines, so you have a very complex environment with unintended consequences.
Then you have questions related to confidential computing, like can you still move data to a cloud computing company in the US even if the data processor can’t read the data? What around federated learning techniques? Because you might never move the data from the endpoint but the value is in the trained model which sits somewhere else. So do we actually want model localisation?
There are lots of open questions here and it’s certainly not a case that we will end up with global data sharing because people have different values. In China people trust the Government and so they want the Government to be able to see data and protect them for example. In the EU, there is a focus on the individual as the locus of control, so it’s about using technology to protect the freedoms of the individual. Those are just two extreme examples, but every country will have their own culture and set of values that they will apply to data.
Right, so this vision we outlined in the collaborative computer paper of these global liquid data markets is just not realistic?
I see a pathway, sure, but it’s quite difficult to get there. I have two major concerns. One is philosophical and the other is practical.
On the philosophical side, the idea of putting a price on data is problematic. Firstly, you have the issue of actually determining who gets to sell the data. Who does it actually belong to and who has the right to sell it. It works for some data where the owner collects the data directly but most data has a long trail from who collected it, to people who move it, to the service providers. It’s not a technical problem, this is more of a legal one and it’s hard, even putting aside the regional challenges.
The second philosophical issue is that the rich will get richer. What would stop the big companies just going out and buying up all the data they can? What about nation states? Companies and states go to great lengths to get hold of data today. If we have markets for them to just buy it, it’s hard to imagine a world in which the market for data is extremely unequal. You can imagine private features of these markets which make it hard to know who the buyers and sellers of this data are. We need legal frameworks to even think about these issues before we create markets.
But let’s put aside the philosophical concerns. Practically this is much harder than it seems, because you would need globally agreed upon standards. Standardization is just hard and takes a long time anyway, but standardizing data for trade is just too broad to standardize. It would be almost impossible to agree on a standard that supports all the needs of everyone that uses data. So you would want to narrow down the requirements, and then as you do you will find the big players today will want to bend the standards towards their needs. And the big players are incentivised to the status quo because they do pretty well from the data value chain as it is. So it’s hard to see the short term or long term pathway for data markets.
It feels like the data consolidation model that has been at the forefront of data utilization strategies has perhaps reached its limitations in terms of efficacy. With the emergence of “Data Mesh”, Collaborative Computing, and, more generally, customer centricity, do you see a horizon where a data federation model plays a more significant role in the lifecycle of your data estate?
This is driven by business needs more than anything else. The reality is that we want to do more stuff at the edge, so latency and performance matter more now than they did before. Car companies and manufacturing companies want to process data locally and make decisions quickly. Sending data back and forth across the Internet just takes time and in many cases isn’t necessary.
But that isn’t true for all use cases. Many, if not most data use cases don’t need immediate action. Data can be sent back to a data warehouse for processing and analytics. We will end up probably talking about “edge computing” a lot but the reality will be that the vast majority of data will still be consolidated and use cases that need to be done at the edge will be done at the edge.
So yes, I guess a data federation model will be more important in the future, but as part of a model that uses a tech stack that is designed for business needs rather than a binary edge vs cloud dichotomy.
This interview has been edited for clarity and anonymity. It was the last of the 6 interviews that Lawrence did. You can find the other 5 here:
- Jordan Brandt (Inpher)
- Stijn Christiaens (Collibra),
- Rick Hao (SpeedInvest)
- Dr Hyoduk Shin (UC San Diego)
- Flavio Bergamaschi (Intel)
And we’d love to hear from you. Got any questions, suggestions or want to chat about your project? Contact us or join our Discord.