A Conversation on Collaborative Computing with Dr Hyoduk Shin from UC San Diego
Artifact management
11 min readPart of our series: Collaborative Computing — How Web3 can be rocket fuel for enterprises’ data strategy.
At Nevermined we are interested in applying Web3 and NFT technology to tackle real business problems in innovative ways. A particular use case we’ve been working on for years is using tokenization and access control to make the sharing of datasets easier.
Barriers have rightly been put up by regulators around the world to limit the commercial exploitation of personal data. But as privacy-enhancing technologies are maturing, we are moving towards a situation where we can create secure and multi-stakeholder ecosystems around data.
When Lawrence Lundy-Bryan, Research Partner at Lunar Ventures and a Nevermined collaborator wrote ‘The Age of Collaborative Computing’, we were thrilled. We totally subscribe to this idea that we are at the beginning of a fundamental reshaping of how data is used in the economy. And we certainly won’t argue about his prediction that Collaborative Computing is the next trillion dollar market.
So, to help us understand the challenge of establishing a new product category, we worked with Lawrence to create a series of interviews with experts in the data collaboration space who are buying, selling or investing in this vision.
We previously published Lawrence’s conversation with Rick Hao from Speedinvest and with Jordan Brandt, CEO of Inpher. This week, he dives into the academic world.
A conversation with Dr Hyoduk Shin, Associate Professor at UC San Diego
By Lawrence Lundy-Bryan, Research Partner at Lunar Ventures
Collaborative computing is the next trillion dollar market. We are at the beginning of a fundamental reshaping of how data is used in the economy. When data can be shared internally and externally without barriers, the value of all data assets can be maximized for private and public value.
To explore this vision in more depth, I spoke with Dr Hyoduk Shin, an Associate Professor of Innovation Information Technology and Operations at the Rady School of Management at UC San Diego. Shin’s research interests include forecast information sharing and investment in supply chain management, competitive strategies under operational constraints, and economics of information technology.
Highlights include:
- How to realign incentives towards data sharing;
- Why, beside technology, culture will be the most important driver;
- How we will end up with a globally fragmented data economy
Let’s start at the macro level, the dominant data strategy is obviously to hoard rather than collaborate. Some of this is just because firms lack tools to share, but a lot of it is just plain old business strategy: if data is valuable then you need to collect and keep more of it, right? Is this changing? And if so, why?
I do see that it’s getting more collaborative, yes. But at the same time, as more firms consider collaborative approaches, it raises more issues that maybe were not fully recognised at first. Privacy is of course one that is top of mind. Privacy concerns and regulation are definitely a constraint to collaboration. Just keeping the data stored somewhere rather than figuring out how to share it is often the easiest thing to do. In many cases, even if a person or team really sees value in a particular dataset being shared with another team or pooling it with other datasets, the process of actually getting sign-off for it might not be worth the hassle for many people. That said, just storing the data somewhere is no longer necessarily the easy option. With more data being collected and stored, also the risks of using the data have increased. So you have all these questions related to data governance which companies are still trying to figure out.
The biggest challenge isn’t privacy however, it’s cultural or more specifically about incentives. For companies there is not a strong enough incentive to share data. The interesting questions to think through are how to realign incentives to share. I think there are three main ways to do that:
- Build up a long-term relationship between the data platform and the data supplier. Things like data unions and data trusts have a role to play here.
- The other is to work on culture. Firms can create a non-exploitative data sharing environment. Leadership can encourage staff to find ways to get the maximum value from their data assets culturally, but maybe even as performance indicators.
- Finally, you can use time as a way to segment data. Data is likely to be more valuable closer to when it was captured. So we could think about ways to share older data as routine practice but keep fresher data.
Are there use cases where you have already seen data sharing happen and is there anything we can learn from those early adopters?
I can talk about two specific areas where we have seen tools used: the semiconductor industry and marketing, each with their own commercial challenges. My work has explored how suppliers can share information with vendors and customers. The semiconductor industry had a problem. Customers typically have to place soft orders in advance of committing to an order, so they have got very good at forecasting. But it’s still only forecasting. And because it’s a soft order, it’s rational for them to over-order. And that’s what we observed in the industry. The customer put in a soft order for 5k units and then actually ordered 3k units. So the vendor is left with inventory. This is an obvious problem for which some form of data sharing makes sense. It makes sense for the vendor to encourage customers to share real-time demand so they can match supply and demand more efficiently and cut costs.
Marketing is another use case. We have seen examples of retailers tracking customer demand and sharing that data with brands who then resell that on to hedge funds. That’s less an example of data collaboration but rather the creation of data products and the value of that ultimately to a customer. There are challenges with this supply chain in that the buyer at the retailer end has no idea how their transaction data is being used. But it speaks to the value of data and the opportunities when firms can package it up and sell it as products.
I’ve found financial services and healthcare to be relatively early adopters of data collaboration tools mainly because of the regulation around privacy and data security. Have you found the same and what other verticals do you expect to be the next adopters?
Confidentiality is the driver here. These industries have highly confidential information that they need to protect. In the case of financial services, there are huge financial gains to be had by sharing data for things like KYC and fraud. Healthcare is less about financial gains, although it is with pharma, and more about the public health gains from aggregating data. A good example of organizations that attempt to strike a balance between confidentiality and aggregation benefits are trade associations. They collect confidential information from multiple parties and those parties enter into an agreement with the association, knowing they are providing some value in the expectation that they will get more back. So the model is not new, it’s just we don’t have the structures or organizations to share data yet.
When thinking about helping companies utilize their data, a sensible framework is: governance, sharing and monetization. It feels like 95% of companies investing in their data infrastructure are still on data governance, maybe 5% are finding ways to share internally, and <1% are even thinking about monetization yet. Does this sound right to you?
Yes, so far it is rare to see much monetization. Many companies are investing in governance as I mentioned earlier. This is sort of stage 1 before you can even think of sharing or monetization. This is a known problem and there are lots of companies addressing this need so we will see it solved relatively soon. How quickly firms then move to data sharing depends on a host of factors. Yes, technology matters and we are indeed lacking infrastructure, but culture will be the driver. Culture is typically the driver for adoption of new technologies in large organizations where inertia is a particularly strong dynamic. But with data collaboration you have capability and talent issues as well as the need to develop, teach and learn new processes and workflows. I think we will see start-ups at the data sharing and monetization stage long before larger organizations, for these cultural reasons.
It feels like the data consolidation model that has been at the forefront of data utilization strategies has perhaps reached its limitations in terms of efficacy. With the emergence of “Data Mesh”, Collaborative Computing, and, more generally, customer centricity, do you see a horizon where a data federation model plays a more significant role in the lifecycle of data estates?
Honestly, it will take time. My feeling is that for many companies, the capability just isn’t there for any federation work. There is definitely a gap between thought-leadership and market adoption when it comes to data collaboration. Large organizations will lack the capacity to do this. A cloud-based end-to-end solution that integrates nicely with existing software is probably the way we will see adoption, but as mentioned that only solves the tech, the cultural part is crucial too.
What cultural, technical or social change would be required for demand in data collaboration to increase 10/100x?
This is a policy question. I don’t see anything technically changing inertia. But regulation is good at forcing change as we have seen with GDPR. The difficulty is that it can’t be country by country, so we would want some agreed standard on data structure or data governance. Maybe done at G8 or OECD level, something like the WTO for data.
As countries are throwing up regulatory barriers to data storage and sharing, PETs like federated learning, fully homomorphic encryption and others are making it easier to process data without moving or even reading it. How do you think about a future regulatory landscape when data can be shared without being moved or read?
If we look to the future, it is indeed possible that tech renders the regulation useless. Or at least, the technology enables firms to programmatically do what regulation intended. Users can then choose between firms in the market. The reality is that different regions will use regulation for different purposes and so we will see different data outcomes. A likely scenario is regional variations with a different balance of power. So South East Asia could design regulation to empower state capacity, while the EU would focus on individuals and the US likely on corporations. That is certainly too simplistic, but the point is that tech won’t render regulation useless.
And finally, if we play out our collaborative computing vision, what are some of the implications of a global marketplace of data? Do you imagine companies managing data as an asset and selling more or less to manage budgets for example?
Everything will be priced and therefore a financial asset. This is a challenge for lots of reasons. Turning everything into software and pricing everything, will have a tendency to monopoly and, even with data, there will be economies of scale. Tons of power will come from that. The marketplaces themselves will be uniquely powerful to determine what can be traded and if all data is a tradable asset you could have a censorship challenge even bigger than with social media firms today. We can speculate on interesting consequences of data assets, for example, we will need to adapt financial reporting to include data assets. There might need to be rules around how and when data can be sold as it relates to the financial year. We aren’t even beginning to think about the consequences of these sorts of things.
As a parting thought, if we think about putting data on a balance sheet, it’s not impossible to imagine the FAANGs increasing their value by 5–10x. If we think about valuations in terms of data, one of Apple, Microsoft, Amazon or Google might already be a $10 trillion dollar company.
In the next part of this series, we’ll publish Lawrence’s conversation with Stan Christiaens, CEO of Collibra, the market-leading data governance platform.
That will cover:
- How competing against non-consumption changes how you sell;
- Why Data Asset System of Records are like CRM 20 years’ ago and;
- Why the Chief Data Officer is becoming one of the most important C-suite roles
Make sure to subscribe to our Medium channel for the next interviews in this series.
And we’d love to hear from you. Got any questions, suggestions or want to chat about your project? Contact us via the website or join our Discord.
Originally posted on 2022-12-13 on Medium.