A Conversation on Collaborative Computing with Rick Hao from Speedinvest
Artifact management
9 min readPart 2 of our series: Collaborative Computing — How Web3 can be rocket fuel for enterprises’ data strategy
At Nevermined we are interested in applying Web3 and NFT technology to tackle real business problems in innovative ways. A particular use case we’ve been working on for years is using tokenization and access control to make the sharing of datasets easier.
Barriers have rightly been put up by regulators around the world to limit the commercial exploitation of personal data. But as privacy-enhancing technologies are maturing, we are moving towards a situation where we can create secure and multi-stakeholder ecosystems around data.
When Lawrence Lundy-Bryan, Research Partner at Lunar Ventures and a Nevermined collaborator wrote ‘The Age of Collaborative Computing’, we were thrilled. We totally subscribe to this idea that we are at the beginning of a fundamental reshaping of how data is used in the economy. And we certainly won’t argue about his prediction that Collaborative Computing is the next trillion dollar market.
So, to help us understand the challenge of establishing a new product category, we worked with Lawrence to create a series of interviews with experts in the data collaboration space who are buying, selling or investing in this vision.
Last week, we published Lawrence’s conversation with Jordan Brandt, CEO of Inpher. This week, we follow it up with a view from the investment side.
A conversation with Rick Hao, Partner at Speedinvest
By Lawrence Lundy-Bryan, Research Partner at Lunar Ventures
The premise of Collaborative Computing is that when data can be shared internally and externally without barriers, the value of all data assets can be maximized for private and public value.
To explore this vision in more depth, I spoke with Rick Hao, deep tech investor at Speedinvest, a venture capital fund with more than €450m to invest in early-stage tech startups across Europe. Highlights include:
- Why getting more data will continue to be a business driver for the foreseeable future, despite trends of cost-efficient algorithms and fewer data algorithms at the frontier;
- Why healthcare is likely to need a different data infrastructure than other markets and;
- Why machine learning will be the main driver of data sharing tools
Let’s start at the top, how do you think about investing in the data sharing space? Is the governance, sharing and monetization framework useful?
Broadly yes, it’s a useful way to think about data management in the enterprise. The use of data within enterprise is continuing to grow dramatically. I would say regulations and privacy concerns means data governance is by far and away the most important of those categories today. I think there is either a missing piece or it fits under sharing, but I think a lot about enriching the data pipeline for machine learning algorithms. The reality is, today, more data means more accurate algorithms which makes companies want to collect or get access to more data, even if that is third-party data. Companies that are able to use more data securely will be well-positioned to build better AI applications. So I’m thinking of data sharing as more of a data acquisition strategy. If data sharing or data aggregation platforms are framed in that way, then there is a massive market.
Financial services and healthcare have a huge need for this stuff. In healthcare, though, we are a little further behind because there just isn’t enough data infrastructure to work with. Even if healthcare providers want to put up data to share with researchers or pool datasets and share the model, it’s hard for them to do that today. Most use cases in healthcare like genomic sequencing and drug discovery can’t just take off the shelf tools designed for manufacturing or commerce markets. The workflow process, data formats, and size of datasets all require a vertical-specific approach. SpeedInvest, as an investor, looks at vertical-specific approaches.
The idea of external data sharing is laughable for many in the enterprise, in the same way open-source server software was laughable in the 1990s, though we are starting to see a shift. What might it take to make the idea of data sharing more acceptable?
I think these two can’t be simply compared because open-source is more like a type of software business model and data sharing is a type of technology. Open-source server software had lots of measurable benefits. With data sharing, the benefits are more abstract for enterprises now, there are just less concrete ROI examples and a clear business case. The benefits seem less clear for business decision makers and the risks are obvious, that makes it hard. Take synthetic data, for example. In theory there are lots of benefits around reducing risk of working with personally-identifiable information (PII) and the ability to generate more of it easily. But when you apply those benefits to specific use cases, the ROI is harder to quantify. The way to do it with data sharing is really to tie it to business performance which should be tied to revenue increase or cost reduction. You can then say: ”enlarging the training dataset by X will improve the algorithm by Y, and therefore increase revenues by Z”. For AI use cases like structured data in financial services, this benefit can be strong.
When you think about investing in the enterprise data infrastructure space, how much weight do you place on the technology vs go to market plan?
As a true deep tech investor, I’m always looking for some innovative tech. There has to be a real hard engineering or scientific challenge being addressed that unlocks a huge market, so the technical risk is worth taking as an investor. I’m very hands on and will want to at least try the product and take a look at the code. But the reality is I’m looking for some early sign of product-market fit, especially with privacy-preserving technologies. PETs, at least some of them, like federated learning and multi-party computation, don’t require huge technical breakthroughs to scale like say fully-homomorphic encryption. So it depends on the specific technology, but, generally, we are at the stage now where I want to see that there is a market pull for the solution. The key for data infrastructure is usability. It is not a straightforward deployment, so integration is crucial. Of course, the go-to-market might change, but you can see very early on if a team understands product and solving customer pain points rather than just building technology.
What are your thoughts on distributed computation or federated learning for enterprise? The pitch of running analytics locally without sending data back to a remote server feels like a compelling proposition? Do you think it will catch on?
Yes, it will be one of the hottest topics in machine learning in the next few years. Privacy-preserving machine learning is not so much a buzzword but will start to reach the C-suite as a way of addressing privacy concerns. It’s still a relatively early topic for enterprise, many firms are still in the early stages of the data management journey and haven’t reached the stage of doing large machine learning in production yet. So, for them, this topic may still take some time to be considered.
I strongly believe that federated learning is going to be very important and it is the right market timing for making investments in this area for investors. I expect to see federated learning or distributed computation projects take the community based business model to start gaining adoption with developers. The focus must be on making this new distributed workflow as easy to use as possible.
The crypto market has been raising the bar in private and collaborative data sharing with things like zero-knowledge proofs and MPC. Do you think that it is being ignored or underestimated, because this stuff is happening in the crypto world?
My personal view, beyond the technology, is that zero-knowledge proof technology has value. It is just so useful for so many use cases, but today we only see it applied to blockchain technology successfully. That seems to be a strong use case that needs a scalable way to validate transactions, but we haven’t seen many other really clear industry use cases. I don’t think it’s being underestimated as such, it’s just either companies aren’t taking the technology and applying it to industry use cases yet or that the pain points just aren’t strong enough yet. The question I would ask is: is there an alternative and easier way to solve the same problem without using complicated cryptography?
For data markets to be realized, we will need strong privacy tools, but also a way to track, pay and exchange the data quickly. Can you imagine widespread data markets not built on blockchain technology? If not on open infrastructure, how do we keep the markets themselves open? Or should we be comfortable repeating the same mistakes as Facebook, when it comes to data rights?
Right now you can’t look past the privacy problems. We are still working on protecting privacy, but if you suppose data assets are private, you still have the challenge of ownership or rights attached to the assets being sold. Most data has a long value chain and it’s rarely as simple as a single creator and owner. So how can we attribute ownership or property rights to data assets in markets that might be traded hundreds or thousands of times? Data exchange markets are part of the future, but they will have to have good data sharing infrastructure available. Blockchain could be a solution but for enterprise solutions, I think federated learning approach could be a better approach since it is easier to implement and integrate to the existing system.
Finally, the idea of ‘collaborative computing’ is a computing environment in which strong privacy tools are built into software and data infrastructure letting anyone anywhere package up and trade data on a global market. Much in the same way encryption with SSL allowed anybody anyway to securely transact with someone else on the web. How much does this idea resonate?
It resonates, but it’s hard to see a pathway for this in the short term immediately. There are just so many moving pieces that go into creating a marketplace for an asset that isn’t traded today. I think the big open question is how to resolve the ownership piece and then share the data securely. I think confidential computing would be a fundamental layer for collaborative computing.
In the next part of this series, we’ll publish Lawrence’s conversation with Dr Hyoduk Shin, an Associate Professor of Innovation Information Technology and Operations at the Rady School of Management at UC San Diego.
That will cover:
- How to realign incentives towards data sharing;
- Why, beside technology, culture will be the most important driver;
- How we will end up with a globally fragmented data economy.
Make sure to subscribe to our Medium channel for the next installments of this series.
And we’d love to hear from you. Got any questions, suggestions or want to chat about your project? Contact us via the website or join our Discord.
Originally posted on 2022-11-23 on Medium.