Data Is Rarely A Moat: Three Cases Where It’s Not (And One Where It Is)
BY: Sandra Carrico, Vice President Engineering & Chief Data Scientist at GLYNT.AI
With corporate spending on artificial intelligence (AI) expected to increase by 70% since 2017, according to Gartner, many of today’s companies are developing their strategy for AI and the data it needs. Too often, the centerpiece is a set of mantras: Our data is unique and valuable. More data is better. Data is the new gold. But there is more than one way to monetize data. And by thinking through these options, you might find that you have a new AI strategy that is more profitable and less risky.
AI Is Data Hungry
The nature of AI models is driving much of the strategic thinking around data and possibly the confusion, too. Today, best-in-class deep learning models tend to get first results after training on data sets of 15,000 examples or more. As the number of data examples increase, AI model performance rises, and at 150,000 to 300,000 data examples, the model typically reaches accuracy above 95% — an acceptable performance level for most corporate use.
Frequently, the data strategy thinking goes as follows: The need for huge piles of data examples to train a powerful AI system creates a barrier to entry. Only those companies that can amass large data sets, pay for the labeling of correct answers on that data for use in AI training and set up large AI models can move forward with an AI-based strategy.
But sustainable competitive advantage — a moat — does not come from the existence of data, but how it is monetized. As the following examples show, it can be wise to let others have your data and spearhead the AI. Sound crazy? Read on!
The Four Quadrants Of AI And Data
The defining dimensions of AI and data strategy are determined by whether your data is unique and if your AI models must be proprietary. Consider the various combinations:
1. Unique in-house data that an AI-powered service or application exploits.
This is the business case that current strategy focuses on, but it occurs far less frequently than expected. Consider YKK, the company that sells nearly 50% of the world’s zippers. Their depth and breadth of zipper data could enable important zipper innovations, customized products and so on. Unique data plus AI equals an even more compelling product. Building the AI capability in-house keeps the moat around the data.
On the other hand, consider Uber or Lyft. Each has data on the supply and demand for local ride services, acquired through their apps. But a new market entrant, a hypothetical Ride-X, could start offering the same service and get the same unique data. Uber and Lyft’s unique data does not prevent entry by Ride-X. What prevents entry is the difficulty in getting drivers and customers. Uber and Lyft have used investments marketing and customer stickiness to create a moat around their business model.
Try the same thought experiment by asking, “Would a venture capitalist fund an AI-driven startup to compete against me? Does my data give me a moat?” If you can’t see how an entrant could compete, congratulations! You have a very rare, protected business model. If you see that disruptive startup coming at you, continue reading.
2. Unique data used by another organization in their AI models.
Medical record data is hugely valuable, as it feeds AI models that detect diseases. Hospitals are sitting on millions of patient records, but typically don’t have in-house AI teams. So, no surprise, hospitals are forming partnerships with pharmaceutical companies and other research organizations.
This strategy monetizes unique data by transferring it to other AI researchers. The hospital’s core product is patient services, while the research company’s core product is detecting disease. The hospital still gains from its data.
3. Non-unique data used by in-house AI models to gain competitive advantage.
Non-unique data can be monetized, and this has been going on for years in the lending industry. Credit bureaus and other data aggregators have been amassing rich databases on potential borrowers. They sell copies of this data to lenders who use it in their in-house models.
Lenders can add their own applicant and borrower data, and the result is a better lending model, whether powered by AI or not. The credit bureaus resell the data millions of times per year, so it is not unique. But still, the lender’s core product is made better by the purchased data, and thus there is demand for non-unique data.
4. Non-unique data used by another organization in their AI models.
Let’s continue with lending but shift the locus of AI modeling activity. Suppose a new company, Lend-X, emerges that uses AI to make a better lending model — one that can find the loan applicants most likely to default.
A bank might consider the following strategy: Instead of building its own in-house AI team, a bank provides its applicant and borrower data to Lend-X. Lend-X makes similar arrangements with others and amasses a large data set, leading to a powerful AI model of loan defaults. The bank can use its data as a bargaining chip, gaining access to the lending model for cheap. Or it can obtain equity in Lend-X, gaining value from its data and that of others.
Standard data strategies tend to overestimate the value of one’s data. Broader thinking not only shows how to monetize data in new ways but can avoid the considerable expense and risk of building a strong in-house AI capability. Data can be monetized without being unique. There is a huge payoff in double-checking one’s data and AI strategy.