Zetta Venture Partners | First Believers in AI Startups

Data rights are the new IP rights

October 2017

As more sophisticated resources for developers become widely available, copycat products can now be launched in a matter of hours. Software patents provided some limited protection, but feature wars rage on. Software without data is now commodity.

These pressures make AI and the data that feed it more valuable than ever. There is no usable AI without data; AIs need data to train to minimum algorithmic performance (MAP) before they can demonstrate value to potential users and attract customers. New customers bring in more data, which is used to improve the algorithm’s performance, which attracts new customers, and so on. Each iteration of this feedback loop — which Zetta calls the virtuous loop — digs a deeper competitive moat.

Continued access to usable data is crucial to keeping this feedback loop moving. As a result, data rights have become the new IP rights in today’s Intelligence Era of startups. This presents opportunities and challenges for emerging startups.

Startups have a ‘clean start’ advantage

Customers were hesitant to entrust their data to an outside party at the dawn of the previous era of startups (the Cloud Era). Cloud era startups would explicitly forgo all rights to the customer data they managed in order to assuage these concerns. Many of those agreements are still in place today, hampering cloud era startups in their attempt to apply intelligence to their products. These cloud-era startups must now undergo the challenging conversation of re-negotiating data rights with their existing customer base, or go on an acquisition spree to get data.

Startups in the Intelligence Era are approaching customers who are more comfortable letting third parties manage their data, enabling them to engage in a different conversation about the data. While cloud storage and computing has grown exponentially, intelligence-era applications require more infrastructure and high-touch data handling in order to effectively capture the relevant data then clean, label, query, and analyze it in order to provide actionable prediction and automation. Many of these enterprise customers that grew comfortable with entrusting their data to cloud vendors remain wary of sharing deeper access to their data with these vendors, who may become potential competitors. Startups present less of a competitive threat and are better positioned to successfully negotiate the rights to use this data.

Bootstrapping data to jumpstart the virtuous loop

Demonstrating value to gain leverage in negotiating data rights with early customers presents a chicken-and-egg problem for intelligence era startups. For many applications, startups can jumpstart the virtuous loop by finding alternative sources of data to train the learning algorithm. Here are some possible approaches:

Target SMB and mid-market customers because they tend to have more liberal attitudes toward data rights, especially when data is exchanged for useful products at reduced prices. These smaller early customers can also serve as references for larger customers to see the value of contributing data to the training pool;
Hire people to train the algorithms, either as full time employees or via mechanical turk;
Find an external source of data such as publicly available datasets from government agencies, purchasing data from third party vendors such as Clearbit, or scraping relevant websites and social media;
Provide a freemium version of the flagship product to capture user engagement data; and
Sell a desirable side product at cost in order to capture the data, a strategy Tesla has employed in order to build a massive dataset to train self driving cars.

Many of these external data sources can be sufficient to train a learning algorithm to a high enough level of performance to demonstrate value to attract enterprise customers. It is imperative that intelligence-era startups build proprietary data pipelines in order to benefit from the compounding effects of pooled learnings across a customer network, making it difficult for new entrants and emerging copycats to catch up.

Strategies to structure data rights

While startups have an advantage over large incumbents in obtaining data rights, the negotiation is rarely easy. The following is an all too common story: a startup approaches a large enterprise with an incredible demo of a new, AI powered workflow that promises to save the enterprise thousands of employee-hours automating a tedious and time consuming task. The product ingests the company’s historic sales data, using it to qualify new sales leads and suggest the optimal time to call. The flashy demo blows the enterprise away and a limited, sandboxed pilot seals the deal. The enterprise is ready to buy and roll out the solution company-wide, as soon as possible. Unfortunately, the discussions get stuck in limbo as the deal goes to the enterprise’s chief compliance officer and lawyers for review: there is no way the startup will be allowed to access their data, lest it fall into the hands of the competition, but the startup’s product is less valuable to the enterprise without the relevant data to train it.

Startups can get in front of these concerns by making it clear from the outset of the negotiation that their main interest is in learning from data and the data exhaust (such as user engagement and interaction data, metadata, data flow information). As Zetta’s partner companies report, the first data rights negotiations are the most difficult. Over time, as the pool of data grows, it becomes easier to demonstrate the value of the product and the network effects of fellow customers. startups will gain more leverage in negotiating data rights after securing the initial wave of customers and their data.

A profound gamechanger

In the Cloud Era, companies competed by releasing new features, which are easy to copy. Consequently, absolute market dominance was harder to achieve, and second-place players exist in many categories. The virtuous loop presents an opportunity for companies to achieve ‘winner takes most’ status, which was otherwise limited to consumer categories. Strategies to achieve this lead could include obtaining exclusive rights to data, accumulating customer data and forming partnerships.

Incumbents and upstart rivals can no longer outspend the market leader to close the gap after startups reach critical mass of data. For the first time in history, technology companies have an opportunity to establish robust protection against legacy incumbents and emerging copycats, far beyond what traditional intellectual property strategies have been able to offer.