What is minimum viable data?


As many focus on the possibilities and limitations of AI tools and their output, Ovetta Sampson reminds us to pay attention to the power of the input.
Share What is minimum viable data?


Hero illustration by Thomas Merceron
After two decades in journalism and the last five years in tech leadership roles, Ovetta Sampson knows a thing or two about navigating the human experience. As Director of User Experience Machine Learning at Google, she leads a team aiming to make ML and AI accessible and useful to more people. While many focus on how to build more powerful models and AI features, Ovetta continues to champion the importance of what she calls minimum viable data, data that is fully representative of the end user as well as the business. We sat down with Ovetta to learn more about how quality data is essential for building quality AI.
At Config 2023, Ovetta made the case that there were many creative things people could do that generative AI wasn’t capable of yet.

You helped create the design industry’s first set of AI ethics principles, which sheds light on how people influence data. What are some of your biggest takeaways from that work?

There’s an idea in technology that we measure what matters. If we’re not measured, do we exist? And if we don’t exist, how can we ensure we’re not harmed by the model’s outcomes? Sometimes, product builders either leave out the people, or they over-index on the data—they forget that those data points are connected to human beings. The result is traumatized data sets—data infused with the trauma of social, cultural, and economic practices.

This article is part of The Prompt, an online and print magazine by Figma and designed by Chloe Scheffe.
Consider the pool of mortgage data that makes up credit worthiness models in the U.S. The math formula that underlies FICO scores was written in 1958, but women couldn’t buy mortgages or sign for credit cards until the 1970s. Similarly, the oldest data set that we have is the U.S. Census, which began in 1790 and didn’t recognize LGBTQ individuals until 2021. So what happened? Did they exist before that? The worst thing you can do with ML and AI is be careless about the data—the people—you omit.
Minimum viable data is a call to product builders to pay attention to the quality of the data a product needs to make its business case, and to consider if ML and AI are the right tools for the challenge. There is no AI and ML without data, and there is no data without people. That data was generated, created, engineered, and transformed by a human.
There is no AI and ML without data, and there is no data without people.

What challenges do you see between people and data in the current state of ML and AI?

The quality of an AI’s output depends 99.9% on the input—namely, the data. The power is in the input. I could give you a prediction of how your model is going to behave based upon the data set that you train it with. The input is what creates these model outcomes, there’s nothing else to talk about, to be quite frank.
Who gets to decide what’s good or bad? Who gets to decide what gets into the input? Who gets to decide how much data they need to be able to solve a problem? Those are the fundamental questions.


How do you teach future product builders the importance of quality data?

Ovetta’s recommended reading:
- “Weapons of Math Destruction” by Cathy O’Neil
- “Ghost Work” by Mary L. Gray and Siddharth Suri
- “Everybody wants to do the model work, not the data work” by Nithya Sambasivan, Shivani Kapania, Hannah Highfill, Diana Akrong, Praveen Kumar Paritosh, and Lora Mois Aroyo
So when we think about the minimum viable data, what is the problem you’re trying to solve? What is the applicability and desirability of both ML and AI? Just because you can use AI or ML, does it mean you should? And is the data that’s going to be used to solve this problem indeed, equitable and of high quality? Is it the minimum viable solution you need to solve this problem and not raise the human engagement risks? So those are the things you need to consider. Right? When you’re like, “I’m going to use ML and AI to solve a problem.” Well, who are you solving it for? And that goes back to the data that you’re collecting.

How can we put ourselves back in the driver’s seat as AI technology continues to accelerate?

For so long, the public has been silent on ML and AI, but now you and I have access to it. We can control the software. With generative AI, we can build anything we want just by talking. Now that more of us have been exposed to AI, we’ll realize our place in it. The box is open, and now that people are aware, next comes advocacy. One great start is just to be aware of how important you are to these industries. You have a role in this technology, not just as a consumer.

Explore the rest of The Prompt, a magazine available online and in the Figma Store as a limited print edition.




