Surveys have been used to achieve insights on populations, products and public opinion since time immemorial. And while methodologies might need modified through the millennia, one thing has remained constant: The necessity for people, numerous people.
But what if you happen to can’t find enough people to construct a large enough sample group to generate meaningful results? Or, what if you happen to could potentially find enough people, but budget constraints limit the quantity of individuals you may source and interview?
That is where Fairgen desires to help. The Israeli startup today launched a platform that uses “statistical AI” to generate synthetic data that it says is pretty much as good as the actual thing. The corporate can also be announcing a fresh $5.5 million fundraise from Maverick Ventures Israel, The Creator Fund, Tal Ventures, Ignia and a handful of angel investors, taking its total money raised since inception to $8 million.
“Fake data”
Data is perhaps the lifeblood of AI, however it has also been the cornerstone of market research since without end. So when the 2 worlds collide, as they do in Fairgen’s world, the necessity for quality data becomes slightly bit more pronounced.
Founded in Tel Aviv, Israel, in 2021, Fairgen was previously focused on tackling bias in AI. But in late 2022, the corporate pivoted to a brand new product, Fairboost, which it’s now launching out of beta.
Fairboost guarantees to “boost” a smaller dataset by as much as thrice, enabling more granular insights into niches which will otherwise be too difficult or expensive to achieve. Using this, firms can train a deep machine learning model for every dataset they upload to the Fairgen platform, with statistical AI learning patterns across the several survey segments.
The concept of “synthetic data” — data created artificially relatively than from real-world events — isn’t novel. Its roots return to the early days of computing, when it was used to check software and algorithms, and simulate processes. But synthetic data, as we understand it today, has taken on a lifetime of its own, particularly with the appearance of machine learning, where it’s increasingly used to coach models. We will address each data scarcity issues in addition to data privacy concerns by utilizing artificially generated data that comprises no sensitive information.
Fairgen is the newest startup to place synthetic data to the test, and it has market research as its primary goal. It’s price noting that Fairgen doesn’t produce data out of thin air, or throw hundreds of thousands of historical surveys into an AI-powered melting pot — market researchers have to run a survey for a small sample of their goal market, and from that, Fairgen establishes patterns to expand the sample. The corporate says it will probably guarantee not less than a two-fold boost on the unique sample, but on average, it will probably achieve a three-fold boost.
In this manner, Fairgen might have the option to determine that somebody of a selected age bracket and/or income level is more inclined to reply an issue in a certain way. Or, mix any number of knowledge points to extrapolate from the unique dataset. It’s principally about generating what Fairgen co-founder and CEO Samuel Cohen says are “stronger, more robust segments of knowledge, with a lower margin of error.”
“The essential realization was that individuals have gotten increasingly diverse — brands have to adapt to that, they usually need to know their customer segments,” Cohen explained to TechCrunch. “Segments are very different — Gen Zs think in another way from older people. And in an effort to have the option to have this market understanding on the segment level, it costs numerous money, takes numerous time and operational resources. And that’s where I spotted the pain point was. We knew that synthetic data had a task to play there.”
An obvious criticism — one which the corporate concedes that they’ve contended with — is that this all feels like an enormous shortcut to having to exit into the sector, interview real people and collect real opinions.
Surely any under-represented group ought to be concerned that their real voices are being replaced by, well, fake voices?
“Each customer we talked to within the research space has huge blind spots — totally hard-to-reach audiences,” Fairgen’s head of growth, Fernando Zatz, told TechCrunch. “They really don’t sell projects because there usually are not enough people available, especially in an increasingly diverse world where you’ve numerous market segmentation. Sometimes they can’t go into specific countries; they can’t go into specific demographics, so that they actually lose on projects because they can’t reach their quotas. They’ve a minimum number [of respondents], and in the event that they don’t reach that number, they don’t sell the insights.”
Fairgen isn’t the one company applying generative AI to the sector of market research. Qualtrics last 12 months said it was investing $500 million over 4 years to bring generative AI to its platform, though with a substantive give attention to qualitative research. Nonetheless, it’s further evidence that synthetic data is here, and here to remain.
But validating results will play a very important part in convincing people who that is the actual deal and never some cost-cutting measure that may produce suboptimal results. Fairgen does this by comparing a “real” sample boost with a “synthetic” sample boost — it takes a small sample of the dataset, extrapolates it and puts it side-by-side with the actual thing.
“With each customer we join, we do that very same sort of test,” Cohen said.
Statistically speaking
Cohen has an MSc in statistical science from the University of Oxford, and a PhD in machine learning from London’s UCL, a part of which involved a nine-month stint as a research scientist at Meta.
One among the corporate’s co-founders is chairman Benny Schnaider, who was previously within the enterprise software space, with 4 exits to his name: Ravello to Oracle for a reported $500 million in 2016; Qumranet to Red Hat for $107 million in 2008; P-Cube to Cisco for $200 million in 2004; and Pentacom to Cisco for $118 in 2000.
After which there’s Emmanuel Candès, professor of statistics and electrical engineering at Stanford University, who serves as Fairgen’s lead scientific advisor.
This business and mathematical backbone is a significant selling point for a corporation attempting to persuade the world that fake data could be every bit pretty much as good as real data, if applied accurately. This can also be how they’re capable of clearly explain the thresholds and limitations of its technology — how big the samples must be to realize the optimum boosts.
In accordance with Cohen, they ideally need not less than 300 real respondents for a survey, and from that Fairboost can boost a segment size constituting not more than 15% of the broader survey.
“Below 15%, we will guarantee a mean 3x boost after validating it with tons of of parallel tests,” Cohen said. “Statistically, the gains are less dramatic above 15%. The information already presents good confidence levels, and our synthetic respondents can only potentially match them or bring a marginal uplift. Business-wise, there may be also no pain point above 15% — brands can already take learnings from these groups; they’re only stuck on the area of interest level.”
The no-LLM factor
It’s price noting that Fairgen doesn’t use large language models (LLMs), and its platform doesn’t generate “plain English” responses à la ChatGPT. The rationale for that is that an LLM will use learnings from myriad other data sources outside the parameters of the study, which increases the probabilities of introducing bias that’s incompatible with quantitative research.
Fairgen is all about statistical models and tabular data, and its training relies solely on the information contained throughout the uploaded dataset. That effectively allows market researchers to generate latest and artificial respondents by extrapolating from adjoining segments within the survey.
“We don’t use any LLMs for a quite simple reason, which is that if we were to pre-train on numerous [other] surveys, it will just convey misinformation,” Cohen said. “Since you’d have cases where it’s learned something in one other survey, and we don’t want that. It’s all about reliability.”
When it comes to business model, Fairgen is sold as a SaaS, with firms uploading their surveys in whatever structured format (.CSV, or .SAV) to Fairgen’s cloud-based platform. In accordance with Cohen, it takes as much as 20 minutes to coach the model on the survey data it’s given, depending on the variety of questions. The user then selects a “segment” (a subset of respondents that share certain characteristics) — e.g. “Gen Z working in industry x,” — after which Fairgen delivers a brand new file structured identically to the unique training file, with the very same questions, just latest rows.
Fairgen is getting used by BVA and French polling and market research firm IFOP, which have already integrated the startup’s tech into their services. IFOP, which is slightly like Gallup within the U.S., is using Fairgen for polling purposes within the European elections, though Cohen thinks it’d find yourself getting used for the U.S. elections later this 12 months, too.
“IFOP are principally our stamp of approval, because they’ve been around for like 100 years,” Cohen said. “They validated the technology and were our original design partner. We’re also testing or already integrating with a number of the largest market research firms on this planet, which I’m not allowed to discuss yet.”