Generative AI has captured the general public imagination with a leap into creating elaborate, plausibly real text and imagery out of verbal prompts. However the catch — and there is usually a catch — is that the outcomes are sometimes removed from perfect while you look a bit of closer.
People indicate strange fingers, floor tiles slip away, and math problems are precisely that: problematically, sometimes they don’t add up.
Now, Synthesia — one in every of the ambitious AI startups working in video, specifically custom avatars designed for business users to create promotional, training and other enterprise video content — is releasing an update that it hopes will help it leapfrog over a few of the challenges in its particular field. Its latest version features avatars — built based on actual humans captured of their studio — which offer more emotion, higher lip tracking and what it says are more expressive natural and human movements after they are fed text to generate videos.
The discharge is coming on the heels of some impressive progress for the corporate up to now. Unlike other generative AI players like OpenAI, which has built a two-pronged strategy — raising huge public awareness with consumer tools like ChatGPT while also constructing out a B2B offering, with its APIs utilized by independent developers in addition to giant enterprises — Synthesia is leaning into the approach that another distinguished AI startups are taking.
Much like how Perplexity’s deal with really nailing generative AI search, Synthesia is concentrated on really nailing the right way to construct essentially the most humanlike generative video avatars possible. More specifically, it’s trying to do that only for the business market and use cases like training and marketing.
That focus has helped Synthesia stand out in what’s change into a really crowded market in AI that runs the chance of getting commoditized when hype settles down into more long-term concerns like ARR, unit economics and operational costs attached to AI implementations.
Synthesia describes its recent Expressive Avatars, the version being released today, as a primary of their kind: “The world’s first avatars fully generated with AI.” Built on large, pre-trained models, Synthesia says its breakthrough has been in how they’re combined to attain multimodal distributions that more closely mimic how actual humans speak.
These are generated on the fly, Synthesia says, which is supposed to be closer to the experience we undergo after we speak or react in life, and stands in contrast to how numerous AI video tools based around avatars work today: typically these are literally many pieces of video that get quickly stitched together to create facial responses that line up, kind of, with the scripts which are fed into them. The aim is to seem less robotic, and more lifelike.
Previous version:
New edition:
As you’ll be able to see within the two examples here, one from Synthesia’s older version and the one being released today, there continues to be a ways to go still in development, something CEO Victor Riparbelli himself also admits.
“In fact its not 100% there yet, but it’s going to be very, very soon, by the top of the yr. It’ll be so mind blowing,” he told TechCrunch. “I believe you can too see that the AI a part of this could be very subtle. With humans there’s a lot information within the tiniest details, the tiniest like movements of our facial muscles. I believe we could never sit down and describe, ‘yes you smile like this while you’re comfortable but that’s fake right?’ That’s such a posh thing to ever describe for humans, but it might be [captured in] deep learning networks. They’re actually capable of determine the pattern after which replicate it in a predictable way.” Next thing it’s working on, he added, is hands.
“Hands are like, super hard,” he added.
The deal with B2B also helps Synthesia anchor its messaging and product more on “secure” AI usage. That is crucial especially with the massive concern today over deepfakes and using AI for malicious purposes like misinformation and fraud. Even so, Synthesia hasn’t managed to avoid controversy on that front altogether. As we’ve identified before, Synthesia’s tech has previously been misused to supply propaganda in Venezuela and false news reports promoted by pro-China social media accounts.
The corporate today noted that it has taken further steps to attempt to lock down that usage. Last month, it updated its policies, it said, “to limit the sort of content people could make, investing within the early detection of bad faith actors, increasing the teams that work on AI safety, and experimenting with content credentials technologies equivalent to C2PA.”
Despite those challenges, the corporate has continued to grow.
Synthesia was last valued at $1 billion when it raised $90 million. Notably, that fundraise was almost a yr ago, in June 2023.
Riparbelli (pictured above, right, with other co-founders Steffen Tjerrild, Professor Lourdes Agapito, Professor Matthias Niessner) said in an interview earlier this month that there are currently no plans to boost more, although that doesn’t really answer the query of whether Synthesia is getting proactively approached. (Note: we’re very excited to have the actual human Riparbelli speaking at an event of ours in London in May, where I’m definitely going to ask about this again. Please come when you’re on the town.)
What we do know of course is that AI costs numerous money to construct and run, and Synthesia has been constructing and running rather a lot.
Prior to the launch of today’s version some 200,000 people have created greater than 18 million video presentations across some 130 languages using Synthesia’s 225 legacy avatars, the corporate said. (It doesn’t break out what number of users are on its paid tiers, but there are numerous big-name customers including Zoom, the BBC, DuPont and more, and enteprises do pay.) The startup’s hope, in fact, is that with the new edition getting pushed out today those numbers will go up much more.