I started experimenting with text-to-video avatars in August 2024 with “Synthesia 2.0”, roughly two months after they were announced in June - before eventually trying HeyGen as well.
Personal avatars are a novelty item today, but I have high conviction that they will fundamentally transform how we communicate in a private, professional setting (e.g., internal corporate comms), as well as how we share ideas publicly (e.g., social media).
My Take: By 2030, a sizable portion of video podcasts will be multi-avatar text-to-video productions.
While the long term transformative impact is real, today it still requires thoughtful human involvement to produce high quality output.
Here are two examples I found:
Scripting is an art - like prompting or speechwriting. It’s easier than it used to be, but harder than it looks. For example, AI still struggles to pronounce culturally nuanced words like “meme.” I expect similar challenges in other fast-evolving domains with emerging language. For anything time-sensitive or trend-driven, human oversight is essential.
Knowing how to configure programs and edit post-production is a real skill. It’s learnable (I picked up the basics myself), but it takes time - and I’ll never be as good as someone who does it full-time.
For the hobbyist, “good enough” production value is often sufficient - but even that takes effort. Most will move on once the novelty wears off.
For solopreneurs, solo GPs, or founders - let alone established companies - developing high quality video output requires real expertise. Even with significant editing and post-production development, its still clearly an avatar, not a live video recording. That’s acceptable for some use cases, but not for many. I suspect that for the foreseeable future, knowing it is an avatar and not the real human speaking to you, will lessen the audience's perception of how genuine the content is - and by extension, how trustworthy the individual or brand is. The psychological need for real human connection is deeply engrained from millions of years of evolution. Its not going to be overcome tomorrow.
Yes, production quality and usability will continue to improve, but for those hoping to scale content output while reducing costs and time - don’t count on that just yet. Even as it improves, I expect 90% of users will still prefer to outsource to professionals rather than manage production themselves. It simply won't be an optimal use of their time to DIY it.
For investors, based on what I've seen using the current tools, there’s an opportunity to focus on solutions that cater to high-context, niche domains - where the bar for clarity and credibility is especially high. Think: medical professionals, educators, pharmaceutical R&D, life sciences, and legal. These products may not become standalone category leaders, but they will likely be compelling acquisition targets as the ecosystem matures.
Exploring the future of text-to-video avatars, @benersing dives into how tools like Synthesia 2.0 may revolutionize communication by 2030. While still needing human oversight, this tech could unlatch new opportunities for marketers and professionals keen on enhancing content creation.
hey @colin . how does a paragraph publication get this post synopsis on here? is there a subscription or...?