Best TTS for Video

lucian.harhata · Jan 31, 2025

Heya all

I've been testing so far best TTS tools out there that I can use in videos:

Eleven Labs (proprietary https://elevenlabs.io/, cloud)
Fish Audio (open source https://fish.audio/, cloud)
XTTS-v2 (open source, locally)
other open source solutions (Tortoise TTS etc)

In Eleven Labs and Fish Audio instances I've used the paid versions of them and was able to create custom voice + convert text to speech.

My findings:

Eleven Labs

great voice, can change intonation and understand text
very good end result
paid version
limited number of chars, i.e. 30k chars for $5; you end up using the chars quite quickly

Fish Audio

good voice, it gets close to Eleven Labs
but it does not understand all text, if there are some characters in the text it goes wild
i got some strange shussh type of sounds that were very weird (no voice, just like a wind sound)
paid version so far shows on their pricing page that it is Unlimited usage
- so far i have not been charged more and created heaps

XTTS-v2 & other open source

quite time consuming to set them up
sometimes you'll get errors with missing packages
a loooooong time to produce small amounts of text to voice, i.e. for 2 sentences took like 2 minutes on a powerful PC
I gave up bcz of wasted time
quality of audio was mediocre compared to the other two above
the only one that gets close to Fish Audio is XTTS-v2, but I am not impressed personally

I also used TTS Arena https://huggingface.co/spaces/TTS-AGI/TTS-Arena to check current live leader board, and Fish Audio gets pretty close.

My conclusion:

Eleven Labs is good for something high quality sound, quite impressive.
- But for going at scale, I think costs might build up quite fast.
Fish Audio gets close in quality to Eleven Labs but there is more to tweaking it / and looks to have some bugs.
- I'll report back as i get more insights into their bugs.

Anybody else tested TTS so far? What are your findings / feedback so far?

Corey · Jan 31, 2025

Great post! We've been chatting about this privately great to make it all public. My current experience:

Eleven Labs: The best solution for mission critical audio and, when used well, delivers nearly undetectable results.
Audiosonic: By Writesonic has some really high quality voices, you can buy "minutes" relatively cheaply.
Natural Reader: Has some realistic voices, for $10/mo you can get 500K chars/day conversion with 1M/mo MP3 download.
OpenAI: OpenAI voices are cheaper and can do well on material where they are suited. They added 5 new voices in October. There is talk of a new generation being added to the API soon that will be much higher quality like 11labs.
GPT-4o: I see comments in chat about some amazing voice it can do but I've never been able to replicate it.

Probably better options on the horizon but as of today I still use Eleven Labs for everything just because it has that slight quality edge, but it's not cheap at scale yet.

Corey · Jan 31, 2025

P.S. @lucian.harhata I do voiceover. If you want to do an SEO test for "TTS vs human voice" just send me scripts and I'll send you human audio.

It would be interesting to test Soundcloud. Bradley Bennett just did a video on using it to get rankings by syndicating AI-generated podcast content. No one has ever proved/disproved definitively that TTS vs Human even matters, we should do that and release the results to the community.

Corey · Feb 6, 2025

Hey @lucian.harhata have you seen this one? https://appsumo.com/products/lazybird/

This is their normal pricing: https://www.lazybird.app/pricing

lucian.harhata · Feb 8, 2025

Corey said:
P.S. @lucian.harhata I do voiceover. If you want to do an SEO test for "TTS vs human voice" just send me scripts and I'll send you human audio.

It would be interesting to test Soundcloud. Bradley Bennett just did a video on using it to get rankings by syndicating AI-generated podcast content. No one has ever proved/disproved definitively that TTS vs Human even matters, we should do that and release the results to the community.

Yeah this would be a good test TTS vs human audio.

I'd say AI generated podcast content can work, i've seen quite a few podcasts lately that have good TTS voice and users are listening in.

Corey · Feb 10, 2025

I'm having good luck with my TTS entity videos. The engagement is not spectacular but:

1) It solves what @Ted calls "Goose Egg SEO" overnight. I can get genuine organic topical traffic into any set of zero-traffic pages in one day. Even 1,000s of them. For very little cost.
2) In many sectors they instantly take over YT rankings and Google SERPs. This adds up.
3) It's also a high trust factor when customers google you and find 100 Q&A videos matching their exact needs.

So even without a lot of engagement they do elevate your brand presence and get you traffic. Definitely they work.

paulandre · Mar 6, 2025

Eleven Labs is what I use

Best TTS for Video

lucian.harhata

Member

Corey

Moderator

Corey

Moderator

Corey

Moderator

lucian.harhata

Member

Corey

Moderator

paulandre

Member

Similar threads