What is your favorite Text To Speech Solution
I was wondering for those that use DAZ to create original animations with a bit of dialogue between characters, which Text To Speech platform do you find works best for you? I used TextAloud reader from NextUp and purchase several of the voice bundles and it worked fine until I tried to install the voices in Windows 10 then I couldnt really get it to function after that. Ivona voices were very natural sounding back then but still could use any expressive SSML tags, that I know of.
I know today the most obvious solution is to use an online AI voice generator, but I really don't like using online platforms and purchasing with a few hundred word limits. Are the any TTS readers and voices that can be run on a local host, that allow SSML tags? I have tried Balabolka but very few SSML tags work with it that I am aware of.
Any information would be appreciated
Comments
I have tried Balabolka as well and its way too Robotic
Honestly there are no really good local install solutions IMHO, particularly any that can and emotional inflection.
I did buy a one year subscription to
https://voicemaker.in/
to make my last full length animated film, but I have since let it lapse until I have another major movie project planned
Here are some of the “voicemaker” voices in the film
I have also used this service that has nice free voices with 250 character limit per generation but you can just break up your dialog into 250 letter paragraphs and connect the generated clips in audacity or similar
https://app.artflow.ai/my-creations
Here is a sample of one of the free Artflow.ai voices
I have been using some AI based ones lately
had one I used the other day that I ran on my own PC and could train on a minute of voice recording but I seem to have accidently deleted it and forgot what it was called
I am now questioning if I am going demented
to be fair I don't think it had a logical sounding name and I haven't apparently saved the download zip either
I've used Replica a few times. It's not cheap but you do get an actual performance. https://www.replicastudios.com/voice-library
oh I am not going mad, it was one of the many Visions of Chaos machine learning options
not a standalone
XTTS
anyway I cloned my own voice as well as my late mother's
don't really think either sounds like us but as unique text to speech voices they are not bad
there is a Hugging Face Demo https://huggingface.co/spaces/coqui/xtts
and a GitHub repository https://github.com/coqui-ai/TTS
I like VOC as it download the files and models, creates the Python enviroment, User interface and everything for me as I am a dummy
I came across EmotiVoice - Multi Voice TTS Engine on Youtube, and it looks like it is a local installation with many voices and emotive capabilities. It seems to require some coding skills which I am behind the power curve on some I have not dared give it a go. If anybody knows anything about it I'd love to know if it works well.( placed a link below).
I am not really looking for perfect A. I. quality I just want to find an egine that can use SSML tags with some of the mainstream voices like Ivona, Nuance, Acapela and the like. Tags like pitch, speed, volume. and in the past I have seen the (emph tag) which allows to emphasize parts of a word. Generally phonetics dictionaries allow to design how a word is pronounced. Seems like a few years ago there were several TTS enginges that could do this.
http://www.youtube.com/watch?v=jcj-KDPB_-o
Eleven Labs is amazing. Does speech-to-speech as well.
Agreed, very good option. I also use play.ht quite a bit.
I just tried out Eleven Labs, and you're right, it's quite impressive.
Has anybody continued using the local host TTS platforms, The original question was about finding emotive TTS solutions that are installed and run on a computer as opposed to an AI solution that requires a renewable service. I know that just a few years ago some companies were marketing realistic voices that could be installed on a computer that had SSML tag capability. They were so expensive at the time that I largely ignored them, but I can't seem to find them anymore. That is why I was wondering if anybody else has come across that type of TTS genre.
It is likely that TTS companies no longer see trying to sell expensive one off licenses as a viable business model when you just set up a sub based AI service online.
( and they are probably farming the actual
generation processing out to a third party at that)
I know you say you do not personally care about high quality but it is the utter lack of believable machine generated voices that have been the bane of us solo animators for decades and in short order
AI voice creation has become far and above any of the local install TTS software in terms of quality and Multi- lingual vocal diversity so I would not hold out any hope on finding any new local TTS apps arriving on the scene in the future
well the one I use with Visions of Chaos is local
I also use RVC to change wav and mp3 files to trained voices
it does singing voices too and isolates vocals and music
I use Eleven Labs. Both free and paid options.
Wendy what was the name of the software you were talking about that does singing? I'm somewhat confused about what Visions of Chaos and RVC are, could you explain a bit?
https://huggingface.co/lj1995/VoiceConversionWebUI/tree/main
is the one that can swap voices
it's a self contained zip installer
VOC is a software for hosting apps like Automatic 1111 and various other machine learning GUIs including Audiocraft, Zeroscope, Large Language Models etc
it"s advantage for me is it will download and install all the prerequisites for you and also has disc management for removing stuff you no longer want
I linked the Hugging space demo of the TTS software I was using earlier in this thread and it's GitHub repository if you want to set up your own python environment
VOC does all that for me
https://softology.pro/voc.htm