Tuesday, December 31, 2013

Speech synthesis

So I just found out about something called Vocaloid, which is a Yamaha product, closed source, for text-to-singing. There's nothing remotely like it in the open source world, but I suspect you could cobble it together from parts already in existence (mostly) by including the melody and timing into an existing text-to-speech engine.

Possible engines might be:
  • MARY (another project from the DFKI)
  • eSpeak - this is more or less the Linux default
  • flite, which is Festival lite.
  • And Festvox, which is the full Edinburgh/CMU Festival system.
I can only imagine that Vocaloid is a unit synthesizer with a large database; the output is pretty natural-sounding, in contrast to the state-of-the-art of truly synthetic speech.  It would be a lot of fun to play with this stuff, especially in the context of music.

