ewen.chou echo chamber

Chatting with Alexa

In a previous post I mentioned my idea of using different triggers for Alexa. In particular, I didn’t want to setup a hardware “push-to-talk” button. Now that I had code to interact with Alexa Voice Service (AVS), I needed a way to generate the audio commands for the requests (instead of recording my voice for each command).

So I ended up looking for a text-to-speech (TTS) tool that could run on Linux. After a bit of searching on the Internet, I found that Festival seemed to be the best option for TTS on Linux.

I am using Ubuntu for my development and testing, and it was very straightforward to install.

sudo apt-get install festival

As is the usual case, the Arch Linux wiki also had a great entry for Festival that offers more details.

What I needed was the text2wave command that comes with the festival package when it is installed. With this command, you can give a text file as input and it can output the speech as a WAV file.

text2wave -o <output_wav_file> <input_text_file>

I wrote a simple Python wrapper to execute the command:

def tts(text, save_to=None):
    """Converts text to speech (WAV) file.

    Args:
        text (str): Text to convert
        save_to (str): File path for saving the WAV file. If not
                       provided will save to a `/tmp/simple-tts/`

    Returns:
        Path (str) where the WAV file is saved.
    """
    os.system('mkdir -p {}'.format(TEMP_DIR))
    temp_file = TEMP_DIR + '/{}.txt'.format(uuid.uuid4())
    if not save_to:
        save_to = TEMP_DIR + '/{}.wav'.format(uuid.uuid4())
    with open(temp_file, 'w') as f:
        f.write(text)
    os.system('text2wave -o {out_fn} {in_fn}'.format(
        out_fn=save_to, in_fn=temp_file))
    return save_to

After some testing, I found that Alexa had some trouble understanding the default voice that is used by Festival. After some more Google-Fu, I was able to find some information about installing additional voice packs for Festival. I tested several of the voice packs and found that the CMU Arctic clb (US English female voice) gave the best results.

Here’s the TTS code, and a simple example that shows how to tie it together with alexa-client.

Please send feedback & follow me on Twitter @ewenchou

You can find code for my projects on Github