How to Convert Text to Speech in Python

Learn how you to perform speech synthesis by converting text to speech both online and offline using gTTS and pyttsx3 libraries in Python.
  · 7 min read · Updated dec 2020 · Machine Learning · Application Programming Interfaces


Speech synthesis (or Text to Speech) is the computer-generated simulation of human speech. It converts human language text into human-like speech audio. In this tutorial, you will learn how you can convert text to speech in Python.

In this tutorial, we won't be building neural networks and train the model in order to achieve results, as it is pretty complex and hard to do it. Instead, we gonna use some APIs and engines that offer it. There are a lot of APIs out there that offers this service, one of the commonly used services is Google Text to Speech, in this tutorial, we will play around with it along with another offline library: pyttsx3.

To make things clear, this tutorial is about converting text to speech and not the other way around, if you want to convert speech to text instead, check this tutorial.

Table of contents:

To get started, let's install required modules:

pip3 install gTTS pyttsx3 playsound

Online Text to Speech

As you may guess, gTTS stands for Google Text To Speech, it is a Python library to interface with Google Translate's text to speech API. It requires Internet connection and it's pretty easy to use.

Open up a new Python file and import:

import gtts
from playsound import playsound

It's pretty straightforward to use this library, you just need to pass text to gTTS object that is an interface to Google Translate's Text to Speech API:

# make request to google to get synthesis
tts = gtts.gTTS("Hello world")

Up to this point, we have sent the text and retrieved the actual audio speech from the API, let's save this audio to a file:

# save the audio file
tts.save("hello.mp3")

Awesome, you'll see a new file appear in the current directory, let's play it using playsound module installed previously:

# play the audio file
playsound("hello.mp3")

And that's it ! You'll hear a robot talking what you just told him to say!

It isn't available only in English, you can use other languages as well by passing the lang parameter:

# in spanish
tts = gtts.gTTS("Hola Mundo", lang="es")
tts.save("hola.mp3")
playsound("hola.mp3")

If you don't want to save it to a file and just play it directly, then you should use tts.write_to_fp() which accepts io.BytesIO() object to write into, check this link for more information.

To get the list of available languages, use this:

# all available languages along with their IETF tag
print(gtts.lang.tts_langs())

Here are the supported languages:

{'af': 'Afrikaans', 'sq': 'Albanian', 'ar': 'Arabic', 'hy': 'Armenian', 'bn': 'Bengali', 'bs': 'Bosnian', 'ca': 'Catalan', 'hr': 'Croatian', 'cs': 'Czech', 'da': 'Danish', 'nl': 'Dutch', 'en': 'English', 'eo': 'Esperanto', 'et': 'Estonian', 'tl': 'Filipino', 'fi': 'Finnish', 'fr': 'French', 'de': 'German', 'el': 'Greek', 'gu': 'Gujarati', 'hi': 'Hindi', 'hu': 'Hungarian', 'is': 'Icelandic', 'id': 'Indonesian', 'it': 'Italian', 'ja': 'Japanese', 'jw': 'Javanese', 'kn': 'Kannada', 'km': 'Khmer', 'ko': 'Korean', 'la': 'Latin', 'lv': 'Latvian', 'mk': 'Macedonian', 'ml': 'Malayalam', 'mr': 
'Marathi', 'my': 'Myanmar (Burmese)', 'ne': 'Nepali', 'no': 'Norwegian', 'pl': 'Polish', 'pt': 'Portuguese', 'ro': 'Romanian', 'ru': 'Russian', 'sr': 'Serbian', 'si': 'Sinhala', 'sk': 'Slovak', 'es': 'Spanish', 'su': 'Sundanese', 'sw': 'Swahili', 'sv': 'Swedish', 'ta': 'Tamil', 'te': 'Telugu', 'th': 'Thai', 'tr': 'Turkish', 'uk': 'Ukrainian', 'ur': 'Urdu', 'vi': 'Vietnamese', 'cy': 'Welsh', 'zh-cn': 'Chinese (Mandarin/China)', 'zh-tw': 'Chinese (Mandarin/Taiwan)', 'en-us': 'English (US)', 'en-ca': 'English (Canada)', 'en-uk': 'English (UK)', 'en-gb': 'English (UK)', 'en-au': 'English (Australia)', 'en-gh': 'English (Ghana)', 'en-in': 'English (India)', 'en-ie': 'English (Ireland)', 'en-nz': 'English (New Zealand)', 'en-ng': 'English (Nigeria)', 'en-ph': 'English (Philippines)', 'en-za': 'English (South Africa)', 'en-tz': 'English (Tanzania)', 'fr-ca': 'French (Canada)', 'fr-fr': 'French (France)', 'pt-br': 'Portuguese (Brazil)', 'pt-pt': 'Portuguese (Portugal)', 'es-es': 'Spanish (Spain)', 'es-us': 'Spanish (United States)'}

Offline Text to Speech

Now you know how to use Google's API, but what if you want to use text to speech technologies offline ?

Well, pyttsx3 library comes into the rescue, it is a text to speech conversion library in Python, it looks for TTS engines pre-installed in your platform and uses them, here are the text-to-speech synthesizers that this library uses:

  • SAPI5 on Windows XP, Windows Vista, 8, 8.1 and 10
  • NSSpeechSynthesizer on Mac OS X 10.5 and 10.6
  • espeak on Ubuntu Desktop Edition 8.10, 9.04 and 9.10

Here are the main features of pyttsx3 library:

  • It works fully offline
  • You can choose among different voices that are installed on your system
  • Controlling the speed of speech
  • Tweaking volume
  • Saving the speech audio into a file

Note: If you're on a Linux system and the voice output is not working with this library, then you should install espeak, ffmpeg and libespeak1:

$ sudo apt update && sudo apt install espeak ffmpeg libespeak1

To get started with this library, open up a new Python file and import it:

import pyttsx3

Now we need to initialize the TTS engine:

# initialize Text-to-speech engine
engine = pyttsx3.init()

Now to convert some text, we need to use say() and runAndWait() methods:

# convert this text to speech
text = "Python is a great programming language"
engine.say(text)
# play the speech
engine.runAndWait()

say() method adds an utterance to speak to the event queue, while runAndWait() method runs the actual event loop until all commands queued up. So you can call multiple times the say() method and run a single runAndWait() method in the end, in order to hear the synthesis, try it out!

This library provides us with some properties that we can tweak based on our needs. For instance, let's get the details of speaking rate:

# get details of speaking rate
rate = engine.getProperty("rate")
print(rate)

Output:

200

Alright, let's change this to 300 (make the speaking rate much faster):

# setting new voice rate (faster)
engine.setProperty("rate", 300)
engine.say(text)
engine.runAndWait()

Or slower:

# slower
engine.setProperty("rate", 100)
engine.say(text)
engine.runAndWait()

Another useful property is voices, which allow us to get details of all voices available on your machine:

# get details of all voices available
voices = engine.getProperty("voices")
print(voices)

Here is the output in my case:

[<pyttsx3.voice.Voice object at 0x000002D617F00A20>, <pyttsx3.voice.Voice object at 0x000002D617D7F898>, <pyttsx3.voice.Voice object at 0x000002D6182F8D30>]

As you can see, my machine has three voice speakers, let's use the second, for example:

# set another voice
engine.setProperty("voice", voices[1].id)
engine.say(text)
engine.runAndWait()

You can also save the audio as file using the save_to_file() method, instead of playing the sound using say() method:

# saving speech audio into a file
engine.save_to_file(text, "python.mp3")
engine.runAndWait()

A new MP3 file will appear in the current directory, check it out!

Conclusion

Great, that's it for this tutorial, I hope that will help you build your application, or maybe your own virtual assistant in Python.

To conclude, if you want to use more reliable synthesis, Google TTS API is your choice, if you just want to make it work a lot faster and without Internet connection, you should use pyttsx3 library.

Here are the documentation for both libraries:

Related: How to Play and Record Audio in Python.

Happy Coding ♥

View Full Code
Sharing is caring!



Read Also





Comment panel