How to Convert Text to Speech in Python

Learn how you to perform speech synthesis by converting text to speech both online and offline using gTTS and pyttsx3 libraries in Python.
Abdou Rockikz · 6 min read · Updated may 2020 · Machine Learning


Speech synthesis is the computer-generated simulation of human speech. It converts human language text into human-like speech. In this tutorial, you will learn how you can convert text to speech in Python.

In this tutorial, we won't be building neural networks and train the model in order to achieve results, as it is pretty complex and hard to do it. Instead, we gonna use some APIs and engines that offer it. There are a lot of APIs out there that offers this service, one of the commonly used services is Google Text to Speech, in this tutorial, we will play around with it along with another offline library: pyttsx3.

To make things clear, this tutorial is about converting text to speech and not the other way around, if you want to convert speech to text instead, check this tutorial.

To get started, let's install required modules:

pip3 install gTTS pyttsx3 playsound

Online Synthesis

As you may guess, gTTS stands for Google Text To Speech, it is a Python library that wraps the original API to ease the work for us.

Open up a new Python file and import:

import gtts
from playsound import playsound

It's pretty straightforward to use this library, you just need to pass text to gTTS object that is an interface to Google Translate's Text-to-Speech API:

# make request to google to get synthesis
tts = gtts.gTTS("Hello world")

Up to this point, we have sent the text and retrieved the actual audio speech, let's save this audio to a file:

# save the audio file
tts.save("hello.mp3")

Awesome, you'll see a new file appear in the current directory, let's play it using playsound module installed previously:

# play the audio file
playsound("hello.mp3")

And that's it ! You'll hear a robot talking what you just tell him to say! Note that it isn't only available in English, you can use other languages as well by passing the lang parameter:

# in spanish
tts = gtts.gTTS("Hola Mundo", lang="es")
tts.save("hola.mp3")
playsound("hola.mp3")

To get the list of available languages, use this:

# all available languages along with their IETF tag
print(gtts.lang.tts_langs())

Here are the supported languages:

{'af': 'Afrikaans', 'sq': 'Albanian', 'ar': 'Arabic', 'hy': 'Armenian', 'bn': 'Bengali', 'bs': 'Bosnian', 'ca': 'Catalan', 'hr': 'Croatian', 'cs': 'Czech', 'da': 'Danish', 'nl': 'Dutch', 'en': 'English', 'eo': 'Esperanto', 'et': 'Estonian', 'tl': 'Filipino', 'fi': 'Finnish', 'fr': 'French', 'de': 'German', 'el': 'Greek', 'gu': 'Gujarati', 'hi': 'Hindi', 'hu': 'Hungarian', 'is': 'Icelandic', 'id': 'Indonesian', 'it': 'Italian', 'ja': 'Japanese', 'jw': 'Javanese', 'kn': 'Kannada', 'km': 'Khmer', 'ko': 'Korean', 'la': 'Latin', 'lv': 'Latvian', 'mk': 'Macedonian', 'ml': 'Malayalam', 'mr': 
'Marathi', 'my': 'Myanmar (Burmese)', 'ne': 'Nepali', 'no': 'Norwegian', 'pl': 'Polish', 'pt': 'Portuguese', 'ro': 'Romanian', 'ru': 'Russian', 'sr': 'Serbian', 'si': 'Sinhala', 'sk': 'Slovak', 'es': 'Spanish', 'su': 'Sundanese', 'sw': 'Swahili', 'sv': 'Swedish', 'ta': 'Tamil', 'te': 'Telugu', 'th': 'Thai', 'tr': 'Turkish', 'uk': 'Ukrainian', 'ur': 'Urdu', 'vi': 'Vietnamese', 'cy': 'Welsh', 'zh-cn': 'Chinese (Mandarin/China)', 'zh-tw': 'Chinese (Mandarin/Taiwan)', 'en-us': 'English (US)', 'en-ca': 'English (Canada)', 'en-uk': 'English (UK)', 'en-gb': 'English (UK)', 'en-au': 'English (Australia)', 'en-gh': 'English (Ghana)', 'en-in': 'English (India)', 'en-ie': 'English (Ireland)', 'en-nz': 'English (New Zealand)', 'en-ng': 'English (Nigeria)', 'en-ph': 'English (Philippines)', 'en-za': 'English (South Africa)', 'en-tz': 'English (Tanzania)', 'fr-ca': 'French (Canada)', 'fr-fr': 'French (France)', 'pt-br': 'Portuguese (Brazil)', 'pt-pt': 'Portuguese (Portugal)', 'es-es': 'Spanish (Spain)', 'es-us': 'Spanish (United States)'}

Offline Synthesis

Now you know how to use Google's API, but what if you want to use text to speech technologies offline ? Well, pyttsx3 library comes into the rescue, it basically looks for TTS engines pre-installed in your platform, here are the text-to-speech synthesizers that this library uses:

  • SAPI5 on Windows XP, Windows Vista, 8, 8.1 and 10
  • NSSpeechSynthesizer on Mac OS X 10.5 and 10.6
  • espeak on Ubuntu Desktop Edition 8.10, 9.04 and 9.10

We'll see in a minute how to use different drivers and voices in this library.

To get started with this library, open up a new Python file and import it:

import pyttsx3

Now we need to initialize the TTS engine:

# initialize Text-to-speech engine
engine = pyttsx3.init()

Now to convert some text, we need to use say() and runAndWait() methods:

# convert this text to speech
text = "Python is a great programming language"
engine.say(text)
# play the speech
engine.runAndWait()

say() method adds an utterance to speak to the event queue, while runAndWait() method runs the actual event loop until all commands queued up. So you can call multiple times the say() method and run a single runAndWait() method in the end, in order to hear the synthesis, try it out!

This library provides us with some properties that we can tweak based on our needs. For instance, let's get the details of speaking rate:

# get details of speaking rate
rate = engine.getProperty("rate")
print(rate)

Output:

200

Alright, let's change this to 300 (make the speaking rate much faster):

# setting new voice rate (faster)
engine.setProperty("rate", 300)
engine.say(text)
engine.runAndWait()

Or slower:

# slower
engine.setProperty("rate", 100)
engine.say(text)
engine.runAndWait()

Another useful property is voices, which allow us to get details of all voices available on your machine:

# get details of all voices available
voices = engine.getProperty("voices")
print(voices)

Here is the output in my case:

[<pyttsx3.voice.Voice object at 0x000002D617F00A20>, <pyttsx3.voice.Voice object at 0x000002D617D7F898>, <pyttsx3.voice.Voice object at 0x000002D6182F8D30>]

As you can see, my machine has three voice speakers, let's use the second, for example:

# set another voice
engine.setProperty("voice", voices[1].id)
engine.say(text)
engine.runAndWait()

Conclusion

Great, that's it for this tutorial, I hope that will help you build your application, or maybe your own virtual assistant in Python.

To conclude, if you want to use more reliable synthesis, Google TTS API is your choice, if you just want to make it work a lot faster and without Internet connection, you should use pyttsx3 library.

Here are the documentation for both libraries:

Related: How to Play and Record Audio in Python.

Happy Coding ♥

View Full Code
Sharing is caring!



Read Also





Comment panel

   
Comment system is still in Beta, if you find any bug, please consider contacting us here.