How to Convert Speech to Text in Python

Learning how to use Speech Recognition Python library for performing speech recognition to convert audio speech to text in Python.
Abdou Rockikz · 4 min read · Updated feb 2020 · Machine Learning


Speech recognition is the ability of a computer software to identify words and phrases in spoken language and convert them to human readable text. In this tutorial, you will learn how you can convert speech to text in Python using SpeechRecognition library.

As a result, we do not need to build any machine learning model from scratch, this library provides us with convenient wrappers for various well known public speech recognition APIs (such as Google Cloud Speech API, IBM Speech To Text, etc.).

Related: How to Extract PDF Tables in Python.

Alright, let's get started, installing the library using pip:

pip3 install SpeechRecognition

Okey, open up a new Python file and import it:

import speech_recognition as sr

The nice thing about this library is it supports several recognition engines:

We gonna use Google Speech Recognition here, as it doesn't require any API key.

Reading from a File

Make sure you have an audio file in the current directory that contains english speech (if you want to follow along with me, get the audio file here):

filename = "16-122828-0002.wav"

This file was grabbed from LibriSpeech dataset, but you can bring anything you want, just change the name of the file, let's initialize our speech recognizer:

# initialize the recognizer
r = sr.Recognizer()

The below code is responsible for loading the audio file, and converting the speech into text using Google Speech Recognition:

# open the file
with sr.AudioFile(filename) as source:
    # listen for the data (load audio to memory)
    audio_data = r.record(source)
    # recognize (convert from speech to text)
    text = r.recognize_google(audio_data)
    print(text)

This will take few seconds to finish, as it uploads the file to Google and grabs the output, here is my result:

I believe you're just talking nonsense

Reading from the Microphone

This requires PyAudio to be installed in your machine, here is the installation process depending on your operating system:

Windows

You can just pip install it:

pip3 install pyaudio

Linux

You need to first install the dependencies:

sudo apt-get install python-pyaudio python3-pyaudio
pip3 install pyaudio

MacOS

You need to first install portaudio, then you can just pip install it:

brew install portaudio
pip3 install pyaudio

Now let's use our microphone to convert our speech:

with sr.Microphone() as source:
    # read the audio data from the default microphone
    audio_data = r.record(source, duration=5)
    print("Recognizing...")
    # convert speech to text
    text = r.recognize_google(audio_data)
    print(text)

This will hear from your microphone for 5 seconds and then tries to convert that speech into text !

It is pretty similar to the previous code, but we are using Microphone() object here to read the audio from the default microphone, and then we used duration parameter in record() function to stop reading after 5 seconds and then uploads the audio data to Google to get the output text.

You can also use offset parameter in record() function to start recording after offset seconds.

Also, you can recognize different languages by passing language parameter to recognize_google() function. For instance, if you want to recognize spanish speech, you would use:

text = r.recognize_google(audio_data, language="es-ES")

Check out supported languages in this stackoverflow answer.

Conclusion

As you can see, it is pretty easy and simple to use this library for converting speech to text. This library is widely used out there in the wild, make sure you master it, check their official documentation.

Read Also: How to Recognize Optical Characters in Images in Python.

Happy Coding ♥

View Full Code
Sharing is caring!



Read Also





Comment panel

   
Comment system is still in Beta, if you find any bug, please consider contacting us here.