Speech Recognition and Text-to-Speech in Python

2023

In this project, we will explore how to perform speech recognition and generate spoken responses using Python. We'll use the SpeechRecognition library to recognize speech from an audio source and the pyttsx3 library for text-to-speech conversion. We'll explain each step of the code and provide the final working code at the end.

Speech Recognition

To begin, we need to import the necessary libraries:

import speech_recognition as sr
import pyttsx3

We import speech_recognition as sr for performing speech recognition and pyttsx3 for text-to-speech conversion.

Next, we create an instance of the recognizer:

r = sr.Recognizer()

We create a Recognizer instance r that will be used for speech recognition.

Using the microphone as the audio source, we record the audio and perform speech recognition:

with sr.Microphone() as source:
    print("Listening...")

    r.adjust_for_ambient_noise(source)
    audio = r.listen(source)

    print("Recognition complete.")

    try:
        text = r.recognize_google(audio)
        print("Recognized text: " + text)

We use a with statement to open the microphone as the audio source. We adjust for ambient noise, record the audio using listen(), and store it in the audio variable. We then use recognize_google() to perform speech recognition using the Google Web Speech API.

Text-to-Speech Conversion

After recognizing the speech, we initialize the text-to-speech engine:

engine = pyttsx3.init()

We create an instance of the text-to-speech engine using init().

We can optionally set the speech rate:

engine.setProperty("rate", 150)

We use setProperty() to set the speech rate. This step is optional.

Next, we generate the spoken response:

engine.say("You said: " + text)
engine.runAndWait()

We use say() to specify the text that should be spoken. In this case, we concatenate "You said: " with the recognized text. Then, we use runAndWait() to generate the spoken response.

Final Code

Here's the complete Python code for performing speech recognition and text-to-speech conversion:

import speech_recognition as sr
import pyttsx3

r = sr.Recognizer()

with sr.Microphone() as source:
    print("Listening...")

    r.adjust_for_ambient_noise(source)
    audio = r.listen(source)

    print("Recognition complete.")

    try:
        text = r.recognize_google(audio)
        print("Recognized text: " + text)

        engine = pyttsx3.init()

        engine.setProperty("rate", 150)

        engine.say("You said: " + text)
        engine.runAndWait()

    except sr.UnknownValueError:
        print("Unable to recognize speech")
    except sr.RequestError as e:
        print("Error: {0}".format(e))

That's it! You now have a Python code snippet that allows you to perform speech recognition and generate spoken responses. Feel free to modify the code or use it as a starting point for your own projects. Enjoy recognizing speech and generating spoken responses!

Back