python speech recognition response problems - python-3.x

im trying to make python voice recognition detect if user said a specific word like "hello"
but it doesnt seem to work,
heres my code
import speech_recognition as s_r
r = s_r.Recognizer()
my_mic = s_r.Microphone(device_index=1)
while True:
with my_mic as source:
voice = r.listen(source)
print(r.recognize_google(voice))
voice = voice
if voice =="hello":
print('hi')
but i get only what the user said as output, can anyone help me out with this?

Related

How to convert an audio file in colab to text?

I am trying to convert an audio file I have in colab workspace into text using the speech recognition module. But it doesn't work as the audio argument here needs to be audio, how do I load an audio file "audio.wav" into some variable to pass there or just simply pass that file.
import speech_recognition as sr
r = sr.Recognizer()
text = r.recognize_google(audio, language = 'en-IN')
print(text)
The speech_recognition library has a procedure to read in audio files. You can do:
inp = sr.AudioFile('path/to/audio/file')
with inp as file:
audio = r.record(file)
After that pass the audio as the first argument to r.recognize_google()
Here is a good article to understand this library.
pip3 install SpeechRecognition pydub
Make sure you have an audio file in the current directory that contains english speech
import speech_recognition as sr
filename = "16-122828-0002.wav"
The below code is responsible for loading the audio file, and converting the speech into text using Google Speech Recognition:
# initialize the recognizer
r = sr.Recognizer()
# open the file
with sr.AudioFile(filename) as source:
# listen for the data (load audio to memory)
audio_data = r.record(source)
# recognize (convert from speech to text)
text = r.recognize_google(audio_data)
print(text)
This will take few seconds to finish, as it uploads the file to Google and grabs the output

How do I transfer the output from pyttsx3 to a variable for DSP

I'm running a Raspberry Pi4 with Python 3.7 and pyttsx3.
I'm planning to use pyttsx3 to verbally respond to "commands" I issue. I also plan to visualise the output speech on a neopixel strip (think "Close Encounters" on a miniature scale.) The visualisation is not the problem though.
My problem is, how do I get the output from pyttsx3 into a variable so I can pass it to my DSP?
I know that I can pass the to a file:
import pyttsx3
engine = pyttsx3.init() # object creation
"""Saving Voice to a file"""
engine.save_to_file('Hello World', 'text.mp3')
engine.runAndWait()
& I know I can read the file but that creates a latency.
I want the speech and twinkly lights to coincide and I know I can play the wav file but I'd like something more "real time".
Does anyone have any suggestions please?

How do I detect certain things that I said in a speech recognizer script

I am trying to make a voice activated virtual assistant of sorts using python, but I am not sure how to detect and distinguish between the different voice commands. Currently it just repeats back to you, "You Said [whatever i said]" but i want it to respond differently to different things that I say. I am quite new to python and don't know what I should do. Does anyone know how I could do this?
You have to define what you want it to do. The last two lines of this tell the program to do something if the input is hello. So when you run it, you say "hello" and it will have a different response. If it does not detect that you said "hello" then it will not do anything. I might recommend finding a project on github where they have already done an assistant like this and start to try to understand what they did and edit to the specifications you want.
import speech_recognition as sr
sample_rate = 48000
chunk_size = 2048
r = sr.Recognizer()
device_id = 1
with sr.Microphone(device_index=device_id, sample_rate=sample_rate, chunk_size=chunk_size) as source:
print("Say something...")
r.adjust_for_ambient_noise(source)
audio = r.listen(source)
text = r.recognize_google(audio)
if text.lower() == "hello":
print("Hi, how are you?")

How can i get my Virtual Assistant to hear me?

I am trying to build a virtual assistant for exercise, when i attempt to get audio using the microphone live 'Robin' (the V.A) will stay running.
I updated speechrecognitioin, pyaudio, and also reinstalled elasticsearch via homebrew after having to install java 1.8. I also tried adjusting the exception_on_overflow error after shutdown and set it '=False' (at this point i am much beyond my level of knowledge). On top of this to ensure the translation was working correctly i ran the -m speech recognition in terminal (OS: Mac) and it pretty accurately translated the speech. Im stumped.
# take command from microphone
def takeCommand():
r = sr.Recognizer()
with sr.Microphone() as source:
print('Absorbing...')
audio = r.listen(source)
try:
print('Recognizing...')
query = r.recognize_google(audio, language='en-US')
print(f'user said:{query}\n')
except KeyboardInterrupt as e:
print('Im sorry, I didnt get that.')
#Begin tasking:
speak('Initializing, Robin...')
wishMe()
takeCommand()
I am hoping for the console to return what i said into text, the goal would then to turn the text into an executable command. Hence the 'takeCommand' function. Yet if Robin cannot detect a sound she will give the output 'Im sorry,'. If theres anything else i can provide let me know. I really appreciate the feedback. Also Im new to stackoverfow I apologize if i didn't format this correctly.

Can google speech API convert text to speech?

I used Google speech API ti successfully convert speech to text using following code.
import speech_recognition as sr
import os
#obtain audio from the microphone
r = sr.Recognizer()
with sr.Microphone() as source:
print("Say something!")
audio = r.listen(source)
# recognize speech using Google Cloud Speech
GOOGLE_CLOUD_SPEECH_CREDENTIALS = r"""{KEY}
"""
# INSERT THE CONTENTS OF THE GOOGLE CLOUD SPEECH JSON CREDENTIALS FILE HERE
try:
speechOutput = (r.recognize_google_cloud(audio, credentials_json=GOOGLE_CLOUD_SPEECH_CREDENTIALS, language="si-LK"))
except sr.UnknownValueError:
speechOutput = ("Google Cloud Speech could not understand audio")
except sr.RequestError as e:
speechOutput = ("Could not request results from Google Cloud Speech service; {0}".format(e))
print(speechOutput)
I want to know if i can convert text to speech using the same API? If not what API to use and the sample python code for that.
Thank you!
For this you'll need to use the new Text-to-Speech API which is in Beta as of now. You can find a Python quickstart in the Client Library section of the docs. The sample is part of the python-docs-sample repo. Adding the relevant part of the example here for better visibility:
def synthesize_text(text):
"""Synthesizes speech from the input string of text."""
from google.cloud import texttospeech
client = texttospeech.TextToSpeechClient()
input_text = texttospeech.types.SynthesisInput(text=text)
# Note: the voice can also be specified by name.
# Names of voices can be retrieved with client.list_voices().
voice = texttospeech.types.VoiceSelectionParams(
language_code='en-US',
ssml_gender=texttospeech.enums.SsmlVoiceGender.FEMALE)
audio_config = texttospeech.types.AudioConfig(
audio_encoding=texttospeech.enums.AudioEncoding.MP3)
response = client.synthesize_speech(input_text, voice, audio_config)
# The response's audio_content is binary.
with open('output.mp3', 'wb') as out:
out.write(response.audio_content)
print('Audio content written to file "output.mp3"')
Update: rate and pitch configuration
You can enclose the text elements in a <prosody> tag to modify the rateand pitch. For example:
<prosody rate="slow" pitch="-2st">Can you hear me now?</prosody>
The possible values for those follow the W3 specifications which can be found here. The SSML docs for Text-to-Speech API detail this and they also provide some samples.
Also, you can control the general audio playback rate with the speed option in <audio>, which currently accepts values from 50 to 200% (in 1% increments).

Resources