How to convert an audio file in colab to text? - python-3.x

I am trying to convert an audio file I have in colab workspace into text using the speech recognition module. But it doesn't work as the audio argument here needs to be audio, how do I load an audio file "audio.wav" into some variable to pass there or just simply pass that file.
import speech_recognition as sr
r = sr.Recognizer()
text = r.recognize_google(audio, language = 'en-IN')
print(text)

The speech_recognition library has a procedure to read in audio files. You can do:
inp = sr.AudioFile('path/to/audio/file')
with inp as file:
audio = r.record(file)
After that pass the audio as the first argument to r.recognize_google()
Here is a good article to understand this library.

pip3 install SpeechRecognition pydub
Make sure you have an audio file in the current directory that contains english speech
import speech_recognition as sr
filename = "16-122828-0002.wav"
The below code is responsible for loading the audio file, and converting the speech into text using Google Speech Recognition:
# initialize the recognizer
r = sr.Recognizer()
# open the file
with sr.AudioFile(filename) as source:
# listen for the data (load audio to memory)
audio_data = r.record(source)
# recognize (convert from speech to text)
text = r.recognize_google(audio_data)
print(text)
This will take few seconds to finish, as it uploads the file to Google and grabs the output

Related

How do I find the length of a sound generated by pyttsx3

I am making a program that needs the length of speech generated by pyttsx3
I did not find any way to do it using pyttsx3 so I am storing the speech in a file
and then trying to use mutagen to get the audio info
import pyttsx3
from mutagen.mp3 import MP3
# the engine
engine = pyttsx3.init()
# 'Hello World' is just an example
engine.save_to_file('Hello world', 'test.mp3')
engine.runAndWait()
# load the mp3 as an audio
audio = MP3('test.mp3')
# the line above gives an error
I get the following error mutagen.mp3.HeaderNotFoundError: can't sync to mpeg frame
Why am I getting this error?
and also is there any other way to get the length of a pyttsx generated speech?
If there is no specific need to use mutagen, I recommend using pydub instead. Code below which gives duration in seconds
Code:
import pyttsx3
from pydub import AudioSegment
# the engine
engine = pyttsx3.init()
# 'Hello World' is just an example
engine.save_to_file('Hello world', 'test.mp3')
engine.runAndWait()
# load the mp3 as an audio
audio = AudioSegment.from_file("test.mp3")
print(audio.duration_seconds)
Output:
0.9205442176870748

Python : How to use speech_recognition or other modules to convert base64 audio string to text?

I have base64 audio string like data:audio/mpeg;base64,//OAxAAAAANIAAAAABhqZ3f4StN3gOAaB4NAUBYZLv......, and I was trying to convert the base64 to wav file using base64 module in Python:
decode_bytes = base64.b64decode(encoding_str)
with open(file_name + '.wav', "wb") as wav_file:
wav_file.write(decode_bytes)
Then I was trying to convert the audio to text using speech_recognition module, and it gives error below:
ValueError: Audio file could not be read as PCM WAV, AIFF/AIFF-C, or Native FLAC; check if file is corrupted or in another format
Is there a solution for this problem?
Seems like your audio file is mp3 from mime-type - audio/mpeg. You will need to save it as mp3
decode_bytes = base64.b64decode(encoding_str)
with open(file_name + '.mp3', "wb") as wav_file:
wav_file.write(decode_bytes)
And convert the mp3 to wav format either using pydub or FFmpeg and then give this wav file to speech_recognition module.

Can google speech API convert text to speech?

I used Google speech API ti successfully convert speech to text using following code.
import speech_recognition as sr
import os
#obtain audio from the microphone
r = sr.Recognizer()
with sr.Microphone() as source:
print("Say something!")
audio = r.listen(source)
# recognize speech using Google Cloud Speech
GOOGLE_CLOUD_SPEECH_CREDENTIALS = r"""{KEY}
"""
# INSERT THE CONTENTS OF THE GOOGLE CLOUD SPEECH JSON CREDENTIALS FILE HERE
try:
speechOutput = (r.recognize_google_cloud(audio, credentials_json=GOOGLE_CLOUD_SPEECH_CREDENTIALS, language="si-LK"))
except sr.UnknownValueError:
speechOutput = ("Google Cloud Speech could not understand audio")
except sr.RequestError as e:
speechOutput = ("Could not request results from Google Cloud Speech service; {0}".format(e))
print(speechOutput)
I want to know if i can convert text to speech using the same API? If not what API to use and the sample python code for that.
Thank you!
For this you'll need to use the new Text-to-Speech API which is in Beta as of now. You can find a Python quickstart in the Client Library section of the docs. The sample is part of the python-docs-sample repo. Adding the relevant part of the example here for better visibility:
def synthesize_text(text):
"""Synthesizes speech from the input string of text."""
from google.cloud import texttospeech
client = texttospeech.TextToSpeechClient()
input_text = texttospeech.types.SynthesisInput(text=text)
# Note: the voice can also be specified by name.
# Names of voices can be retrieved with client.list_voices().
voice = texttospeech.types.VoiceSelectionParams(
language_code='en-US',
ssml_gender=texttospeech.enums.SsmlVoiceGender.FEMALE)
audio_config = texttospeech.types.AudioConfig(
audio_encoding=texttospeech.enums.AudioEncoding.MP3)
response = client.synthesize_speech(input_text, voice, audio_config)
# The response's audio_content is binary.
with open('output.mp3', 'wb') as out:
out.write(response.audio_content)
print('Audio content written to file "output.mp3"')
Update: rate and pitch configuration
You can enclose the text elements in a <prosody> tag to modify the rateand pitch. For example:
<prosody rate="slow" pitch="-2st">Can you hear me now?</prosody>
The possible values for those follow the W3 specifications which can be found here. The SSML docs for Text-to-Speech API detail this and they also provide some samples.
Also, you can control the general audio playback rate with the speed option in <audio>, which currently accepts values from 50 to 200% (in 1% increments).

Convert any audio file to mp3 with python

I want to convert any audio file (flac, wav,...) to mp3 with python
I am a noob , I tried pydub but I didn't found out how to make ffmpeg work with it, and If I'm right it can't convert flac file.
The idea of my project is to :
Make musicBee send the path of the 'now playing' track (by pressing the assigned shortcut) to my python file which would convert the music if it is not in mp3 and send it to a folder. (Everything in background so I don't have to leave what I'm doing to make the operation)
You can use the following the code:
from pydub import AudioSegment
wav_audio = AudioSegment.from_file("audio.wav", format="wav")
raw_audio = AudioSegment.from_file("audio.wav", format="raw",
frame_rate=44100, channels=2, sample_width=2)
wav_audio.export("audio1.mp3", format="mp3")
raw_audio.export("audio2.mp3", format="mp3")
You can also look here for more options.
flac_audio = AudioSegment.from_file("sample.flac", "flac")
flac_audio.export("sampleMp3.mp3", format="mp3")

Play wav file python 3

I want to play a .wav file in Python 3.4. Additonally, I want python to play the file rather than python open the file to play in VLC, media player etc..
As a follow up question, is there any way for me to combine the .wav file and the .py file into a standalone exe.
Ignore the second part of the question if it is stupid, I don't really know anything about compiling python.
Also, I know there have been other questions about .wav files, but I have not found one that works in python 3.4 in the way I described.
Using pyaudio you may get incorrect playback due to speed, consider instead:
sudo apt-get install python-pygame
Windows:
choco install python-pygame?
def playSound(filename):
pygame.mixer.music.load(filename)
pygame.mixer.music.play()
import pygame
pygame.init()
playSound('hellyeah.wav')
I fixed the problem by using the module pyaudio, and the module wave to read the file.
I will type example code to play a simple wave file.
import wave, sys, pyaudio
wf = wave.open('Sound1.wav')
p = pyaudio.PyAudio()
chunk = 1024
stream = p.open(format =
p.get_format_from_width(wf.getsampwidth()),
channels = wf.getnchannels(),
rate = wf.getframerate(),
output = True)
data = wf.readframes(chunk)
while data != '':
stream.write(data)
data = wf.readframes(chunk)
If you happen to be using linux a simple solution is to call aplay.
import os
wav_file = "./Hello.wav"
os.system(f'aplay {wav_file}')

Resources