Can google speech API convert text to speech? - python-3.x

I used Google speech API ti successfully convert speech to text using following code.
import speech_recognition as sr
import os
#obtain audio from the microphone
r = sr.Recognizer()
with sr.Microphone() as source:
print("Say something!")
audio = r.listen(source)
# recognize speech using Google Cloud Speech
GOOGLE_CLOUD_SPEECH_CREDENTIALS = r"""{KEY}
"""
# INSERT THE CONTENTS OF THE GOOGLE CLOUD SPEECH JSON CREDENTIALS FILE HERE
try:
speechOutput = (r.recognize_google_cloud(audio, credentials_json=GOOGLE_CLOUD_SPEECH_CREDENTIALS, language="si-LK"))
except sr.UnknownValueError:
speechOutput = ("Google Cloud Speech could not understand audio")
except sr.RequestError as e:
speechOutput = ("Could not request results from Google Cloud Speech service; {0}".format(e))
print(speechOutput)
I want to know if i can convert text to speech using the same API? If not what API to use and the sample python code for that.
Thank you!

For this you'll need to use the new Text-to-Speech API which is in Beta as of now. You can find a Python quickstart in the Client Library section of the docs. The sample is part of the python-docs-sample repo. Adding the relevant part of the example here for better visibility:
def synthesize_text(text):
"""Synthesizes speech from the input string of text."""
from google.cloud import texttospeech
client = texttospeech.TextToSpeechClient()
input_text = texttospeech.types.SynthesisInput(text=text)
# Note: the voice can also be specified by name.
# Names of voices can be retrieved with client.list_voices().
voice = texttospeech.types.VoiceSelectionParams(
language_code='en-US',
ssml_gender=texttospeech.enums.SsmlVoiceGender.FEMALE)
audio_config = texttospeech.types.AudioConfig(
audio_encoding=texttospeech.enums.AudioEncoding.MP3)
response = client.synthesize_speech(input_text, voice, audio_config)
# The response's audio_content is binary.
with open('output.mp3', 'wb') as out:
out.write(response.audio_content)
print('Audio content written to file "output.mp3"')
Update: rate and pitch configuration
You can enclose the text elements in a <prosody> tag to modify the rateand pitch. For example:
<prosody rate="slow" pitch="-2st">Can you hear me now?</prosody>
The possible values for those follow the W3 specifications which can be found here. The SSML docs for Text-to-Speech API detail this and they also provide some samples.
Also, you can control the general audio playback rate with the speed option in <audio>, which currently accepts values from 50 to 200% (in 1% increments).

Related

python speech recognition response problems

im trying to make python voice recognition detect if user said a specific word like "hello"
but it doesnt seem to work,
heres my code
import speech_recognition as s_r
r = s_r.Recognizer()
my_mic = s_r.Microphone(device_index=1)
while True:
with my_mic as source:
voice = r.listen(source)
print(r.recognize_google(voice))
voice = voice
if voice =="hello":
print('hi')
but i get only what the user said as output, can anyone help me out with this?

Audio to Text from Blob trigger

So I have a use-case where I want to upload audio files (.WAV) into a blob storage which triggers a Function and gets the text from the audio. At the moment, the only way possible is having the audio file locally. The audio config can't take the uri of the audio file. The code I'm using is this:
import azure.cognitiveservices.speech as speechsdk
speech_key, service_region = "sub-key", "westeurope"
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
audio_input = speechsdk.AudioConfig(filename="**BLOB URI**")
speech_recognizer = speechsdk.SpeechRecognizer(speech_config, audio_input)
result = speech_recognizer.recognize_once()
if result.reason == speechsdk.ResultReason.RecognizedSpeech:
print("Recognized: {}".format(result.text))
elif result.reason == speechsdk.ResultReason.NoMatch:
print("No speech could be recognized: {}".format(result.no_match_details))
elif result.reason == speechsdk.ResultReason.Canceled:
cancellation_details = result.cancellation_details
print("Speech Recognition canceled: {}".format(cancellation_details.reason))
if cancellation_details.reason == speechsdk.CancellationReason.Error:
print("Error details: {}".format(cancellation_details.error_details))
From my research, we can't have a uri as a filename (bold part of code). Solutions like downloading locally first won't work.
I tried reading the audio as a stream but I couldn't find a way to convert to an AudioInputStream.
Any help would be great. Thanks.
You can use the Batch transcription REST API operations that enables you to transcribe a large amount of audio in storage. You can point to audio files using a typical URI or a shared access signature (SAS) URI and asynchronously receive transcription results. With the v3.0 API, you can transcribe one or more audio files, or process a whole storage container.
Please see the followings:
https://medium.com/#abhishekcskumar/logic-apps-large-audio-speech-to-text-batch-transcription-d71e93bbaeec
https://github.com/PanosPeriorellis/Speech_Service-BatchTranscriptionAPI/blob/master/CrisClient/Program.cs
https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/batch-transcription#sample-code

How to convert an audio file in colab to text?

I am trying to convert an audio file I have in colab workspace into text using the speech recognition module. But it doesn't work as the audio argument here needs to be audio, how do I load an audio file "audio.wav" into some variable to pass there or just simply pass that file.
import speech_recognition as sr
r = sr.Recognizer()
text = r.recognize_google(audio, language = 'en-IN')
print(text)
The speech_recognition library has a procedure to read in audio files. You can do:
inp = sr.AudioFile('path/to/audio/file')
with inp as file:
audio = r.record(file)
After that pass the audio as the first argument to r.recognize_google()
Here is a good article to understand this library.
pip3 install SpeechRecognition pydub
Make sure you have an audio file in the current directory that contains english speech
import speech_recognition as sr
filename = "16-122828-0002.wav"
The below code is responsible for loading the audio file, and converting the speech into text using Google Speech Recognition:
# initialize the recognizer
r = sr.Recognizer()
# open the file
with sr.AudioFile(filename) as source:
# listen for the data (load audio to memory)
audio_data = r.record(source)
# recognize (convert from speech to text)
text = r.recognize_google(audio_data)
print(text)
This will take few seconds to finish, as it uploads the file to Google and grabs the output

Google cloude authentification without json

So im using a small programm to get license plates from images. I do that by sending google vision the image and searching the text that i get bex for licens plates that are like a regular expression.
# -*- coding: utf-8 -*-
"""
Created on Sat May 23 19:42:18 2020
#author: Odatas
"""
import io
import os
from google.cloud import vision_v1p3beta1 as vision
import cv2
import re
# Setup google authen client key
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'client_key.json'
# Source path content all images
SOURCE_PATH = "F:/Radsteuereintreiber/Bilder Temp/"
def recognize_license_plate(img_path):
# Read image with opencv
img = cv2.imread(img_path)
# Get image size
height, width = img.shape[:2]
# Scale image
img = cv2.resize(img, (800, int((height * 800) / width)))
# Save the image to temp file
cv2.imwrite(SOURCE_PATH + "output.jpg", img)
# Create new img path for google vision
img_path = SOURCE_PATH + "output.jpg"
# Create google vision client
client = vision.ImageAnnotatorClient()
# Read image file
with io.open(img_path, 'rb') as image_file:
content = image_file.read()
image = vision.types.Image(content=content)
# Recognize text
response = client.text_detection(image=image)
texts = response.text_annotations
return texts
path = SOURCE_PATH + 'IMG_20200513_173356.jpg'
plate = recognize_license_plate(path)
for text in plate:
# read description
license_plate = text.description
# change all symbols to whitespace.
license_plate = re.sub('[^a-zA-Z0-9\n\.]', ' ', license_plate)
# see if some text matches pattern
test = re.findall('[A-Z]{1,3}\s[A-Z]{1,2}\s\d{1,4}', str(license_plate))
# stop if you found someting
if test is not None:
break
try:
print(test[0])
except Exception:
print("No plate found")
As you can see i set my envoiremental variable to the client_key.json at the start. When i distribut my programm i dont like to send out my key to every user. So i would like to include the key inside the program directly.
I tried it by using the explicit credential method by google with a json created inside the program like this:
def explicit():
#creat json
credentials={ REMOVED: INSIDE HER WOULD BE ALL THE INFORMATION FROM THE JSON KEY FILE.
}
json_credentials=json.dumps(credentials)
# Explicitly use service account credentials by specifying the private key
# file.
storage_client = storage.Client.from_service_account_json(
json_credentials)
# Make an authenticated API request
buckets = list(storage_client.list_buckets())
print(buckets)
# [END auth_cloud_explicit]
But i get the error.
[Errno 2] No such file or directory: content of my json again removed
So i not sure if i have to switch to an api based call and how do i call the same functionality then? Because i have to upload a picture obvriously i dont even think thats possible through an api call.
So im kinda lost. Thanks for any help.
If you want the user to be able to make API calls against your Google Cloud project, then including your service account key, either as a JSON file or inline in your code, is basically equivalent, and either way the user would have access to your key.
This is generally not advised though: even a minimally scoped service account would be able to make requests and potentially incur charges against your account.
An alternative would be to deploy your own API inside your Google Cloud project which wraps the call to the Vision API. This would allow you to protect your service account key, and also to rate limit or even block calls to this API if you need to.
Your script or library would then make calls to this custom API instead of directly to the Vision API.

gTTS /python/ - can you text message the .mp3 file in your script?

twilio api says it only handles mp4 for audio. Anyone have a solution to save python text to speach as .mp3 or .mp4 and text said audio file or a call and the mp3 audio plays.
Twilio developer evangelist here.
Twilio does support mp3 files for audio, as well as wav, aiff, gsm and ulaw.
It looks as though that's exactly what gTTS does for you. The example in the documentation is:
>>> from gtts import gTTS
>>> tts = gTTS('hello')
>>> tts.save('hello.mp3')
Alternatively, you could just use Twilio's text to speech as part of your call, with <Say>. If you are not satisfied with the basic voices you can now use voices from AWS Polly in Twilio Voice.
Let me know if that helps at all.

Resources