Google Cloud Speech API did not return all text - speech-to-text

I am using this API to transcribe a long audio but the result only returned partial text.
https://cloud.google.com/speech-to-text/docs/async-recognize#speech-async-recognize-gcs-csharp
What did I do wrong?

Related

Unable to extract log from watson assistant

Please I have struggling with consuming watson assistant log API. currently we use watson assistant log API for fetching daily conversation log between our chatbot and customer and feed the log into NLC for classifying the intents for precision analyst. Recently we observed that IBM has decomission the NLC and advice that we migrate to NLU. since then we've being having issues getting the component running.
right now the assistant log return empty log for this filterĀ 
authenticator = IAMAuthenticator('********')
assistant = AssistantV2(
version='2021-06-14',
authenticator = authenticator
)
assistant.set_service_url('https://api.eu-gb.assistant.watson.cloud.ibm.com')
print("The assistant URL ")
response=assistant.list_logs(
assistant_id='******', page_limit=100, filter='response_timestamp>2022-09-14T00:00:00.000Z',
cursor=cursor).get_result()
response=assistant.list_logs(
assistant_id='*******').get_result()
print(json.dumps(response, indent=2))
#print("the response is ",response)
this is returning an empty data
{
"logs": [],
"pagination": {}
}
Here are some reasons why this may occur.
You are using V2 API which requires an Enterprise account. If you only have Plus then you can use the V1 API. But there is no guarantee it will continue to work as expected in the future.
You don't have any logs for that date range. Check in the Analytics page.
An issue in your code. Try running the sample code on the REST API docs. If this works then compare against your code.
Your filter looks ok, but try this filter (change language and assistant ID):
language::en,request.context.system.assistant_id::<YOUR_ASSISTANT_ID>,response_timestamp>2022-09-14T00:00:00.000Z
Seeing as you have midnight as your time, you can also change that part to:
response_timestamp>2022-09-14

Say "hello" when Twilio call connected

I have a websocket in python Flask that listens to a twilio call. When the call is started I want to say "hello" here is the code.
if data['event'] == "start":
speakBytes = speaker.speak("Hello") // using micrsoft cognitive service to convert the text to bytes
convertedBytes = ap.lin2ulaw(speakBytes.audio_data,1)
ws.send(responseString.format(base64.b64encode(convertedBytes), str(data['streamSid'])))
But the above is not working. I checked microsoft cognitive services speech sunthesizer returns the bytes in WAV format so I have used lin2ulaw form python audioop module.
Need help. Thanks in advance.
Twilio developer evangelist here.
It looks like you are correctly creating the audio to send to the Twilio Media Stream, however I don't think you are sending the correct format.
Twilio Media Streams expect a media message to be a JSON object with the following properties:
event: the value "media"
streamSid: the SID of the stream
media: an object with a "payload" property that then contains the base64 encoded mulaw/8000 audio
Something like this might work:
message = {
"streamSid": data['streamSid'],
"event": "media",
"media": {
"payload": base64.b64encode(convertedBytes)
}
}
# Serializing json
json_object = json.dumps(message)
ws.send(json_object)
If you're using Twilio to connect the number then you'll need to reply with TwiML to the call:
from twilio.twiml.voice_response import VoiceResponse
response = VoiceResponse()
response.say('Hello')
return str(response)
See the doc of <Say></Say.
If you want to use the .wav you created then you would need to save it somewhere accessible (e.g. an Amazon S3 bucket) and then you can use TwiML <Play></Play>.
Thanks for the answers everyone. The solution turned out to be a small change.
I had to change ap.lin2ulaw(speakBytes.audio_data,1) to ap.lin2ulaw(speakBytes.audio_data,4) and it worked fine. It seems to be the compatibility of microsoft text to speech and twilio formats.

Voice recognition failed miserably: Wrong status code 401 in Bing Speech API / token

When I was trying to translate a sample audio from English to some other language using Azure Bing Speech to Text Api, I am getting Error: Voice recognition failed miserably: Wrong status code 401 in Bing Speech API / token
I have tried increasing open_timeout to a higher value like 50000(which was suggested for slow-internet) hard-coded in bingspeech-api-client in Line 110 , but still the error persists.
let audioStream = fs.createReadStream('hello.wav');
// Bing Speech Key (https://www.microsoft.com/cognitive-services/en-us/subscriptions)
let subscriptionKey = '******';
let client = new BingSpeechClient(subscriptionKey);
client.recognizeStream(audioStream).then(function(response)
{
console.log("response is ",response);
console.log("-------------------------------------------------");
console.log("response is ",response.results[0]);
}).catch(function(error)
{
console.log("error occured is ",error);
});
This code should generate the text from that sample audio file.
Code 401 means unauthorized - wrong key in your case. I suspect you followed an outdated version of some tutorial since by now the service is not called Bing Speech API anymore. See here for a current tutorial using the microsoft-cognitiveservices-speech-sdk SDK for node.js.
https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/quickstart-js-node#use-the-speech-sdk

Dialogflow - Can't save response with only media content

I want to set up a response which plays an mp3 file in response to invocation.
The only way I managed to get this to work is of I add a simple response and suggestion_chips.
When I try to save only the media content I get an error "Errors in 'Default Welcome Intent' intent: Google Assistant simple_response, suggestion_chips should be added to intent".
What is the correct way to set up a response that will play an mp3 file without a text response and suggestion_chips?
This does not work:
This works:
A MediaResponse must also have a simple text response or SSML response associated with it.

How can a bot receive a voice file from Facebook Messenger (MP4) and convert it to a format that is recognized by speech engines like Bing or Google?

I'm trying to make a bot for Facebook Messenger using Microsoft's Bot Framework that will do this:
Get a user's voice message sent via Facebook Messenger
Convert speech to text
Do something with it
There's no problem with getting the voice message from Messenger (the URL can be extracted from the message the bot receives), and there's also no problem with converting an audio file to speech (using Bing Speech API or Google's similar API).
However, these APIs require PCM (WAV) files, while Facebook Messenger gives you an MP4 file.
Is there a popular/standard way of converting one format into another that is used in writing the bots?
So far my best idea is to run vlc.exe as a console job on my server and convert the file, but that doesn't sound like the best solution.
Developed a solution that works as follows:
Receive voice message from facebook
Download the MP4 file to local disk using the link inside Activity.Attachments
Use MediaToolKit (wrapper for FFMPEG) to convert MP4/AAC to WAV on local server
Send the WAV to Bing Speech API
So the answer to my question is: use MediaToolKit+ffmpeg to convert the file format.
Sample implementation and code here: https://github.com/J3QQ4/Facebook-Messenger-Voice-Message-Converter
public string ConvertMP4ToWAV()
{
var inputFile = new MediaFile { Filename = SourceFileNameAndPath };
var outputFile = new MediaFile { Filename = ConvertedFileNameAndPath };
using (var engine = new Engine(GetFFMPEGBinaryPath()))
{
engine.Convert(inputFile, outputFile);
}
return ConvertedFileNameAndPath;
}

Resources