Get the audio data from Google Assistant - dialogflow-es

As of now (is using api.ai) what I see is, I get the string format of what the user speaks.
I would like to access the raw audio file the user speaks to interact with Google Assistant using the api.ai platform.
Is there a way to get the audio file ?
[UPDATE]:
We are aiming to evaluate the quality of speech of the user hence we would need to run the algorithms on the audio.

No, there is currently no way to get the audio content of what has been sent.
(However, the team is looking to understand the use cases of why you might want or need this feature, so you may want to elaborate on your question further.)

Related

Is it possible to record audio and play it afterwards on Google Assistant?

I'd like to know if it's possible to record an audio extract on Google Assistant (with Dialogflow and Firebase) and play it later? The idea is to:
Ask the user to tell his name vocally.
Record it.
Play it afterwards.
I read these answers. The answer was no, but maybe there's an update as now we can listen to what we said on Google Assistant "myactivity" as seen here.
The answer is still no, as a developer. While users have recordings available, there's no way to access them programmatically and no way to play them through Dialogflow.
You can read this great article which explains a trick to record audio and play it back on the Google Assistant using a progressive web app.
The design consists of thses two parts:
A web client for recording and uploading the audio files to Google Cloud Storage.
An Action that plays the audio file from Cloud Storage
You'll also find the url of the open-sourced code on github at the end of the article.

Sending voice command automatically to google home

I want to test how Google Home transforms vocal commands to text by sending voice commands and storing the result returned. I have already done the storage part, but now, I can't find in the documentation how to send automatically voice commands to google home, the only apparent way is to speak to it directly, which not very practical if you want to test a long list of commands, 50 times for each command!
Edited: To make it clearer, I want to write a function that sends voice files (mp3, or any other format) to google assistant, instead of having to say/pronounce the command in a human way.
Do you know if it is possible to make this process automatic ?
It sounds like you might want the Assistant SDK, which will let you send an audio stream/file or text to be processed by the Assistant and return the result.
You're unclear about exactly what you're trying to do, and how you're trying to do it, but this table should help you understand what features are available for the various methods of using the Assistant SDK. In general, you'll be able to send an audio file or stream (either using the python library or a gRPC library for your language of choice) and get a response back.

How do I use twiml.gather continuously in existing call?

I want to be able to capture both parties speech to text continuously in a call and send those strings off to be translated in real-time and then use twiml.say to speak the text back. I have not been having much luck with this and wondering how I should go about doing this.
The one user will make a call from their phone to the other support person which is at a web browser. I have the call setup and working fine, however I cannot find any documentation anywhere that is aligned with what I am wanting to do and wondering if it is possible or if I need to be looking down a different route.
Should anyone have any advice or has seen samples similar to this I would love to see them. Thanks!
Twilio developer evangelist here.
It's not currently possible to capture a two legged conversation with <Gather> and speech recognition. So you might need to look somewhere else for this functionality.

Audio hosting service that offers transcriptions of uploaded file?

Similar to how YouTube captions videos, is there any audio hosting service out there that will transcribe audio and provide a written transcription for accessibility purposes?
No.
You could upload the audio to YouTube as a video file and get its auto-captions, terrible as they are, then extract those.
You should know that YouTube's auto-captioning should never (never) be relied on. You can instead use it to generate a rough time-based set of captions that you can then download and correct.
The easiest way to do that is via No More Craptions, which will take a YouTube video with auto-captions and walk you through correcting them in a simple interface.
You may then download your completed work as a transcript as well. When you do that, remember to offer a plain text link near the audio file / player on the page with a clear indication of what the user will receive.
Let me reiterate — never rely on YouTube auto-captions. Always correct whatever YouTube provides. Always.

Is there a in-built voice in gstreamer?

I am building a gtk+3.0 application. When a user clicks a button or enters information, I want to give audio feedback to the user using gstreamer.
Is there an in-built voice, which, when a string is passed to it, it speaks the string?
If yes, are there culture-specific voices?
Are different languages supported in such a in-built voice?
Or should i just ask someone to let me record their voice for each audio feedback that has to be provided? This will be inefficient if the application grows.
There is the 'festival' plugin, it uses the festival library but doesn't have any configuration properties. If you find out the the festival library can have properties exposed that are useful for you please request them to be added in gstreamer bugzilla: http://bugzilla.gnome.org/enter_bug.cgi?product=GStreamer

Resources