In google's speech to text live streaming, if user does not speak anything then google charge me? - speech-to-text

I'm using google's speech to text converter. Now I have to track each user's usage so I'm thinking about does google charge me if user does not speak over live streaming?
Thanks

Yes, you still have to pay. Use voice activity detection on your side before sending the audio.

Related

Record a short blob audio with Alexa using nodejs

I am trying to create a skill that will record 5 seconds of my voice and send it to my database as a blob with Alexa. (It's mainly for voice recognition.. I am using Azure Speaker API)
I have spent a lot of time trying to find if there is an specific way to do it with Nodejs and Alexa but I didn't find anything.
Currently I have this project in javascript so I am wondering if it would be possible to use what I have? Because I won't have the browser to allow my microphone, etc.. so I am not sure if it would work with Alexa.
If someone has an idea or have worked with getting audio to db from Alexa, please help me.
Thanks!
What you're trying to do is get user input as voice and as per my knowledge and finding, Alexa (alexa skill kit) doesn't allow you to get raw user input in anyway.

Is it possible to record audio and play it afterwards on Google Assistant?

I'd like to know if it's possible to record an audio extract on Google Assistant (with Dialogflow and Firebase) and play it later? The idea is to:
Ask the user to tell his name vocally.
Record it.
Play it afterwards.
I read these answers. The answer was no, but maybe there's an update as now we can listen to what we said on Google Assistant "myactivity" as seen here.
The answer is still no, as a developer. While users have recordings available, there's no way to access them programmatically and no way to play them through Dialogflow.
You can read this great article which explains a trick to record audio and play it back on the Google Assistant using a progressive web app.
The design consists of thses two parts:
A web client for recording and uploading the audio files to Google Cloud Storage.
An Action that plays the audio file from Cloud Storage
You'll also find the url of the open-sourced code on github at the end of the article.

Using Cortana for dictation of documents

I'm currently doing research about Cortana as I'm interested in doing some development of custom skills for it. Currently I'm using Cortana to invoke Windows Speech Recognition where I can then use WSR as a means to dictate text into Word. I'm experimenting with this as a possibility to be used for recording and generating a transcript in real time for meetings.
Now this is quite a hassle as I've found and I'm curious to know if there is something that I can do to integrate a bot within Cortana for the same purpose. I've looked up and done some reading about Azure Bot Framework, Cognitive Services, LUIS, etc.
Is it possible to develop such a solution using the above mentioned services ?
Thank you in advance !
Yes, it is possible.
You can feed the streams to the Speech to Text API, then chunk the audio according to the returned Offset and Duration of each phrase, then send those chunks to the Speaker Recognition API to identify the speaker by name so you'd have a name for each chunk to put with it's transcribed phrase and create a dialog out of
Since you're considering it mainly for meetings, the solution you've mentioned was announced a while ago as a feature of Microsoft Teams, and it is going to be publicly available in the near feature, you can also watch a demo that was presented at Build 2018 from here

Get the audio data from Google Assistant

As of now (is using api.ai) what I see is, I get the string format of what the user speaks.
I would like to access the raw audio file the user speaks to interact with Google Assistant using the api.ai platform.
Is there a way to get the audio file ?
[UPDATE]:
We are aiming to evaluate the quality of speech of the user hence we would need to run the algorithms on the audio.
No, there is currently no way to get the audio content of what has been sent.
(However, the team is looking to understand the use cases of why you might want or need this feature, so you may want to elaborate on your question further.)

Is there a in-built voice in gstreamer?

I am building a gtk+3.0 application. When a user clicks a button or enters information, I want to give audio feedback to the user using gstreamer.
Is there an in-built voice, which, when a string is passed to it, it speaks the string?
If yes, are there culture-specific voices?
Are different languages supported in such a in-built voice?
Or should i just ask someone to let me record their voice for each audio feedback that has to be provided? This will be inefficient if the application grows.
There is the 'festival' plugin, it uses the festival library but doesn't have any configuration properties. If you find out the the festival library can have properties exposed that are useful for you please request them to be added in gstreamer bugzilla: http://bugzilla.gnome.org/enter_bug.cgi?product=GStreamer

Resources