Record a short blob audio with Alexa using nodejs - node.js

I am trying to create a skill that will record 5 seconds of my voice and send it to my database as a blob with Alexa. (It's mainly for voice recognition.. I am using Azure Speaker API)
I have spent a lot of time trying to find if there is an specific way to do it with Nodejs and Alexa but I didn't find anything.
Currently I have this project in javascript so I am wondering if it would be possible to use what I have? Because I won't have the browser to allow my microphone, etc.. so I am not sure if it would work with Alexa.
If someone has an idea or have worked with getting audio to db from Alexa, please help me.
Thanks!

What you're trying to do is get user input as voice and as per my knowledge and finding, Alexa (alexa skill kit) doesn't allow you to get raw user input in anyway.

Related

How to tell Alexa to punctuate user responses properly. Please see the use case

I am sorry if this looks like a stupid question!
My skill is recording user responses in the database. This part is working fine. But my concern is Alexa is not punctuating the response at all. Here is an example:
User: The loading speed of the website is very slow (a few milliseconds of pause) can't we make it faster (vocal tone was used in such a way so that Alexa can understand that this part is a question)
Recorded: the loading speed of the website is very slow can't we make it faster
Expected: the loading speed of the website is very slow. can't we make it faster?
Is there any way to accomplish this? Because it is very important to have correctly punctuated responses to be stored in the database. as this skill will be used for project management purpose.
I am afraid it's not possible with Alexa. However you can use AWS Transcribe Service - build an app for the mobile/web and send recorded audio to the service. According to their docs:
Easy-to-Read Transcriptions
Amazon Transcribe automatically adds punctuation and formatting so that the output closely matches the quality of manual transcription at a fraction of the time and expense.

Is it better to incorporate motion sensor and sound files in google assistant - or a python based audio chatbot programme?

I want to incorporate a few new things in an audio chatbot. Can I please check the best way to do it?
- I want to record actor's voices to replace the chatbot's default computerised voice
- I want to include sound files that play on demand (and with variety, so the file that plays depends on user choices) - is that possible and if so is there much delay before they start playing?
- I would also like to use sensor motion to start the program, so that the chatbot automatically says hello and starts a conversation when a user enters a room, rather than the user having to say 'hello google, can I talk to...blah blah' to activate the chatbot.
Thus far I've been using dialogflow to build natural language processing chatbots. Does dialogflow have the capacity to do all this, or should I use another programme linked to it as well? Or, for this sort of functionality would it be better to build a chatbot using python - and does anybody know any open source versions?
It is not possible to have the chatbot start a conversation without the user saying "Okay, Google. Talk to..". This has been done so that Google Assistant cannot be triggered without the user activating it themselves.
As for using sound files, you can record parts of your conversation and use these files in your conversation using SSML. With SSML you can edit what your assistant says using simple code. The audio tag is what you need to play sound files.

Is it possible to record audio and play it afterwards on Google Assistant?

I'd like to know if it's possible to record an audio extract on Google Assistant (with Dialogflow and Firebase) and play it later? The idea is to:
Ask the user to tell his name vocally.
Record it.
Play it afterwards.
I read these answers. The answer was no, but maybe there's an update as now we can listen to what we said on Google Assistant "myactivity" as seen here.
The answer is still no, as a developer. While users have recordings available, there's no way to access them programmatically and no way to play them through Dialogflow.
You can read this great article which explains a trick to record audio and play it back on the Google Assistant using a progressive web app.
The design consists of thses two parts:
A web client for recording and uploading the audio files to Google Cloud Storage.
An Action that plays the audio file from Cloud Storage
You'll also find the url of the open-sourced code on github at the end of the article.

How to listen to user audio in google actions / assistant console?

I am using dialogflow to create a google assistant app. I want to hear what the user said for error resolving. How can I do that? I know it is possible in Alexa but I cannot find it on Google.
Developers do not get access to the user's original audio clips, just the transcriptions. If you are detecting a number of errors from your action, it may be useful to try to get a better understanding of how users are conversing with your actions in general.

Get the audio data from Google Assistant

As of now (is using api.ai) what I see is, I get the string format of what the user speaks.
I would like to access the raw audio file the user speaks to interact with Google Assistant using the api.ai platform.
Is there a way to get the audio file ?
[UPDATE]:
We are aiming to evaluate the quality of speech of the user hence we would need to run the algorithms on the audio.
No, there is currently no way to get the audio content of what has been sent.
(However, the team is looking to understand the use cases of why you might want or need this feature, so you may want to elaborate on your question further.)

Resources