I want to test how Google Home transforms vocal commands to text by sending voice commands and storing the result returned. I have already done the storage part, but now, I can't find in the documentation how to send automatically voice commands to google home, the only apparent way is to speak to it directly, which not very practical if you want to test a long list of commands, 50 times for each command!
Edited: To make it clearer, I want to write a function that sends voice files (mp3, or any other format) to google assistant, instead of having to say/pronounce the command in a human way.
Do you know if it is possible to make this process automatic ?
It sounds like you might want the Assistant SDK, which will let you send an audio stream/file or text to be processed by the Assistant and return the result.
You're unclear about exactly what you're trying to do, and how you're trying to do it, but this table should help you understand what features are available for the various methods of using the Assistant SDK. In general, you'll be able to send an audio file or stream (either using the python library or a gRPC library for your language of choice) and get a response back.
Related
I want to incorporate a few new things in an audio chatbot. Can I please check the best way to do it?
- I want to record actor's voices to replace the chatbot's default computerised voice
- I want to include sound files that play on demand (and with variety, so the file that plays depends on user choices) - is that possible and if so is there much delay before they start playing?
- I would also like to use sensor motion to start the program, so that the chatbot automatically says hello and starts a conversation when a user enters a room, rather than the user having to say 'hello google, can I talk to...blah blah' to activate the chatbot.
Thus far I've been using dialogflow to build natural language processing chatbots. Does dialogflow have the capacity to do all this, or should I use another programme linked to it as well? Or, for this sort of functionality would it be better to build a chatbot using python - and does anybody know any open source versions?
It is not possible to have the chatbot start a conversation without the user saying "Okay, Google. Talk to..". This has been done so that Google Assistant cannot be triggered without the user activating it themselves.
As for using sound files, you can record parts of your conversation and use these files in your conversation using SSML. With SSML you can edit what your assistant says using simple code. The audio tag is what you need to play sound files.
I have a google home speaker, and I can issue commands like what's the time or play some music, but I'd like to be able to define my own responses to certain commands, like
how many appointments do I have today
or
are there any cancellations
I would like the above commands to a run a script where I can either run a web-service, or pull information from my SmartThings hub (that bit is optional) and respond with an appropriate response.
I've done a bit of research, and it seems that IFTTT, can do something similar, but I don't really want to be dependent on a 3rd party app, and if this can be done directly with Google.
I guess I'm looking for something similar to Groovy for SmartThings, where I can write Smart Apps.
The API to develop your own commands is known as Actions on Google. Broadly speaking, Actions will send JSON to a webhook that you control, and you can have it do whatever you wish at that point.
I want to be able to capture both parties speech to text continuously in a call and send those strings off to be translated in real-time and then use twiml.say to speak the text back. I have not been having much luck with this and wondering how I should go about doing this.
The one user will make a call from their phone to the other support person which is at a web browser. I have the call setup and working fine, however I cannot find any documentation anywhere that is aligned with what I am wanting to do and wondering if it is possible or if I need to be looking down a different route.
Should anyone have any advice or has seen samples similar to this I would love to see them. Thanks!
Twilio developer evangelist here.
It's not currently possible to capture a two legged conversation with <Gather> and speech recognition. So you might need to look somewhere else for this functionality.
As of now (is using api.ai) what I see is, I get the string format of what the user speaks.
I would like to access the raw audio file the user speaks to interact with Google Assistant using the api.ai platform.
Is there a way to get the audio file ?
[UPDATE]:
We are aiming to evaluate the quality of speech of the user hence we would need to run the algorithms on the audio.
No, there is currently no way to get the audio content of what has been sent.
(However, the team is looking to understand the use cases of why you might want or need this feature, so you may want to elaborate on your question further.)
For sometime now I've been trying to do something that I never thought that it would be that hard: audio streaming. My objective is simple; a simple web app through which a certain someone can click a button and live-stream his own voice to other people using this app. It's an online classroom of sorts. Here's the details:
A broadcast/lecture is scheduled for a certain date and time (done)
A user logs-in as a teacher/instructor to a simple interface where he can click "start broadcasting" (done)
When the instructor clicks "broadcast" his voice is streamed to other users. Other student-type users can also log in and start listening to THE BROADCAST this teacher started. (and here is the trick!)
The broadcast itself should be automatically stored to a local file in the process. So that students can go back to it anytime.
Of course I spent so many hours googling and stackoverflow-ing this problem, and here is what I could understand so far:
If the starting point is the browser, I must use the GetUserMedia API, the result is raw PCM data that I can download, send to server or stream to others. (simple)
Offering the broadcast to the listeners (students) will be done via HTML5's Audio API. (simple)
WebRTC cannot help me here, because it's a p2p thing, there cannot be a server middling in the process, and I NEED TO KEEP A COPY OF THE LECTURE LOCALLY. (Here's a working example)
I can use tools like Binary.js to stream the audio binary data to the students, but this requires a file to be present already on the desk.
I need to convert the PCM data to a format like MP3 or OGG in the process, and not use WAV because it's much expensive bandwidth-wise.
I feel like it should be straight forward, but I cannot get it to work, I cannot piece all of this together and offer a stable and good experience for the user.
So again, I would love to know how to do the following:
Break the GetUserMedia raw data into packets and convert it to mp3, stream it to the server, where a script (NodJS probably) can store it locally and stream it whoever tuned-in, in real time.
I am open to whatever tool you recommend, I know that NodeJS will be present in the solution, and I am happy to use it. If the streaming could be done via a 3rd-party tool, I have no problem with that.
Thanks you in advance.
I see your comment about WebRTC, but I think you should investigate it more.
Like what you see here in this (old) post: http://servicelab.org/2013/07/24/streaming-audio-between-browsers-with-webrtc-and-webaudio/
Otherwise, you might have to go for a third party solution, like https://www.crowdcast.io/
(Even if you find a video-only solution, you can use a static picture or so for the video)
Event broadcasting is a good business for many companies. If it was that easy, there wouldn't be only few and well known competitors in the market.