I am building a gtk+3.0 application. When a user clicks a button or enters information, I want to give audio feedback to the user using gstreamer.
Is there an in-built voice, which, when a string is passed to it, it speaks the string?
If yes, are there culture-specific voices?
Are different languages supported in such a in-built voice?
Or should i just ask someone to let me record their voice for each audio feedback that has to be provided? This will be inefficient if the application grows.
There is the 'festival' plugin, it uses the festival library but doesn't have any configuration properties. If you find out the the festival library can have properties exposed that are useful for you please request them to be added in gstreamer bugzilla: http://bugzilla.gnome.org/enter_bug.cgi?product=GStreamer
Related
I want to incorporate a few new things in an audio chatbot. Can I please check the best way to do it?
- I want to record actor's voices to replace the chatbot's default computerised voice
- I want to include sound files that play on demand (and with variety, so the file that plays depends on user choices) - is that possible and if so is there much delay before they start playing?
- I would also like to use sensor motion to start the program, so that the chatbot automatically says hello and starts a conversation when a user enters a room, rather than the user having to say 'hello google, can I talk to...blah blah' to activate the chatbot.
Thus far I've been using dialogflow to build natural language processing chatbots. Does dialogflow have the capacity to do all this, or should I use another programme linked to it as well? Or, for this sort of functionality would it be better to build a chatbot using python - and does anybody know any open source versions?
It is not possible to have the chatbot start a conversation without the user saying "Okay, Google. Talk to..". This has been done so that Google Assistant cannot be triggered without the user activating it themselves.
As for using sound files, you can record parts of your conversation and use these files in your conversation using SSML. With SSML you can edit what your assistant says using simple code. The audio tag is what you need to play sound files.
I want to test how Google Home transforms vocal commands to text by sending voice commands and storing the result returned. I have already done the storage part, but now, I can't find in the documentation how to send automatically voice commands to google home, the only apparent way is to speak to it directly, which not very practical if you want to test a long list of commands, 50 times for each command!
Edited: To make it clearer, I want to write a function that sends voice files (mp3, or any other format) to google assistant, instead of having to say/pronounce the command in a human way.
Do you know if it is possible to make this process automatic ?
It sounds like you might want the Assistant SDK, which will let you send an audio stream/file or text to be processed by the Assistant and return the result.
You're unclear about exactly what you're trying to do, and how you're trying to do it, but this table should help you understand what features are available for the various methods of using the Assistant SDK. In general, you'll be able to send an audio file or stream (either using the python library or a gRPC library for your language of choice) and get a response back.
I want to be able to capture both parties speech to text continuously in a call and send those strings off to be translated in real-time and then use twiml.say to speak the text back. I have not been having much luck with this and wondering how I should go about doing this.
The one user will make a call from their phone to the other support person which is at a web browser. I have the call setup and working fine, however I cannot find any documentation anywhere that is aligned with what I am wanting to do and wondering if it is possible or if I need to be looking down a different route.
Should anyone have any advice or has seen samples similar to this I would love to see them. Thanks!
Twilio developer evangelist here.
It's not currently possible to capture a two legged conversation with <Gather> and speech recognition. So you might need to look somewhere else for this functionality.
As of now (is using api.ai) what I see is, I get the string format of what the user speaks.
I would like to access the raw audio file the user speaks to interact with Google Assistant using the api.ai platform.
Is there a way to get the audio file ?
[UPDATE]:
We are aiming to evaluate the quality of speech of the user hence we would need to run the algorithms on the audio.
No, there is currently no way to get the audio content of what has been sent.
(However, the team is looking to understand the use cases of why you might want or need this feature, so you may want to elaborate on your question further.)
I am thinking on how to build an spotify app that does beat detection (extract bpm of a song).
For that I need to access the raw audio, the waveform, and analyze it.
I am new to building spotify apps.
I know that with "libspotify" you can access raw audio. Can you do the same through the spotify apps API? And how?
For the record, currently exist two spotify apps apis:
Current
Preview
Unless you're really keen on writing that beat detection code yourself, you should look at the APIs provided by the EchoNest, which include that (and many other awesome things).
see Getting the tempo, key signature, and other audio attributes of a song
In a word: no. That isn't currently available in the Apps API.
There’s a new endpoint I guess. See an example https://medium.com/swlh/creating-waveforms-out-of-spotify-tracks-b22030dd442b?source=linkShare-962ec94337a0-1616364513
That uses the endpoint https://developer.spotify.com/documentation/web-api/reference/tracks/get-audio-analysis/
Edit: I agree with commenter #wizbcn that this does not answer this question. Is it sort of incorrect to leave it here because I found this SO post while searching for info about visualizing the tack's waveform as in the linked article? Maybe I should make this a comment instead?