This is my first question, new and fresh, hello guys.
As the title mentions, is there any workaround or way to add audio inside dialog-speech-template? As it doesn't support mp3, and only wav, I found it hard to implement.
The audio I wanted to get is origin from API, and hence it's not possible for me to download the mp3 file and convert it (as changes may happen to the audio).
Is there any programmatic way to convert the mp3 audio to wav? I am pretty new to Bixby, hope elders here can help.
Unfortunately, Bixby SSML only for certain wav format. Please refer SSML#AudioClip for details. There are also instructions how to convert using ffmpeg tool.
To support mp3 format, you can raise a Feature Request in our community. This forum is open to other Bixby developers who can upvote it, leading to more visibility within the community and with the Product Management team.
Related
I would like to create a skill to yell at someone, but i can not find any reference in SSML to yell or scream.
Is it even possible ?
Use audio file for doing that. You can record or download from the internet and use it in ssml audio format. You just have to put your audio url as done in code below.
<speak>
<audio src="soundbank://soundlibrary/transportation/amzn_sfx_car_accelerate_01"/>
</speak>
There's currently no yelling supported. The closest expression you could achieve with SSML is using the custom tag for emotions:
<amazon:emotion name="excited" intensity="medium">Hey, I'm so excited!</amazon:emotion>
The support of emotions varies across locales and I suggest to keep an eye on the dev blog posts to keep track of new possibilities:
https://developer.amazon.com/en-US/blogs/alexa/alexa-skills-kit/2020/11/alexa-speaking-styles-emotions-now-available-additional-languages
I recently filed an issue asking Google about this, but the answer is quite confusing.
The person who answered said that it is possible if you specify "MP3" as encoding.
I tried that and it did not work.
However the person at Google closed the issue. So I really do not know how to proceed.
https://issuetracker.google.com/issues/166478543
My understanding is that the encoding in my .m4a file is not MP3 and that the person who answered got this a bit wrong.
(I also got some nice advice not to use .m4a.
But this is not an option in my case since I am not producing the files.
I have no influence whatsoever over that. Unfortunately.)
Is there someone here who can clarify if Google Speech to Text API can handle .m4a? (I have added some tags to clarify the environment.)
If I was in your position, I would use https://www.npmjs.com/package/audiobuffer-to-wav to convert the M4A to WAV, then use the WAV file which google SR accepts easily.
i want to develop an google-action. (ideally using dialogflow).
but the google-action needs some features where i couldn't find a solution, and i'm not sure if it's even possible.
My Usecases:
The google action starts a mps. someone stops and exits the google action, and if the user starts the google action again, i would resume the mp3.
but i couldn't find a solution where i can determine the "offset", when the user stops the mp3.
and even i would have this offset, i didn't find a solution how to tell google assistant, that i want to play the mp3, but starts at e.g. Minute 51.
I would be really wondered, it the google action possibilitys are so extremly restricted.
can someone confirm, that this usecases are not possible, or can someone give me a hint how to do it?
i only found this one, which is restricted to start a mp3 at beginning.
https://developers.google.com/actions/assistant/responses#media_responses
Kind Regards
Stefan
To start an mp3 file at a certain point you can try the SSML tag and its clipBegin property.
clipBegin - A TimeDesignation that is the offset from the audio source's beginning to start playback from. If this value is greater than or equal to the audio source's actual duration, then no audio is inserted.
https://developers.google.com/actions/reference/ssml
To use this, your mp3 file has to be hosted using HTTPS
Hope that this helps.
You could use the conversational actions (instead of dialogflow) where media responses allow using a start_offset
....
"content": {
"media": {
"start_offset": "2.12345s",
...
For more details see
https://developers.google.com/assistant/conversational/prompts-media#MediaResponseProperties
Even conversational actions seem to be the "newest" technology for google actions. Or at least released recently.
I am currently working on something of a soundcloud clone and for now I'm simply having all the audio files render as html5 audio tags. Are there any nice react equivalents I can npm install to make my audio a wave or appear dynamic? I'm very thankful for any suggestions!!!
In searching online for you, I see a few options available to you.
Here:
https://www.npmjs.com/package/react-audio-player
https://www.npmjs.com/package/react-audio
https://www.npmjs.com/package/react-audio-recorder (might not be as helpful, but worth noting)
https://www.npmjs.com/package/hymn
What exactly are you trying to achieve? These libraries might be helpful depending on the exact details.
This StackOverflow answer shows how to use your own controls to stylize the player however you so desire.
I want to create a software:
- Input as a video stream H264 ( from another software)
- Output as a webcam for my friends can watch in skype, yahoo, or something like that.
I knows I need to create directshow filter to do that, but I dont know what type filter I must to create.
And when I have a filter, I dont know how to import it to my application?
I need a example or a tutorial, please help me
You need to create a virtual video source/camera filter. There have been a dozen of questions like this on SO, so I will just link to some of them:
How to write an own capture filter?
Set byte stream as live source in Expression Encoder 4
"Fake" DirectShow video capture device
Windows SDK has PushSource sample which shows how to generate video off a filter. VCam sample you can find online shows what it takes to make a virtual device from video source.
See also: How to implement a "source filter" for splitting camera video based on Vivek's vcam?.
NOTE: Latest versions of Skype are picky as for video devices and ignore virtual devices for no apparent reason.
You should start here : Writing DirectShow Filters or here : Introduction to DirectShow Filter Development
I assume you already have Windows SDK for such develpment, if not check this