Generating Subtitles from audio file using speech Framework iOS - audio

In my app i play the audios using url with the help of AvPlayer. Now i want to add the support of subtitles in it. iOS 10 introduces the Speech framework which help us to recognize the real time and recorded speech. As according to the apple:
"You can perform speech transcription of both real-time and recorded audio. For example, you can get a speech recognizer and start simple speech recognition using code like this:
let recognizer = SFSpeechRecognizer()
let request = SFSpeechURLRecognitionRequest(url: audioFileURL)
recognizer?.recognitionTask(with: request, resultHandler: { (result, error) in
print (result?.bestTranscription.formattedString)
})
Now i am looking for the way that how can i get the Subtitles in the form of string of currently playing audio using this speech framework. And how i shall be able to know that which dialogue is currently playing so that i can show exactly the same string on the screen.

In the segments portion of SFSpeechRecognition you can selectively identify the Subtitles you wish. To do this you need to parse the segments through a filter highlighting specific text.

Related

azure speech to text full breaks/filler words detection

I've been looking for a model that is capable to detect what we call "full breaks" or "filler words" such as "eh", "uhmm" "ahh" but Azure doesn't get them.
I've been playing with Azure's speech to text web UI but it seems it doesn't catch these types of words/expressions.
I wonder if there is some option in the API configuration to "toggle" the detection of full breaks or filler words.
Thank you in advance

how to shout out, scream, cry out or yell at in Alexa Skill?

I would like to create a skill to yell at someone, but i can not find any reference in SSML to yell or scream.
Is it even possible ?
Use audio file for doing that. You can record or download from the internet and use it in ssml audio format. You just have to put your audio url as done in code below.
<speak>
<audio src="soundbank://soundlibrary/transportation/amzn_sfx_car_accelerate_01"/>
</speak>
There's currently no yelling supported. The closest expression you could achieve with SSML is using the custom tag for emotions:
<amazon:emotion name="excited" intensity="medium">Hey, I'm so excited!</amazon:emotion>
The support of emotions varies across locales and I suggest to keep an eye on the dev blog posts to keep track of new possibilities:
https://developer.amazon.com/en-US/blogs/alexa/alexa-skills-kit/2020/11/alexa-speaking-styles-emotions-now-available-additional-languages

Add audio in dialog (Bixby)

This is my first question, new and fresh, hello guys.
As the title mentions, is there any workaround or way to add audio inside dialog-speech-template? As it doesn't support mp3, and only wav, I found it hard to implement.
The audio I wanted to get is origin from API, and hence it's not possible for me to download the mp3 file and convert it (as changes may happen to the audio).
Is there any programmatic way to convert the mp3 audio to wav? I am pretty new to Bixby, hope elders here can help.
Unfortunately, Bixby SSML only for certain wav format. Please refer SSML#AudioClip for details. There are also instructions how to convert using ffmpeg tool.
To support mp3 format, you can raise a Feature Request in our community. This forum is open to other Bixby developers who can upvote it, leading to more visibility within the community and with the Product Management team.

Google Home -> Dialogflow entity matching very bad? for non dictonary enities

with Dialogflow (API.AI) I find the problem that names from vessel are not well matched when the input comes from google home.
It seems as the speech to text engine completly ignore them and just does the speech to text based on dictionary so Dialogflow cant match the resulting text all at the end.
Is it really like that or is there some way to improve?
Thanks and
Best regards
I'd recommend look at Dialogflow's training feature to identify where the speech recognition of the Google Assistant may not have worked they way you expect. In those cases, you'll see how Google's speech recognition detected words you may not have accounted for. In cases where you'd like to match these unrecognized words to a entity value, simply add them as synonyms.

How to create Directshow filter?

I want to create a software:
- Input as a video stream H264 ( from another software)
- Output as a webcam for my friends can watch in skype, yahoo, or something like that.
I knows I need to create directshow filter to do that, but I dont know what type filter I must to create.
And when I have a filter, I dont know how to import it to my application?
I need a example or a tutorial, please help me
You need to create a virtual video source/camera filter. There have been a dozen of questions like this on SO, so I will just link to some of them:
How to write an own capture filter?
Set byte stream as live source in Expression Encoder 4
"Fake" DirectShow video capture device
Windows SDK has PushSource sample which shows how to generate video off a filter. VCam sample you can find online shows what it takes to make a virtual device from video source.
See also: How to implement a "source filter" for splitting camera video based on Vivek's vcam?.
NOTE: Latest versions of Skype are picky as for video devices and ignore virtual devices for no apparent reason.
You should start here : Writing DirectShow Filters or here : Introduction to DirectShow Filter Development
I assume you already have Windows SDK for such develpment, if not check this

Resources