azure speech to text full breaks/filler words detection - azure

I've been looking for a model that is capable to detect what we call "full breaks" or "filler words" such as "eh", "uhmm" "ahh" but Azure doesn't get them.
I've been playing with Azure's speech to text web UI but it seems it doesn't catch these types of words/expressions.
I wonder if there is some option in the API configuration to "toggle" the detection of full breaks or filler words.
Thank you in advance

Related

How to use change stress in words. Azure Speech

Please tell me how I can change the stress in some words in the Azure voice engine text-to-speech. I use Russian voices. I am not working through SSML.
When I send a text for processing, then in some words he puts the stress on the wrong syllable or letter.
I know that some voice engines use special characters like + or 'in front of a stressed vowel. I have not found such an option here
To specify the stress for individual words you can use the SpeakSsmlAsync method and pass a lexicon url or you can directly specify it directly in the ssml by using the phoneme-element. In both cases you can use IPA.

How measure characters I use with Azure Cognitive Services Speech Synthesis (TTS)?

Is there a way to see qtty characters I use in the azure portal metrics? There is "SynthesizedCharacters" metric but I only view data when I use it from Speech Studio. I want to see this metric when I use cognitive sdk. Is it possible?
Thanks
Unfortunately, AFAIK there is no metric to track that from the Azure Portal. However, you can maintain the count locally at your end or the central location where you can query yourself --- add an additional logic to maintain the metrics in your code.
The character is counted based on the below conditions (that can be found here):
Text passed to the text-to-speech service in the SSML body of the
request
All markup within the text field of the request body in the SSML
format, except for and tags
Letters, punctuation, spaces, tabs, markup, and all white-space
characters
Every code point defined in Unicode

How to disable disfluency removal for Google Cloud Speech to Text API

I am building an app that captures user audio and analyzes disfluency in a reader's speech, so it it important for me to know all forms of disfluency.
I noticed that Google's speech to text cloud API automatically removes disfluencies in speech. For example:
"so uhh, I will probably do that umm probably next week"
Gets transcribed to:
"so I will probably do that probably next week"
Is there a way to keep the uhhs and umms?

Text Split cognitive skill not visible in UI

I am adding Azure Search and trying to add skills for content enrichment.
I can see the Key Phrase Extraction and the Language Detection predefined skills but not the Text Split skill on the screen. Is there a reason why Text Split skill is not visible? Or is it something that can only be added via API?
The capabilities exposed throught the portal focus on core scenarios that customers want to perform so they do not include text splitting. If you want to split your text, you should do it by creating your own skillset programatically through the API, that will allow you to define the language and the size of a page.

Google Home -> Dialogflow entity matching very bad? for non dictonary enities

with Dialogflow (API.AI) I find the problem that names from vessel are not well matched when the input comes from google home.
It seems as the speech to text engine completly ignore them and just does the speech to text based on dictionary so Dialogflow cant match the resulting text all at the end.
Is it really like that or is there some way to improve?
Thanks and
Best regards
I'd recommend look at Dialogflow's training feature to identify where the speech recognition of the Google Assistant may not have worked they way you expect. In those cases, you'll see how Google's speech recognition detected words you may not have accounted for. In cases where you'd like to match these unrecognized words to a entity value, simply add them as synonyms.

Resources