I've been using the (preview) CRIS speech to text service in Azure. For some short wav files, i get a correct text equivalent, but it is followed by "non". Is this a keyword meaning "non-word" or is this a bug? -- it happens both when i use the base conversational model, and also when i use a custom language model based on the base conversational model, but it does not happen with the "search and dictation" model.
for example, i send a noisy wav file of someone saying "yes" and i get back "yes non". If the wav file is not noisy this doesn't happen, and if the spoken text is two or more words it doesn't happen. it just seems to happen for noisy one-word files. what does "non" mean?
After talking with the product group, this is apparently a bug in the current build of CRIS and will be fixed shortly. The "non" doesn't mean anything, it just appears when there are bursts of background noise.
Related
We use the Azure Batch Transcription Service to get the Transcript of an Audio / Speech.
In here we noticed, that sometimes filler words like "uhm", "hm" or something similar are included, but very rarely - also as we used this service for a few months already and we have the feeling as if it "got less" (so less "uhm"s in the transcript)
Q1: Is there a way to get the fill words? We want to recieve them within the transcript.
Also, as we sometimes record conversations it can happen that someone says a name or is talking about other personal information.
Q2: Is there a way to "filter" those personal information / words within the transcript?
Sorry, I don't think there is a way to filter personal data/ word when translate. We only can do profanity Filter for batch transcription.
But I agree this feature will be very helpful. I will forward this feature request to product group to see if we can have this in the feature.
Thing I will suggest is to optimize the transcription as last to filter the sensitive information.
Regards,
Yutong
I'm doing a POC with Speech to text. I need to recognize specific words like "D-STUM" (daily stand up meeting). The problem is, every time I tell my program to recognize "D-STUM", i get "Destiny", "This theme", etc.
I already went on speech.microsoft.com/.../customspeech, and I've recorded around 40 wav files of people saying "D-STUM". I've also created a file named "trans.txt" which contains every wav file with the word "D-STUM" after each file. Like this :
D_stum_1.wav D-STUM
D_stum_2.wav D-STUM
D_stum_3.wav D-STUM
D_stum_4.wav D-STUM
...
Then I uploaded a zip containing the wav files and the trans.txt file, train a model with those datas, and created an endpoint. I referenced this endpoint on my soft, and launched it.
I expect my custom speech-to-text to recognize people saying "D-STUM" and displaying "D-STUM" as text. I never had "D-STUM" displayed after customizing the model.
Did I do something wrong? Is it the right way to do a custom training?
Is 40 samples not enough for the model to be properly trained?
Thank you for your answers.
Custom Speech has several ways to get a better understanding of specific words:
By providing audio sample with their transcription, as you have done
By providing text sample (without audio)
Based on my previous use-cases, I would highly suggest to create a training file with 5 to 10 sentences in it, each one containing "D-STUM" in its usage context. Then duplicate those sentences like 10 to 20 times in the file.
It worked for us to understand specific words.
Additionally, if you are using "en-US" or "de-DE" as target language, you can use a pronunciation file, see here
I just integrated Chatbase with my Dialogflow chatbot via REST API.
So far everything is working, however I noticed that, especially for "Not handled" inputs, some words looks like censored and are replaced by a <*> sequence.
Is that normal? I can't find any reference in the documentation or any way to change this.
If it helps, my chatbot is in Italian and replaced words are not offensive at all. In addition this only happens with some inputs, not all..
Thanks,
Igor
I do support for Chatbase, thank you for posting this question! We automatically mask any sequence of characters that may be considered SPII (sensitive personally identifiable information). Typically, names and string sequences containing consecutive numbers get replaced with the <*> mask.
You can reveal the full message by using the Transcripts feature available on the Messages Report.
Regards,
Sean
Is it possible to use DialogFlow to simply parse some text and return the entities within that text?
I'm not interested in a conversation or bot-like behaviour, simply text in and list of entities out.
The entity recognition seems to be better with DialogFlow than Google Natural Language Processing and the ability to train might be useful also.
Cheers.
I've never considered this... but yeah, it should be possible. You would upload the entities with synonyms. Then, remove the "Default Fallback Intent", and make a new intent, called "catchall". Procedurally generate sentences with examples of every entity being mentioned, alone or in combination (in whatever way you expect to need to extract them). In "Settings", change the "ML Settings" so the "ML Classification Threshold" is 0.
In theory, it should now classify every input as "catchall", and return all the entities it finds...
If you play around with tagging things as sys.any, this could be pretty effective...
However, you may want to look into something that is built for this. I have made cool stuff with Aylien's NLP API. They have entity extraction, and the free tier gives you 1,000 hits per day.
EDIT: If you can run some code, instead of relying on SaaS, you could check out Rasa NLU for Entity Extraction. With a SpaCy backend it would do well on recognizing pre-trained entities, and with a different backend you can use custom entities.
I'm working on a problem that at the very least seems to require named entity recognition, but I'm not sure how to go farther than the NER parse. What I'm trying to do is parse information (likely from tweets) regarding scheduling of events. So, for example, I'd like to be able to automatically resolve the yes/no answer to the question of "Are The Beatles playing tomorrow?" from short messages like:
"The Beatles cancelled their show tomorrow" or
"The Beatles' show is still on tomorrow"
I know NER will get me close as it will identify the band of interest and the time (if it's indicated), but there are many ways to express the concepts I'm interested in, for example:
"The Beatles are on for tomorrow" or
"The Beatles won't be playing tomorrow."
How can I go from an NER parsed representation to extracting the information of interest? Any suggestions would be much appreciated.
I guess you should search by event detection (optionally - in Twitter); maybe, also by question answering systems, if your example with yes/no questions wasn't just an illustration: if you know user needs in advance, this information may increase the quality of the system.
For start, there are some papers about event detection in Twitter: here and here.
As a baseline, you can create a list with positive verbs for your domain (to be, to schedule) and negative verbs (to cancel, to delay) - just start from manual list and expand it by synonyms from some dictionary, e.g. WordNet. Also check for negations - again, by presence of pre-specified words ('not' in different forms) in a tweet. Then, if there is a negation, you just invert the meaning.
Since you work with Twitter and most likely there would be just one event mentioned in a tweet, it can work pretty well.