Phone number and Date of Birth from human speech - nlp

Is there an effective Natural Language Processor that can fetch the phone number and date of birth from human speech. Each user has a different way of specifying the phone number and date of birth. Hence, converting speech to text and then parsing the text for phone number is not helpful.

You can use Google speech to text api. I had used same for entering account number for blind people. I was working for bank so I there were lots of numbers involved as input eg account number, card number etc.
With Google STT engine you can define custom voice inputs.
Also I had created feedback mechanism using Text to Speech Api so that app can tell if users feedback is invalid and request him to speak again.
You can see code snippet at github.
https://github.com/hiteshsahu/Android-TTS-STT

Easiest way is to extract text from speech, there is plenty of tools, proprietary (nuance), and tinker friendly open source like sphinx, and plenty of tools to extract dates and phones expressed differently. IBM Watson offers one, Smart Formatting beta, to uniform dates and phones in own transcripts. To guess which dates are birthdays you try detect related keywords (birth, born so on) nearby.
For few free alternatives, check
For phone #
https://www.npmjs.com/package/phone-number-extractor
https://github.com/googlei18n/libphonenumber
For date extractions check prev questions
Extracting dates from text in Java
Best way to identify and extract dates from text Python?
There is a patent for the process your are asking, but you might have to pay royalties or smth.
http://www.freepatentsonline.com/8416928.html

If you want to fetch the phone number and date of birth from human speech.
So, you can use another option and implement it.
https://cloud.google.com/speech/
This API is really useful for converting your speech to text. I also have this problem at one moment so you can try it too.
The another API which is really good for authentication.
https://api.ai/
I hope it helps you.

Related

Cognitive Service show Fill words and hide personal data

We use the Azure Batch Transcription Service to get the Transcript of an Audio / Speech.
In here we noticed, that sometimes filler words like "uhm", "hm" or something similar are included, but very rarely - also as we used this service for a few months already and we have the feeling as if it "got less" (so less "uhm"s in the transcript)
Q1: Is there a way to get the fill words? We want to recieve them within the transcript.
Also, as we sometimes record conversations it can happen that someone says a name or is talking about other personal information.
Q2: Is there a way to "filter" those personal information / words within the transcript?
Sorry, I don't think there is a way to filter personal data/ word when translate. We only can do profanity Filter for batch transcription.
But I agree this feature will be very helpful. I will forward this feature request to product group to see if we can have this in the feature.
Thing I will suggest is to optimize the transcription as last to filter the sensitive information.
Regards,
Yutong

How can I do speaker identification (diarization) with microsoft speech to text without previous voice enrollment?

In my application, I need to record a conversation between people and there's no room in the physical workflow to take a 20 second sample of each person's voice for the purpose of training the recognizer, nor to ask each person to read a canned passphrase for training. But without doing that, as far as I can tell, there's no way to get speaker identification.
Is there any way to just record, say, 5 people speaking and have the recognizer automatically classify returned text as belonging to one of the 5 distinct people, without previous training?
(For what it's worth, IBM Watson can do this, although it doesn't do it very accurately, in my testing.)
If I understand your question right then Conversation Transcription should be a solution for your scenario, as it will show the speakers as Speaker[x] and iterate for each new speaker, if you don't generate user profiles beforehand.
User voice samples are optional. Without this input, the transcription
will show different speakers, but shown as "Speaker1", "Speaker2",
etc. instead of recognizing as pre-enrolled specific speaker names.
You can get started with the real-time conversation transcription quickstart.
Microsoft Conversation Transcription which is in Preview, now targeting to microphone array device. So the input recording should be recorded by a microphone array. If your recordings are from common microphone, it may not work and you need special configuration. You can also try Batch diarization which support offline transcription with diarizing 2 speakers for now, it will support 2+ speaker very soon, probably in this month.

Dialogflow dont recognize phonenumber

Im tring to create a booking system for a restaurant so the assistant go to ask to user number of guest, time and day to reserve, and finally the name and the phone number of guest! But many time the phonenumber is confused by the guest number.
I set in parameters for #guest the value of #sys.number and for $telephone the entity of #sys.phone-number, but sometimes get wrong recognize. I could make it work?
The Dialogflow team has a really full-featured example on Github here (it's a bike shop, rather than a restaurant, but most of the functionality is the same). Give it a look for some inspiration.
Regarding the specifics of recognizing phone numbers: I'd recommend adding a bunch (like more than 10) of example training phrases to the appropriate Intent that include phone numbers. Often the problem with matching these things is just a matter of the number of examples the system gets to learn from.
Good luck!

how to get the comments on social media and make it as your data?

I've proposed a title for our thesis, Movie Success Prediction through Social Media comments using Sentiment Analysis, is there a way you can get the comments on social media (twitter, Instagram, Facebook etc.) and use it for your software? like an API or any other way. is that even possible to use your software on different social media to get the comments for prediction or should i change my title and stick to one social media like Facebook or twitter only?
what's the good algorithm for this?
what programming language and framework/IDE should i use?
I've done lots of research on google and still hoping for more info here. Thank you.
Edit: I'll only use YouTube and YouTube API.
From the title of your question, it seems that the method you need to use is distant supervision. You need to retrieve data with labels you think it is proper for your task. For instance, a tweet containing #perfect hashtag would probably be a positive tweet. So, you can define set of hashtags for your task, negative, positive or even for neutral; then you can retrieve tweets by those via Twitter API. For your task, those should be for movies, therefore your data should contain movie related information in first place.
Given that you will deal with text data and you'd like to create your own dataset, it is better to start with Twitter. Its API works for your needs and it is very well-documented. The language and frameworks are upto your choice, since APIs supports many known languages as well. Personally, I'd start with python or java to quickly solve future problems easier with community support.
For a general survey of this area, you may dive into papers and resources from here:
https://scholar.google.com.tr/scholar?hl=en&q=distant+supervision+sentiment+analysis
Distant supervision could be used to create a sentiment lexicon out of millions English tweets by using sets of negative and positive hashtags as well. You may take a look at Chapter 5 of this thesis ( https://spectrum.library.concordia.ca/980377/1/Ozdemir_MCompSc_F2015.pdf ), this may also give a good insight for your thesis, too.
Hope this helps.
Cheers

Bing Speech to Text API returning very wrong text

I am trying the "Bing Speech To Text API" in audio files that contains a real conversations between a person that answer customers in a call-center, and a customer that calls the call center to solve his doubts. Thus, these audios have two persons talking, and sometimes have long silence period when the customer is waiting an answer from support. These audios have 5 to 10 minutes long.
My doubt is:
What is the best aproach to translate audios like that to text, using Microsoft Cognitive Services?
What APIs do I have to use, besides Bing Speech To Text?
Do I have to cut or convert the audios before sending them to Bing Speech To Text?
I am asking that because the Bing Speech to text API is returning an text very very very very very different from the audio content. It is impossible to use or undertand. But, of course, I think I am doing some mistake.
Please, could you explain to me the best strategy to work with audio files like this?
I would be very glad for any help.
Best Regads,
I had run into this problem with conversations as well. Make sure that the transcription mode is set to "conversation" instead of "interactive."

Resources