Azure Cognitives services speech to text recognition of numeric entities as text - azure

I was wondering if its possible that the c++ sdk of Cognitives services Speech to text to return the numeric entities as text instead of numbers.
Current response 'I want to order 2 Cokes'
Expected response 'I want to order two Cokes'
Of course i can implement a feature to the translation. But i was wondering if its something that the service already provides. Particularly on spanish.

take a look at the sample repository at https://github.com/Azure-Samples/cognitive-services-speech-sdk
especially the file speech_recognition_samples.cpp , function SpeechRecognitionWithLanguageAndUsingDetailedOutputFormat
Enabling ‘detailed output’ will give you the result you want:
config->SetOutputFormat(OutputFormat::Detailed);
Then you need to look at the detailed output:
result->Properties.GetProperty(PropertyId::SpeechServiceResponse_JsonResult)
And that would create detailed output like this:
{"Duration":35500000,"NBest":[{"Confidence":0.7535948753356934,"Display":"I want to order 2 Cokes.","ITN":"I want to order 2 cokes","Lexical":"i want to order two cokes","MaskedITN":"I want to order 2 cokes"}],"Offset":17000000,"RecognitionStatus":"Success"}
The lexical output is probably what you want
Wolfgang

Related

How to visualize a count of all values in an array field in Kibana

I am having trouble creating a particular type of visualization in Kibana. My events in Kibana are statistics on communications between two ip address. Two of the fields are lists of ports used by the particular ip address. An example of the fields would be:
ip1 = 192.168.101.2
ip2 = 192.168.101.3
ip2Ports = 80,443
ip1Ports = 80,57000,0
I would like to have a top count of all the values such as
port count
80 2
57000 1
443 1
I have been able to parse ip2Ports to be ip2Ports_List.column1, ip2Ports_List.column2, ect, but I can only choose one term with term aggregation in the visualization. I can split the chart, but that leads to separate counts for each field. If I go by the original ip2Ports field, it is just aggregated as the string such as, "80,443".
Is it even possible to create a top count visualization of fields with multiple values? If so, how would I do so. If not, is there a way to restructure my data so I can do it? Thank you!
My issue stemmed from the format of the values being sent in by Logstash. I had thought that the 'ip2Ports_List.column1' format, which was a result from using the csv filter, was part of an array. It wasn't. After analyzing it, 'ip2Ports_List.column1' didn't seem to be much different from a new field.
Elastic needed an array to give me the visualization I wanted. I wasn't sure what the best way to produce it was, so I just ended up using the ruby filter. This is what the code ended up looking like:
ruby {
code => "fields = event.get('portsIp').split(',')
event.set('portsIpArray',fields)"
}
Where 'portsIp' looked something like "80,443". Splitting it turned 'portsIp' into a Ruby array. I just set that array as the value for a new event field, 'portsIpArray'.
From there when I tried visualize the 'portsIpArray' field, it looked exactly how I wanted it to, treating each port as separate value, and still associating each port with the same event/field.
Extra:
Also something I discovered is if you're writing your code like I was, directly in the Logstash conf file, Logstash doesn't like it if you use double quotes within the double quoted code. In hindsight it makes sense, but it doesn't give a clear error so it's difficult to figure out.

Extract a particular string given its heading - regex

Am trying to extract a particular string in the below data.
I have to extract the observations mentioned in the data. There are various ways observations has been written, for ex : OBSERVATION, Observation, observed, OBSERVED ....
Please let me know how to extract this.
Data :
Preconditions :1)One valid navigation map should be installed on the MGU.2)One route must be active.\nActions/steps : 1)Press PTT button.\n2)Give voice command "Navigate to Alibaba Restaurant" and observe system\' s behavior.\n\nExpected result/behaviour:\n1) Confirmation prompt of given spoken command should be played. \n2)User shall get the list of POI\'s which are spoken by user for e.g. \n\n\nObserved result/behavior:\n1) Confirmation prompt of given spoken command is not played.\n2) User not able to select POI via speech commands.\n3)User getting the list of POI destination but user not able to select those point via spoken commands for e.g.

Combining phrases from list of words Python3

doing my best to grab information out of a lot of pdf files. Have them in a dictionary format where the key is a given date and the values are a list of occupations.
looks like this when proper:
'12/29/2014': [['COUNSELING',
'NURSING',
'NURSING',
'NURSING',
'NURSING',
'NURSING']]
However, occasionally there are occupations with several words which cannot be reliably understood in single word-form, such as this:
'11/03/2014': [['DENTISTRY',
'OSTEOPATHIC',
'MEDICINE',
'SURGERY',
'SOCIAL',
'SPEECH-LANGUAGE',
'PATHOLOGY']]
Notice that "osteopathic medicine & surgery" and "speech-language pathology" are the full text for two of these entries. This gets hairier when we also have examples of just "osteopathic medicine" or even "medicine."
So my question is this - How should I go about testing combinations of these words to see if they match more complex occupational titles? I can use the same order of the words, as I have maintained that from the source.
Thanks!

Automatic detection of types of logs in logstash

I am new to logstash, elasticsearch and kibana (ELK).
I know that I can create filters that parse specific logs and extract some fields from them. It looks like for each type of log I have to configure a specific filter. As I have around 20 different services, each writing around a hundred of different types of log this looks too difficult to me.
For type of logs I mean logs that have a specific template with parameters that change
This is a example of some logs:
Log1: User Peter has logged in
Log2: User John has logged in
Log3: Message "hello" sent by Peter
Log4: Message "bye" sent by John
I want ELK to discover automatically that here we have two types of log
Type1: User %1 has logged in
Type2: Message "%1" sent by %2
Is that possible? Is there any example to do that? I don't want to write manually the template for each type of log, I want it to be discovered automatically.
Then also extract the parameters. This is what I wold like to see in the index
Log1: Type1, params: Peter
Log2: Type1, params: John
Log3: Type2, params: hello, Peter
Log4: Type2, params: bye, John
After that I would like ELK to scan again my index and discover that param %1 of Type1 is usually param %2 in Type2 (the user name). Also it should discover that Log1 and Log3 are related (same user).
The last thing it should do is finding unusual sequences of actions (logins without the corresponding logout, for example)
Is any of this possible without having to manually configure all types of logs? If not, can you point me to some example of this multipass indexing even if it involves manual configuration?
Logstash has no discovery like this, you'll have to do the language parsing yourself. It's tedious and repetitive, but it gets the job done. You have a few options here, depending on your ability to influence other areas:
If the format of those logs is changeable, consider pushing for an authentication-logging standard. That way you only need one pattern.
Consider a modular approach to generating your filter pipeline. Log1 patterns go in one module, Log2 in another. It makes maintainability easier.
You have my sympathy with this problem. I've had to integrate Logstash with the authentication-logging of many systems by now, and each one describes what they're doing somewhat differently, all based on the whim of the developer who wrote it (which may have happened 25 years ago in some cases).
For the products we develop, I can at least influence how the logging looks. Moving away from a natural language grok format to something else, such as kv or even json goes a long way towards simplifying the parsing problem or me. The trick is convicing people that we only look at the logs through Kibana anyway, why do we need:
User %{user} logged into application %{app} in zone %{zone}
When we can have
user="%{user}" app="%{app}" zone=%{zone}
Or even:
{ "user": %{user}, "app": %{app}, "zone": %{zone} }
Since that's what it'll be when Logstash is done with it anyway.

Finding Related Topics using Google Knowledge Graph API

I'm currently working on a behavioral targeting application and I need a considerably large keyword database/tool/provider that enables applications to reach to the similar keywords via given keyword for my app. I've recently found that Freebase, which had been providing a similar service before Google acquired them and then integrated to their Knowledge Graph. I was wondering if it's possible to have a list of related topics/keywords for the given entity.
import json
import urllib
api_key = 'API_KEY_HERE'
query = 'Yoga'
service_url = 'https://kgsearch.googleapis.com/v1/entities:search'
params = {
'query': query,
'limit': 10,
'indent': True,
'key': api_key,
}
url = service_url + '?' + urllib.urlencode(params)
response = json.loads(urllib.urlopen(url).read())
for element in response['itemListElement']:
print element['result']['name'] + ' (' + str(element['resultScore']) + ')'
The script above returns the queries below, though I'd like to receive related topics to yoga, such as health, fitness, gym and so on, rather than the things that has the word "Yoga" in their name.
Yoga Sutras of Patanjali (71.245544)
Yōga, Tokyo (28.808222)
Sri Aurobindo (28.727333)
Yoga Vasistha (28.637642)
Yoga Hosers (28.253984)
Yoga Lin (27.524054)
Patanjali (27.061115)
Yoga Journal (26.635073)
Kripalu Center (26.074436)
Yōga Station (25.10318)
I'd really appreciate any suggestions, and I'm also open to using any other API if there is any that I could make use of. Cheers.
See your point:) So here's the script I use for that using Serpstat's API. Here's how it works:
Script collects the keywords from Serpstat's database
Then, collects search suggestions from Serpstat's database
Finally, collects search suggestions from Google's suggestions
Note that to make script work correctly, it's preferable to fill all input boxes. But not all of them are required.
Keyword — required keyword
Search Engine — a search engine for which the analysis will be carried out. For example, for the US Google, you need to set the g_us. The entire list of available search engines can be found here.
Limit the maximum number of phrases from the organic issue, which will participate in the analysis. You cannot set more than 1000 here.
Default keys — list of two-word keywords. You should give each of them some "weight" to receive some kind of result if something goes wrong.
Format: type, keyword, "weight". Every keyword should be written from a new line.
Types:
w — one word
p — two words
Examples:
"w; bottle; 50" — initial weight of word bottle is 50.
"p; plastic bottle; 30" — initial weight of phrase plastic bottle is 30.
"w; plastic bottle; 20" — incorrect. You cannot use a two-word phrase for the "w" type.
Bad words — comma-separated list of words you want the script to exclude from the results.
Token — here you need to enter your token for API access. It can be found on your profile page.
You can download the source code for script here

Resources