Alexa Skill Not Distinguishing Letters & Numbers - node.js

We are working on as Alexa Skill which references information by a letter+number sequence. For example, we want to ask "Give me the status of item B1".
We have tried several ways to sort out the "B1" in our Alexa intent configuration, but we are getting very inconsistent (and usually wrong) results.
Our latest attempt was to create a "letters" slot and a "numbers" slot and create our invocation such as: give me the status of {letter} {number} but again, we usually get our slots back with no value in the JSON - or often Alexa will send JSON that contains values that are not in our slot definitions. If I say "B1", Alexa might return "viva" or some other random word.
The problem is in Alexa's interpretation of the voice. If we use the test text entry in the developer console, it (usually) works as expected.
Has anyone had success with letter/number combinations like this?

Ran into similar issues developing a "battleship"-like game: https://www.amazon.com/Josep-Valls-Vargas-Audio-Battleship/dp/B01KJ91K5U/ref=sr_1_3
You are doing it the right way, separate slots for letter and number. Although numbers were usually recognized right, the letters were much more tricky (specially the letter E for some reason). The only thing that worked was to allow users to use arbitrary words and/or a phonetic alphabet. Then in my skill I just slice the first letter of the recognized word.
"intent": "ShootIntent"
"slots": [
{
"name": "Column",
"type": "LIST_OF_COLUMNS"
},
{
"name": "Row",
"type": "AMAZON.NUMBER"
}
],
Then, LIST_OF_COLUMNS is something like: a Adams Alpha b Beta Boston Bravo c Charlie Chi Chicago d Delta Denver e Easy Echo Epsilon f Foxtrot Frank g Gamma George Golf h Henry Hotel i Ida India Iota j John Juliet k Kappa Kilo King l Lambda Lima Lincoln...
In this other similar game I added several alternatives in the sample utterances and added words used within the game: https://www.amazon.com/Josep-Valls-Vargas-Easy-Hangman/dp/B06VY1TK8L/ref=sr_1_2

Try using correct Slot type. AMAZON.Number and AMAZON.firstName are two different slot types.
Also, you can use SSML for this, to make Alexa speak numbers and words.

Related

Extracting sentences with Spacy POS/DEP : actor and action

Thank you for your assistance. I am using spacy to parse though documents to find instances of certain words and extract the sentence in a new df[column].
Here are some texts:
text = 'Many people like Germany. It is a great country. Germany exports lots of technology. France is also a great country. France exports wine. Europeans like to travel. They spend lot of time of beaches. Spain is one of their travel locations. Spain appreciates tourists. Spain's economy is strengthened by tourism. Spain has asked and Germany is working to assist with the travel of tourists to Spanish beaches. Spain also like to import French wine. France would like to sell more wine to Spain.'
My code works like this:
def sent_matcher(text: str) -> list:
doc = nlp(text)
sent_list = []
phrase_matcher = PhraseMatcher(nlp.vocab)
phrases = ['Germany', 'France']
patterns = nlp(data) for data in phrases]
phrase_matcher.add('EU entity', None, * patterns)
for sent in doc.sents:
for match_id, start, end in phrase_matcher(nlp(sent.text)):
if nlp.vocab.strings[match_id] in ['EU entity']:
sent_list.append(sent)
text = (sent_list)
return text
This code works fine and pulls all the sentences that include the EU entity.
However, I wanted to take this to the next level and pull out sentences where the EU entity is the actor and identify what type of action they were taking. I tried using POS/Dependency to pull out Proper nouns combined with the verb but the nsubj was not always correct or the nsubj was linked to another word in a compound noun structure. I tried extracting instances where the country was the first actor (if token == 'x') but I always threw a string error even if I tokenized the word. I also tried using noun_chunks but then I couldn't isolate the instance of the country or tie that chunk back to the verb.
I am pretty new to NLP so any thoughts would be greatly appreciated on how to code this and reap the desired output.
Thank you for your help!
It sounds like if you use merge_entities and follow the rule-based matching guide for the DependencyMatcher you should be able to do this pretty easily. It won't be perfect but you should be able to match many instances.

How to capitalize every word in a string in cases where title() doesn't 100% work out in python

Hi,
I am relatively new to python, and I was wondering why the code below doesn't remain applicable to all of the sample tests in Codewars ("Jaden Casing strings") which is as follows:
Jaden Casing Strings:
Jaden Smith, the son of Will Smith, is the star of films such as The Karate Kid (2010) and After Earth (2013). Jaden is also known for some of his philosophy that he delivers via Twitter. When writing on Twitter, he is known for almost always capitalizing every word. For simplicity, you'll have to capitalize each word, check out how contractions are expected to be in the example below.
Your task is to convert strings to how they would be written by Jaden Smith. The strings are actual quotes from Jaden Smith, but they are not capitalized in the same way he originally typed them.
Example:
Not Jaden-Cased: "How can mirrors be real if our eyes aren't real"
Jaden-Cased: "How Can Mirrors Be Real If Our Eyes Aren't Real"
Link to Jaden's former Twitter account #officialjaden via archive.org
My code:
def to_jaden_case(string):
for word in string:
if "'" in word:
word.capitalize()
else:
word.title()
return string
I am also new to Python I tired below method which seems to work:
def to_jaden_case(string):
return ' '.join(i.capitalize() for i in string.split())
I was trying to use .title() in different ways but couldn't seem get solution with that but i could split string and capitalize every word.

when calculating the cooccurance of two words, do we sepate the sentences or linking all sentences?

For example, I get I document that contains 2 sentences: I am a person. He also likes apples.
Do we need to count the cooccurrence of "person" and "He" ?
Each document is separated with a line break. Context windows of cooccurrences are limited to each document.
Based on the implementation here.
A newline is taken as indicating a new document (contexts won't cross newline).
So, depending on how you prepare sentences, you may get different results:
Setting 1: ('He', 'person') cooccurred
...
I am a person. He also likes apples.
...
Setting 2: ('He', 'person') not cooccurred
...
I am a person.
He also likes apples.
...

How to resolve English sentence verbs semantically

I am trying to transform English statements into SQL queries.
e.g. How many products were created last year?
This should get transformed to select count(*) from products
where manufacturing date between 1/1/2015 and 31/12/2015
I am not able to understand how to map the verb "created" to "manufacturing date" attribute in my table. I am using Stanford core nlp suite to parse my statement. I am also using wordnet taxonomies with JWI framework.
I have tried to map the verbs to the attributes by defining simple rules. But it is not a very generic approach, since I can not know all the verbs in advance. Is there any better way to achieve this?
I would appreciate any help in this regard.
I know this would require a tool change, but I would reccommend checking out Adapt by Mycroft AI.
It is a very straightforward intent parser which transforms user input into a json semantic representation.
For example:
Input: "Put on my Joan Jett Pandora station."
JSON:
{
"confidence": 0.61,
"target": null,
"Artist": "joan jett",
"intent_type": "MusicIntent",
"MusicVerb": "put on",
"MusicKeyword": "pandora"
}
It looks like the rules are very easy to specify and expand so you would just need to build out your rules and then have whatever tool you want process the JSON and send the SQL query.

English word segmentation in NLP?

I am new in the NLP domain, but my current research needs some text parsing (or called keyword extraction) from URL addresses, e.g. a fake URL,
http://ads.goole.com/appid/heads
Two constraints are put on my parsing,
The first "ads" and last "heads" should be distinct because "ads" in the "heads" means more suffix rather than an advertisement.
The "appid" can be parsed into two parts; that is 'app' and 'id', both taking semantic meanings on the Internet.
I have tried the Stanford NLP toolkit and Google search engine. The former tries to classify each word in a grammar meaning which is under my expectation. The Google engine shows more smartness about "appid" which gives me suggestions about "app id".
I can not look over the reference of search history in Google search so that it gives me "app id" because there are many people have searched these words. Can I get some offline line methods to perform similar parsing??
UPDATE:
Please skip the regex suggestions because there is a potentially unknown number of compositions of words like "appid" in even simple URLs.
Thanks,
Jamin
Rather than tokenization, what it sounds like you really want to do is called word segmentation. This is for example a way to make sense of asentencethathasnospaces.
I haven't gone through this entire tutorial, but this should get you started. They even give urls as a potential use case.
http://jeremykun.com/2012/01/15/word-segmentation/
The Python wordsegment module can do this. It's an Apache2 licensed module for English word segmentation, written in pure-Python, and based on a trillion-word corpus.
Based on code from the chapter “Natural Language Corpus Data” by Peter Norvig from the book “Beautiful Data” (Segaran and Hammerbacher, 2009).
Data files are derived from the Google Web Trillion Word Corpus, as described by Thorsten Brants and Alex Franz, and distributed by the Linguistic Data Consortium. This module contains only a subset of that data. The unigram data includes only the most common 333,000 words. Similarly, bigram data includes only the most common 250,000 phrases. Every word and phrase is lowercased with punctuation removed.
Installation is easy with pip:
$ pip install wordsegment
Simply call segment to get a list of words:
>>> import wordsegment as ws
>>> ws.segment('http://ads.goole.com/appid/heads')
['http', 'ads', 'goole', 'com', 'appid', 'heads']
As you noticed, the old corpus doesn't rank "app id" very high. That's ok. We can easily teach it. Simply add it to the bigram_counts dictionary.
>>> ws.bigram_counts['app id'] = 10.2e6
>>> ws.segment('http://ads.goole.com/appid/heads')
['http', 'ads', 'goole', 'com', 'app', 'id', 'heads']
I chose the value 10.2e6 by doing a Google search for "app id" and noting the number of results.

Resources