I'm testing NLP tools and right now I'm facing a problem with Rasa NLU.
With API.AI, Wit.ai and LUIS.AI I could find the entities I want with no more than 8-10 examples. With Rasa, on the other hand, I already have 18 examples and I could never find an entity. Even if my query matches exactly one of my examples, I still have an empty entities array as the result.
I'm using Rasa with the recommended Docker instance and my current pipeline is ["nlp_spacy", "tokenizer_spacy", "intent_featurizer_spacy", "ner_crf", "ner_synonyms", "intent_classifier_sklearn" and "ner_duckling"].
I specify my project and my model within my query, like this:
localhost:5000/parse?q=my_sentence&project=my_project&model=my_model
Any useful information is appreciated.
Thank you!
Update with examples
{
"text": "How can I make a carrot cake?",
"intent": "AskRecipe",
"entities": [
{
"start": 17,
"end": 27,
"value": "carrot cake",
"entity": "recipe"
}
]
},
{
"text": "What do I need to make a Lemon Pie?",
"intent": "AskRecipe",
"entities": [
{
"start": 25,
"end": 33,
"value": "Lemon Pie",
"entity": "recipe"
}
]
},
{
"text": "What do I need to make brownies?",
"intent": "AskRecipe",
"entities": [
{
"start": 23,
"end": 30,
"value": "brownies",
"entity": "recipe"
}
]
}
Then when I try, for instance, to extract information from "What do I need to make brownies?" (which is also listed as a example) this is the result:
{"entities": [], "intent": {"confidence": 0.8870822891508189, "name": "AskRecipe"}, "text": "What do I need to make brownies?", "intent_ranking": [{"confidence": 0.8870822891508189, "name": "AskRecipe"}, {"confidence": 0.11291771084918109, "name": "greet"}]}
I tried many other examples, but none of them worked.
I solved this problem.
In my config.json file, I updated my pipeline value to "scapy_sklearn" as opposed to ["nlp_spacy", "tokenizer_spacy", "intent_featurizer_spacy", "ner_crf", "ner_synonyms", "intent_classifier_sklearn" and "ner_duckling"].
Also, I restarted my docker instance after I trained a new model.
I must say, though, the docker instance with which I succeeded is not the same as the one I was using when I posted this issue, so, honestly, I can't be 100% sure I didn't break any configuration before - although I believe I didn't.
I hope this helps someone :)
I had the same issue of Rasa not recognizing entities. I see you've solved the issue in a different way, I'll just add what worked for me cos I see the same mistake I made, in the examples posted -
The end value must be ('start' index) + (length of 'value') so it should be every 'end' value shown here +1.
I know it's very simple, but it worked for me.
Related
I am currently grabbing the entire swagger.json contents:
swagger_link = "https://<SWAGGER_URL>/swagger/v1/swagger.json"
swagger_link_contents = requests.get(swagger_link)
#write the returned swagger to a file
swagger_return = swagger_link_contents.json()
with open ("swagger_raw.json", "w") as f:
f.write(json.dumps(swagger_return, indent=4))
The file is filled with beautiful JSON now. But there are a ton of objects in there, quite a few I don't need. Say I wanted to grab the command "UpdateCookieCommand" from my document:
"parameters": [
{
"name": "command",
"in": "body",
"required": true,
"schema": {
"$ref": "#/definitions/UpdateCookieCommand"
},
"x-nullable": false
}
],
but it also has additional mentions later in the document...
UpdateCookieCommand": {
"allOf": [
{
"$ref": "#/definitions/BaseCommand"
},
{
"type": "object",
"required": [
"command",
"id"
],
The latter object is really what I want to take from the document. It tells me what's required for a nominal API command to Update a Cookie. The hope is I could have a script that will look at every release and be able to absorb changes of the swagger and build new commands.
What methodology would you all recommend to accomplish this? I considered making subprocess Linux commands to just awk, sed, grep it to what I want but maybe there's a more elegant way? Also I won't know the amount of lines or location of anything as a new Swagger will be produced each release and my script will need to automatically run with a CI/CD pipeline.
I tried every code available in internet for scraping google playstore. The url "https://play.google.com/store/getreviews" worked for 1 day. It is not working now. I'm getting 404 error.
Please help
You could try a third-party solution like SerpApi. We handle proxies, solve captchas, and parse all rich structured data for you.
With this API you can retrieve up to 199 reviews per page, and then use next_page_token to get the rest.
Example python code for retrieving YouTube reviews (available in other libraries also):
from serpapi import GoogleSearch
params = {
"api_key": "SECRET_API_KEY",
"engine": "google_play_product",
"store": "apps",
"gl": "us",
"product_id": "com.google.android.youtube",
"all_reviews": "true"
}
search = GoogleSearch(params)
results = search.get_dict()
Example JSON output:
"reviews": [
{
"title": "Qwerty Jones",
"avatar": "https://play-lh.googleusercontent.com/a/AATXAJwSQC_a0OIQqkAkzuw8nAxt4vrVBgvkmwoSiEZ3=mo",
"rating": 3,
"snippet": "Overall a great app. Lots of videos to see, look at shorts, learn hacks, etc. However, every time I want to go on the app, it says I need to update the game and that it's \"not the current version\". I've done it about 3 times now, and it's starting to get ridiculous. It could just be my device, but try to update me if you have any clue how to fix this. Thanks :)",
"likes": 586,
"date": "November 26, 2021"
},
{
"title": "matthew baxter",
"avatar": "https://play-lh.googleusercontent.com/a/AATXAJy9NbOSrGscHXhJu8wmwBvR4iD-BiApImKfD2RN=mo",
"rating": 1,
"snippet": "App is broken, every video shows no dislikes even after I hit the button. I've tested this with multiple videos and now my recommended is all messed up because of it. The ads are longer than the videos that I'm trying to watch and there is always a second ad after the first one. This app seriously sucks. I would not recommend this app to anyone.",
"likes": 352,
"date": "November 28, 2021"
},
{
"title": "Operation Blackout",
"avatar": "https://play-lh.googleusercontent.com/a-/AOh14GjMRxVZafTAmwYA5xtamcfQbp0-rUWFRx_JzQML",
"rating": 2,
"snippet": "YouTube used to be great, but now theyve made questionable and arguably stupid decisions that have effectively ruined the platform. For instance, you now have the grand chance of getting 30 seconds of unskipable ad time before the start of a video (or even in the middle of it)! This happens so frequently that its actually a feasible option to buy an ad blocker just for YouTube itself... In correlation with this, YouTube is so sensitive twords the public they decided to remove dislikes. Why????",
"likes": 370,
"date": "November 24, 2021"
},
...
],
"serpapi_pagination": {
"next": "https://serpapi.com/search.json?all_reviews=true&engine=google_play_product&gl=us&hl=en&next_page_token=CpEBCo4BKmgKR_8AwEEujFG0VLQA___-9zuazVT_jmsbmJ6WnsXPz8_Pz8_PxsfJx5vJns3Gxc7FiZLFxsrLysnHx8rIx87Mx8nNzsnLyv_-ECghlTCOpBLShpdQAFoLCZiJujt_EovhEANgmOjCATIiCiAKHmFuZHJvaWRfaGVscGZ1bG5lc3NfcXNjb3JlX3YyYQ&product_id=com.google.android.youtube&store=apps",
"next_page_token": "CpEBCo4BKmgKR_8AwEEujFG0VLQA___-9zuazVT_jmsbmJ6WnsXPz8_Pz8_PxsfJx5vJns3Gxc7FiZLFxsrLysnHx8rIx87Mx8nNzsnLyv_-ECghlTCOpBLShpdQAFoLCZiJujt_EovhEANgmOjCATIiCiAKHmFuZHJvaWRfaGVscGZ1bG5lc3NfcXNjb3JlX3YyYQ"
}
Check out the documentation for more details.
Test the search live on the playground.
Disclaimer: I work at SerpApi.
I was trying out Phoenetic search using Azure Search without much luck. My objective is to work out an Index configuration that can handle typos and accomodate phonetic search for end users.
With the below configuration and sample data, I was trying to search for intentionally misspelled words like 'softvare' or 'alek'. I got results for 'alek' thanks for Phonetic analyzer; but didn't get any results for 'softvare'.
Looks like for this requirement phonetic search will not do the trick.
Only option that I found was to use synonyms map. The major pitfall is that I'm unable to use the Phonetics / Custom analyzer along with Synonyms :(
What are the various strategies that you would recommend for taking care of typos?
search query used
?api-version=2017-11-11&search=alec
?api-version=2017-11-11&search=softvare
Here is the index configuration
"name": "phonetichotels",
"fields": [
{"name": "hotelId", "type": "Edm.String", "key":true, "searchable": false},
{"name": "baseRate", "type": "Edm.Double"},
{"name": "description", "type": "Edm.String", "filterable": false, "sortable": false, "facetable": false, "analyzer":"my_standard"},
{"name": "hotelName", "type": "Edm.String", "analyzer":"my_standard"},
{"name": "category", "type": "Edm.String", "analyzer":"my_standard"},
{"name": "tags", "type": "Collection(Edm.String)", "analyzer":"my_standard"},
{"name": "parkingIncluded", "type": "Edm.Boolean"},
{"name": "smokingAllowed", "type": "Edm.Boolean"},
{"name": "lastRenovationDate", "type": "Edm.DateTimeOffset"},
{"name": "rating", "type": "Edm.Int32"},
{"name": "location", "type": "Edm.GeographyPoint"}
],
Analyzer (part of the index creation)
"analyzers":[
{
"name":"my_standard",
"#odata.type":"#Microsoft.Azure.Search.CustomAnalyzer",
"tokenizer":"standard_v2",
"tokenFilters":[ "lowercase", "asciifolding", "phonetic" ]
}
]
Analyze API Input and Output for 'software'
{
"analyzer":"my_standard",
"text": "software"
}
{
"#odata.context": "https://ctsazuresearchpoc.search.windows.net/$metadata#Microsoft.Azure.Search.V2017_11_11.AnalyzeResult",
"tokens": [
{
"token": "SFTW",
"startOffset": 0,
"endOffset": 8,
"position": 0
}
]
}
Analyze API Input and Output for 'softvare'
{
"analyzer":"my_standard",
"text": "softvare"
}
{
"#odata.context": "https://ctsazuresearchpoc.search.windows.net/$metadata#Microsoft.Azure.Search.V2017_11_11.AnalyzeResult",
"tokens": [
{
"token": "SFTF",
"startOffset": 0,
"endOffset": 8,
"position": 0
}
]
}
Sample data that I loaded
{
"#search.action": "upload",
"hotelId": "5",
"baseRate": 199.0,
"description": "Best hotel in town for software people",
"hotelName": "Fancy Stay",
"category": "Luxury",
"tags": ["pool", "view", "wifi", "concierge"],
"parkingIncluded": false,
"smokingAllowed": false,
"lastRenovationDate": "2010-06-27T00:00:00Z",
"rating": 5,
"location": { "type": "Point", "coordinates": [-122.131577, 47.678581] }
},
{
"#search.action": "upload",
"hotelId": "6",
"baseRate": 79.99,
"description": "Cheapest hotel in town ",
"hotelName": " Alec Baldwin Motel",
"category": "Budget",
"tags": ["motel", "budget"],
"parkingIncluded": true,
"smokingAllowed": true,
"lastRenovationDate": "1982-04-28T00:00:00Z",
"rating": 1,
"location": { "type": "Point", "coordinates": [-122.131577, 49.678581] }
},
With the right configuration, I should have got results even with the misspelled words.
I work on Azure Search. Before I suggest approaches to handle misspelled words, it would be helpful to look at your custom analyzer (my_standard) configuration. It might tell us why it's not able to handle the case for 'softvare'. As a DIY, you can use the Analyze API to see the tokens created using your custom analyzer and it should contain 'software' to actually match the docs.
Now then, here are a few ways that can be used independently or in conjunction to handle misspelled words. The best approach varies depending on the use-case and I strongly suggest you experiment with these to figure out the best one in your case.
You are already familiar with phonetic filters which is a common approach to handle similarly pronounced terms. If you haven't already, try different encoders for the filter to evaluate which configuration gives you the best results. Check out the list of encoders here.
Use fuzzy queries supported as part of the Lucene query syntax in Azure Search which returns terms that are near the original query term based on a distance metric. The limitation here is that it works on a single term. Check the docs for more details. Sample query would look like - search=softvare~1 You can also use term boosting to give the original term more boost in cases where the original term is also a valid term.
You also alluded to synonyms which is also used to query with misspelled terms. This approach gives you the most control over the process of handling typos but also require you to have prior knowledge of different typos for terms. You can use these docs if you want to experiment with synonyms.
As you could read in my post; my Objective was to handle the typos.
The only easy option is to use the inbuilt Lucene functionality - Fuzzy Search. I'm yet to check on the response times as the querytype has to be set to 'full' for using fuzzy search. Otherwise, the results were satisfactory.
Example:
search=softvare~&fuzzy=true&querytype=full
will return all documents with the 'Software' in it.
For further reading please go through Documentation
I have an intent where I might say 'Transfer 4 to Bob' and it identifies this as 'Transfer for to Bob'
Also I might say 'Transfer 10 to Bob and it identifies this as 'Transfer 102 Bob' treating to word to as 2 on the end of the previous number.
What is the best way to get API.AI to recognise these parts correctly so 4 is not for and to is not 2?
You mentioned that you're using the Actions on Google platform. This means that speech recognition - the process of translating what the user says into text - is happening before the data gets to API.AI.
The problem you're experiencing is that Actions on Google is misrecognizing some numbers as words, e.g. four becomes for.
Because this happens before - and separately from - API.AI, you won't be able to fix the misrecognition.
Below, I'll explain how you can work around this issue in API.AI. However, it's also worth thinking about how you could make your conversation design as robust as possible so that issues like this are less likely to cause problems.
One way you could increase robustness would be to mark the number as a required parameter in API.AI so the user is prompted if it isn't detected due to a recognition error. In that case, the dialog would go like this:
User: Give me four lattes.
App: Sure, four lattes coming up.
User: Give me for lattes.
App: How many do you want?
User: Four.
App: Sure, four lattes coming up.
Regardless, here's a workaround you can use to help recover from misrecognition:
In your intent, provide examples of these commonly misrecognized values. Highlight and mark them as numbers.
Test out your intent out in the console and you'll see that "for" is now matched as a "number" entity with value "for".
In your fulfillment webhook, check the parameter for this value and convert it to the appropriate number using a dictionary. Here's the JSON for the above query:
{
"id": "994c4e39-be49-4eae-94b0-077700ef87a3",
"timestamp": "2017-08-03T19:50:26.314Z",
"lang": "en",
"result": {
"source": "agent",
"resolvedQuery": "Get me for lattes",
"action": "",
"actionIncomplete": false,
"parameters": {
"drink": "lattes",
"number": "for" // NOTE: Convert this to "4" in your webhook
},
"contexts": [],
"metadata": {
"intentId": "0e1b0e72-78ba-4c61-a4fd-a73788034de1",
"webhookUsed": "false",
"webhookForSlotFillingUsed": "false",
"intentName": "get drink"
},
"fulfillment": {
"speech": "",
"messages": [
{
"type": 0,
"speech": ""
}
]
},
"score": 1
},
"status": {
"code": 200,
"errorType": "success"
},
"sessionId": "8b0891c1-50c8-43c6-99c4-8f77261acf86"
}
I want to build a cloud based solution in which I would give a pool of images; and then ask for "find similar image to a particular image from this pool of images" !! Pool of images can be like "all t-shirt" images. Hence, similar images mean "t-shirt with similar design/color/sleeves" etc.
Tagging solution won't work as they are at very high level.
AWS Rekognition gives "facial similarities" .. but not "product similarities" .. it does not work like for images of dresses..
I am open to use any cloud providers; but all are providing "tags" of the image which won't help me.
One solution could be that I use some ML framework like MXNet/Tensorflow, create my own models, train them and then use.. But is there any other ready made solution on any of cloud providers ?
ibm-bluemix has an api to find similar images https://www.ibm.com/watson/developercloud/visual-recognition/api/v3/#find_similar
Using the Azure Cognitive Services (Computer Vision to be more precise) you can get categories, tags, a caption and even more info for images. Processing all your images would provide tags for your image pool. And that enables you to get similar images based on (multiple) identical tags.
This feature returns information about visual content found in an image. Use tagging, descriptions, and domain-specific models to identify content and label it with confidence. Apply the adult/racy settings to enable automated restriction of adult content. Identify image types and color schemes in pictures.
An example of (part of) a result of the Computer Vision API:
Description{
"Tags": [
"train",
"platform",
"station",
"building",
"indoor",
"subway",
"track",
"walking",
"waiting",
"pulling",
"board",
"people",
"man",
"luggage",
"standing",
"holding",
"large",
"woman",
"yellow",
"suitcase"
],
"Captions": [
{
"Text": "people waiting at a train station",
"Confidence": 0.8331026
}
]
}
Tags[
{
"Name": "train",
"Confidence": 0.9975446
},
{
"Name": "platform",
"Confidence": 0.995543063
},
{
"Name": "station",
"Confidence": 0.9798007
},
{
"Name": "indoor",
"Confidence": 0.927719653
},
{
"Name": "subway",
"Confidence": 0.838939846
},
{
"Name": "pulling",
"Confidence": 0.431715637
}
]
You could also use the Bing Image Search API (https://azure.microsoft.com/en-us/services/cognitive-services/bing-image-search-api/) which allows you to do an image search based on specified criteria within your solution...
You could use a combination of things here. Use an image tagging service aws rekognition, or any of those listed above) then create some training data with like images, and upload that into something like aws machine learning. This is a bit similar to what has been brought up earlier, however I am trying to elucidate that while tagging might not be the end all be all of your solution, it will likely play a role as a first to step to a more complex process.
pls check the site http://cloudsight.ai/api, and try the demo. the sample result would be
{
"token": "BBKA0lW9O-B2eamXUysdXA",
"url": "http://assets.cloudsight.ai/uploads/image_request/image/314/314978/314978186/79379_86cb4e2611d6b0a3287a926a1ca1fe51_image1_zoom.jpg",
"ttl": 54,
"status": "completed",
"name": "men's red and black checkered button-up shirt"
}
{
"token": "bjX7nWGs0toajIDwyvXxlw",
"url": "http://assets.cloudsight.ai/uploads/image_request/image/314/314987/314987168/11.jpg",
"ttl": 54,
"status": "completed",
"name": "blue, gray and navy blue stripe crew-neck T-shirt"
}