Which FaceAPI to Verify a person? - azure

I'm new to the cognitive services, and going through the docs still a little confused as to which of the methods are needed to verify someone who appears in two different images. Basically want to make sure the person is the same person in both images.

You can use Face verification
This will check the likelihood that two faces belong to the same person. The API will return a confidence score about how likely it is that the two faces belong to one person. If the faces in 2 images are matching then the confidence score returned by the method will be nearer to 1.
You can refer documentation here about Face- Verify.

Related

Is an order something transient or not

In my company (train company) there is a sort of battle going on over two viewpoints on something. Before going to deep into the problem I'm first going to explain the different domains we have in our landscape now.
Product: All product master data and their characteristics.
Think their name, their possible list of choices...
Location: All location master data that can be chosen, like stations, stops, etc.
Quote: To get a price for a specific choice of a product with their attributes.
Order: The order domain where you can make a positive order but also a negative one for reimbursements.
Ticket: This is essentially what you get from paying the order. Its the product but in the state that its at, when gotten by the customer.
The problem
Viewpoint PURPLE (I don't want to create bias)
When an order is transformed into all "tickets", we convert the order details, like price, into the ticket model. In order to make Order something we can throw away. Order is seen as something transient. Kind of like the bag you have in a supermarket. Its the goods inside the bag that matter. Not the bag itself.
When a reimburse flow would start. You do not need to go to the order. You would have everything in the Ticket domain. So this means data from order will be duplicated to Ticket.
But not all, only the things that are relevant. Like price for example.
Viewpoint YELLOW (I don't want to create bias)
You do the same as above but you do not store the price in Ticket domain. The ticket domain only consist of details that are relevant for the "ticket" to work. Price is not allowed in there cause its a thing of the order. When a reimburse flow would start, its allowed to go fetch those details from the order. Making order not something you can throw away as its having crucial data inside of it.
The benefit here is that Order is not "polluting" the Ticket with unnecessary data. But this is debatable. The example of the price is a good example.
I wish to know your ideas about these two viewpoints.
There is no "Don't repeat yourself" when it comes to the business domain. The only thing that dictates the business domain is the business requirements. If the requirements state that the ticket should work independent of the order changes, then you have to duplicate things.
But in this case, the requirements are ambiguous. There is no correct design using the currently specified requirements. Building code based on assumptions is the #1 way of getting bad code, since you most likely will have to do a redesign down the road.
You need to go back to the product owner and ask him about the difference between the Order and the Ticket.
For instance:
What should happen to the ticket if the order is deleted?
What happens to the order and/or ticket if the product price changes?
What happens to a ticket if the order is reimbursed?
Go back, get better requirements and then start to design the application.

Dialogflow: Not able to identify simple phrases

Not able to identify simple phrases like "my name is not Harry, it's Sam".
It is giving me name as harry and company name as Sam, Since name and company name was required in the same sentence.
It should have taken name as Sam and prompted the user again for company name OR should have given complete fallback.
Hi and welcome to Stackoverflow.
Dude. This is not a simple phrase.
Negative questions are always very difficult to catch by Dialogflow.
Suppose I have a question like,
I want to check *google* revenue for the year *2017*
As you can see, google and 2017 are the entities.
But now in the same way if you say,
I don't want to check *google* revenue for the year *2017*
The chances of hitting that old intent is very high as dialogflow matches almost 90% of this sentence with your old sentence. So it might fail.
Hope you are trying to ask something similar to this.
Anyhow coming to your point, If company name and name are different entities, then
Two things you can avoid:
As everyone mentioned,check your entities. The values should not be present in both the entities. This will fail because dialogflow will not know whether it should treat 'Sam' as your name or company name.
If you are not using the values from an entity, instead using '$ANY', then It has a very high chance of failing. And If you are using Dialogflow's system entity like, $given-name, then that is also not preferred as it does not catch all the names. So avoid these entities.
Things you can try:
Train Train And Train. As you would be aware, the training section in dialogflow is pretty good. Train it a few times and it will automatically learn and master it.
But , please note: Wrong training will result in wrong results. It should be 100% accurate. Always check before you approve a training.
And try using webHooks, actions, and/or events to figuring your way out from an external API.

suggest list of how-to articles based on text content

I have 20,000 messages (combination of email and live chat) between my customer and my support staff. I also have a knowledge base for my product.
Often times, the questions customers ask are quite simple and my support staff simply point them to the right knowledge base article.
What I would like to do, in order to save my support staff time, is to show my staff a list of articles that may likely be relevant based on the initial user's support request. This way they can just copy and paste the link to the help article instead of loading up the knowledge base and searching for the article manually.
I'm wondering what solutions I should investigate.
My current line of thinking is to run analysis on existing data and use a text classification approach:
For each message, see if there is a response with a link to a how-to article
If Yes, extract key phrases (microsoft cognitive services)
TF-IDF?
Treat each how-to as a 'classification' that belongs to sets of key phrases
Use some supervised machine learning, support vector machines maybe to predict which 'classification, aka how-to article' belongs to key phrase determined from a new support ticket.
Feed new responses back into the set to make the system smarter.
Not sure if I'm over complicating things. Any advice on how this is done would be appreciated.
PS: naive approach of just dumping 'key phrases' into search query of our knowledge base yielded poor results since the content of the help article is often different than how a person phrases their question in an email or live chat.
A simple classifier along the lines of a "spam" classifier might work, except that each FAQ would be a feature as opposed to a single feature classifier of spam, not-spam.
Most spam-classifiers start-off with a dictionary of words/phrases. You already have a start on this with your naive approach. However, unlike your approach a spam classifier does much more than a text search. Essentially, in a spam classifier, each word in the customer's email is given a weight and the sum of weights indicates if the message is spam or not-spam. Now, extend this to as many features as FAQs. That is, features like: FAQ1 or not-FAQ1, FAQ2 or not-FAQ2, etc.
Since your support people can easily identify which of the FAQs an e-mail requires then using a supervised learning algorithm would be appropriate. To reduce the impact of any miss-classification errors, then consider the application presenting a support person with the customer's email followed by the computer generated response and all the support person would have to-do is approve the response or modify it. Modifying a response should result in a new entry in the training set.
Support Vector Machines are one method to implement machine learning. However, you are probably suggesting this solution way too early in the process of first identifying the problem and then getting a simple method to work, as well as possible, before using more sophisticated methods. After all, if a multi-feature spam classifier works why invest more time and money in something else that also works?
Finally, depending on your system this is something I would like to work-on.

Spark Item Similarity Interpretation (Cross-Similarity and Similarity)

I've been using Spark Item Similarity through mahout by following the steps in this article:
https://mahout.apache.org/users/algorithms/intro-cooccurrence-spark.html
I was able to clean my data, setup a local-only spark/hadoop node and all that.
Now, my question relies more in the interpretation of the matrices. I've tried some Google queries with limited success.
I'm creating a multi-modal recommender - and one of my datasets is very similar to the Mahout example.
Example input:
Customer ActionName Product
11064612 view 241505
11086047 purchase 110915
11121878 view CERT_DL
11149030 purchase CERT_FS
11104130 view 111401
The output of mahout is 2 sets of matrices. A similarity matrix and a coocurrence matrix.
This is my similarity matrix (I assume mahout uses my "filter1" purchases)
**791207-WP** 791520-WP:11.350536461453885 791520:9.547158147208393 76130142:7.938639976084232 711215:7.0641921646893024 751309:6.805891904514283
So how would I interpret this? If someone purchased 791207-WP they could be interested in 791520-WP? (so I'd use the left part against purchases of a customer and rank products in the right part?).
The row for 791520-WP looks like this:
791520-WP 76151220:18.954662238247693 791604-WP:13.951210170984268
So, in theory, I'd recommend 76151220 to someone who bought 791520-WP, correct?
Part 2 of the question is interpreting the cross-similarity matrix. Remember my filter2 is "views".
How would I interpret this:
**790907** 76120956:14.2824428207241 791500-LXQ2:13.864741460885853 190907:10.735807818360627
I take this matrix as "someone who visited the 76120956 web page ended up purchasing 790907". So I should promote 790907 to customers who bought 76120956 and maybe even add a link between these 2 products on our site, for example.
Or is it "people who visited the webpage of 790907 ended up buying 76120956"?
My plan is not to use these as-is. I'll still use RowSimilarity and different sources to rank products - but I'm missing the basic interpretation of the outputs from mahout.
If you know of any documentation that clarifies this, that would be a great asset to have.
Thank you.
In both cases the matrix is telling you that the item-id key is similar to the listed items by the LLR value attached to each similar item. Similar in the sense that similar users purchased the items. In the second case it is saying that similar people viewed the items and this view also appears to have led of a purchase of the same item.
Cooccurrence works for purchases alone, cross-occurrence adds the check to make sure the view also correlated with a purchase. This allows you to use both for recommendations.
The output is meant to be used with a search engine generally and you would use a user's history of purchases and views as a 2 field query against the matrices, one in each field.
There are analogous methods to find item-based recommendations.
Better yet, use something like the Universal Recommender here: actionml.com/docs/ur with PredictionIO for an end-to-end system.

Azure ML Recommendations

I want to use Azure ML to find related products using information from receipts from a store.
I got a file of reciepts:
44366,136778
79619,88975
78861,78864
53395,78129,78786,79295,79353,79406,79408,79417,85829,136712
32340,33973
31897,32905
32476,32697,33202,33344,33879,34237,34422,48175,55486,55490,55498
17800
32476,32697,33202,33344,33879,34237,34422,48175,55490,55497,55498,55503
47098
136974
85832
Each row represent one receipt and each number is a product id.
Given a product id I want to get a list of similar products, i.e. products that was bought together by other customers.
Can anyone point me in the right direction of how do to this?
This seems a good fit for their frequently bought together service (https://datamarket.azure.com/dataset/amla/mba). You may have to preprocess the dataset to get it in the required format. This service has a web UI as well: https://marketbasket.cloudapp.net/
This is a typical problem for Recommender, you can use a model called Machbox recommender to cover such a problem.
Recommender typically use Scoring about items to propose and the use some tricky calculation to predict scores for items users had not scored yet ( a score would be typically 1 user bought the item, 0 he did not)
If you need more details let me know ..(you have access to a free version of Azure ML where you can try all this)
Regards

Resources