Azure Search, using prefix with Scoring Profile - azure

When using a prefix like "tru*" I see the score of the results are stopped been calculated against the Scoring Profile.
I'm looking for a solution to searching part of a word and also order the results.
Two images show a search with '*' and without,

Eventually the Scoring Profile does work, but since using '*' the base Score is so low that no difference is shown after scoring profile.
The best solution for me was using Order By in the search request.

Related

Scoring relevancy in Azure Search

Is it possible to apply a profile scoring considering that a file must have a greater relevancy for one country than others?
How can I manage a scoring like that?
You can add Scoring Profiles to Azure Search index
The example in documentation gives sample for boosting score based on geo location.

Best practice wit ai training

I am developing an apps that use wit ai as a service. Right now, I am having problems training it. In my apps I have 3 intents:
to call
to text
to send picture
Here are my example training:
Call this number 072839485 and text this number 0623744758 and send picture to this number 0834952849.
Call this number 072839485, 0834952849 and 0623744758
In my first training I labeled that sentence with all 3 intents, and 072839485 as phone_number with role to_call_phone_number, 0623744758 as phone_number with role to_text_phone_number and 0834952849 as phone_number with role to_send_pic_phone_number.
In my second training I labeled all the 3 numbers as phone_number with to_call_phone_number role.
After many training, the wit still output the wrong labelled. When the sentence like this:
Call this number 072637464, 07263485 and 0273847584
The wit says 072637464 is to_call_phone_number but 07263485 and 0273847584 are to_send_pic_phone_number.
Am I not correctly training it? Can some one give me some suggestions about the best practice to train wit?
There aren't many best practices out there for wit.ai training at the moment, but with this particular example in mind I would recommend the following:
Pay attention to the type of entity in addition to just the value. If you choose free-text or keyword, you'll get different responses from the wit engine. For example: in your training if the number is a keyword, it'll associate the particular number with the intent/role rather than the position. This is probably the reason your training isn't working correctly.
One good practice would be to train your bot with specific examples first which will provide the bot with more information (such as user providing keyword 'photograph' and number) and then general examples which will apply to more cases (such as your second example).
Think about the user's perspective and what would seem natural to them. Work with those training examples first. Generate a list of possible training examples labelling them from general to specific and then train intents/roles/entities based on those examples rather than thinking about intents and roles first.

Can i predict data price based on a survey on azure machine learning?

I want to predict my input price based on a list of questions/answers using azure machine learning.
I built one using the "bayesian linear regression" but it seems that it is predicting the price based on the prices i have in my dataset and not based on the Q/A.
Am i in the wrong path or am i missing something?
Any suggestion would be helpful.
Check the Q/A s that you using is not having missing values. If there's any missing values follow data preprocessing techniques to fill those.
What kind of answers do you have as inputs? (yes/no, numeric values, different textual answers, etc...) In my opinion numerical values and yes/no inputs makes your model more accurate.
Try different regression algorithms (https://azure.microsoft.com/en-us/documentation/articles/machine-learning-algorithm-cheat-sheet/) and check their accuracy.
you need to set features and label properly. if you publish your experiment in Gallery using unlisted mode and paste the link here, we can take a look.

How can I convert probability into score?

I am now working on a document recommendation program and I am kinda stuck here.
For each document, I have a score assigned according to user's actions. Then, when a new document comes in, I need to predict how user will like it and rerank the whole documents again according to their scores. My solution is to use a threshold to divide those scores into "recommend" and "not recommend". Then naiveBayes or other classification models can either give me a label or return the possibility of that label (I am using NLTK package to do text analytics).
Am I on the right way? My question is when I get that possibility, how can I convert it into the score that I use to do the ranking? Or I should use logistic regression in scikit instead?
Thanks!
It sounds like you are trying to force a ranking problem into a classification problem. What you really want to do is learn how to rank the documents given a "query".
I would suggest trying out something like the SVM-Rank algorithm. It takes as input a set of "recommended" and "not recommended" vectors and then learns how to rank them so that the recommended ones come first. There is also a simple python tool in dlib you can use to do it. See here for an example: http://dlib.net/svm_rank.py.html

Apache lucene inverted index

Does Lucene index use tf-idf as weights? Is it possible to define your own statistics and weights for each document, and "plug" them into Lucene?
Yes, the default scoring algorithm incorporates tf-idf, and is fully documented in the TFIDFSiilarity documentation.
There are a number of ways to customize the scoring of documents.
The simplest and most common is to incorporate a boost, either on a field at index time, or on a query term when querying.
Many query types modify the scoring used for that query. Examples include ConstantScoreQuery and DisjunctionMaxQuery.
The Similarity you use defines the scoring algorithm. You could select a different one (ex. BM25Similarity).
You can implement your own Similarity, Usually by extending a higher-level implementation such as DefaultSimilarity, TFIDFSimilarity, or SimilarityBase
Just go through this example. It may help help you to know how you can bring custom changes in indexing process
http://lucene.apache.org/core/4_3_1/demo/src-html/org/apache/lucene/demo/IndexFiles.html

Resources