Difference between Matched Rewards and Observed Rewards in Azure Personalizer? - azure

Question in title.
I've been searching around and can't seem to find much of an explanation about the two.
Say you have a model that uses 20% of the rank calls for exploration. I suspect matched rewards are how many times out of the 80% it was rewarded.
Can anyone confirm this?
By recording locally, I can confirm that learned events and observed rewards match up but struggling to explain why matched events are so low. Graph for example:

Related

If I interrupt sklearn grid_search.fit() before completion can I access the current .best_score_, .best_params_?

If I interrupt grid_search.fit() before completion will I loose everything it's done so far?
I got a little carried away with my grid search and provided an obscenely large search space. I can see scores that I'm happy with already but my stdout doesn't display which params led to those scores..
I've searched the docs: http://scikit-learn.org/stable/modules/generated/sklearn.grid_search.GridSearchCV.html
And there is a discussion from a couple years ago about adding a feature for parrallel search here: https://sourceforge.net/p/scikit-learn/mailman/message/31036457/
But nothing definitive. My search has been working for ~48hrs, so I don't want to loose what's been discovered, but I also don't want to continue.
Thanks!
welcome to SO!
To my understanding there isn't any intermediate variables that get returned off the grid_search function, only the resulting grid and their scores (see here for more information grid search.py).
So if you cancel it you might lose the work that's been done so far.
But a bit of advice, 48 hours is a long time (obviously this depends on the rows, columns and number of hyper parameters being tuned). You might want to start with a more broad grid search first and then refine your parameter search off that.
That will benefit you two ways:
Run time might end up being much shorter (see caveats above) meaning you don't have to wait so long and risk losing results
You might find that your model prediction score is only impacted by one or two hyper parameters, letting you keep the other searches more broad and focussing your efforts on the parameters that influence your prediction accuracy most.
Hopefully by the time I've written this response your grid search has completed!!

Effect of randomness on search results

I am currently working on a search ranking algorithm which will be applied to elastic search queries (domain: e-commerce). It assigns scores on several entities returned and finally sorts them based on the score assigned.
My question is: Has anyone ever tried to introduce a certain level of randomness to any search algorithm and has experienced a positive effect of it. I am thinking that it might be useful to reduce bias and promote the lower ranking items to give them a chance to be seen easier and get popular if they deserve it. I know that some machine learning algorithms are introducing some randomization to reduce the bias so I thought it might be applied to search as well.
Closest I can get here is this but not exactly what I am hoping to get answers for:
Randomness in Artificial Intelligence & Machine Learning
I don't see this mentioned in your post... Elasticsearch offers a random scoring feature: https://www.elastic.co/guide/en/elasticsearch/guide/master/random-scoring.html
As the owner of the website, you want to give your advertisers as much exposure as possible. With the current query, results with the same _score would be returned in the same order every time. It would be good to introduce some randomness here, to ensure that all documents in a single score level get a similar amount of exposure.
We want every user to see a different random order, but we want the same user to see the same order when clicking on page 2, 3, and so forth. This is what is meant by consistently random.
The random_score function, which outputs a number between 0 and 1, will produce consistently random results when it is provided with the same seed value, such as a user’s session ID
Your intuition is right - randomization can help surface results that get a lower than deserved score due to uncertainty in the estimation. Empirically, Google search ads seemed to have sometimes been randomized, and e.g. this paper is hinting at it (see Section 6).
This problem describes an instance of a class of problems called Explore/Exploit algorithms, or Multi-Armed Bandit problems; see e.g. http://en.wikipedia.org/wiki/Multi-armed_bandit. There is a large body of mathematical theory and algorithmic approaches. A general idea is to not always order by expected, "best" utility, but by an optimistic estimate that takes the degree of uncertainty into account. A readable, motivating blog post can be found here.

What's the best way to tune my Foursquare API search queries?

I'm getting some erratic results from Foursquare's venue search API and I'm wondering if anyone has any tips on how to process my input parameters for the most "intuitive" results.
For example, suppose I am searching for a venue called "Ise Sushi", around "New York, NY", which is equivalent to (lat: 40.7143528, lon: -74.00597309999999) using Google Maps API. Plugging into the Foursquare Venue API, we get:
https://api.foursquare.com/v2/venues/search?query=ise%20sushi&ll=40.7143528%2C-74.00597309999999
This yields pretty underwhelming results: the venue I'm looking for ends up rather far down the list, at 11th place. What's interesting is that reducing the precision of the coordinates appears to produce much better results. For example, suppose we were to round the coordinates to 3 significant digits:
https://api.foursquare.com/v2/venues/search?query=ise%20sushi&ll=40.7%2C-74.0
This time, the venue I'm looking for ends up in 2nd place, even though it is actually farther from the center of the search (1072 meters, vs. 833 meters using the first query).
Another modification that appears to help improve the quality of search is substituting underscores for spaces to separate our search terms. For example, here's the original query with underscores:
https://api.foursquare.com/v2/venues/search?query=ise_sushi&ll=40.7143528%2C-74.00597309999999
This produces the most intuitive-seeming results: the venue I'm looking for appears first, and is accompanied by just one other result, "Ise Restaurant" (which is tagged as a "sushi restaurant"). For what it's worth, this actually seems to be the result set of the same search conducted on Foursquare's own website.
I'm curious what lessons I should be learning from this. Should I be reducing the precision of my coordinates? Should I be connecting my search terms with underscores, and if so, does that limit how a user can order their search terms?
Although there are ranking improvements we can make on our end to find this distant exact match, it generally also helps to specify intent=browse (although it looks like in this case, for now, it may give you worse results). By default, /venues/search uses intent=checkin, which tries really hard to find close-by matches for checking in to, at the expense of other ways a venue might match your search. Learn more at https://developer.foursquare.com/docs/venues/search

Google Web Optimizer (A/B Testing) Why no clear winner?

I've previously asked how long it takes for a winning combination to appear on Google's Web Optimizer, but now I have another weird problem during an A/B test:
For the past two days has Google announced that there was a "High Confidence Winner" that had a 98.5% chance of beating the original variation by 27.4%. Great!
I decided to leave it running to make absolutely sure, but something weird happened: Today Google is saying that they "haven't collected enough data yet to show any significant results" (as shown below). Sure, the figures have changed slightly, but they're still very high: 96.6% chance of beating the original by 22%.
So, why is Google not so sure now?
How could it have gone from having a statistically significant "High Confidence" winner, to not having enough data to calculate one? Are my numbers too tiny for Google to be absolutely sure or something?
Thanks for any insights!
How could it have gone from having a
statistically significant "High
Confidence" winner, to not having
enough data to calculate one?
With all statistics tests there is what's called a p-value, which is the probablity of obtaining the observed result by random chance, assuming that there is no difference between what's being tested. So when you run a test, you want a small p-value so that you can be confident with your results.
So with GWO must have a p-value between 1.5% and 3.4% (I'm guessing it's 2.5%, atleast in this case, it might be that it depends on the number of combinations)
So when (100% - chance to beat %) > p-value, then GWO will say that it has not collected enough information, and if a combination has a (100% - chance to beat %) < p-value then a winner is found. Obviously if that line is just crossed, then it could easily go back with a little more data.
To summerize, you shouldn't be checking the results frequently, you should setup a test, then ignore it for a long while then check the results.
Are my numbers too tiny for Google to
be absolutely sure or something?
No

Google Web Optimizer -- How long until winning combination?

I've had an A/B Test running in Google Web Optimizer for six weeks now, and there's still no end in sight. Google is still saying: "We have not gathered enough data yet to show any significant results. When we collect more data we should be able to show you a winning combination."
Is there any way of telling how close Google is to making up its mind? (Does anyone know what algorithm does it use to decide if there's been any "high confidence winners"?)
According to the Google help documentation:
Sometimes we simply need more data to
be able to reach a level of high
confidence. A tested combination
typically needs around 200 conversions
for us to judge its performance with
certainty.
But all of our conversions have over 200 conversations at the moment:
230 / 4061 (Original)
223 / 3937 (Variation 1)
205 / 3984 (Variation 2)
205 / 4007 (Variation 3)
How much longer is it going to have to run??
Thanks for any help.
Is there any way of telling how close Google is to making up its mind?
You can use the GWO calculator to help determine how long a test will take based on a number of assumptions that you provide. Keep in mind though that it is possible that there is not significant difference between your test combination, in which case a test to see which is best would take an infinite amount of time, because it is not possible to find a winner.
(Does anyone know what algorithm does it use to decide if there's been any "high confidence winners"?)
That is a mystery, but with most, if not all, statistical tests, there is what's called a p-value which is the probability of obtaining a result as extreme as the one observed by chance alone. GWO tests run until the p-value passes some threshold, probably 5%. To be more clear, GWO tests run until a combination is significantly better than the original combination, such that the result only has a 5% chance of occurring by chance alone.
For your test there appears to be no significant winner, it's a tie.

Resources