What is the best way to measure search quality on site? - search

I have site which is rely on search and I want to select single metric to use for search ranking experiments. What is the state of art metric for real live site.
"Real life" means I can't create golden standard and ask judges to rank results per query.
I'm thinking about "clicks per search" as a main measure.

Related

Azure search solr index definition for supporting multiple markets

I am building a product catalog for an e-comm website. I am having a requirement to build a azure search/solr/elastic search based index. The problem is saving the market specific attributes. The website is supporting 109 markets and there is each market specific data like ratings, price, views, wish-listed, etc. that I need to save in the index eg: Product1 will have 109 ratings(rating is different in each market)/109 prices(price might be different in each market) corresponding to 109 markets. Also I will have to use this attributes to add a boosting function so that when people are searching for this, products with higher view/ratings surfaces up. How do I design the index definition to support this? Can I achieve this by 1 index doc per product or do I have to create 1 index doc per market? Some pointers will be very helpful. I have spent couple of days on this and could not reach to a conclusion that is optimized for this use case. Thank you!
My proposed index definition:
-id
-mktUSA
--mktId
--rating
--views
--price
...
-mktCanada
--mktId
--rating
--views
--price
...
-locales
--En
--Fr
--Zh
...
...other properties
The problem with this approach is configuring a magnitude scoring functions inside scoring profile, to boost products based on the market
Say eg: If user is from Canada, only the Canada based rating/views should be considered and not the other market ratings while Cognitive search is calculating the search relevance score.
Is there any possible work around this? Elastic search has a neat solution of Function score query that can be used to configure the scoring function dynamically
From what I understand, your problem is that you want to have a single index with products that support 109 different markets. Many different properties for your Product model can then be market-specific. Your concern is that the model gets to big, or if it's a scalable design. It is. You can have 1000+ properties without a problem.
I have built a similar search solution for e-commerce for multiple markets.
For price, I specify one price per market. I have about 80 or so markets, so that's 80 prices. There is no way around it. I would probably do the same for ratings and views too. One per market.
In our application we use separate dimensions for market, language and country. A market can be Scandinavia, BeNeLux or Asia-Pacific. You need to clearly define what a market is in your case, and agree with the business which markets you have and how you handle changes. Countries can map directly to markets, but it may also differ. Finally, language is usually shared across markets/countries and you usually only have to support 20-25 languages.
Suggested data model
Id
TitleEnGb
TitleDeDe
TitleFrFr
...
PriceGb
PriceUs
PriceNo
PriceDe
...
RatingsGb
RatingsUs
RatingsNo
RatingsDe
...
DescriptionEnGb
DescriptionDeDe
DescriptionFrFr
...
I try to illustrate that the Title and Description are language-specific. The price and ratings are market-specific.
For the 20-25 language-specific properties, you have to think about what analyzers to use. You want to use language-specific analyzers, and preferably the Microsoft analyzers since they have much better linguistics support with full lemmatization and so on.
When you develop your frontend application you have to keep track of which market, country and language you then refer to the specific properties. This is the easiest way to support boosting and so on.
Per-market index is not recommended
You could create one index per market. I have gone down this route before. I would not recommend this. This means you have to update 109 indexes every time you add, change or delete an item. And Azure Search supports 50 indexes per service at the most anyways.

How to make a score from multiples KPIs

I am wondering if is it possible to create a global score using multiple KPIs with different scales.
Example:
I would like to join all this KPIs in one score that could tell me what version is better. Is it possible? (I consider the 3 with the same weight in the score)
There is quite some theory on (credit) rating methods which provides a profound mathematical base for what you are after. You might start reading about score cards in general. An abundant way of combining different scores uses Logit.
The short answer to your question: There is no single best way to combine three KPIs, you have to try different formulas, and decide on one of the formulas based on some statistics tests in a validation step.
Further reading
Using a Balanced Scorecard to Measure Your Key Performance Indicators - a brief primer on the topic
Chapter on Logit from the book Stefan Trueck, Svetlozar T. Rachev: Rating Based Modeling of Credit Risk: Theory and Application of Migration
Guidelines on Credit Risk Management - OeNB as PDF

Measurements to evaluate a web search engine

I'm currently developing a small web search engine but I'm not sure how am I going evaluate it. I understand that a search engine can be evaluated by its precision and recall. In a more "localized" information retrieval system, e.g., an e-library, I can calculate both of them because I can know which stuffs are relevant to my query. But in a web-based information retrieval system, e.g., Google, it would be impossible to calculate the recall because I do not know how many web pages are relevant. This should means that F-measure and other measurements that require the number of relevant pages cannot be done.
Is everything I wrote correct? Is web search engine evaluation limited to precision only? Are there any other measurements I could use to evaluate a web search engine (other than P#k)?
You're correct that precision and recall, along with the F score / F measure are commonly-used metrics for evaluating (unranked) retrieval sets in search engine performance.
And you're also correct about the difficult or impossible nature of determining recall and precision scores for huge corpus of data such as all the web pages on the entire internet. For all search engines, small or large, I would argue that it's important to consider the role of human interaction in information retrieval: are the users using the search engine more interested in having a (ranked) list of relevant results that answers their information need or would one "top" relevant result be enough to satisfy the user's information needs? Check out the concept of "satisficing" as it pertains to information seeking for more information about how users assess when their information needs are met.
Whether you use precision, recall, mean-average precision, mean reciprocal rank, or any other of the numerous relevance and retrieval metrics it really depends on what you're trying to assess with regard to the quality of your search engine's results. I'd first try to figure out what sort of 'information needs' the users of my small search engine might have: will they be looking for a selection of relevant documents or would it be more helpful for their query needs if they had one 'best' document to satisfy their information needs? If you can better understand how your users will be using your small search engine you can then use that information to help inform which relevance model(s) will give your users results that they deem to be most useful for their information-seeking needs.
You might be interested in the free, online version of the Manning and Schütze "Introduction to Information Retrieval" text available from Stanford's NLP department which covers relevance and retrieval models, scoring and much more.
Google's Search Quality Evaluator training guide, which lists a few hundred dimensions on how Google's search results are ranked/scored, might be of interest to you too as you try to suss out your user's information-seeking goals. It's pretty neat to see all of the various factors that go into determining a web page's PageRank (Google's page ranking algorithm) score!

Solr: how to manage irrelevant results when not sorting by relevance?

Case in point: say we have a search query that returns 2000 results ranging from very relevant to hardly relevant at all. When this is sorted by relevance this is fine, as the most relevant results are listed on the first page.
However, when sorting by another field (e.g. user rating) the results on the first page are full of hardly-relevant results, which is a problem for our client. Somehow we need to only show the 'relevant' results with highest ratings.
I can only think of a few solutions, all of which have problems:
1 - Filter out listings on Solr side if relevancy score is under a threshold. I'm not sure how to do this, and from what I've read this isn't a good idea anyway. e.g. If a result returns only 10 listings I would want to display them all instead of filter any out. It seems impossible to determine a threshold that would work across the board. If anyone can show me otherwise please show me how!
2 - Filter out listings on the application side based on score. This I can do without a problem, except that now I can't implement pagination, because I have no way to determine the total number of filtered results without returning the whole set, which would affect performance/bandwidth etc... Also has same problems of the first point.
3 - Create a sort of 'combined' sort that aggregates a score between relevancy and user rating, which the results will then be sorted on. Firstly I'm not sure if this is even possible, and secondly it would be weird for the user if the results aren't actually listed in order of rating.
How has this been solved before? I'm open to any ideas!
Thanks
If they're not relevant, they should be excluded from the result set. Since you want to order by a dedicated field (i.e. user rating), you'll have to tweak how you decide which documents to include in the result at all.
In any case you'll have to define "what is relevant enough", since scores aren't really comparable between queries and doesn't say anything about "this was xyz relevant!".
You'll have to decide why those documents that are included aren't relevant and exclude them based on that criteria, and then either use the review score as a way to boost them further up (if you want the search to appear organic / by relevance). Otherwise you can just exclude them and sort by user score. But remember that user score, as an experience for the user, is usually a harder problem to make relevant than just order by the average of the votes.
Usually the client can choose different ordering options, by relevance or ratings for example. But you are right that ordering by rating is probably not useful enough. What you could do is take into account the rating in the relevance scoring. For example, by multiplying an "organic" score with a rating transformed as a small boost. In Solr you could do this with Function Queries. It is not hard science, and some magic is involved. Much is common sense. And it requires some very good evaluation and testing to see what works best.
Alternatively, if you do not want to treat it as a retrieval problem, you can apply faceting and let users do filtering of the results by rating. Let users help themselves. But I can imagine this does not work in all domains.
Engineers can define what relevancy is. Content similarity scoring is not only what constitutes relevancy. Many Information Retrieval researchers and engineers agree that contextual information should be used besides only the content similarity. This opens a plethora of possibilities to define a retrieval model. For example, what has become popular are Learning to Rank (LTR) approaches where different features are learnt from search logs to deliver more relevant documents to users given their user profiles and prior search behavior. Solr offers this as module.

How to perform website benchmarking?

I am trying to do competitive analysis of online trends prevailing in real estate domain at state level in a country. I have to create a report which is not biased towards any particular company but it compares or just shows how the companies are performing for a list of trends. I will use parameters of Clickstream analysis to show the statistics of how the websites of the company perform. The trend specific performance can be depicted by Sentiment Analysis in my opinion. If there is some other way to do it in an effective manner I am looking forward to any such approach.
Now, I am not able to find any trends that come in common.
How can I find general trends which will be common for all real estate comapnies ?
I tried using Google Trends. They provide graphical and demographic information regarding a particular search term and lists related terms to the search which I am clueless how to use. And as I drill down from country to state, the amount data is very less.
Once I have the trends then I've to find how people are reacting to those trends. Sentiment Analysis is the thing which will provide me this info.
But even if I get the trends how will I get trend specific data from which I can calculate its polarity ?
Twitter and other social media sites can provide some data on which sentiment analysis can be performed. I used this site which gives the positive, negative and neutral behaviour related to some term on twitter. I need something analogous to this but the dataset on which this analysis can be performed should not be limited to social media only.
Are there any other entities I can add in this competitive analysis
report ?
The report will be generated on monthly basis. And I want maximum amount of automation in above tasks. I am thinking of using web-scraping also to scrape data of similar format. I would also like to know what data I should scrape and what data I should manually extract.

Resources