Searching with Solr, Sphinx or Lucene - ranking search results by clicks - search

I want to implement a search that will rank results higher, if previous, similar searches have led users to click on a result.
Is that possible with either Solr (Lucene) or Sphinx?

I think, tracking of user clicks is necessary. (if the higher ranking depends on user clicks)
For "ranking higher" the results, maybe the solr-elevator function could be helpful for your needs: http://wiki.apache.org/solr/QueryElevationComponent
Probably the elevation-function is more helpful than the lucene boost function (in your case). http://lucene.apache.org/java/2_4_0/queryparsersyntax.html#Boosting%20a%20Term
Finlay it depends of the kind of implementation, i think.

It's certainly possible with Solr (Lucene), but not really feasible. What you'd have to do is:
Track the clicks of users
Normalize the search query to group similar queries and store them
Reindex that into Solr
If you ask me, that sounds like a lot of work with a lot of pitfalls.

With Sphinx you could use additional attribute clicks_count and use such query to rank clicked documents higher
SELECT *, clicks_count*1000 AS cc
FROM your_index
WHERE MATCH ("words to match") ORDER BY cc DESC;
to get only clicks wight into account
or
SELECT *, weght() + clicks_count*10000 AS cc
FROM your_index
WHERE MATCH ("words to match") ORDER BY cc DESC;
to get match weight with clicks weight into account
Of course you have update your counter 'clicks_count'.

Related

Sphinx "reverse" search

We have a website where users put up ads for stuff they want to sell, with parameters such as price, location, title and description. These can then be searched for using sphinx and allowing users to specify min- and maxprice, a location with a searchradius (using google maps) etc. Users can choose to save these searches and get emails when new ads appear that fit their search. Herein lies the problem: We want to perform a reverse search every time an ad is posted. With the price, location, title and description as parameters we want to search through all the saved "searches" and get the ones that would have found the ad. The min- and maxprice should just be performed in a query i suppose, and some Quorom syntax to get all ads with at least 2 or mby just 1 occurance in the title/description. Our problem lies mostly in the geo-search. How do we find all searches where the "search-circles" would include our newly posted location without performing a search for every saved search?
That is the main-question, any comment on our suggested solution to the other problems is also very welcome. Thank you in advance / Jenny
The standard 'geo-search' support on sphinx should work just as well on a Prospective Index, as a normal retrospective search.
Having built a sphinx 'index' of all the saved searches...
And you run a query using the 'ad' as the search query:- rather than the 'filter' using a fixed radius, you just use the radius from the attribute (ie the radius stored on the particular query) - if using the API cant use setFilterRange directly, need to use setSelect, to make a new virtual attribute.
$cl->setSelect("*,IF(#geodist<radius,1,0) as myfilter");
$cl->setFilter('myfilter',array(1));
(and yes, the min/maxprice can just be done with normal filters too - just inverting the logic to that you would use in a retrospective search)
... the complication is in the 'full-text' query, if the saved search is anything more than a single keyword, but you appear to have already figured out that part.

magento search issue

I have one issue which I am in desperate need for your help. I am on Magento ver. 1.6.1.0, whenever I am searching with a sentence like "baby's cute shoes" in magento then the results are not accurate but when I search only a word like "cute" or "shoes" then it gives me the result.
I have a feeling that magento is not able to search a sentence but it is able to search products with words. Is there anything I can do to better optimize the search in magento?
The options for search can be found in the backend under System > Catalog > Catalog search, you probably have search type set to LIKE. You will potentially get better results using FULLTEXT mode.
Magento does not search the entered string as a full sentence. Instead it splits (tokenizes) your search string into words and will search for products containing ANY of these words (implementing "OR" logic). So if you are searching for "red shoes", it will find everything containing words "red" OR containing words "shoes". Obviously it is not very useful in most cases as it will produce a lot of totally irrelevant results.
You can check this free extension to refine your search: Catalog Search Refinement FREE. This extension modifies the search behavior to only find the products that have ALL keywords ("AND" logic in other words). This will find only products that have both "red" and "shoes" keywords. There is also Advanced Search version of that extension that also looks up for similar words based on phonetic distance among other things as well as weighted search attributes, allowing to bubble up the most relevant products.
I got my issue resolved by this link - https://stackoverflow.com/questions/1953715/magento-search-not-returning-expected-results
I went to this line in app/code/core/Mage/CatalogSearch/Model/Resource/Fulltext.php
and did this (below)
copy app/code/core/Mage/CatalogSearch/Model/Mysql4/Fulltext.php to app/code/local/Mage/CatalogSearch/Model/Mysql4/Fulltext.php
line 341 - 343 app/code/local/Mage/CatalogSearch/Model/Mysql4/Fulltext.php
if ($like) {
$likeCond = '(' . join(' OR ', $like) . ')';
}
change into
if ($like) {
$likeCond = '(' . join(' AND ', $like) . ')';
}
Also make sure to change the order in which the results are shown. Default Magento is to serve it backwards.
Add the following to /app/design/frontend/default/default/layout/catalogsearch.xml
<reference name="search_result_list">
<action method="setDefaultDirection"><string>asc</string></action>
<action method="setDefaultOrder"><string>relevance</string></action>
</reference>
Between the following:
<catalogsearch_result_index translate="label">
...
</catalogsearch_result_index>
Stock Magento search needs a few tweaks to get it functional. The Like search was changed from AND logic to OR logic in 1.5/1.6 and gives better results when reverted back to AND logic. This has been solved in several threads over in the Magento forums on Magento's website. Another fix is to to chop the s off of plurals which is also addressed over there.
The reason for cutting trailin "s" is that most people don't search for "an oil pressure gauge", but "oil pressure gauges" which gives total misses when you're selling a "0-100 psi Oil Pressure Gauge". Also alias all items ending in "ies" to their singular. Rarely do you sell an "rc aircraft batteries", it will be specific like "1200aH aircraft battery" and so your less savvy customer's searches never match.
"Baby's cute shoes" will never register a hit unless it shows up in the items you use to populate the Fulltext search index. Who sells an item called "baby's cute shoes" anyway? I usually synonym these types of searches to hit a specific category where the items are listed. Some customer searches are just too subjective to match the objective nature of product search (actual items vs. nebulous idea).

How do I modify the Magento Search to check child skus?

Currently, the site search will search all of the skus of the items marked as being visible in search. This is all well and good.
The problem arises when the customer knows a sku of the individual child item. So, let's say a product comes in both a 20 foot and 25 foot variation. We would put those into a configurable product and have a single product page where a customer could then choose which of those two lengths.
What happens is, a customer invariably knows that the sku of the 20 ft variation is RDB-20, while the other is RDB-25. A search for RDB-25 then, comes back with no results since the simple product is not visible in search - it doesn't realize there is a match.
How do I get the search to search an item with visibility "Not Visible Individually", when it's parent is visible in search?
The desired effect is that, if a child SKU is searched for, the parent should show up in the results.
There really is no good way of doing it without extending the default search, but at that point you might as well look for other options.
Here's a workaround that might be doable depending on how you manage your products and it worked for me until I moved on from the default search.
Rather than altering the search, try adding an attribute to all products and make it hidden concatenating all the skus into this field. The search should find the text attribute and show the configurable.
Its a bit of a workaround but works for me.
This is untested, but I did a bit of perusing in our attributes and I think I found something that might help.
Currently since our child products don't show up in our search, we have the parent populate with the children product's attributes.
However, things like brand, taxable amount, description, populate for every child product while our SKU does not.
The only difference I can see between the two attributes is under manage attributes -> click on attribute -> and then under properties go to frontend properties and select
Use In Search Results Layered Navigation: YES
Used in Product Listing: YES
Use In Layered Navigation: Filterable (with results)
I'm not sure which of these do what, but in the population of the fulltext search data table, somewhere it is being told to populate for the children and I believe that the admin panel is where.
I hope this helps!

Solr Facetting - Showing First 10 results and Other

I am implementing a solution in Solr where I have a lot of values in my facet.
As opposed to displaying a long list of values(facets) down the side of my page I want to display the top 10. And also have one for other.
For instance I would be faceting on Nationality.
So, I do not want to have a list of every nationality, Nor do I want a "see all" button.
What I require is the top 10 nationalitys and then "Other".
When a user clicks on other, it facets on this?
This is quite easy in Solr.. All you need to do is add a
&facet.limit=10
e.g.
http://solrserver:8080/solr/select&version=2.2&q=solr&start=0&rows=0&indent=on&facet=on&facet.field= nationality&facet.limit=10
to your request and you should be able to limit the results.
For more information you can check out my blog post on faceting in solr:
http://www.craftyfella.com/2010/01/faceting-and-multifaceting-syntax-in.html
or the solr wiki here:
http://wiki.apache.org/solr/SimpleFacetParameters#facet.limit

Designing a one EVERYTHING search box (date+address+keywords)

I'm storing information about local "events". They are described by 3 things - address, date, keywords(tags). I want to have only one search box for at least address and keywords. The date might go to a separate field. I'm assuming that most people will search for events that are taking place "today" so this filter won't get that much traffic.
I need those addresses to be correct (because I'm geocoding them afterwards) so I need to validate them before submitting the form and display a list of "did you mean" if a user made a typo there. I can't do life search here. I can do a live search on keywords. Keep in mind that a user can make a typo there too and I want to catch that.
Is there a clever way to design the input's parser in this case to guess which is supposed to be address and which keywords?
OR
Is there a way to actually parse it as user is entering his query? Maybe I should show autocomplete hints for keywords, after 3 first characters are entered, and if user denies to use them then to assume that it's a part of an address he's typing.
What do You think?
Take a look at Document Cloud's Visual search
http://documentcloud.github.com/visualsearch/#demo

Resources