I have a problem with location queries returning erroneous results in ElasticSearch.
In our system, a business search engine, every search takes two inputs: a location, and a query-string, e.g.
q=sushi
location=Greenwich Village, New York, New York
I want the search to show me sushi in Greenwich Village first, then sushi outside of Greenwich Village, but to never show me non-sushi results.
The problem is, because of the location query, anything in Greenwich Village gets matched -- lawyers, doctors, whatever. I'd like say the following to ElasticSearch:
If q matches, then location doesn't have to (it's OK to return sushi outside of Greenwich Village), but if location matches, don't return it unless q matches also (not OK to return non-sushi businesses in Greenwich Village).
Anyone have any thoughts on how to do this?
It sounds like you want to search for "sushi" (you don't want non-sushi results) but sort your results by location (you want Greenwich Village results first).
If you are storing locations as geo points, you can simply use distance to sort your results.
If location is just a field, and you can only know if the business is inside or outside of a location, you can use Custom Filters Score query to boost relevancy of the results in the desired location. The query part should contain the search for "sushi" and the filters part should contain the search for location.
I incorporated the information on this post and here to to come up with the following solution.
Index every 'place' (neighborhood, city, etc) with a center-point, and also index the coordinates of every business.
Index the place ids attached to the businesses that contain them.
Use a sub-search to convert the text entered into the location bar to a place record.
Use a CustomScoreQuery to modify every result's score by the following formula, which was worked out by trial and error:
new_score = old_score / (1 + distance_between_place_centerpoint_and_result)^3
Also query the place id that results from 3 against the place_ids field as a 'should' boolean query. This gives a flat boost to everything that actually falls within the confines of the specified place.
A side effect of this strategy is that businesses near the center point of the place are considered more relevant -- it is arguable, in my opinion, whether this is correct or not. But other than that it has worked quite well.
Thanks to imitov for his insight that helped me come up with this solution.
Related
Here are 2 search examples,
you can see the first search for "yael0079"
return an object where the email filed is yael0079#gmail.com as top score.
The second search for "yael0079#gmail.com"
return the object from before somewhere far below.
Now, I know the '#' tag consider as space, but still, I would expect the same object will get the higher score.
In the second case, since the # sign is considered punctuation, your query becomes yael0079 OR gmail.com. The term gmail.com matches also in other fields of the documents returned what adds to the overall relevance score. To learn more about query processing and scoring in Azure Search, please read: How full text search works in Azure Search.
I am indexing the title field for few products in Solr.
But when I am searching, I am not getting those titles in response.
For eg. I am storing following as title : Baboons Typing Tshirt
But when I am searching following I am not getting any result !!!
1)title:Baboons
2)title:(Baboons Typing Tshirt)
3)title:(Baboons*)
On the otherhand, if I am searching like this, I am getting lot of results
1)title:(Tshirt)
I have indexed many titles containing word Tshirt but I want to search a specific title which is failing..!!
I dont know whether Solr is ignoring first words, or it is doing something random.
My Question is basically: If I have a search title with lots of words, I will like to match it with the title which contains maximum common terms.
How to do it?
Thanks
Solr works like that by itself. You don't have to change anything.
You have to be careful how you set up your fields in schema.xml, i.e. how analysis is done.
You can use Solr's admin > Analysis interface to see how exactly your title field (when indexing) and query (when searching) is processed (tokenized, transformed).
Remember, match, in order to occur, requires identical word (case and everything) on both sides (index & query).
To open your index and see how Solr has actually indexed your data, use Luke.
Searching Magento with fulltext search engine and like method , it will store results in catalogsearch_fulltext table in "data_index" field where it stores value in the format like
each searchable attribute is separated with |.
e.g
3003|Enabled|None||Product name|1.99|yellow|0
here it store sku,status,tax class, product name , price ,color etc etc
It stores all searchable attribute value.
Now the issue is for Configurable product , it will also store the associated products name ,price ,status in the same field like
3003|Enabled|Enabled|Enabled|Enabled|None|None|None|None|Product name|Product name|associted Product name1|associted Product name2|associted Product name3|1.99|2.00|2.99|3.99|yellow|black|yellow|green|0|0|0|0
So what happen is if i search for any word from associated product, it will also list the main configurable product as it has the word in its "data_index" field.
Need some suggestion how can i avoid associated products being included in data_index, So that i can have perfect search result.
thanks
We are looking into our search as well and it has been surprising to see the inefficiencies included in the fulltext table. We have some configurable products as well that have MANY variations and their population in the fulltext search is downright horrendous.
As for solutions, I can only offer my approach to fix the problem (not completed: but rather in the process).
I am extending Magento to include an event listener to the process of indexing the products (Because catalog search indexing is when the fulltext database is populated). Once that process occurs, I am writing my own module to remove duplicate entries from the associated products and also to add the functionality of adding additional search keyword terms as populated from a CSV file.
This should effectively increase search speed dramatically and also return more relevent search results. Because as of now, configurable products are getting "search bias" in the search results.
This isn't so much of an answer as a comment, but it was too lengthy to fit in the comments but I thought this might be beneficial to you. Once I get my module working, if you would like, I can possibly give you directions on how you could implement a similar module yourself.
Hope that helped (if only for moral support in magento's search struggle)
Some popular words, like "food," are used all over the world as loan words.
I am trying to use flickr.photos.search to get photos from one specific language or region.
I didn't find a setting for this in http://www.flickr.com/services/api/flickr.photos.search.html
I tried these two ways, but neither worked:
http://api.flickr.com/services/rest/?method=flickr.photos.search&api_key=XXXXXXXXXX&tags=food&format=json&location=japan
//lang=jp
I searched in Google and only found that YQL can search by location. (I will use YQL in another way, too many calls will over the api will be limited.)
I also found that in flickr.photos.search one can set a lat, lon, and radius, but the range is a circle, so this will not limit a search to a specific country.
None of these are good choices for me. Can anyone help?
There are actually a few interesting ways to do this.
The way I would do it is to first find the place you are looking for by using the place API:
flickr.places.find: http://www.flickr.com/services/api/flickr.places.find.html This will return a list of WOE (Where on Earth) ids for a given query. Your query can be anything from a street address to a country.
Once you have the WOE id, you could then submit a flickr.photos.search query including the optional place_id or WOE id.
Another fun way to do this would be to call the flickr.places.tagsForPlace method once you have a WOE id, and then search for your photos by these tags. This might produce more interesting results and also weed out the users who didnt specify a place, but did specify tags.
I am trying to figure out if how I can accomplish the following and none of the answers I have found so far seem to fit:
I have a fairly static and large set of resources I need to have indexed and searchable. Solr seems to be a perfect fit for that. In addition I need to have the ability for my users to add resources from the main data set to a 'Favourites' folder (which can include a few more tags added by them). The Favourites needs to be searchable in the same manner as the main data set, across all the same fields plus the additional ones.
My first thought was to have two separate schemas
- the first for the main data set and its metadata
- the second for the Favourites folder with all of the metadata from the main set copied over and then adding the additional fields.
Then I thought that would probably waste quite a bit of space (the number of users is much larger than the number of main resources).
So then I thought I could have the main data set with its metadata (Core0), same as above with the resourceId as the unique identifier. Then there would be second one (Core1) for the Favourites folder with the unique id of the resourceId, userId, grade, folder all concantenated. The resourceId would be a separate field also. In addition, I would create another schema/core (Core3) with all the fields from the other two and have a request handler defined on it that searches across the other 2 cores and returns the results through this core.
This third core would have searches run against it where the results would expect to only be returned for a single user. For example, a user searches their Favourites folder for all the items with Foo. The result is only those items the user has added to their Favourites with Foo somewhere in their main data set metadata. I guess the result handler from Core3 would break the search up into a search for all documents with Foo in Core0, a search across Core1 for userId and folder and then match up the resourceIds from both of them and eliminate those not in both. Or run a search on Core1 with the userId and folder and then having gotten that result set back, extract all the resourceIds and append an AND onto the search query to Core0 like: AND (resourceId = 1232232312 OR resourceId = 838388383 OR resourceId = 8637626491).
Could this be made to work? Or is there some simpler mechanism is Solr to resolve the merging of 2 searches across 2 cores and only return the results that match on (not necessarily a unique) field in both?
Thanks.
Problem looks like a data base join of 2 tables with resource id as the foreign key.
Ignore the post if what i understood is wrong.
First i will probably do it with a single core, with a field userid (indexed, but not stored), reindex a document every time a new user favorites it by appending his user id (delimited by something that analyzer ignores).
So searching gets easier (userId:"kaka's id" will fetch all my favorites)
I think it takes some work to do this and also if number of users who can like a document increases, userid field gets really long.
So in that case,i will move on to my next idea which is similar to yours,have a second core with (userid,resource id).Write a wrapper which first searches this core for all the favorites, then searches another core for all the resources in a where condition, but again..if a user favorites more resources, the query might exceed GET method's size limit..
If both doesn't seem to work, its time to think something more scalable, which leaves us the same space wasting option.
Am i missing something??