Solr Merge results - search

I use Solr for product filtering on our website,
for example you can have a product filter where you can filter database of Televisions by size, price, company etc.,. I found Solr+FilterQuery to be very efficeint for such functionality. I have a separate core that has the product info of all TVs in our DB.
I have another Core for product reviews. The review can be on a specific product type or company. So someone can write a review on a Samsung TV or Samsung customer service. So when someone searches for a text (for example "Samsung TV review" or "Samsung customer service"), I search this core.
Now I want to merge the results from the above cores. So when someone searches for 'samsung 46 lcd contrast ratio review', I esentially want to filter the TVs by Company (Samsung), then by size (46") and then find reviews that have text "contrast ratio review". I have no clue how to do this. Basically I want to merge the results by document ID and add additional colums for result 2 into result 1.
I have seen suggestions to flatten out the data but I want to use reviews index on a lot of other filters. So I am not sure if thats a good idea. Moreover if new reviews start coming in I dont want to reindex all the cores of products (even delta reindexing will touch lot of products).
Any ideas on how to acheieve this?

If I got your question right what you are looking for is JOIN functionality.
http://www.slideshare.net/lucenerevolution/grouping-and-joining-in-lucenesolr
http://wiki.apache.org/solr/Join

Related

Azure search solr index definition for supporting multiple markets

I am building a product catalog for an e-comm website. I am having a requirement to build a azure search/solr/elastic search based index. The problem is saving the market specific attributes. The website is supporting 109 markets and there is each market specific data like ratings, price, views, wish-listed, etc. that I need to save in the index eg: Product1 will have 109 ratings(rating is different in each market)/109 prices(price might be different in each market) corresponding to 109 markets. Also I will have to use this attributes to add a boosting function so that when people are searching for this, products with higher view/ratings surfaces up. How do I design the index definition to support this? Can I achieve this by 1 index doc per product or do I have to create 1 index doc per market? Some pointers will be very helpful. I have spent couple of days on this and could not reach to a conclusion that is optimized for this use case. Thank you!
My proposed index definition:
-id
-mktUSA
--mktId
--rating
--views
--price
...
-mktCanada
--mktId
--rating
--views
--price
...
-locales
--En
--Fr
--Zh
...
...other properties
The problem with this approach is configuring a magnitude scoring functions inside scoring profile, to boost products based on the market
Say eg: If user is from Canada, only the Canada based rating/views should be considered and not the other market ratings while Cognitive search is calculating the search relevance score.
Is there any possible work around this? Elastic search has a neat solution of Function score query that can be used to configure the scoring function dynamically
From what I understand, your problem is that you want to have a single index with products that support 109 different markets. Many different properties for your Product model can then be market-specific. Your concern is that the model gets to big, or if it's a scalable design. It is. You can have 1000+ properties without a problem.
I have built a similar search solution for e-commerce for multiple markets.
For price, I specify one price per market. I have about 80 or so markets, so that's 80 prices. There is no way around it. I would probably do the same for ratings and views too. One per market.
In our application we use separate dimensions for market, language and country. A market can be Scandinavia, BeNeLux or Asia-Pacific. You need to clearly define what a market is in your case, and agree with the business which markets you have and how you handle changes. Countries can map directly to markets, but it may also differ. Finally, language is usually shared across markets/countries and you usually only have to support 20-25 languages.
Suggested data model
Id
TitleEnGb
TitleDeDe
TitleFrFr
...
PriceGb
PriceUs
PriceNo
PriceDe
...
RatingsGb
RatingsUs
RatingsNo
RatingsDe
...
DescriptionEnGb
DescriptionDeDe
DescriptionFrFr
...
I try to illustrate that the Title and Description are language-specific. The price and ratings are market-specific.
For the 20-25 language-specific properties, you have to think about what analyzers to use. You want to use language-specific analyzers, and preferably the Microsoft analyzers since they have much better linguistics support with full lemmatization and so on.
When you develop your frontend application you have to keep track of which market, country and language you then refer to the specific properties. This is the easiest way to support boosting and so on.
Per-market index is not recommended
You could create one index per market. I have gone down this route before. I would not recommend this. This means you have to update 109 indexes every time you add, change or delete an item. And Azure Search supports 50 indexes per service at the most anyways.

Spark Item Similarity Interpretation (Cross-Similarity and Similarity)

I've been using Spark Item Similarity through mahout by following the steps in this article:
https://mahout.apache.org/users/algorithms/intro-cooccurrence-spark.html
I was able to clean my data, setup a local-only spark/hadoop node and all that.
Now, my question relies more in the interpretation of the matrices. I've tried some Google queries with limited success.
I'm creating a multi-modal recommender - and one of my datasets is very similar to the Mahout example.
Example input:
Customer ActionName Product
11064612 view 241505
11086047 purchase 110915
11121878 view CERT_DL
11149030 purchase CERT_FS
11104130 view 111401
The output of mahout is 2 sets of matrices. A similarity matrix and a coocurrence matrix.
This is my similarity matrix (I assume mahout uses my "filter1" purchases)
**791207-WP** 791520-WP:11.350536461453885 791520:9.547158147208393 76130142:7.938639976084232 711215:7.0641921646893024 751309:6.805891904514283
So how would I interpret this? If someone purchased 791207-WP they could be interested in 791520-WP? (so I'd use the left part against purchases of a customer and rank products in the right part?).
The row for 791520-WP looks like this:
791520-WP 76151220:18.954662238247693 791604-WP:13.951210170984268
So, in theory, I'd recommend 76151220 to someone who bought 791520-WP, correct?
Part 2 of the question is interpreting the cross-similarity matrix. Remember my filter2 is "views".
How would I interpret this:
**790907** 76120956:14.2824428207241 791500-LXQ2:13.864741460885853 190907:10.735807818360627
I take this matrix as "someone who visited the 76120956 web page ended up purchasing 790907". So I should promote 790907 to customers who bought 76120956 and maybe even add a link between these 2 products on our site, for example.
Or is it "people who visited the webpage of 790907 ended up buying 76120956"?
My plan is not to use these as-is. I'll still use RowSimilarity and different sources to rank products - but I'm missing the basic interpretation of the outputs from mahout.
If you know of any documentation that clarifies this, that would be a great asset to have.
Thank you.
In both cases the matrix is telling you that the item-id key is similar to the listed items by the LLR value attached to each similar item. Similar in the sense that similar users purchased the items. In the second case it is saying that similar people viewed the items and this view also appears to have led of a purchase of the same item.
Cooccurrence works for purchases alone, cross-occurrence adds the check to make sure the view also correlated with a purchase. This allows you to use both for recommendations.
The output is meant to be used with a search engine generally and you would use a user's history of purchases and views as a 2 field query against the matrices, one in each field.
There are analogous methods to find item-based recommendations.
Better yet, use something like the Universal Recommender here: actionml.com/docs/ur with PredictionIO for an end-to-end system.

How can we sort "Reviews" by date returned from Product Advertising API?

I'm using Product Advertising API for fetch product reviews by ASIN. Its good. But can we sort product reviews by "Date".
Because I want to display latest review on the top. Is that possible?
Can anybody help me, I stuck here from last few days.
Thanks,
Surinder
This type of call will get top reviews, including some recent.
https://www.amazon.com/reviews/iframe?akid=[AWSAccessKeyId]&alinkCode=xm2&asin=[ASIN]&atag=[ASSOCIATE_TAG]&exp=[DATE]&summary=1&truncate=1000&v=2&sig=[SIGNATURE]
You can experiment with different flags for possible sorting, however Amazon's API document does not detail sort specific parameters:
http://docs.aws.amazon.com/AWSECommerceService/latest/DG/EX_RetrievingCustomerReviews.html

Azure ML Recommendations

I want to use Azure ML to find related products using information from receipts from a store.
I got a file of reciepts:
44366,136778
79619,88975
78861,78864
53395,78129,78786,79295,79353,79406,79408,79417,85829,136712
32340,33973
31897,32905
32476,32697,33202,33344,33879,34237,34422,48175,55486,55490,55498
17800
32476,32697,33202,33344,33879,34237,34422,48175,55490,55497,55498,55503
47098
136974
85832
Each row represent one receipt and each number is a product id.
Given a product id I want to get a list of similar products, i.e. products that was bought together by other customers.
Can anyone point me in the right direction of how do to this?
This seems a good fit for their frequently bought together service (https://datamarket.azure.com/dataset/amla/mba). You may have to preprocess the dataset to get it in the required format. This service has a web UI as well: https://marketbasket.cloudapp.net/
This is a typical problem for Recommender, you can use a model called Machbox recommender to cover such a problem.
Recommender typically use Scoring about items to propose and the use some tricky calculation to predict scores for items users had not scored yet ( a score would be typically 1 user bought the item, 0 he did not)
If you need more details let me know ..(you have access to a free version of Azure ML where you can try all this)
Regards

Product search engines, and filtering on attributes ala eBay - how is it done?

Apologies in advance if this is a common question...I think I'm having trouble finding answers because I'm not sure what the problem is actually called!
The background to the problem is - if you look at a service like ebay, when you make a query, you can select a category to drill down in you results. And then when you drill down a leaf category, you can start using filters. So if you select televisions, you might get a variety of filters - like panel technology (oled, lcd, crt), screen size (22", 32", 40" etc.), brand (sony, samsung, lg etc.). The different filters show you the number of results each filter will produce.
Key point: as you select filters, the filters available update. So if you select sony and oled, the screensize filter (and the others) will update to match results available within the constraints of the previously chosen filters.
My question is...how would you implement this kind of filter system in a search engine. Or specifically, how would you calculate the number of results available for a give combination of filters? How do you work out and update the 'filter histogram' as the user makes filter choices?
It seems like a complex problem. Does ebay precalculate the number of results for every possible combination of filters under a leaf category?
Or is there some other smarter way of handling this?
I hope my question makes sense :) Thanks for ANY help! :)

Resources