Misconeptions about search indexing? (Haystack/Whoosh) - search

I'm using haystack with whoosh for development purposes.
I want search results based on django models to be filtered by the user that created them.
Please see my other post Filter haystack result with SearchQuerySet for details.
Basically I had to add User to my search index. But I noticed, when I manually change the user_id of a record, search is broken. After thinking about it this even makes sense. But, this means I have to rebuild the index after each field update in each model? Surely that doesn't scale at all?
I thought the engine would find the object by id, then look it up in the database, and return a current instance for further processing like filtering. It seems like everything is cached in the index so must be synchronized in realtime for search results to show up? Am I missing something here?

This documentation helped shed some light:
http://docs.haystacksearch.org/dev/searchindex_api.html

Related

Honoring previous searches in the next search result in solr

I am using solr for searching. I wants to improve my search result quality based on previously searched terms. Suppose, I have two products in my index with names 'Jewelry Crystal'(say it belongs to Group 1) & 'Compound Crystal'(say it belongs to Group 2). Now, if we query for 'Crystal', then both the products will come.
Let say, if I had previously searched for 'Jewelry Ornament', then I searches for 'Crystal', then I expects that only one result ('Jewelry Crystal') should come. There is no point of showing 'Compound Crystal' product to any person looking for jewelry type product.
Is there any way in SOLR to honour this kind of behavior or is there any other method to achieve this.
First of all, there's nothing built-in in Solr to achive this. What you need for this is some kind of user session, which is not supported by Solr, or a client side storage like a cookie or something for the preceding query.
But to achive the upvote you can use a runtime Boost Query.
Assuming you're using the edismax QueryParser, you can add the following to your Solr query:
q=Crystal&boost=bq:(Jewelry Ornament)
See http://wiki.apache.org/solr/ExtendedDisMax#bq_.28Boost_Query.29

Can solr return function values (not solr score or document fields)?

We are making a solr query where we are giving a custom function (which is pretty complex) and sorting the results by value of that function. The query looks something like:
solr/select?customFunc=complexFunction(querySpecificValue1,querySpecificValue2)&sort_by=$customFunc&fq=......
Our understanding is that we can only get back fields on the document and solr score back from solr. Can someone tell us if and how we can fetch the computed value of customFunc for each document. For some reasons we cannot set solr score to be customFunc.
You should use the fl parameter to select pseudo fields, functions and so on, but this is supported only on trunk, which will be released with the 4.0 version of Solr. Have a look at the CommonQueryParameters wiki. The SOLR-2444 issue might be interesting too.
A brief example:
solr/select?q=*:*&fl=*,customFunc:complexFunction(querySpecificValue1,querySpecificValue2)
This helped me :
/solr/auction-En/select/?q=*:*_val_:"sum(x,y)"&debugQuery=true&version=2.2&start=0&rows=10&indent=on&fl=*,score
You will see the values of the function in the debug part.

How can I configure Sitecore search to retrieve custom values from the search index

I am using the AdvancedDatabaseCrawler as a base for my search page. I have configured it so that I can search for what I want and it is very fast. The problem is that as soon as you want to do anything with the search results that requires accessing field values the performance goes through the roof.
The main search results part is fine as even if there are 1000 results returned from the search I am only showing 10 or 20 results per page which means I only have to retrieve 10 or 20 items. However in the sidebar I am listing out various filtering options with the number or results associated with each filtering option (eBay style). In order to retrieve these filter options I perform a relationship search based on the search results. Since the search results only contain SkinnyItems it has to call GetItem() on every single result to get the actual item in order to get the value that I'm filtering by. In other words it will call Database.GetItem(id) 1000 times! Obviously that is not terribly efficient.
Am I missing something here? Is there any way to configure Sitecore search to retrieve custom values from the search index? If I can search for the values in the index why can't I also retrieve them? If I can't, how else can I process the results without getting each individual item from the database?
Here is an idea of the functionality that I’m after: http://cameras.shop.ebay.com.au/Digital-Cameras-/31388/i.html
Klaus answered on SDN: use facetting with Apache Solr or similar.
http://sdn.sitecore.net/SDN5/Forum/ShowPost.aspx?PostID=35618
I've currently resolved this by defining dynamic fields for every field that I will need to filter by or return in the search result collection. That way I can achieve the facetted searching that is required without needing to grab field values from the database. I'm assuming that by adding the dynamic fields we are taking a performance hit when rebuilding the index. But I can live with that.
In the future we'll probably look at utilizing a product like Apache Solr.

CouchDB views - Multiple join... Can it be done?

I have three document types MainCategory, Category, SubCategory... each have a parentid which relates to the id of their parent document.
So I want to set up a view so that I can get a list of SubCategories which sit under the MainCategory (preferably just using a map function)... I haven't found a way to arrange the view so this is possible.
I currently have set up a view which gets the following output -
{"total_rows":16,"offset":0,"rows":[
{"id":"11098","key":["22056",0,"11098"],"value":"MainCat...."},
{"id":"11098","key":["22056",1,"11098"],"value":"Cat...."},
{"id":"33610","key":["22056",2,"null"],"value":"SubCat...."},
{"id":"33989","key":["22056",2,"null"],"value":"SubCat...."},
{"id":"11810","key":["22245",0,"11810"],"value":"MainCat...."},
{"id":"11810","key":["22245",1,"11810"],"value":"Cat...."},
{"id":"33106","key":["22245",2,"null"],"value":"SubCat...."},
{"id":"33321","key":["22245",2,"null"],"value":"SubCat...."},
{"id":"11098","key":["22479",0,"11098"],"value":"MainCat...."},
{"id":"11098","key":["22479",1,"11098"],"value":"Cat...."},
{"id":"11810","key":["22945",0,"11810"],"value":"MainCat...."},
{"id":"11810","key":["22945",1,"11810"],"value":"Cat...."},
{"id":"33123","key":["22945",2,"null"],"value":"SubCat...."},
{"id":"33453","key":["22945",2,"null"],"value":"SubCat...."},
{"id":"33667","key":["22945",2,"null"],"value":"SubCat...."},
{"id":"33987","key":["22945",2,"null"],"value":"SubCat...."}
]}
Which QueryString parameters would I use to get say the rows which have a key that starts with ["22945".... When all I have (at query time) is the id "11810" (at query time I don't have knowledge of the id "22945").
If any of that makes sense.
Thanks
The way you store your categories seems to be suboptimal for the query you try to perform on it.
MongoDB.org has a page on various strategies to implement tree-structures (they should apply to Couch and other doc dbs as well) - you should consider Array of Ancestors, where you always store the full path to your node. This makes updating/moving categories more difficult, but querying is easy and fast.

Query and/or Search for SharePoint Document ID

We have the sharepoint 2010 environment with Document ID's enabled.
Given (part of) a Doc ID, we want to programmatically retrieve the document(s) matching that ID. The problem seems to be that this column is rather special, in that it might need special handling.
Using an SPSiteDataQuery, fetching the _dlc_DocId field as part of the viewfields works fine. However, including it as part of the where query never results in any documents being fetched.
Using the Search API has gotten us nowhere at all.
Has anyone pulled this off, or any suggestions on how to tackle this problem?
[Update] Turns out we were fooled by subtle errors in the XML and bad debugging misinterpretations. This stuff just works fine.
I don't normally contribute to these sorts of things because cleverer people than I always get there before me, but as this is an old one with no proper answer I think I'll add my thoughts for those who find this page.
I was struggling with this but after a little digging around and learning a bit of Caml I got this working.
I am using the SharePoint Client Object Model against SharePoint 2010 and Office365 beta.
Start off your query by looking at the all list items query:
Microsoft.SharePoint.Client.CamlQuery.CreateAllItemsQuery().ViewXml
"<View Scope=\"RecursiveAll\">\r\n <Query>\r\n </Query>\r\n</View>"
Stick a where child inside the query
Then add in
<Eq><FieldRef Name="_dlc_DocId" /><Value Type="Text">MDXC2KE55ASN-3-80</Value></Eq>
replacing MDXC2KE55ASN-3-80 with the doc ID you are looking for inside the where.
Also don't forget you might want to make use of these too:
<ViewFields><FieldRef Name="_dlc_DocId" /></ViewFields>
<RowLimit>1</RowLimit>
Then use List.GetItems() method to bring back the ListItemCollection.
Just in case nobody comes with a slick solutions from the depths of the Sharepoint infrastructure:
What would Google Do?
Slice is, Dice it and dump it in a reverse index.
Solr and Lucene offer supreme tools for this. The idea is to cut the DocId's in small pieces and add the location of the document to the bucket for that piece.
Say We have "A real nice document" with Id ABCD123. You would add it to the buckets
ABCD, BCD1, CD12, D123
When searching for a partial ID (+ other data like dates, types, ...) you (well the search engine) creates the union of the buckets + applies additonal constraints.
To make this happen you need to write a spider for the sharepoint server and a routine which makes a record of data elements to be indexed.
Put a nice REST interface in frnt of it (actually SOLR already has that), integrate it in the main sharepoint server, and nobody needs to know there is something else running behind it.
These products can also incrementally update the indexes, so they can be kept up to date.
you could use the following to get the Document ID.
SPFile file = MethodToUploadFileToServer(web, filepath);
SPListItem item = file.Item;
string DocID = item.Properties["_dlc_DocId"].ToString();

Resources