Solr index time boost on document date - search

I'm trying to index a wiki (using a direct access to the wiki db) and trying to negatively boost on document date (so that the old documents appear further down in the results). There is a great solr-wiki page on boosting and related topics:
http://wiki.apache.org/solr/SolrRelevancyFAQ
It simply says to do the following:
"Use an index-time boost that is larger for newer documents"
But how and where? Which part of the solr configuration do I have to change to use an index-time boost? Do I have to adapt the DATA import handler?

IMO you should not use an index time boost for date.
If you do an index time boost, the boost is fixed stored in the index
Query time boost will provide you the flexibility to boost as well as change the boost at runtime without needing re-indexing
You can use Query time boost which will always boost according to the current date.

From Solr relevancy FAQ:
Index-time boosts are assigned with the optional attribute "boost" in
the section of the XML updating messages. See UpdateXmlMessages
for more information.
Following the UpdateXmlMessages link you can find this:
Optional attributes on "doc"
boost = — default is 1.0 (See Lucene docs for definition of
boost.) NOTE: make sure norms are enabled (omitNorms="false" in the
schema.xml) for any fields where the index-time boost should be
stored.
Optional attributes for "field"
boost = — default is 1.0 (See Lucene docs for definition of
boost.) NOTE: make sure norms are enabled (omitNorms="false" in the
schema.xml) for any fields where the index-time boost should be
stored.

Related

Cloudant Lucene boost factor during indexing

During indexing time it is possible to set a boost factor value which then changes the position of specific record in the array of returned documents.
Example:
index("default", doc.my_field, {"index": "analyzed", "boost": doc.boostFactor});
When applying this boost factor I can see that the sorting changes. However, it appears to be rather random.
I would expect a number greater than 1 would sort the document higher.
Did anybody managed to get the boost factor with Cloudant to work correctly?
Yes, Cloudant boost factor should work correctly. Setting boost to a field of a specific doc, will modify the score of this doc: Score = OriginalScore * boost while searching on this field.
Do you search on the same field you boost? How does your query look like? Does the field my_field consists of multiple tokens? This may also influence scoring (e.g. longer fields get scored less).
You can observe scores of docs in the order fields in the results, and then by modifying boost observe how the scores are changing.

How to Boost freshness documents with Lucene

i can set boost for indexing, but i don't to boost document by freshness document (indexing by fields from databases)
code for boost:
titleq.setBoost(0.8f);
please help me for boosting score by freshness document thank.
There is one work around for your problem:
If you want latest updated document to come up then
Add one field with same value in every document and search document with that field and value pair.
So automatically your latest document will come up.

Orchard - Database indexes

Does anyone know if there's a recommended set of database indexes for Orchard's core modules? I can't seem to find much info around this, and while I appreciate that the code uses NHibernate to abstract the underlying database, I suspect 99% of users will simply be using SQL server/Express as the default DB and would require suitable indexes to be added. For example on: Orchard_Framework_ContentItemVersionRecord ([Published], [ContentItemRecord_id])
If there isn't already, would it be a good idea for the core modules to have a recommended set of indexes documented somewhere, as they're clearly going to be required for any serious deployments based on an RDBMS?
You're right that some indices would help queries. However with indices it really depends on the usage pattern so there are not many built-in; but you can always add them yourself depending on your specific usage (you can also use SQL Server's tools to recommend indices for your). You can even add indices from migrations to other modules' tables.
In the latest source of Orchard (not yet released) there are also some more default indicies, also for ContentItemVersionRecord. You can see them in FrameworkDataMigration in the 1.x branch.

Optimal Indexing strategy for Multilingual requirement using solr

We use IBM WCS v7 for one of our e-commerce based requirement, in which Apache solr is embeded for the search based implementation.
As per a new requirement, there will be multiple language support for an website, ex- France version of the site can have support for english, french etc. (en_FR, fr_FR etc.) In order to configure solr with this interface, what should be the optimal indexing strategy using a single solr core ?
I got some ideas 1) using multiple fields in schema.xml for multiple languages, 2) using different solr cores for different languages.
But these approaches don't seem to be the best one fitting to the current requirement, as there will be 18 language support for the e-commerce website. Using different fields for every language will be very complicated, and also using different solr code is not a good approach as we need to apply the configurational change in all the solr cores if ever it happens as per any requirement.
Is there any other approaches, or is there any way I can associate the localeId to the indexed data and process the search result with respect to the detected language ?
Any help on this topic will be highly appreciated.
Thanks and Regards,
Jitendriya Dash
This post has already been answered by original poster and others- just summarizing that as an answer:
Recommended solution is to create one index core per locale/language. This is especially important if either the catalog or content (such as product name, description, keywords) will be different and business prefers to manage it separately for each locale. This gives the added benefit for Solr to perform its stemming and tokenization specific to that locale, if applicable.
I have been part of solutions where this approach was preferred over maintaining multiple fields or documents in the same core for each locale/language. Most number of index cores I have worked with is 6.
One must also remember that index core addition will require updates to supporting processes (Product Information Management system updates to catalog load to workspace management to stage-propagation to reindexing to cache invalidation).

Solr: Boosting documents based on a numeric 'popularity' field - do it at index time or query time?

I'm reading the solr cookbook and it suggests using a boost function bf=product(popularity) parameter to boost certain documents based on the "popularity" score.
This could also be implemented using a index time boost on the document right?
So which is the better option? Is there a difference in terms of:
Functionality?
Performance?
This depends on how often your popularity changes. If it is pre-baked and changes infrequently, then you can boost at index time. If it changes frequently (e.g. based on the live searches), then you probably want to store it externally to specific records, using (for example) ExternalFileField.

Resources