How to Boost freshness documents with Lucene - search

i can set boost for indexing, but i don't to boost document by freshness document (indexing by fields from databases)
code for boost:
titleq.setBoost(0.8f);
please help me for boosting score by freshness document thank.

There is one work around for your problem:
If you want latest updated document to come up then
Add one field with same value in every document and search document with that field and value pair.
So automatically your latest document will come up.

Related

Cloudant Lucene boost factor during indexing

During indexing time it is possible to set a boost factor value which then changes the position of specific record in the array of returned documents.
Example:
index("default", doc.my_field, {"index": "analyzed", "boost": doc.boostFactor});
When applying this boost factor I can see that the sorting changes. However, it appears to be rather random.
I would expect a number greater than 1 would sort the document higher.
Did anybody managed to get the boost factor with Cloudant to work correctly?
Yes, Cloudant boost factor should work correctly. Setting boost to a field of a specific doc, will modify the score of this doc: Score = OriginalScore * boost while searching on this field.
Do you search on the same field you boost? How does your query look like? Does the field my_field consists of multiple tokens? This may also influence scoring (e.g. longer fields get scored less).
You can observe scores of docs in the order fields in the results, and then by modifying boost observe how the scores are changing.

If I drop the same document into ElasticSearch again, is it going to reindex it?

This is obviously a question about ES internals.
What I have is a custom search engine built on top of ES feeding it with data from multiple vendors. In order to find out if particular document has changed since last indexing (during e.g. periodic re-pulling the documents from vendors - there's no way to ask some vendors "give me only documents changed since that date"), I'd have to check it somehow for modification and drop it into ES for indexing iff the document changed.
Question: does ES keep track of document checksums internally to see if it actually needs to re-index it? (of course I'm presuming that it's not some HTML where some fields, timestamps, etc. are updated dynamically on each GET).
If it did (that is, re-indexing identical documents has negligible amortized cost), that would simplify updates for me, obviously.
If you use the Update API, you can detect no ops https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html#_detecting_noop_updates. You can see the source code for the no op here. https://github.com/elastic/elasticsearch/blob/master/core/src/main/java/org/elasticsearch/action/update/UpdateRequestBuilder. Note the "extra work" comment. That's definitely something to consider.
Keep in mind the update API tends to be a lot slower than plain vanilla bulk inserts. Regular inserts in which you let ES increment the _version number when you index a document in the same index with the same id will be faster... but they'll also create GC and indexing pressure.

Solr: Boosting documents based on a numeric 'popularity' field - do it at index time or query time?

I'm reading the solr cookbook and it suggests using a boost function bf=product(popularity) parameter to boost certain documents based on the "popularity" score.
This could also be implemented using a index time boost on the document right?
So which is the better option? Is there a difference in terms of:
Functionality?
Performance?
This depends on how often your popularity changes. If it is pre-baked and changes infrequently, then you can boost at index time. If it changes frequently (e.g. based on the live searches), then you probably want to store it externally to specific records, using (for example) ExternalFileField.

Solr index time boost on document date

I'm trying to index a wiki (using a direct access to the wiki db) and trying to negatively boost on document date (so that the old documents appear further down in the results). There is a great solr-wiki page on boosting and related topics:
http://wiki.apache.org/solr/SolrRelevancyFAQ
It simply says to do the following:
"Use an index-time boost that is larger for newer documents"
But how and where? Which part of the solr configuration do I have to change to use an index-time boost? Do I have to adapt the DATA import handler?
IMO you should not use an index time boost for date.
If you do an index time boost, the boost is fixed stored in the index
Query time boost will provide you the flexibility to boost as well as change the boost at runtime without needing re-indexing
You can use Query time boost which will always boost according to the current date.
From Solr relevancy FAQ:
Index-time boosts are assigned with the optional attribute "boost" in
the section of the XML updating messages. See UpdateXmlMessages
for more information.
Following the UpdateXmlMessages link you can find this:
Optional attributes on "doc"
boost = — default is 1.0 (See Lucene docs for definition of
boost.) NOTE: make sure norms are enabled (omitNorms="false" in the
schema.xml) for any fields where the index-time boost should be
stored.
Optional attributes for "field"
boost = — default is 1.0 (See Lucene docs for definition of
boost.) NOTE: make sure norms are enabled (omitNorms="false" in the
schema.xml) for any fields where the index-time boost should be
stored.

Does CouchDB support unqiue key constraint?

I come from a RDBMS background, and I have an application here which requires good scalability and low latency. I want to give CouchDB a try. However, I need to detect when a particular INSERT operation fails due to a unique key constraint. Does CouchDB support this? I took a look at the docs, but I could not come across anything relevant.
The _id for each document is unique (within the same database), but there are no constraints for other fields in the document.
Particularly, there are no constraints that run across two or more documents.
You can set up validation documents to set up validation rules for documents, but again they are on a document by document basis.
As the above poster says, there are no constraints for other fields than the document _id. The _id can be automatically generated by couchdb or you can create your own. (for my purposes I have created my own as I knew I could guarantee the key's uniqueness).
At the lowest API level, if you attempt a PUT request of an existing document id, it will be rejected with a HTTP 409 error - unless you supply the correct revision (_rev property) of the existing document.
I wouldn't run anything mission-critical with couchdb but the code is out of Apache incubation and quite functional. A number of people are running websites with it.

Resources