My question is simple but I can't find the answer. Is there a way to set in Lucene to retrieve an amount of results higher than 100 in a query?
Im using lucene 2.4.0 now.
Thanks all.
Functionality for controlling the search result size and itarations over larger search results has been vastly improved in later versions of Lucene. If you have the possibility, consider upgrading to 2.9 or even 3.0.
That being said, I cannot from your post determine how exactly you get your search results returned. Perhaps you use a Hits object? In that case you should consider using the TopDocCollector instead. The TopDocCollector(int) constructor allows you to specify the maximum number of hits you would like to have returned from your search.
In 2.4.0, you can use Searcher.search(Query query, int n) method to retrieve desired number of results. The method returns TopDocs object.
Related
in a commercial application it is not uncommun to have hundreds facets. Of course not all products are flaged with all of them.
But when searching I need to add a facet querystring parameter that list all the facets that I want to get back. As I don't know by advance the list of relevant one, I have to pass all of them in the query.
This is not practical we more than a few facets.
Is there a way to solve this issue or is it a limitation of the product?
The Azure Search doc:
https://msdn.microsoft.com/fr-fr/library/azure/dn798927.aspx
You are correct that this is a current limitation of Azure Search in that you need to pass all the facets in the query string. Please know that we are aware of this and in fact it can be an even bigger issue for customers where they have so many parameters or facets in their query string that it exceeds the max size of the url. For this reason, we are investigating what can be done about this to accommodate this.
I apologize that I do not yet have a date for when this is to be available other than to say it is on our short term roadmap.
Liam
It looks like Azure Search now supports both a GET and POST method, and recommends using POST when the length of the URL would exceed the max limit of 2048 characters (1024 for just the querystring).
https://learn.microsoft.com/en-us/rest/api/searchservice/search-documents
We are new to elastic search and NEST.
We are trying to do case sensitive search using C# client - NEST.
We have read lots of posts but could not figure out it. Can someone please us with detail step by step instructions.
Any help will be highly appreciated.
Thanks,
VB.
I know this is an older question, but I ran across it in my research. So, here's my answer.
First, switching to a TERM query did not help. Upon learning more about how ElasticSearch works by default, I understand why.
By default, ElasticSearch is case-insensitive. When documents are indexed, the default analyzer lowercases all of the string values and keeps the lowercase values for future searches. This does not affect the values stored in the documents themselves, but the lowercasing does affect searches.
If you are using the default analyzer, then your search terms for string values should be all lowercase.
Before I learned how this worked, I spent a fair amount of time looking at a mixed-case field value in an indexed document, then searching with a query term that used the same mixed-case value. Zero results. It wasn't until I forced the value my query used to all lowercase that I started getting results.
You can read more about ElasticSearch analyzers here: ElasticSearch - Analysis
Try TERM query, are values passed to TERM query are not analyzed, thus ES is not making lower case of your input.
Here: http://www.elasticsearch.org/guide/reference/query-dsl/term-query/
I'm using Solr to search for a long list of IDs like so:
ID:("4d0dbdd9-d6e1-b3a4-490a-6a9d98e276be"
"4954d037-f2ee-8c54-c14e-fa705af9a316"
"0795e3d5-1676-a3d4-2103-45ce37a4fb2c"
"3e4c790f-5924-37b4-9d41-bca2781892ec"
"ae30e57e-1012-d354-15fb-5f77834f23a9"
"7bdf6790-de0c-ae04-3539-4cce5c3fa1ff"
"b350840f-6e53-9da4-f5c2-dc5029fa4b64"
"fd01eb56-bc4c-a444-89aa-dc92fdfd3242"
"4afb2c66-cec9-8b84-8988-dc52964795c2"
"73882c65-1c5b-b3c4-0ded-cf561be07021"
"5712422c-12f8-ece4-0510-8f9d25055dd9"...etc
This works up to a point, but above a certain size fails with the message: too many boolean clauses. You can increase the limit in solrconfig.xml, but this will only take it so far - and I expect the limit is there for a reason:
<maxBooleanClauses>1024</maxBooleanClauses>
I could split the query into several little ones, but that would prevent me then sorting the results. There must be a more appropriate way of doing this?
You should be using a Lucene filter instead of building up the huge boolean query. Try using FieldCacheTermsFilter and pass that filter in to your Searcher. FieldCacheTermsFilter will translate your UID's to a Lucene DocIdSet, and it'll do it fast since it's doing it via the FieldCache.
I'm using Solr and I want to facet over a field "group".
Since "group" is created by users, potentially there can be a huge number of values for "group".
Would Solr be able to handle a use case like this? Or is Solr not really appropriate for facet fields with a large number of values?
I understand that I can set facet.limit to restrict the number of values returned for a facet field. Would this help in my case?
Say there are 100,000 matching values for "group" in a search, if I set facet.limit to 50. would that speed up the query, or would the query still be slow because Solr still needs to process and sort through all the facet values and return the top 50 ones?
Any tips on how to tune Solr for large number of facet values?
Thanks.
Since 1.4, solr handles facets with a large number of values pretty well, as it uses a simple facet count by default. (facet.method is 'fc' by default).
Prior to 1.4, solr was using a filter based faceted method (enum) which is definitely faster for faceting on attribute with small number of values. This method requires one filter per facet value.
About facet.limit , think of it like as a way to navigate through the facet space (in conjunction with facet.offset), like you navigate through the result space with rows/offset. So a value of 10 ~ 50 is sensible.
As with rows/offset, and due to the nature of Solr, you can expect the performance of facet.limit/facet.offset to degrade when the offset gets bigger, but it should be perfectly fine if you stay within reasonable boundaries.
By default, solr outputs more frequent facets first.
To sum up:
Use Solr 1.4
Make sure facet.method is 'fc' (well, that's the default anyway).
Navigate through your facet space with facet.limit/facet.offset.
Don't misregard to enable cache faceting related parameters (try different cache sizes to chose the values that fit well to your system):
<filterCache class="solr.FastLRUCache" size="4096" initialSize="4096" autowarmCount="4096"/>
<queryResultCache class="solr.LRUCache" size="5000" initialSize="5000" autowarmCount="5000"/>
I'm using the following code to execute a query in Lucene.Net
var collector = new GroupingHitCollector(searcher.GetIndexReader());
searcher.Search(myQuery, collector);
resultsCount = collector.Hits.Count;
How do I sort these search results based on a field?
Update
Thanks for your answer. I had tried using TopFieldDocCollector but I got an error saying, "value is too small or too large" when i passed 5000 as numHits argument value. Please suggest a valid value to pass.
The search.Searcher.search method will accept a search.Sort parameter, which can be constructed as simply as:
new Sort("my_sort_field")
However, there are some limitations on which fields can be sorted on - they need to be indexed but not tokenized, and the values convertible to Strings, Floats or Integers.
Lucene in Action covers all of the details, as well as sorting by multiple fields and so on.
What you're looking for is probably TopFieldDocCollector. Use it instead of the GroupingHitCollector (what is that?), or inside it.
Comment on this if you need more info. I'll be happy to help.
In the original (Java) version of Lucene, there is no hard restriction on the size of the the TopFieldDocCollector results. Any number greater than zero is accepted. Although memory constraints and performance degradation create a practical limit that depends on your environment, 5000 hits is trivial and shouldn't pose a problem outside of a mobile device.
Perhaps in porting Lucene, TopFieldDocCollector was modified to use something other than Lucene's "heap" implementation (called PriorityQueue, extended by FieldSortedHitQueue)—something that imposes an unreasonably small limit on the results size. If so, you might want to look at the source code for TopFieldDocCollector, and implement your own similar hit collector using a better heap implementation.
I have to ask, however, why are you trying to collect 5000 results? No user in an interactive application is going to want to see that many. I figure that users willing to look at 200 results are rare, but double it to 400 just as factor of safety. Depending on the application, limiting the result size can hamper malicious screen scrapers and mitigate denial-of-service attacks too.
The constructor for Sort accepting only the string field name has been depreciated. Now you have to create a sort object and pass it in as the last paramater of searcher.Search()
/* sorting by a field of type long called "size" from greatest -> smallest
(signified by passing in true for the last isReversed paramater)*/
Sort sorter = new Sorter(new SortField("size", SortField.Type.LONG, true))
searcher.Search(myQuery, collector, sorter);