I am new to couchdb, i have looked at the docs and SO posts but for some reason this simple query is still eluding me.
SELECT TOP 10 * FROM x WHERE DATE BETWEEN startdate AND enddate ORDER BY score
UPDATE: It cannot be done. This is unfortunate since to get this type
of data you have to pull back potentially millions of records (a few
fields) from couch then do either filtering, sorting or limiting
yourself to get the desired results. I am now going back to my
original solution of using _changes to capture and store elsewhere the data i do need to perform that query on.
Here is my updated view (thanks to Dominic):
emit([d.getUTCFullYear(), d.getUTCMonth() + 1, d.getUTCDate(), score], doc.name);
What I need to do is:
Always sort by score descending
Optionally filter by date range (for instance, TODAY only)
Limit by x
Update: Thanks to Dominic I am much closer - but still having an
issue.
?startkey=[2017,1,13,{}]&endkey=[2017,1,10]&descending=true&limit=10&include_docs=true
This brings back documents between the dates sorted by score
However if i want top 10 regardless of date then i only get back top 10 sorted by date (and not score)
For starters, when using complex keys in CouchDB, you can only sort from left to right. This is a common misconception, but read up on Views Collation for a more in-depth explanation. (while you're at it, read the entire Guide to Views as well since you're getting started)
If you want to be able to sort by score, but filter by date only, you can accomplish this by breaking down your timestamp to only show the degree you care about.
function (doc) {
var d = new Date(doc.date)
emit([ d.getUTCFullYear(), d.getUTCMonth() + 1, d.getUTCDate(), score ])
}
You'll end up outputting a more complex key than what you currently have, but you query it like so:
startkey=[2017,1,1]&endkey=[2017,1,1,{}]
This will pick out all the documents on 1-1-2017, and it'll be sorted by score already! (in ascending order, simply swap startkey and endkey to get descending order, no change to the view needed)
As an aside, avoid emitting the entire doc as the value in your view. It is likely more efficient to leverage the include_docs=true parameter, and leaving the value of your emit empty. (please refer to this SO question for more information)
With this exact setup, you'd need separate views in order to query by different precisions. For example, to query by month you just use the year/month and so on.
However, if you are willing/able to sort your scores in your application, you can use a single view to get all the date precision you want. For example:
function (doc) {
var d = new Date(doc.date)
emit([ d.getUTCFullYear(), d.getUTCMonth() + 1, d.getUTCDate(), d.getUTCHour(), d.getUTCMinutes(), d.getUTCSeconds(), d.getUTCMilliseconds() ])
}
With this view and the group_level parameter, you can get all the scores by year, month, date, hour, etc. As I mentioned, in this case it won't be sorted by score yet, but maybe this opens up other queries to you. (eg: what users participated this month?)
I have a solr field which has a set of values. Is it possible in solr to return results that are varied based on that field.
Eg: My field contains "ValueA","ValueB" and "ValueC". So if rows is set to 3 then instead of returning all results from "ValueA" it should give me one from each field value (Considering they have the same scores)
You might want to use the Result Grouping / Field Collapsing
or the CollapsingQParserPlugin.
The CollapsingQParserPlugin is newer (since Solr 4.6), faster and more appropriate for your problem, I guess, as it does not effect the structure of the results.
Just add this to your solrconfig.xml:
<queryParser name="collapse" class="org.apache.solr.search.CollapsingQParserPlugin"/>
You can then collapse your result by adding the following parameter to your query:
fq={!collapse field=my_field}
or in Solrj:
solrQuery.addFilterQuery("{!collapse field=my_field}");
Collapsing means: For each value in my_field it only retains the document with the highest score in the result set.
My goal is to round score to group similar items and then sort by another field (let's use price as an example).
I'm able to accomplish this with the following query:
/select?defType=func&q=rint(product(query({!v=the search term}),100))&fl=score,price&sort=score%20desc,price
However, this query returns every document indexed in Solr.
How can I filter this query so that items with a score of 0 are excluded?
I've tried adding {!frange l=1} to the query which kind of worked... but it made all of the scores equal to 1. This obviously isn't good because I need to show the most relevant results first.
Thanks in advance for any help.
Alex
I spent hours trying to filter out values with a relevance score of 0. I couldn't find any straight forward way to do this. I ended up accomplishing this with a workaround that assigns the query function to a local param. I call this local param in both the query ("q=") and the filter query ("fq=").
Example
Let's say you have a query like:
q={!func}sum(*your arguments*)
First, make the function component its own parameter:
q={!func}$localParam
&localParam={!func}sum(*your arguments*)
Now to only return results with scores between 1 and 10 simply add a filter query on that localParam:
q={!func}$localParam
&localParam={!func}sum(*your arguments*)
&fq={!frange l=1 u=10 inclusive=true}$localParam
I am using Solr - Lucene 4.0. I am trying to run a query to search a field called Names.
An example of a query would be:
Names:George
When I execute the search with the amount of rows to return to 1000 it returns 1000 results. I expect it to return way less than that. The last results aren't similar at all. Is there a way to set a threshold for my results so that it only returns matches of a certain similarity?
actually you cant not create a minimum matching score. because the matching score is relative and depends on a lot of things (ex. number of overall documents, number of matching terms found).
i do not know what is your case exactly. but you may consider use paging. like get results in 20 document at a time and check the score of the last document and then stop if its lower than a threshold that you specify.
I need to know how Lucene orders the records in a result set if I use composite queries.
It looks like it sorts it using "score" value for exact queries and it sorts it lexicographically for range queries. But what if have the query which looks like
q = type:TAG OR type:POST AND date:[111 to 999]
You mix together logical search and scoring. When you pass query like date:[111 to 999], Lucene searches for all documents with the date in specified range. But you give it no advice on how to sort them - is date 111 more preferable for you than 555? or is 701 better than 398? Lucene have no idea about it, so the score is the same for all found documents. Just to make some order, Lucene sorts results lexicographically, but that's mostly detail of implementation, not some key idea.
On other hand, if you pass some other parameters with a query - be it keywords or tags - Lucene can apply its similarity algorithm and assign different scores to different docs in results. You may find more on Lucene's scoring here.
So, to give you short answer: Lucene sorts results by score, and only if the score for 2 documents is the same, it uses other types sorting options like lexicographical order.