Recently I am drowning in two school of thought about cursor designation for paging:
Cursor contains only its position (like last item id, last created at ...).
So server can serve any combination of cursor and query parameter.
For example:
First query: ?queryParam=X&cursor=, server responds with cursor=C1
Second query: ?queryParam=Y&cursor=C1, server still able to handle this query with new query parameter (even though cursor=C1 is associated with query parameter X)
Cursor contains original query parameters. When specify cursor, other query parameters are disregarded.
That is if query with incompatible query params <-> cursor pair, the server may ignore query params or even responds with error
First query: ?queryParam=X&cursor=, server responds with cursor=C1 which encodes queryParam=X
Second query: ?queryParam=Y&cursor=C1, server extract queryParam=X from cursor=C1 and ignore queryParam=Y from request.
So what is a prefer approach to design a cursor regards the two options above?
Last time I check, Google API (specifically Gmail API::list messages) use first approach.
Related
I want to fetch all documents in a CouchDB database where the document ID starts with a given prefix.
Did some searching and found, according to the CouchDB Documentation, the best way to accomplish this is by using a startkey and endkey, where the startkey is the prefix, and the endkey is the prefix with a high-value unicode character appended at the end.
So, as I understand it, a call to "http://server:5984/some_db/_all_docs?startkey=2018&endkey=2018\ufff0&include_docs=true" should fetch all docs from some_db with an ID starting with '2018'.
That url is being encoded by the web browser as follows:
http://server:5984/some_db/_all_docs?startkey=2018&endkey=2018%EF%BF%B0&include_docs=true
And the response I get back is {"error":"bad_request","reason":"invalid UTF-8 JSON"}
So I tried just sticking to pure ASCII and used ~ instead of \ufff0. Same response. Also got the same response using a z.
If I do something like _all_docs?startkey=2018&endkey=2019&include_docs=true&inclusive_end=false everything works fine and I get the expected results. However, I can't guarantee the prefix will always be a number, and I get the impression trying to implement it like that programmatically will cause me issues some where or some how. Any thoughts?
I'm using Dart running in the web browser to make the request, if it makes a difference.
Update
So, I've realized in actuality _all_docs does not support the endkey and startkey parameters. The request I originally thought was working was actually just returning the entire database.
I had assumed _all_docs supports startkey and andkey because I have used PouchDB in the past, which does support those parameters in the allDocs() function.
Still looking for a solution, since this project is not using PouchDB, but at least now I know what the problem is.
Update 2
Previous update was wrong, Although the documentation of _all_docs doesn't have these parameters listed, there is a note which I had missed indicating it also supports the parameters for view, see my answer below.
Okay, I figured it out.
I was wrong in my update, startkey and endkey are supported by _all_docs because it's just a built-in view, so all the parameters for views apply. However, it expects the passed values to be JSON values, not just a bare string as a key. The solution is just to put quotation marks around the keys.
That is, encoded quotation marks, e.g. startkey=%222018%22&endkey=%222018%EF%BF%B0%22
I have a certain query that I am using to get results that correspond to a particular search:
response = gmail_service.users().messages().list(userId=user_id, q='from:"digital-no-reply#amazon.com"', pageToken='').execute()
To get the next page of results, is this the right query:
response = gmail_service.users().messages().list(userId=user_id, q='from:"digital-no-reply#amazon.com"', pageToken=next_page_token).execute()
I tried not giving the query param, thinking that the next_page_token should contain a reference to the query that generated the previous page, but the results I got did not come from the query parameter. Hence wondering what is the correct way of getting all pages of results corresponding to the query?
Your suspicion is correct. Just supply the same query on your next page fetch and repeat until there is no pageToken in the response. Then you know you have gotten all the results of that particular query.
So I have been playing with paginating and am trying to resolve an issue where a result on page 64 will sometimes contain a hit on page 65.
If I execute this query
http://host:9200/index/_search?q=field:searchterm&size=1&from=100
I discover that every second query result is identical.
But if the pagination parameter has a lower value, all results are identical.
I've played with sorting, but the behavior is consistent.
Try adding a preference param to the request parameter.
I'm guessing this could be due to the bouncing result issue.
For load balancing you could probably use preference parameter with a custom string such as username for the initial request.
Use the same custom string for subsequent pagination requests
I want to create pagination on application level using the CouchDB view API. The pagination uses cursors, so given a cursor, I will query the view for the n+1 documents starting with the given cursor as start key and output the n results as page and provide the n+1 result row as the cursor for the next page.
This works well as long as the view keys are also the keys for my view rows. Now this time all my docs have a date field and I emit them as map keys, because I want to sort via date. However, I can't use my cursors anymore like before.
I thought that is the reason the view API also provides startkey_docid for submitting such a cursor doc id, however this is obviously not true. It seems like this value is only applied if there are several equal rows per keys.
So, in short: I want a date-ordered view, but cursors based on the document ids. How can I do this?
Thanks in advance
Simplified view
function map(doc)
{
emit(doc.date, {_id: doc._id});
}
Simplified view result:
{
"rows":[
{"id":"123","key":"2010-06-26T01:28:13.555Z", value:{...}},
{"id":"234","key":"2010-06-22T12:21:23.123Z", value:{...}},
{"id":"987","key":"2010-06-16T13:48:43.321Z", value:{...}},
{"id":"103","key":"2010-05-01T17:38:31.123Z", value:{...}},
{"id":"645","key":"2009-07-21T21:21:13.345Z", value:{...}}
]
}
Application-level query with cursor 234, page size 3 should return:
234, 987, 103
So how can I map this to a view?
Why do you want cursors based on docid?
Map Reduce creates single dimensional indexes, so any non-key traversal will be expensive. However, I think you can do what you want without requiring traversing 2 indexes at the same time.
See for instance here how I paginate through a posts with a certain tag:
Sofa's CouchApp tag pagination
aka
http://jchris.couchone.com/sofa/_design/sofa/_list/index/tags?descending=true&reduce=false&limit=10&startkey=[%22life%22%2C{}]&endkey=[%22life%22]
The key in that view looks like ["tag","2008/10/25 04:49:10 +0000"] so you can paginate through by tag and, within tags, by time.
Edited
Ha! I just realized what you are trying to do. It is so very simple.
Forget all about docids, they should be random anyway and not related to anything so just forget docs even have ids for a second.
You say "Application-level query with cursor 234, page size 3 should return:
234, 987, 103"
Your cursor should not be 234. It should be the key "2010-06-22T12:21:23.123Z".
So in essence you use the key of the last row of results as the startkey for the next query. So eg startkey=""2010-06-22T12:21:23.123Z""&limit=3, then for each page you render, link to a query where the new startkey is the last returned key.
Bonus: with what I've just described, you will have the bottom row of page 2 be the top row of page 3. To fix this, add skip=1 to your query.
Bonus bonus: OK, what about when I have more than 3 docs that emitted to the same key in the view? Then the last key will always be the same as the first key, so you can't progress in pagination without expanding the limit parameter. Unless... you use startkey_docid (and set it do the id of the last row). That is the only time you should use startkey_docid.
I use solr to search for documents and when trying to search for documents using this query "id:*", I get this query parser exception telling that it cannot parse the query with * or ? as the first character.
HTTP Status 400 - org.apache.lucene.queryParser.ParseException: Cannot parse 'id:*': '*' or '?' not allowed as first character in WildcardQuery
type Status report
message org.apache.lucene.queryParser.ParseException: Cannot parse 'id:*': '*' or '?' not allowed as first character in WildcardQuery
description The request sent by the client was syntactically incorrect (org.apache.lucene.queryParser.ParseException: Cannot parse 'id:*': '*' or '?' not allowed as first character in WildcardQuery).
Is there any patch for getting this to work with just * ? Or is it very costly to do such a query?
If you want all documents, do a query on *:*
If you want all documents with a certain field (e.g. id) try id:[* TO *]
Lucene doesn't allow you to start WildcardQueries with an asterisk by default, because those are incredibly expensive queries and will be very, very, very slow on large indexes.
If you're using the Lucene QueryParser, call setAllowLeadingWildcard(true) on it to enable it.
If you want all of the documents with a certain field set, you are much better off querying or walking the index programmatically than using QueryParser. You should really only use QueryParser to parse user input.
id:[a* TO z*] id:[0* TO 9*] etc.
I just did this in lukeall on my index and it worked, therefore it should work in Solr which uses the standard query parser. I don't actually use Solr.
In base Lucene there's a fine reason for why you'd never query for every document, it's because to query for a document you must use a new indexReader("DirectoryName") and apply a query to it. Therefore you could totally skip applying a query to it and use the indexReader methods numDocs() to get a count of all the documents, and document(int n) to retrieve any of the documents.
If you are just trying to get all documents, Solr does support the *:* query. It's the only time I know of that Solr will let you begin a query with an *. I'm sure you've probably seen this as the default query in the Solr admin page.
If you are trying to do a more specific query with an * as the first character, like say id:*456 then one of the best ways I've seen is to index that field twice. Once normally (field name: id), and once with all the characters reversed (field name: reverse_id). Then you could essentially do the query id:456 by sending the query reverse_id:654 instead. Hope that makes sense.
You can also search the Solr user group mailing list at http://www.mail-archive.com/solr-user#lucene.apache.org/ where questions like this come up quite often.
The following Solr issue is a request to be able to configure the default lucene query parser.
https://issues.apache.org/jira/browse/SOLR-218
In this issue you can find the following description how to 'patch' Solr. This modification would allow you to start queries with a *.
Jonas Salk: I've basically updated only one Java file: SolrQueryParser.java.
public SolrQueryParser(IndexSchema schema, String defaultField) {
...
setAllowLeadingWildcard(true);
setLowercaseExpandedTerms(true);
...
}
...
public SolrQueryParser(QParser parser, String defaultField, Analyzer analyzer) {
...
setAllowLeadingWildcard(true);
setLowercaseExpandedTerms(true);
...
}
I'm not sure if setLowercaseExpandedTerms is needed...
I'm assuming with id:* you're just trying to match all documents, right?
I've never used solr before, but in my Lucene experience, when ingesting data, we've added a hidden field to every document, then when we need to return every record we do a search for the string constant in that field that's the same for every record.
If you can't add a field like that in your situation, you could use a RegexQuery with a regex that would match anything that could be found in the id field.
Edit: actually answering the question. I've never heard of a patch to get that to work, but I would be surprised if it could even be made to work reasonably well. See this question for a reason why unconstrained PrefixQuery's can cause a problem.
Actually, I have been using a workaround for this. I append a character to the id, eg: A1, A2, etc.
With such values in the field, it is possible to search using the query id:A*
But would love to find whether a true solution exists.