CouchDb fetch all documents where ID starts with prefix - couchdb

I want to fetch all documents in a CouchDB database where the document ID starts with a given prefix.
Did some searching and found, according to the CouchDB Documentation, the best way to accomplish this is by using a startkey and endkey, where the startkey is the prefix, and the endkey is the prefix with a high-value unicode character appended at the end.
So, as I understand it, a call to "http://server:5984/some_db/_all_docs?startkey=2018&endkey=2018\ufff0&include_docs=true" should fetch all docs from some_db with an ID starting with '2018'.
That url is being encoded by the web browser as follows:
http://server:5984/some_db/_all_docs?startkey=2018&endkey=2018%EF%BF%B0&include_docs=true
And the response I get back is {"error":"bad_request","reason":"invalid UTF-8 JSON"}
So I tried just sticking to pure ASCII and used ~ instead of \ufff0. Same response. Also got the same response using a z.
If I do something like _all_docs?startkey=2018&endkey=2019&include_docs=true&inclusive_end=false everything works fine and I get the expected results. However, I can't guarantee the prefix will always be a number, and I get the impression trying to implement it like that programmatically will cause me issues some where or some how. Any thoughts?
I'm using Dart running in the web browser to make the request, if it makes a difference.
Update
So, I've realized in actuality _all_docs does not support the endkey and startkey parameters. The request I originally thought was working was actually just returning the entire database.
I had assumed _all_docs supports startkey and andkey because I have used PouchDB in the past, which does support those parameters in the allDocs() function.
Still looking for a solution, since this project is not using PouchDB, but at least now I know what the problem is.
Update 2
Previous update was wrong, Although the documentation of _all_docs doesn't have these parameters listed, there is a note which I had missed indicating it also supports the parameters for view, see my answer below.

Okay, I figured it out.
I was wrong in my update, startkey and endkey are supported by _all_docs because it's just a built-in view, so all the parameters for views apply. However, it expects the passed values to be JSON values, not just a bare string as a key. The solution is just to put quotation marks around the keys.
That is, encoded quotation marks, e.g. startkey=%222018%22&endkey=%222018%EF%BF%B0%22

Related

Converting the logic for cts:search to search:search

I have been working with cts:search in my project but somehow it feels the result time are taking a bit longer than expected. Can search:search help? If so, how?
For example I have the query as
let $query := cts:and-query((cts:element-value-query(xs:QName("Applicability"),"Yes")))
and I want to fetch the document URIs. I was using:
cts:search(collection("abc"), $query)
and it returned the URIs, but how can this be extracted using search:search?
Or is there something other than search can help for improving the execution time?
Are you interested in retrieving the documents, or just the URIs?
If you are only looking to retrieve the URIs of the documents that have an element with that value, then use cts:uris() instead of cts:search(). The cts:uris() function runs unfiltered and will only return URIs from the lexicon, instead of retrieving all of the documents, which can be a lot more expensive than cts:search if you don't need the content.
cts:uris("", (), cts:and-query(( collection("abc"), $query)) )
When using cts:search, the first thing that I would try is to add the unfiltered option to your search and see if that helps.
By default cts:search executes filtered:
A filtered search (the default). Filtered searches eliminate any false-positive matches and properly resolve cases where there are multiple candidate matches within the same fragment. Filtered search results fully satisfy the specified cts:query.
So, try executing the same query with the "unfiltered" option:
cts:search(collection("abc"), $query, "unfiltered")
You could also look to create an index on that Applicability element, with either an element-range-index or a field-range-index, and then use the appropriate range-query instead of a value-query.

Difference between operators in a URL

What's the difference between using : and ? in a URL? For example /products/:id and /products?id=1? I am trying to get the values from the URL like this Product.findById (req.params.id) but I was wondering which one is most suitable. I know using : do I have to use req.params and ? req.query but I don't understand the difference between them, are they the same?
in my point of view, it is totally different if you are using RESTFUL API pattern
/products/:id called path parameters
The path parameters determine the resource you’re requesting for. Think of it like an automatic answering machine that asks you to press 1 for service, press 2 for another service, 3 for yet another service and so on.
Path parameters are part of the endpoint itself and are not optional
but query parameters
Technically, query parameters are not part of the REST architecture, and they used to help you completely understand how to read and use API’s Query parameters give you the option to modify your request with key-value pairs.
Having your parameters in the query is conceptually optional to the router, query parameters are more of properties and descriptions of the request itself, like when saying GET /users?sort=asc, in this case, the sort attribute was more of a description to the request and the request could complete the fetch without it, that might not always be the case, but a query parameter still describes its request even if it was mandatory.
On the other hand, URL parameters are part of the request itself, the URL without the parameter doesn't make sense, like GET /users/:userID, in this case, not supplying userID will supply unexpected data (A list of users for example) if it didn't break the router completely. URL parameters play part in defining the request rather than just describing it, and they can't be optional.

sails blueprints query in url not working

I have various GET http calls to my api with the following format:
/api/posts?userId=3
However, it is not filtering posts by its userId column, and just returns all posts, regardless of the posts' userId.
This syntax has worked in past projects I've had also, and is documented here. (The example they give is GET /purchase?amount=99.99).
Questions I've seen do not address the query language via defaults routes in this way, so I'm having trouble finding help. Any guesses on what could be going wrong?
UPDATE:
What does work as expected
req.query is getting set and read by policies (eg, ?userId=3 is found by req.param("userId"))
/api/posts?userId=3&populate=userId populates the userId field, (but still returns all posts for all users)
filtering by primary key (eg ?id=5) filters and returns only one record as expected
When working in the api, or in sails console, (eg Posts.find({userId: 3})) works
What does not work as expected
filtering by foreign keys (eg ?userId=6)
filtering by non-foreign keys (eg ?name=test)
filtering using where (eg &where={"userId":1})
It turns out I was adding a "where" clause to every query in a policy (eg, "where: {"status": "active"}}). Having a "where" in the query string automatically overrides the other params, and so nothing else was getting seen. (To be precise, if you have a where clause, other criteria never get seen)
For some reason, the "where" was also not working to set the search criteria for sails 0.12.13, so I ended up hacking the parseCriteria function actionUtils in sails and using the one they have for v.1 and that worked for me.
Hope that helps anyone in the future

PouchDB get documents by ID with certain string in them

I would like to get all documents that contain a certain string in them, I can't seem to find a solution for it..
for example I have the following doc ids
vw_10
vw_11
bmw_12
vw_13
bmw_14
volvo_15
vw_16
how can I get allDocs with the string vw_ "in" it?
Use batch fetch API:
db.allDocs({startkey: "vm_", endkey: "vm_\ufff0"})
Note: \ufff0 is the highest Unicode character which is used as sentinel to specify ranges for ordered strings.
You can use PouchDB find plugin API which is way more sophisticated than allDocs IMO for querying. With the PouchDB find plugin, there is a regex search operator which will allow you do exactly this.
db.find({selector: {name: {$regext: '/vw_'}}});
It's in BETA at the time of writing but we are about to ship a production app with it. That's how stable it has been so far. See https://github.com/nolanlawson/pouchdb-find for more on Pouch Db Find
You better have a view with the key you want to search. This ensures that the key is indexed. Otherwise, the search might be too slow.

WildcardQuery error in Solr

I use solr to search for documents and when trying to search for documents using this query "id:*", I get this query parser exception telling that it cannot parse the query with * or ? as the first character.
HTTP Status 400 - org.apache.lucene.queryParser.ParseException: Cannot parse 'id:*': '*' or '?' not allowed as first character in WildcardQuery
type Status report
message org.apache.lucene.queryParser.ParseException: Cannot parse 'id:*': '*' or '?' not allowed as first character in WildcardQuery
description The request sent by the client was syntactically incorrect (org.apache.lucene.queryParser.ParseException: Cannot parse 'id:*': '*' or '?' not allowed as first character in WildcardQuery).
Is there any patch for getting this to work with just * ? Or is it very costly to do such a query?
If you want all documents, do a query on *:*
If you want all documents with a certain field (e.g. id) try id:[* TO *]
Lucene doesn't allow you to start WildcardQueries with an asterisk by default, because those are incredibly expensive queries and will be very, very, very slow on large indexes.
If you're using the Lucene QueryParser, call setAllowLeadingWildcard(true) on it to enable it.
If you want all of the documents with a certain field set, you are much better off querying or walking the index programmatically than using QueryParser. You should really only use QueryParser to parse user input.
id:[a* TO z*] id:[0* TO 9*] etc.
I just did this in lukeall on my index and it worked, therefore it should work in Solr which uses the standard query parser. I don't actually use Solr.
In base Lucene there's a fine reason for why you'd never query for every document, it's because to query for a document you must use a new indexReader("DirectoryName") and apply a query to it. Therefore you could totally skip applying a query to it and use the indexReader methods numDocs() to get a count of all the documents, and document(int n) to retrieve any of the documents.
If you are just trying to get all documents, Solr does support the *:* query. It's the only time I know of that Solr will let you begin a query with an *. I'm sure you've probably seen this as the default query in the Solr admin page.
If you are trying to do a more specific query with an * as the first character, like say id:*456 then one of the best ways I've seen is to index that field twice. Once normally (field name: id), and once with all the characters reversed (field name: reverse_id). Then you could essentially do the query id:456 by sending the query reverse_id:654 instead. Hope that makes sense.
You can also search the Solr user group mailing list at http://www.mail-archive.com/solr-user#lucene.apache.org/ where questions like this come up quite often.
The following Solr issue is a request to be able to configure the default lucene query parser.
https://issues.apache.org/jira/browse/SOLR-218
In this issue you can find the following description how to 'patch' Solr. This modification would allow you to start queries with a *.
Jonas Salk: I've basically updated only one Java file: SolrQueryParser.java.
public SolrQueryParser(IndexSchema schema, String defaultField) {
...
setAllowLeadingWildcard(true);
setLowercaseExpandedTerms(true);
...
}
...
public SolrQueryParser(QParser parser, String defaultField, Analyzer analyzer) {
...
setAllowLeadingWildcard(true);
setLowercaseExpandedTerms(true);
...
}
I'm not sure if setLowercaseExpandedTerms is needed...
I'm assuming with id:* you're just trying to match all documents, right?
I've never used solr before, but in my Lucene experience, when ingesting data, we've added a hidden field to every document, then when we need to return every record we do a search for the string constant in that field that's the same for every record.
If you can't add a field like that in your situation, you could use a RegexQuery with a regex that would match anything that could be found in the id field.
Edit: actually answering the question. I've never heard of a patch to get that to work, but I would be surprised if it could even be made to work reasonably well. See this question for a reason why unconstrained PrefixQuery's can cause a problem.
Actually, I have been using a workaround for this. I append a character to the id, eg: A1, A2, etc.
With such values in the field, it is possible to search using the query id:A*
But would love to find whether a true solution exists.

Resources