Does it always return the document list in the same order when using the query method in CouchDB - couchdb

I use CouchDB to store records in JSON manner.
My question is does it always keep the same order when returning the documents list?

By default, any view endpoint will return the documents stored in a database in ascending key order. This includes the _all_docs endpoint, meaning that your documents are sorted by _id by default for that endpoint.
If you do not require the documents to be sorted, you can set the sorted parameter to false to improve performance, but in that case no order is guaranteed.

Related

Firebase Firestore query by subcolletion length

I am trying to query my Firestore collection (in Node.js for my flutter app), and to get the 10 documents which has the most objects in their subcolllection called Purchases (to get the best sellers).
Is it possible in Firestore? Or do I have to keep an int field outside of my subcollection to represent its length?
Thank you!
I thought this was answered recently, but can't find it right now, so...
Firestore queries (and other read operations) work on a single collection, or a group of collections with the same name. They don't consider any data in other (nested or otherwise) collections, nor can they query based on aggregates (such as the number of documents), unless those aggregates are stored in a document in the queried collection.
So the solution is indeed to keep a counter in a document in the collection you are querying against, and updating that counter with every add/delete to the subcollection.

When retrieving documents from a cloudant database via the web interface, what order do they arrive?

I want to retrieve a large collection of documents from Cloudant via the Web interface, with no sort directive, in 200-document pages (using bookmarks). If I retrieve all of the pages, then retrieve them all again, will the documents in the two collections be in the same order?
for views, native UTF-8 sorting is applied to the key defined in the view function.
for search indexes, the default sort order is by relevance with the highest scoring matches first.
without making any changes that would impact the default sort order, subsequent retrievals should be in the same order.

Get the last documents?

CouchDB has a special _all_docs view, which returns documents sorted on ID. But as ID's are random by default, the sorting makes no sense.
I always need to sort by 'date added'. Now I have two options:
Generating my own ID's and make sure they start with a timestamp
Use standard GUID's, but add a timestamp in json, and sort on
that
Now the second solution is less hackish, but I suspect the first solution to be much more efficient and faster, because all queries will be done on the real row id, which is indexed.
Is it true that both solutions differ in performance? And if it's true, which one is likely to be faster or preferred?
Is it true that both solutions differ in performance?
Your examples given describing the primary and secondary index approach in CouchDB.
_all_docs is the only primary index and is always up-to-date. Secondary indexes (views) as in your second solution getting updated when they are requested.
Thats the reason why from the requesters point-of-view _all_docs might be "faster". In real there isn't a difference in requesting already up-to-date indexes. Two workarounds for potentially outdated views (secondary indexes) are the use of the query param stale=ok (update the view after the response to the request) or so called "view-heaters" (send a simple HTTP Get to the view to trigger the update process).
And if it's true, which one is [...] prefered?
The capabilities to build an useful index and response payload are significant higher on the side of secondary indexes.
When you want to use the primary index you have to "design" your id as you have described. You can imagine that is a huge pre-decision of what can also be done with the doc and the ids.
My recommendation would be to use secondary indexes (views). Only if you need data stored in real-time or high-concurrency scenarios you should include the primary index in the search for the best fit to request data.

Range-based, chronological pagination queries across multiple collections with MongoDB?

Is there an efficient way to do a range-based query across multiple collections, sorted by an index on timestamps? I basically need to pull in the latest 30 documents from 3 collections and the obvious way would be to query each of the collections for the latest 30 docs and then filter and merge the result. However that's somewhat inefficient.
Even if I were to select only for the timestamp field in the query then do a second batch of queries for the latest 30 docs, I'm not sure that be a better approach. That would be 90 documents (whole or single field) per pagination request.
Essentially the client can be subscribed to articles and each category of article differs by 0 - 2 fields. I just picked 3 since that is the average number of articles that users are subscribed to so far in the beta. Because of the possible field differences, I didn't think it would be very consistent to put all of the articles of different types in a single collection.
MongoDB operations operate on one and only one collection at a time. Thus you need to structure your schema with collections that match your query needs.
Option A: Get Ids from supporting collection, load full docs, sort in memory
So you need to either have a collection that combines the ids, main collection names, and timestamps of the 3 collections into a single collection, and query that to get your 30 ID/collection pairs, and then load the corresponding full documents with 3 additional queries (1 to each main collection), and of course remember those won't come back in correct combined order, so you need to sort that page of results manually in memory before returning it to your client.
{
_id: ObjectId,
updated: Date,
type: String
}
This way allows mongo to do the pagination for you.
Option B: 3 Queries, Union, Sort, Limit
Or as you said load 30 documents from each collection, sort the union set in memory, drop the extra 60, and return the combined result. This avoids the extra collection overhead and synchronization maintenance.
So I would think your current approach (Option B as I call it) is the lesser of those 2 not-so-great options.
If your query is really to get the most recent articles based on a selection of categories, then I'd suggest you:
A) Store all of the documents in a single collection so they can utilize a a single query for fetching a combine paged result. Unless you have a very consistent date range across collections, you'll need to track date ranges and counts so that you can reasonably fetch a set of documents that can be merged. 30 from one collection may be older than all from another. You can add an index for timestamp and category and then limit the results.
B) Cache everything aggressively so that you rarely need to do the merges
You could use the same idea I explained here, although this post is related to MongoDB text search it applies to any kind of query
MongoDB Index optimization when using text-search in the aggregation framework
The idea is to query all your collections ordering them by date and id, then sort/mix the results in order to return the first page. Subsequent pages are retrieved by using last document's date and id from the previous page.

CouchDB view query with multiple key values

I am currently trying to create a view and query to fit this SQL query:
SELECT * FROM articles
WHERE articles.location="NY" OR articles.location="CA"
ORDER BY articles.release_date DESC
I tried to create a view with a complex key:
function(doc) {
if(doc.type == "Article") {
emit([doc.location, doc.release_date], doc)
}
}
And then using startkey and endkey to retrieve one location and ordering the result on the release date.
.../_view/articles?startkey=["NY", {}]&endkey=["NY"]&limit=5&descending=true
This works fine.
However, how can I send multiple startkeys and endkeys to my view in order to mimic
WHERE articles.location="NY" OR articles.location="CA" ?
My arch nemesis, Dominic, is right.
Furthermore, it is never possible to query by criteria A and then sort by criteria B in CouchDB. In exchange for that inconvenience, CouchDB guarantees scalable, dependable, logarithmic query times. You have a choice.
Store the view output in its own database, and make a new view to sort by criteria B
or, sort the rows afterward, which can be either
Sort client-side, once you receive the rows
Sort server-side, in a _list function. The is great, but remember it's not ultimately scalable. If you have millions of rows, the _list function will probably crash.
The short answer is, you currently cannot use multiple startkey/endkey combinations.
You'll either have to make 2 separate queries, or you could always add on the lucene search engine to get much more robust searching capabilities.
It is possible to use multiple key parameters in a query. See the Couchbase CouchDB documentation on multi-document fetching.

Resources