Sorting CouchDB result by unix timestamp and paginate - couchdb

I am struggeling to get pagination working for a few days now. I have a database with docs and have a few to have the timestamp as key to sort descending. But I can't seem to get the next set of docs ...
If I run it, I get the top 5 docs. When I try to use startkey or startkey_docid I only seem to get the same lines again.
Trying the couch documentation I am not sure what I need to make it work.
couchdb has a design like:
{
"_id": "_design/filters",
"views": {
"blog": {
"map": "function (doc) { if (doc.published && doc.type == 'post') emit(doc.header.date); }"
}
}
}
... header.date is generated with +new Date()
on the nodejs side, with github/nano, I use something similar to:
import nano from 'nano';
let db = nano(SERVICE_URL).use('blog_main');
let lastDocId = ctx.query.lastDocId;
let lastSkip = ctx.query.lastSkip ? +ctx.query.lastSkip + 5 : null;
let query = {
limit: 1 + 4, // limit to 5
descending: true, // reverse order: newest to top
include_docs: true,
}
if (lastDocId) { // initally off
query.startkey = lastDocId;
}
if (lastSkip) { // other method for tests
query.skip = lastSkip; // ----> this results in some previous and some new items
}
let itemRows = await db.view('filters','blog', query);
let items = itemRows.rows;
// each doc is in items[].doc
I have seen sort by value, sorting works for me - but I cant seem to get pagination to work.

I'm uncertain regarding the statement "I get the same lines again". That is reproducible if startkey is the first rather than the last key of the prior result - and that would be the first problem.
Regardless, assuming startkey is correct the parameters skip and startkey are conflicting. Initially skip should be 0 and afterwards it should be 1 in order to skip over startkey in successive queries.
This technique is clearly outlined in the CouchDB pagination documentation1.
Details
Assume the complete view (where key is a unix timestamp) is
{
"total_rows":7,
"offset":0,
"rows":[
{"id":"821985c5140ca583e108653fb6091ac8","key":1580050872331,"value":null},
{"id":"821985c5140ca583e108653fb6092c3b","key":1580050872332,"value":null},
{"id":"821985c5140ca583e108653fb6093f47","key":1580050872333,"value":null},
{"id":"821985c5140ca583e108653fb6094309","key":1580050872334,"value":null},
{"id":"821985c5140ca583e108653fb6094463","key":1580050872335,"value":null},
{"id":"821985c5140ca583e108653fb60945f4","key":1580050872336,"value":null},
{"id":"821985c5140ca583e108653fb60949f3","key":1580050872339,"value":null}
]
}
Given the initial query conditions
{
limit: 5,
descending: true,
include_docs: false // for brevity
}
indeed produces the expected result, 5 rows with the most recent first
{
"total_rows":7,
"offset":0,
"rows":[
{"id":"821985c5140ca583e108653fb60949f3","key":1580050872339,"value":null},
{"id":"821985c5140ca583e108653fb60945f4","key":1580050872336,"value":null},
{"id":"821985c5140ca583e108653fb6094463","key":1580050872335,"value":null},
{"id":"821985c5140ca583e108653fb6094309","key":1580050872334,"value":null},
{"id":"821985c5140ca583e108653fb6093f47","key":1580050872333,"value":null}
]
}
Now assuming the second query is so
{
limit: 5,
descending: true,
include_docs: false, // for brevity
startkey: 1580050872333,
skip: 5
}
startkey (the key of the last row of the prior result) is correct but the skip parameter is literally skipping past the next (logical) set of rows. Specifically with those parameters and the example view above, the query would blow past the remaining keys resulting in an empty row set.
This is what is desired:
{
limit: 5,
descending: true,
include_docs: false, // for brevity
startkey: 1580050872333,
skip: 1 // just skip the last doc (startkey)
}
1 CouchDB Pagination Recipes 3.2.5.5. Paging (Alternate Method)

Using startkey or skip, returned results that included some of the skipped results as well, or all previous ones (strangely mixed up).
I solved it, by extending the result keys with - second part.
Since the key was based on a date without time, it seemed to have rearranged the entries on each request due to similar date-timestamps. Adding a second part that was was sortable as well (used the created timestamp as second part) fixed it.
The key is now [datetimestamp, createdtimestamp] .. both can be sorted descending.

Related

Nodejs Elasticsearch query default behaviour

On a daily basis, I'm pushing data (time_series) to Elasticsearch. I created an index pattern, and my index have the name: myindex_* , where * is today date (an index pattern has been setup). Thus after a week, I have: myindex_2022-06-20, myindex_2022-06-21... myindex_2022-06-27.
Let's assume my index is indexing products' prices. Thus inside each myindex_*, I have got:
myindex_2022-06-26 is including many products prices like this:
{
"reference_code": "123456789",
"price": 10.00
},
...
myindex_2022-06-27:
{
"reference_code": "123456789",
"price": 12.00
},
I'm using this query to get the reference code and the corresponding prices. And it works great.
const data = await elasticClient.search({
index: myindex_2022-06-27,
body: {
query: {
match: {
"reference_code": "123456789"
}
}
}
});
But, I would like to have a query that if in the index of the date 2022-06-27, there is no data, then it checks, in the previous index 2022-06-26, and so on (until e.g. 10x).
Not sure, but it seems it's doing this when I replace myindex_2022-06-27 by myindex_* (not sure it's the default behaviour).
The issue is that when I'm using this way, I got prices from other index but it seems to use the oldest one. I would like to get the newest one instead, thus the opposite way.
How should I proceed?
If you query with index wildcard, it should return a list of documents, where every document will include some meta fields as _index and _id.
You can sort by _index, to make elastic search return the latest document at position [0] in your list.
const data = await elasticClient.search({
index: myindex_2022-*,
body: {
query: {
match: {
"reference_code": "123456789"
}
}
sort : { "_index" : "desc" },
}
});

How do I query a complex key in pouchDB?

If I was to make a get request, I'd do something like:
https://myserver.com/sometestdb/_design/sortJob/_view/index?limit=100&reduce=false&startkey=["job_price"]&endkey=["job_price", {}]
For a map query like:
function(doc) {
if (doc.data.type === "job") {
emit(["job_ref", doc.data.ref], null);
emit(["job_price", doc.data.price], null);
}
}
How would I replicate the query using pouchDb query? I've tried a few things around the start and end keys but no luck:
{
include_docs: true,
startkey: 'job_price',
endkey: 'job_price,{}'
}
{
include_docs: true,
startkey: 'job_price',
endkey: 'job_price\uffff'
}
Both of these return 0 results whereas the link I use produces the expected results.
Note: I can confirm the data is present in my pouchDB as I've queried it using the pouch-find plugin but am trying various techniques to see which is faster.
EDIT: According to the complex keys section in the docs, I should be able to do the following:
{
include_docs: true,
startkey: '[\'job_price\']',
endkey: '[\'job_price\',{}]'
}
But that results in:
No rows can match your key range, reverse your start_key and end_key
or set {descending : true}
But I should be able to get results like this and don't want descending: true.
Ok, so it was my reading of the documentation that was off.
When building the start / end key, you need to pass the array, not pass the array as a string (which I thought pouchDB then eval'd.
This is the working query:
{
include_docs: true,
startkey: ['job_price'],
endkey: ['job_price', {}]
}
Posting this answer rather than deleting the question as it might help someone else.

Couchbase & nodejs: View query with range, order and limited results

I am new to couchbase and I'm trying to understand how filtering, ordering and limiting results in a view work together.
Couchbase version: 3.0.1
I'm using nodejs as the SDK.
I have a map function like this
function (doc, meta) {
if (doc.type !== 'item' || !doc.category) {
return;
}
emit([doc.orderId, doc.category.id, doc.number], null);
}
And an item document that looks like this
{
"id": 1,
"type": "item",
"number": 1203,
"orderId": 2,
"category": {
"id": 10,
"title": "Carpet"
}
}
I would like to filter only items with orderId = 2 and category.id = 10, all this ordered by number descending. Because I have a paginator, I would like to display 20 items per page. I have thousands of items in the database.
With the query below, I have an error because of the order call. If I comment it, I find the results, filtered, limited and ordered by default by number ascending.
var order_id = 2,
category_id = 10,
limit = 20,
skip = 0,
range = [order_id, category_id],
// suppose we have a valid couchbase connexion and a viewQuery object
query = viewQuery.from('items', 'myView')
.limit(limit)
.skip(skip)
.order(2) // 2 = DESC. This line doesn't work
.include_docs(true)
.range(range, range.concat([{}]), true);
bucket.query(query, function (err, docs) {
console.log(err);
console.log(docs);
});
The error says:
Error: query_parse_error: No rows can match your key range, reverse your start_key and end_key or set descending=false
Note that if I order ASC, the error occurs too. I have to remove the call to the .order() function to have my view behave properly.
Does anyone knows why?
Thanks
When you order your query in descending order, you have to swap the order of the start and end keys as well (the parameters to the range method.)

Mongoose query returning repeated results

The query receives a pair of coordinates, a maximum Distance radius, a "skip" integer and a "limit" integer. The function should return the closest and newest locations according to the position given. There is no visible error in my code, however, when I call the query again, it returns repeated results. "skip" variable is updated according to the results returned.
Example:
1) I make query with skip = 0, limit = 10. I receive 10 non-repeated locations.
2) Query is called again now, skip = 10, limit = 10. I receive another 10 locations with repeated results from the first query.
QUERY
Locations.find({ coordinates :
{ $near : [ x , y ],
$maxDistance: maxDistance }
})
.sort('date_created')
.skip(skip)
.limit(limit)
.exec(function(err, locations) {
console.log("[+]Found Locations");
callback(locations);
});
SCHEMA
var locationSchema = new Schema({
date_created: { type: Date },
coordinates: [],
text: { type: String }
});
I have tried looking everywhere for a solution. My only option would be versions of Mongo? I use mongoose 4.x.x and mongodb is like 2.5.6. I believe. Any ideas?
There are a couple of things to consider here in the sort of results that you want, with the first consideration being that you have a "secondary" sort criteria in the "date_created" to deal with.
The basic problem there is that the $near operator and like operators in MongoDB do not at present "project" any field to indicate the "distance" from the queried location, and simply just "default sort" the data. So in order to do that "secondary" sort, a field with the "distance" needs to be present. There are therefore other options for this.
The second case is that "skip" and "limit" style paging is horrible form performance on large sets of data and should be avoided where you can. So it's better to select data based on a "range" where it occurs rather than "skip" through all the results you have previously displayed.
The first thing to do here is use a command that can "project" the distance into the document along with the other information. The aggregation command of $geoNear is good for this, and especially since we want to do other sorting:
var seenIds = [],
lastDistance = null,
lastDate = null;
Locations.aggregate(
[
{ "$geoNear": {
"near": [x,y],
"maxDistance": maxDistance
"distanceField": "dist",
"limit": 10
}},
{ "$sort": { "dist": 1, "date_created": -1 }
],
function(err,results) {
results.forEach(function(result) {
if ( ( result.dist != lastDistance ) || ( result.date_created != lastDate ) ) {
seenIds = [];
lastDistance = result.dist;
lastDate = result.date_created;
}
seenIds.push(result._id);
});
// save those variables to session or other persistence
// do something with results
}
)
That is the first iteration of your results where you fetch the first 10. Noting the logic inside the loop, where each document in the results is inspected for either a change in the "date_created" or the projected "dist" field now present in the document and where this occurs the "seenIds" array is wiped of all current entries. The general action is that all the variables are tested and possibly updated on each iteration and where there is no change then items are added to the list of "seenIds".
All those three variables being worked on need to be stored somewhere awaiting the next request. For web applications the session store is ideal, but different approaches vary. You just want those values to be recalled when we start the next request, as on the next and subsequent iterations we alter the query a bit:
Locations.aggregate(
[
{ "$geoNear": {
"near": [x,y],
"maxDistance": maxDistance,
"minDistance": lastDistance,
"distanceField": "dist",
"limit": 10,
"query": {
"_id": { "$nin": seenIds },
"date_created": { "$lt": lastDate }
}
}},
{ "$sort": { "dist": 1, "date_created": -1 }
],
function(err,results) {
results.forEach(function(result) {
if ( ( result.dist != lastDistance ) || ( result.date_created != lastDate ) ) {
seenIds = [];
lastDistance = result.dist;
lastDate = result.date_created;
}
seenIds.push(result._id);
});
// save those variables to session or other persistence
// do something with results
}
)
So there the "minDistance" parameter is entered as you want to exclude any of the "nearer" results that have already been seen, and the additional checks are placed in the query with the "date_created" needing to be "less than" the "lastDistance" recorded as well since we are in descending order of sort, with the final "sure" filter in excluding any "_id" values that were recorded within the list because the values had not changed.
Now with geospatial data that "seenIds" list is not likely to grow as generally you are not going to find things all at the same distance, but it is a general process of paging a sorted list of data like this, so it is worth understanding the concept.
So if you want to be able to use a secondary field to sort on with geospatial data and also considering the "near" distance then this is the general approach, by projecting a distance value into the document results as well as storing the last seen values before any changes that would not make them unique.
The general concept is "advancing the minimum distance" to enable each page of results to get gradually "further away" from the source point of origin used in the query.

MongoDB update/insert document and Increment the matched array element

I use Node.js and MongoDB with monk.js and i want to do the logging in a minimal way with one document per hour like:
final doc:
{ time: YYYY-MM-DD-HH, log: [ {action: action1, count: 1 }, {action: action2, count: 27 }, {action: action3, count: 5 } ] }
the complete document should be created by incrementing one value.
e.g someone visits a webpage first this hour and the incrementation of action1 should create the following document with a query:
{ time: YYYY-MM-DD-HH, log: [ {action: action1, count: 1} ] }
an other user in this hour visits an other webpage and document should be exteded to:
{ time: YYYY-MM-DD-HH, log: [ {action: action1, count: 1}, {action: action2, count: 1} ] }
and the values in count should be incremented on visiting the different webpages.
At the moment i create vor each action a doc:
tracking.update({
time: moment().format('YYYY-MM-DD_HH'),
action: action,
info: info
}, { $inc: {count: 1} }, { upsert: true }, function (err){}
Is this possible with monk.js / mongodb?
EDIT:
Thank you. Your solution looks clean and elegant, but it looks like my server can't handle it, or i am to nooby to make it work.
i wrote a extremly dirty solution with the action-name as key:
tracking.update({ time: time, ts: ts}, JSON.parse('{ "$inc":
{"'+action+'": 1}}') , { upsert: true }, function (err) {});
Yes it is very possible and a well considered question. The only variation I would make on the approach is to rather calculate the "time" value as a real Date object ( Quite useful in MongoDB, and manipulative as well ) but simply "round" the values with basic date math. You could use "moment.js" for the same result, but I find the math simple.
The other main consideration here is that mixing array "push" actions with possible "updsert" document actions can be a real problem, so it is best to handle this with "multiple" update statements, where only the condition you want is going to change anything.
The best way to do that, is with MongoDB Bulk Operations.
Consider that your data comes in something like this:
{ "timestamp": 1439381722531, "action": "action1" }
Where the "timestamp" is an epoch timestamp value acurate to the millisecond. So the handling of this looks like:
// Just adding for the listing, assuming already defined otherwise
var payload = { "timestamp": 1439381722531, "action": "action1" };
// Round to hour
var hour = new Date(
payload.timestamp - ( payload.timestamp % ( 1000 * 60 * 60 ) )
);
// Init transaction
var bulk = db.collection.initializeOrderedBulkOp();
// Try to increment where array element exists in document
bulk.find({
"time": hour,
"log.action": payload.action
}).updateOne({
"$inc": { "log.$.count": 1 }
});
// Try to upsert where document does not exist
bulk.find({ "time": hour }).upsert().updateOne({
"$setOnInsert": {
"log": [{ "action": payload.action, "count": 1 }]
}
});
// Try to "push" where array element does not exist in matched document
bulk.find({
"time": hour,
"log.action": { "$ne": payload.action }
}).updateOne({
"$push": { "log": { "action": payload.action, "count": 1 } }
});
bulk.execute();
So if you look through the logic there, then you will see that it is only ever possible for "one" of those statements to be true for any given state of the document either existing or not. Technically speaking, the statment with the "upsert" can actually match a document when it exists, however the $setOnInsert operation used makes sure that no changes are made, unless the action actually "inserts" a new document.
Since all operations are fired in "Bulk", then the only time the server is contacted is on the .execute() call. So there is only "one" request to the server and only "one" response, despite the multiple operations. It is actually "one" request.
In this way the conditions are all met:
Create a new document for the current period where one does not exist and insert initial data to the array.
Add a new item to the array where the current "action" classification does not exist and add an initial count.
Increment the count property of the specified action within the array upon execution of the statement.
All in all, yes posssible, and also a great idea for storage as long as the action classifications do not grow too large within a period ( 500 array elements should be used as a maximum guide ) and the updating is very efficient and self contained within a single document for each time sample.
The structure is also nice and well suited to other query and possible addtional aggregation purposes as well.

Resources