Couchbase & nodejs: View query with range, order and limited results - node.js

I am new to couchbase and I'm trying to understand how filtering, ordering and limiting results in a view work together.
Couchbase version: 3.0.1
I'm using nodejs as the SDK.
I have a map function like this
function (doc, meta) {
if (doc.type !== 'item' || !doc.category) {
return;
}
emit([doc.orderId, doc.category.id, doc.number], null);
}
And an item document that looks like this
{
"id": 1,
"type": "item",
"number": 1203,
"orderId": 2,
"category": {
"id": 10,
"title": "Carpet"
}
}
I would like to filter only items with orderId = 2 and category.id = 10, all this ordered by number descending. Because I have a paginator, I would like to display 20 items per page. I have thousands of items in the database.
With the query below, I have an error because of the order call. If I comment it, I find the results, filtered, limited and ordered by default by number ascending.
var order_id = 2,
category_id = 10,
limit = 20,
skip = 0,
range = [order_id, category_id],
// suppose we have a valid couchbase connexion and a viewQuery object
query = viewQuery.from('items', 'myView')
.limit(limit)
.skip(skip)
.order(2) // 2 = DESC. This line doesn't work
.include_docs(true)
.range(range, range.concat([{}]), true);
bucket.query(query, function (err, docs) {
console.log(err);
console.log(docs);
});
The error says:
Error: query_parse_error: No rows can match your key range, reverse your start_key and end_key or set descending=false
Note that if I order ASC, the error occurs too. I have to remove the call to the .order() function to have my view behave properly.
Does anyone knows why?
Thanks

When you order your query in descending order, you have to swap the order of the start and end keys as well (the parameters to the range method.)

Related

DynamoDB client doesn't fulfil the limit

Client lib : "#aws-sdk/client-dynamodb": "3.188.0"
I've a DynamoDB pagination implementation.
My user count is 98 & page size is 20. Therefore I'm expecting 5 pages & each having 20,20,20,20 & 18 users in the result.
But actually I'm getting more than 5 pages and each page having variable number of users like 10, 12, 11 ..etc.
How can I get users with proper page limit like 20, 20, 20, 20 & 18?
public async pagedList(usersPerPage: number, lastEvaluatedKey?: string): Promise<PagedUser> {
const params = {
TableName: tableName,
Limit: usersPerPage,
FilterExpression: '#type = :type',
ExpressionAttributeValues: {
':type': { S: type },
},
ExpressionAttributeNames: {
'#type': 'type',
},
} as ScanCommandInput;
if (lastEvaluatedKey) {
params.ExclusiveStartKey = { 'oid': { S: lastEvaluatedKey } };
}
const command = new ScanCommand(params);
const data = await client.send(command);
const users: User[] = [];
if (data.Items !== undefined) {
data.Items.forEach((item) => {
if (item !== undefined) {
users.push(this.makeUser(item));
}
});
}
let lastKey;
if (data.LastEvaluatedKey !== undefined) {
lastKey = data.LastEvaluatedKey.oid.S?.valueOf();
}
return {
users: users,
lastEvaluatedKey: lastKey
};
}
The scan command documentation https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Scan.html#Scan.Pagination
Provides few reasons why your result may contain less results:
The result must fit in 1 MB
If a filter is applied, the data is filtered "after scan". You have a filter in your query.
From the docs
A filter expression is applied after a Scan finishes but before the
results are returned. Therefore, a Scan consumes the same amount of
read capacity, regardless of whether a filter expression is present.
...
Now suppose that you add a filter expression to the Scan. In this case, DynamoDB applies the filter expression to the six items that were returned, discarding those that do not match. The final Scan result contains six items or fewer, depending on the number of items that were filtered.
In the next section it is explained how you can verify that it is possibly your case:
Counting the items in the results
In addition to the items that match your criteria, the Scan response contains the following
elements:
ScannedCount — The number of items evaluated, before any ScanFilter is
applied. A high ScannedCount value with few, or no, Count results
indicates an inefficient Scan operation. If you did not use a filter
in the request, ScannedCount is the same as Count.

CouchDB Count Reduce with timestamp filtering

Let's say I have documents like so:
{
_id: "a98798978s978dd98d",
type: "signature",
uid: "u12345",
category: "cat_1",
timestamp: UNIX_TIMESTAMP
}
My goal is to be able to count all signature's created by a certain uid but being able to filter by timestamp
Thanks to Alexis, I've gotten to this far with a reduce _count function:
function (doc) {
if (doc.type === "signature") {
emit([doc.uid, doc.timestamp], 1);
}
}
With the following queries:
start_key=[null,lowerTimestamp]
end_key=[{},higherTimestamp]
reduce=true
group_level=1
Response:
{
"rows": [
{
"key": [ "u11111" ],
"value": 3
},
{
"key": [ "u12345" ],
"value": 26
}
]
}
It counts the uid correctly but the filter doesn't work properly. At first I thought it might be a CouchDB 2.2 bug, but I tried on Cloudant and I got the same response.
Does anyone have any ideas on how I could get this to work with being ale to filter timestamps?
When using compound keys in MapReduce (i.e. the key is an array of things), you cannot query a range of keys with a "leading" array element missing. i.e. you can query a range of uuids and get the results ordered by timestamp, but your use-case is the other way round - you want to query uuids by time.
I'd be tempted to put time first in the array, but unix timestamps are not so good for grouping ;). I don't known the ins and outs of your application but if you were to index a date instead of a timestamp like so:
function (doc) {
if (doc.type === "signature") {
var date = new Date(doc.timestamp)
var datestr = date.toISOString().split('T')[0]
emit([datestr, doc.uuid], 1);
}
}
This would allow you to query a range of dates (to the resolution of a whole day):
?startkey=["2018-01-01"]&endkey=["2018-02-01"]&group_level=2
albeit with your uuids grouped by day.

Mongoose query returning repeated results

The query receives a pair of coordinates, a maximum Distance radius, a "skip" integer and a "limit" integer. The function should return the closest and newest locations according to the position given. There is no visible error in my code, however, when I call the query again, it returns repeated results. "skip" variable is updated according to the results returned.
Example:
1) I make query with skip = 0, limit = 10. I receive 10 non-repeated locations.
2) Query is called again now, skip = 10, limit = 10. I receive another 10 locations with repeated results from the first query.
QUERY
Locations.find({ coordinates :
{ $near : [ x , y ],
$maxDistance: maxDistance }
})
.sort('date_created')
.skip(skip)
.limit(limit)
.exec(function(err, locations) {
console.log("[+]Found Locations");
callback(locations);
});
SCHEMA
var locationSchema = new Schema({
date_created: { type: Date },
coordinates: [],
text: { type: String }
});
I have tried looking everywhere for a solution. My only option would be versions of Mongo? I use mongoose 4.x.x and mongodb is like 2.5.6. I believe. Any ideas?
There are a couple of things to consider here in the sort of results that you want, with the first consideration being that you have a "secondary" sort criteria in the "date_created" to deal with.
The basic problem there is that the $near operator and like operators in MongoDB do not at present "project" any field to indicate the "distance" from the queried location, and simply just "default sort" the data. So in order to do that "secondary" sort, a field with the "distance" needs to be present. There are therefore other options for this.
The second case is that "skip" and "limit" style paging is horrible form performance on large sets of data and should be avoided where you can. So it's better to select data based on a "range" where it occurs rather than "skip" through all the results you have previously displayed.
The first thing to do here is use a command that can "project" the distance into the document along with the other information. The aggregation command of $geoNear is good for this, and especially since we want to do other sorting:
var seenIds = [],
lastDistance = null,
lastDate = null;
Locations.aggregate(
[
{ "$geoNear": {
"near": [x,y],
"maxDistance": maxDistance
"distanceField": "dist",
"limit": 10
}},
{ "$sort": { "dist": 1, "date_created": -1 }
],
function(err,results) {
results.forEach(function(result) {
if ( ( result.dist != lastDistance ) || ( result.date_created != lastDate ) ) {
seenIds = [];
lastDistance = result.dist;
lastDate = result.date_created;
}
seenIds.push(result._id);
});
// save those variables to session or other persistence
// do something with results
}
)
That is the first iteration of your results where you fetch the first 10. Noting the logic inside the loop, where each document in the results is inspected for either a change in the "date_created" or the projected "dist" field now present in the document and where this occurs the "seenIds" array is wiped of all current entries. The general action is that all the variables are tested and possibly updated on each iteration and where there is no change then items are added to the list of "seenIds".
All those three variables being worked on need to be stored somewhere awaiting the next request. For web applications the session store is ideal, but different approaches vary. You just want those values to be recalled when we start the next request, as on the next and subsequent iterations we alter the query a bit:
Locations.aggregate(
[
{ "$geoNear": {
"near": [x,y],
"maxDistance": maxDistance,
"minDistance": lastDistance,
"distanceField": "dist",
"limit": 10,
"query": {
"_id": { "$nin": seenIds },
"date_created": { "$lt": lastDate }
}
}},
{ "$sort": { "dist": 1, "date_created": -1 }
],
function(err,results) {
results.forEach(function(result) {
if ( ( result.dist != lastDistance ) || ( result.date_created != lastDate ) ) {
seenIds = [];
lastDistance = result.dist;
lastDate = result.date_created;
}
seenIds.push(result._id);
});
// save those variables to session or other persistence
// do something with results
}
)
So there the "minDistance" parameter is entered as you want to exclude any of the "nearer" results that have already been seen, and the additional checks are placed in the query with the "date_created" needing to be "less than" the "lastDistance" recorded as well since we are in descending order of sort, with the final "sure" filter in excluding any "_id" values that were recorded within the list because the values had not changed.
Now with geospatial data that "seenIds" list is not likely to grow as generally you are not going to find things all at the same distance, but it is a general process of paging a sorted list of data like this, so it is worth understanding the concept.
So if you want to be able to use a secondary field to sort on with geospatial data and also considering the "near" distance then this is the general approach, by projecting a distance value into the document results as well as storing the last seen values before any changes that would not make them unique.
The general concept is "advancing the minimum distance" to enable each page of results to get gradually "further away" from the source point of origin used in the query.

couchdb map-reduce and grouping

I am attempting to get a count of unique events for an object (lets say a video):
Here are my documents:
{
"type":"View",
"video_id": "12300",
"user_id": 3
}
{
"type":"View",
"video_id": "12300",
"user_id": 1
}
{
"type":"View",
"video_id": "45600",
"user_id": 3
}
I'm trying to get a unique (by user_id) count of views for each video
I assume I want to map my data like so:
function(doc) {
if (doc.type === 'View') {
emit([doc.video_id, doc.user_id], 1);
}
},
But I don't understand how to reduce it down to unique users per video, or am I going about this wrong.
You should look at the group_level view parameter. It will allow you to change what field(s) the grouping occurs on.
By using group_level = 1, in this case it will group by video_id. Using group_level = 2, it will group on both video_id and user_id.
Add ?group=true after the request URL. That groups identical keys together as input for the reduce function:
function(keys, values, rereduce){
return sum(values);
}
That should do it.
Note that keys and values are unzipped lists of keys and their values. With grouping on the keys are all identical for each call of the reduce.

CouchDB, MapReduce: query a time slice

For a monitoring an application with CouchDB I need to sum up a field of my data (for example the time needed to execute a method that has been logged).
That's no problem for me with map-reduce, but I need to sum up only the data recorded in a special time slice.
Example records:
{_id: 1, methodID:1, recorded: 100, timeneeded: 10},
{_id: 2, methodID:1, recorded: 200, timeneeded: 11},
{_id: 3, methodID:2, recorded: 200, timeneeded: 2},
{_id: 4, methodID:1, recorded: 300, timeneeded: 6},
{_id: 5, methodID:2, recorded: 310, timeneeded: 3},
{_id: 6, methodID:1, recorded: 400, timeneeded: 9}
Now I would like to get just the sum of timeneeded of all records that have been recorded in the range of 200 to 350 and grouped by methodID. (That would be 17 for methodID:1 and 5 for methodID:2.)
How can I do that?
I now tried it with a list function that's using WickedGrey's idea. See my functions here:
map function:
function(doc) {
emit([ doc.recorded], {methodID:doc.methodID, timeneeded:doc.timeneeded});
}
list function:
"function(head, req) {
var combined_values = {};
var row;
while (row = getRow()) {
if( row.values.methodID in combined_values) {
combined_values[ row.values.methodID] +=row.values.timeneeded;
}
else {
combined_values[ row.values.methodID] = row.values.timeneeded;
}
}
for(var methodID in combined_values){
send( toJSON({method: methodID, timeneeded:combined_values[methodID]}) );
}
}"
Now I have to problems:
1. I always get the results as a file and my firefox asks me if I want to download it, instead of viewing it in the browser like when I query a classic view.
2. As I understand the thing, the results are now calculated on the fly, in the list function. I expect this to be not really fast with hundrets of millions of records... Any ideas how to get it faster?
Thank you for your help!
andy
You can't use a map key to filter by one set of criteria, but group by another in CouchDB. However, you can filter the keys by time range, and group with a reduce function. Try something like this:
function map(doc) {
emit(doc.recorded, {doc.methodID: doc.timeneeded});
}
function reduce(key, values, rereduce) {
var combined_values = {};
for (var i in values) {
var totals = values[i];
for (var methodID in totals) {
if (methodID in combined_values) {
combined_values[methodID] += totals[methodID];
}
else {
combined_values[methodID] = totals[methodID];
}
}
}
return combined_values;
}
That should allow you to specify a start/end key, and with group_level=0 should get you a value containing the dictionary that you're looking for.
Edit: Also, this thread might be of interest:
http://couchdb-development.1959287.n2.nabble.com/reduce-limit-error-td2789734.html
It discusses an option to turn off the reduce must shrink message, and further down the list provides other ways of achieving the same goal: using a list function. That might be a better approach that what I've outlined here. :(
function map(doc) {
if(doc.methodID && doc.recorded && doc.timeneeded) {
emit([doc.methodID,doc.recorded], doc.timeneeded);
}
}
//reduce
_sum

Resources