CouchDB Count Reduce with timestamp filtering - couchdb

Let's say I have documents like so:
{
_id: "a98798978s978dd98d",
type: "signature",
uid: "u12345",
category: "cat_1",
timestamp: UNIX_TIMESTAMP
}
My goal is to be able to count all signature's created by a certain uid but being able to filter by timestamp
Thanks to Alexis, I've gotten to this far with a reduce _count function:
function (doc) {
if (doc.type === "signature") {
emit([doc.uid, doc.timestamp], 1);
}
}
With the following queries:
start_key=[null,lowerTimestamp]
end_key=[{},higherTimestamp]
reduce=true
group_level=1
Response:
{
"rows": [
{
"key": [ "u11111" ],
"value": 3
},
{
"key": [ "u12345" ],
"value": 26
}
]
}
It counts the uid correctly but the filter doesn't work properly. At first I thought it might be a CouchDB 2.2 bug, but I tried on Cloudant and I got the same response.
Does anyone have any ideas on how I could get this to work with being ale to filter timestamps?

When using compound keys in MapReduce (i.e. the key is an array of things), you cannot query a range of keys with a "leading" array element missing. i.e. you can query a range of uuids and get the results ordered by timestamp, but your use-case is the other way round - you want to query uuids by time.
I'd be tempted to put time first in the array, but unix timestamps are not so good for grouping ;). I don't known the ins and outs of your application but if you were to index a date instead of a timestamp like so:
function (doc) {
if (doc.type === "signature") {
var date = new Date(doc.timestamp)
var datestr = date.toISOString().split('T')[0]
emit([datestr, doc.uuid], 1);
}
}
This would allow you to query a range of dates (to the resolution of a whole day):
?startkey=["2018-01-01"]&endkey=["2018-02-01"]&group_level=2
albeit with your uuids grouped by day.

Related

Nodejs Elasticsearch query default behaviour

On a daily basis, I'm pushing data (time_series) to Elasticsearch. I created an index pattern, and my index have the name: myindex_* , where * is today date (an index pattern has been setup). Thus after a week, I have: myindex_2022-06-20, myindex_2022-06-21... myindex_2022-06-27.
Let's assume my index is indexing products' prices. Thus inside each myindex_*, I have got:
myindex_2022-06-26 is including many products prices like this:
{
"reference_code": "123456789",
"price": 10.00
},
...
myindex_2022-06-27:
{
"reference_code": "123456789",
"price": 12.00
},
I'm using this query to get the reference code and the corresponding prices. And it works great.
const data = await elasticClient.search({
index: myindex_2022-06-27,
body: {
query: {
match: {
"reference_code": "123456789"
}
}
}
});
But, I would like to have a query that if in the index of the date 2022-06-27, there is no data, then it checks, in the previous index 2022-06-26, and so on (until e.g. 10x).
Not sure, but it seems it's doing this when I replace myindex_2022-06-27 by myindex_* (not sure it's the default behaviour).
The issue is that when I'm using this way, I got prices from other index but it seems to use the oldest one. I would like to get the newest one instead, thus the opposite way.
How should I proceed?
If you query with index wildcard, it should return a list of documents, where every document will include some meta fields as _index and _id.
You can sort by _index, to make elastic search return the latest document at position [0] in your list.
const data = await elasticClient.search({
index: myindex_2022-*,
body: {
query: {
match: {
"reference_code": "123456789"
}
}
sort : { "_index" : "desc" },
}
});

Node.js/MongoDB - querying dates

I'm having a bit of an issue understanding how to query dates; I think the issue might be with how my data is structured. Here is a sample document on my database.
{
"phone_num": 12553,
"facilities": [
"flat-screen",
"parking"
],
"surroundings": [
"ping-pong",
"pool"
],
"rooms": [
{
"room_name": "Standard Suite",
"capacity": 2,
"bed_num": 1,
"price": 50,
"floor": 1,
"reservations": [
{
"checkIn": {
"$date": "2019-01-10T23:23:50.000Z"
},
"checkOut": {
"$date": "2019-01-20T23:23:50.000Z"
}
}
]
}
]
}
I'm trying to query the dates to see check if a specific room is available at a certain date-range but no matter what I do I can't seem to get a proper result, either my query 404's or returns empty array.
I really tried everything, right now for simplicity I'm just trying to get the query to work with checkIn so I can figure out what I'm doing wrong. I tried 100 variants of the code below but I couldn't get it to work at all.
.find({"rooms": { "reservations": { "checkIn" : {"$gte": { "$date": "2019-01-09T00:00:00.000Z"}}}}})
Am I misunderstanding how the .find method works or is something wrong with how I'm storing my dates? (I keep seeing people mentioning ISODates but not too sure what that is or how to implement).
Thanks in advance.
I think the query you posted is not correct. For example, if you want to query for the rooms with the checkin times in a certain range then the query should be like this -
.find({"rooms.reservations.checkout":{$gte:new Date("2019-01-06T13:11:50+06:00"), $lt:new Date("2019-01-06T14:12:50+06:00")}})
Now you can do the same with the checkout time to get the proper filtering to find the rooms available within a date range.
A word of advice though, the way you've designed your collection is not sustainable in the long run. For example, the date query you're trying to run will give you the correct documents, but not the rooms inside each document that satisfy your date range. You'll have to do it yourself on the server side (assuming you're not using aggregation). This will block your server from handling other pending requests which is not desirable. I suggest you break the collection down and have rooms and reservations in separate collections for easier querying.
Recently I was working on date query. First of all we need to understand how we store date into the mongodb database. Say I have stored data using UTC time format like 2020-07-21T09:45:06.567Z.
and my json structure is
[
{
"dateOut": "2020-07-21T09:45:06.567Z",
"_id": "5f1416378210c50bddd093b9",
"customer": {
"isGold": true,
"_id": "5f0c1e0d1688c60b95360565",
"name": "pavel_1",
"phone": 123456789
},
"movie": {
"_id": "5f0e15412065a90fac22309a",
"title": "hello world",
"dailyRentalRate": 20
}
}
]
and I want to perform a query so that I can get all data only for this( 2020-07-21) date. So how can we perform that?. Now we need to understand the basic.
let result = await Rental.find({
dateOut: {
$lt:''+new Date('2020-07-22').toISOString(),
$gt:''+new Date('2020-07-21').toISOString()
}
})
We need to find 21 date's data so our query will be greater than 21 and less than 22 cause 2020-07-21T00:45:06.567Z , 2020-07-21T01:45:06.567Z .. ... .. this times are greater than 21 but less than 22.
var mydate1 = new Date();
var mydate1 = new Date().getTime();
ObjectId.getTimestamp()
Returns the timestamp portion of the ObjectId() as a Date.
Example
The following example calls the getTimestamp() method on an ObjectId():
ObjectId("507c7f79bcf86cd7994f6c0e").getTimestamp()
This will return the following output:
ISODate("2012-10-15T21:26:17Z")
If your using timestamps data to query.
EG : "createdAt" : "2021-07-12T16:06:34.949Z"
const start = req.params.id; //2021-07-12
const data = await Model.find({
"createdAt": {
'$gte': `${start}T00:00:00.000Z`,
'$lt': `${start}T23:59:59.999Z`
}
});
console.log(data);
it will show the data of particular date .i.,e in this case. "2021-07-12"

Mongoose query returning repeated results

The query receives a pair of coordinates, a maximum Distance radius, a "skip" integer and a "limit" integer. The function should return the closest and newest locations according to the position given. There is no visible error in my code, however, when I call the query again, it returns repeated results. "skip" variable is updated according to the results returned.
Example:
1) I make query with skip = 0, limit = 10. I receive 10 non-repeated locations.
2) Query is called again now, skip = 10, limit = 10. I receive another 10 locations with repeated results from the first query.
QUERY
Locations.find({ coordinates :
{ $near : [ x , y ],
$maxDistance: maxDistance }
})
.sort('date_created')
.skip(skip)
.limit(limit)
.exec(function(err, locations) {
console.log("[+]Found Locations");
callback(locations);
});
SCHEMA
var locationSchema = new Schema({
date_created: { type: Date },
coordinates: [],
text: { type: String }
});
I have tried looking everywhere for a solution. My only option would be versions of Mongo? I use mongoose 4.x.x and mongodb is like 2.5.6. I believe. Any ideas?
There are a couple of things to consider here in the sort of results that you want, with the first consideration being that you have a "secondary" sort criteria in the "date_created" to deal with.
The basic problem there is that the $near operator and like operators in MongoDB do not at present "project" any field to indicate the "distance" from the queried location, and simply just "default sort" the data. So in order to do that "secondary" sort, a field with the "distance" needs to be present. There are therefore other options for this.
The second case is that "skip" and "limit" style paging is horrible form performance on large sets of data and should be avoided where you can. So it's better to select data based on a "range" where it occurs rather than "skip" through all the results you have previously displayed.
The first thing to do here is use a command that can "project" the distance into the document along with the other information. The aggregation command of $geoNear is good for this, and especially since we want to do other sorting:
var seenIds = [],
lastDistance = null,
lastDate = null;
Locations.aggregate(
[
{ "$geoNear": {
"near": [x,y],
"maxDistance": maxDistance
"distanceField": "dist",
"limit": 10
}},
{ "$sort": { "dist": 1, "date_created": -1 }
],
function(err,results) {
results.forEach(function(result) {
if ( ( result.dist != lastDistance ) || ( result.date_created != lastDate ) ) {
seenIds = [];
lastDistance = result.dist;
lastDate = result.date_created;
}
seenIds.push(result._id);
});
// save those variables to session or other persistence
// do something with results
}
)
That is the first iteration of your results where you fetch the first 10. Noting the logic inside the loop, where each document in the results is inspected for either a change in the "date_created" or the projected "dist" field now present in the document and where this occurs the "seenIds" array is wiped of all current entries. The general action is that all the variables are tested and possibly updated on each iteration and where there is no change then items are added to the list of "seenIds".
All those three variables being worked on need to be stored somewhere awaiting the next request. For web applications the session store is ideal, but different approaches vary. You just want those values to be recalled when we start the next request, as on the next and subsequent iterations we alter the query a bit:
Locations.aggregate(
[
{ "$geoNear": {
"near": [x,y],
"maxDistance": maxDistance,
"minDistance": lastDistance,
"distanceField": "dist",
"limit": 10,
"query": {
"_id": { "$nin": seenIds },
"date_created": { "$lt": lastDate }
}
}},
{ "$sort": { "dist": 1, "date_created": -1 }
],
function(err,results) {
results.forEach(function(result) {
if ( ( result.dist != lastDistance ) || ( result.date_created != lastDate ) ) {
seenIds = [];
lastDistance = result.dist;
lastDate = result.date_created;
}
seenIds.push(result._id);
});
// save those variables to session or other persistence
// do something with results
}
)
So there the "minDistance" parameter is entered as you want to exclude any of the "nearer" results that have already been seen, and the additional checks are placed in the query with the "date_created" needing to be "less than" the "lastDistance" recorded as well since we are in descending order of sort, with the final "sure" filter in excluding any "_id" values that were recorded within the list because the values had not changed.
Now with geospatial data that "seenIds" list is not likely to grow as generally you are not going to find things all at the same distance, but it is a general process of paging a sorted list of data like this, so it is worth understanding the concept.
So if you want to be able to use a secondary field to sort on with geospatial data and also considering the "near" distance then this is the general approach, by projecting a distance value into the document results as well as storing the last seen values before any changes that would not make them unique.
The general concept is "advancing the minimum distance" to enable each page of results to get gradually "further away" from the source point of origin used in the query.

MongoDB update/insert document and Increment the matched array element

I use Node.js and MongoDB with monk.js and i want to do the logging in a minimal way with one document per hour like:
final doc:
{ time: YYYY-MM-DD-HH, log: [ {action: action1, count: 1 }, {action: action2, count: 27 }, {action: action3, count: 5 } ] }
the complete document should be created by incrementing one value.
e.g someone visits a webpage first this hour and the incrementation of action1 should create the following document with a query:
{ time: YYYY-MM-DD-HH, log: [ {action: action1, count: 1} ] }
an other user in this hour visits an other webpage and document should be exteded to:
{ time: YYYY-MM-DD-HH, log: [ {action: action1, count: 1}, {action: action2, count: 1} ] }
and the values in count should be incremented on visiting the different webpages.
At the moment i create vor each action a doc:
tracking.update({
time: moment().format('YYYY-MM-DD_HH'),
action: action,
info: info
}, { $inc: {count: 1} }, { upsert: true }, function (err){}
Is this possible with monk.js / mongodb?
EDIT:
Thank you. Your solution looks clean and elegant, but it looks like my server can't handle it, or i am to nooby to make it work.
i wrote a extremly dirty solution with the action-name as key:
tracking.update({ time: time, ts: ts}, JSON.parse('{ "$inc":
{"'+action+'": 1}}') , { upsert: true }, function (err) {});
Yes it is very possible and a well considered question. The only variation I would make on the approach is to rather calculate the "time" value as a real Date object ( Quite useful in MongoDB, and manipulative as well ) but simply "round" the values with basic date math. You could use "moment.js" for the same result, but I find the math simple.
The other main consideration here is that mixing array "push" actions with possible "updsert" document actions can be a real problem, so it is best to handle this with "multiple" update statements, where only the condition you want is going to change anything.
The best way to do that, is with MongoDB Bulk Operations.
Consider that your data comes in something like this:
{ "timestamp": 1439381722531, "action": "action1" }
Where the "timestamp" is an epoch timestamp value acurate to the millisecond. So the handling of this looks like:
// Just adding for the listing, assuming already defined otherwise
var payload = { "timestamp": 1439381722531, "action": "action1" };
// Round to hour
var hour = new Date(
payload.timestamp - ( payload.timestamp % ( 1000 * 60 * 60 ) )
);
// Init transaction
var bulk = db.collection.initializeOrderedBulkOp();
// Try to increment where array element exists in document
bulk.find({
"time": hour,
"log.action": payload.action
}).updateOne({
"$inc": { "log.$.count": 1 }
});
// Try to upsert where document does not exist
bulk.find({ "time": hour }).upsert().updateOne({
"$setOnInsert": {
"log": [{ "action": payload.action, "count": 1 }]
}
});
// Try to "push" where array element does not exist in matched document
bulk.find({
"time": hour,
"log.action": { "$ne": payload.action }
}).updateOne({
"$push": { "log": { "action": payload.action, "count": 1 } }
});
bulk.execute();
So if you look through the logic there, then you will see that it is only ever possible for "one" of those statements to be true for any given state of the document either existing or not. Technically speaking, the statment with the "upsert" can actually match a document when it exists, however the $setOnInsert operation used makes sure that no changes are made, unless the action actually "inserts" a new document.
Since all operations are fired in "Bulk", then the only time the server is contacted is on the .execute() call. So there is only "one" request to the server and only "one" response, despite the multiple operations. It is actually "one" request.
In this way the conditions are all met:
Create a new document for the current period where one does not exist and insert initial data to the array.
Add a new item to the array where the current "action" classification does not exist and add an initial count.
Increment the count property of the specified action within the array upon execution of the statement.
All in all, yes posssible, and also a great idea for storage as long as the action classifications do not grow too large within a period ( 500 array elements should be used as a maximum guide ) and the updating is very efficient and self contained within a single document for each time sample.
The structure is also nice and well suited to other query and possible addtional aggregation purposes as well.

CouchDB: getting number of keys in given key range

In my CouchDB database, all keys have the form "A_xxxxxxxx" where xxxxxxxx is zero-padded decimal number (e.g. "A_00000001" or "A_12345678")
I want to get only the number of keys in a given key range.
For example, to get the keys from A_10000000 to A_30000000, I can query something like:
GET DATABASE/_all_docs?startkey="A_00001000"&endkey="A_30000000"&include_docs=false
But the result contains all keys, and I need to count the elements in "docs" field of the output.
Since the number of keys in my query will be huge, and all I want to know is the number of keys, not the actual list of the keys.
The range start and range end value can be vary, which is not fixed.
Is is possible to get only the number of keys of the given range, without retrieving actual key list?
Thanks,
You cannot get the number of keys in a given key range using the built-in _all_docs view. But you can get the desired result using a custom map reduce view such as this one described in the CouchDB Definitive Guide
map.js
function(doc) {
emit(doc._id, 1);
}
reduce.js
function(keys, values, rereduce) {
return sum(values)
}
You can add these views to your CouchDB database using the Futon admin utility by creating a new document with these contents:
{
"_id": "_design/test",
"views": {
"count": {
"map": "function(doc) {\n emit(doc._id, 1);\n}",
"reduce": "function(keys, values, rereduce) {\n return sum(values)\n}"
}
}
}
_design/test/count can then be queried like instead of _all_docs and will return the number of documents between the start and end keys.
When I run this query again my database without a start and end key I get this result:
{
"rows":[
{
"key": null,
"value": 185
}
]
}
Running the query again with the start and end keys populated I get this result:
{
"rows":[
{
"key": null,
"value": 11
}
]
}

Resources