How to manage large/complex NoSQL queries in code, can we make it generic? - node.js

I've been using mongoose in node js for interacting with mongodb, now for every search/update/delete operation I'll have to write a JSON such as,
Model.find({
"$and": [
{ "name": name },
{
"start": {
"$or": [
{ "$exists": false },
{ "$gte": start }
]
}
},
{
"stop": {
"$or": [
{ "$exists": false },
{ "$gte": stop }
]
}
}
]
}).exec(callback);
Now everytime I've to get any such objects I'll have to make a method which makes such query, or even if I make a generic method it'll have limited parameters as input unless I pass a JSON which again makes almost same amount of code.
So what is best practice for managing such queries in code ?

Related

CouchDB Replication does not replicate old revision documents

I'm working on "one database per user" system using the CouchDB replication with a selector to filter my data based on the user configuration.
It works pretty well, until the day when i noticed an issue with the replication, it is difficult for me to describe it so I will do it with an example:
I have my main database "mainDB" which i'm using as the "source" database for the replication, and i decide to create a sub database "subDB" for a user which will be the "target" for the replication.
I create my replication doc with my selector to filter the data from my "mainDB" and nothing happen, my "subDB" is empty, the replication state is marked as "Running" but 0 pending changes.
And as soon as i update a doc from the "mainDB" (doc that is supposed to be replicated to my "subDB"), the "_rev" of this doc will change, the replication really start and replicate my doc to the "subDB".
In brief, CouchDB filtred replication based on a selector will not replicate any doc until we update "_rev" of each doc that is supposed to be replicated.
App version
Apache CouchDB
v. 3.2.2
EDIT 1
The selector looks like this:
{
"selector": {
"$or": [
{
"date_debut": {
"$lte": "#end_date#"
},
"typedoc": "ActiviteDocument",
"date_fin": {
"$gte": "#start_date#"
},
"id": {
"$in": [
#array_of_integer_A#
]
}
},
{
"typedoc": "IndividuDocument",
"id": {
"$in": [
#array_of_integer_B#
]
}
},
(JSON too long to full parse here, but other part of the $or use same logical)
...
}
EDIT 2 : I changed the selector logical by using $or and $and
"selector": {
"$or": [
{
"$and": [
{
"typedoc": "ActiviteDocument"
},
{
"date_debut": {
"$lte": "#end_date#"
}
},
{
"date_fin": {
"$gte": "#statt_date#"
}
},
{
"id": {
"$in": [#array_of_integer_A#]
}
}
]
},
{
"$and": [
{
"typedoc": "IndividuDocument"
},
{
"id": {
"$in": [#array_of_integer_B#]
}
}
]
},
EDIT 3 : i changed my replication doc by removing selector and using "doc_ids", the replication will not replicate my docs except if i update one of them so the "_rev" change and the replication detect that and start working
{
"_id": "replicationmaster-1123",
"source": "mysource",
"target": "mytarget",
"doc_ids": [
"ActiviteDocument_335765",
"ActiviteDocument_351882",
"ActiviteDocument_421350",
"ActiviteDocument_423684",
"ActiviteDocument_428304",
"ActiviteDocument_440523",
"ActiviteDocument_442048",
"ActiviteDocument_443727"
],
"continuous": true,
"create_target": false,
"owner": "admin"
}
EDIT 4 : demo https://youtu.be/OqJA0fDQqy8
The problem is that in your selector JSON the $or parameter needs to be an array of objects, each one being an individual condition. The way you have it, this parameter is an array with a single object that has all the conditions in it.
Here is a complete replicator document based on your conditions, with the correct syntax:
{
"_id": "abc12357",
"source": "https://username:password#mycouchdb.com/db1",
"target": "https://username:password#mycouchdb.com/db2",
"selector": {
"$or": [
{
"start": {
"$lte": "2022-10-27"
}
},
{
"typedoc": "ActiviteDocument"
},
{
"end": {
"$gte": "2022-09-29"
}
},
{
"id": {
"$in": [
65993,
63938,
87265,
312112,
64885,
64277
]
}
}
]
}
}

Mongoose + Aggregate Date Format Issue

When I perform following query in mongo, its working fine :
db.getCollection('patients').aggregate({
"$match": {
"demographics.dob": new Date("2018-01-17T00:00:00.000Z")
}
})
But when I remove new Date(), its not working.
db.getCollection('patients').aggregate({
"$match": {
"demographics.dob": "2018-01-17T00:00:00.000Z"
}
})
Now why I want to remove this is because when I'm trying to send date from express js, its going in string format, following is my code :
filter["demographics.dob"] = new Date(filter["demographics.dob"]).toISOString();
I know that in aggregate, type casting is not done internally so please suggest me the way to do it in other way :)
Following is the result of mongoose debug log:
patients.aggregate([
{
"$project": {
"demographics.legalFirstName": "$demographics.legalFirstName",
"demographics.lastName": "$demographics.lastName",
"updatedAt": "$updatedAt",
"_id": 1
}
},
{
"$match": {
"$and": [
{
"demographics.dob": "2018-01-17T00:00:00.000Z"
}
]
}
},
{
"$sort": {
"updatedAt": -1
}
},
{
"$skip": 0
},
{
"$limit": 2
}
],
, {}
)
Use $dateFromString and change $match to use $expr in 3.6 version.
Something like
{"$match":{
"$expr":{
"$eq":[
"$demographics.dob",
{"$dateFromString":{"dateString":"2018-01-17T00:00:00.000Z"}}
]
}
}}

Getting data from elasticsearch when field is undefined or missing from object

I'm trying to get data from elastic search by querying some field which indicates if the object was already handled. let's call it 'isHandled".
There are some objects that indexed without this field.
Is there any way to get the data that "isHandled" is just not "true" (false or even missing)?
Thanks
You can use the exists query to achieve that. This query will return you all documents where isHandled is either false or not existing.
{
"query": {
"bool": {
"should": [
{
"term": {
"isHandled": "false"
}
},
{
"bool": {
"must_not": {
"exists": {
"field": "isHandled"
}
}
}
}
]
}
}
}

How to pull one instance of an item in an array in MongoDB?

According to the documents:
The $pull operator removes from an existing array all instances of a value or values that match a specified condition.
Is there an option to remove only the first instance of a value? For example:
var array = ["bird","tiger","bird","horse"]
How can the first "bird" be removed directly in an update call?
So you are correct in that the $pull operator does exactly what the documentation says in that it's arguments are in fact a "query" used to match the elements that are to be removed.
If your array content happened to always have the element in the "first" position as you show then the $pop operator does in fact remove that first element.
With the basic node driver:
collection.findOneAndUpdate(
{ "array.0": "bird" }, // "array.0" is matching the value of the "first" element
{ "$pop": { "array": -1 } },
{ "returnOriginal": false },
function(err,doc) {
}
);
With mongoose the argument to return the modified document is different:
MyModel.findOneAndUpdate(
{ "array.0": "bird" },
{ "$pop": { "array": -1 } },
{ "new": true },
function(err,doc) {
}
);
But neither are of much use if the array position of the "first" item to remove is not known.
For the general approach here you need "two" updates, being one to match the first item and replace it with something unique to be removed, and the second to actually remove that modified item.
This is a lot more simple if applying simple updates and not asking for the returned document, and can also be done in bulk across documents. It also helps to use something like async.series in order to avoid nesting your calls:
async.series(
[
function(callback) {
collection.update(
{ "array": "bird" },
{ "$unset": { "array.$": "" } },
{ "multi": true }
callback
);
},
function(callback) {
collection.update(
{ "array": null },
{ "$pull": { "array": null } },
{ "multi": true }
callback
);
}
],
function(err) {
// comes here when finished or on error
}
);
So using the $unset here with the positional $ operator allows the "first" item to be changed to null. Then the subsequent query with $pull just removes any null entry from the array.
That is how you remove the "first" occurance of a value safely from an array. To determine whether that array contains more than one value that is the same though is another question.
It's worth noting that whilst the other answer here is indeed correct that the general approach here would be to $unset the matched array element in order to create a null value and then $pull just the null values from the array, there are better ways to implement this in modern MongoDB versions.
Using bulkWrite()
As an alternate case to submitting two operations to update in sequence as separate requests, modern MongoDB release support bulk operations via the recommended bulkWrite() method which allows those multiple updates to be submitted as a single request with a single response:
collection.bulkWrite(
[
{ "updateOne": {
"filter": { "array": "bird" },
"update": {
"$unset": { "array.$": "" }
}
}},
{ "updateOne": {
"filter": { "array": null },
"update": {
"$pull": { "array": null }
}
}}
]
);
Does the same thing as the answer showing that as two requests, but this time it's just one. This can save a lot of overhead in server communication, so it's generally the better approach.
Using Aggregation Expressions
With the release of MongoDB 4.2, aggregation expressions are now allowed in the various "update" operations of MongoDB. This is a single pipeline stage of either $addFields, $set ( which is an alias of $addFields meant to make these "update" statements read more logically ), $project or $replaceRoot and it's own alias $replaceWith. The $redact pipeline stage also applies here to some degree. Basically any pipeline stage which returns a "reshaped" document is allowed.
collection.updateOne(
{ "array": "horse" },
[
{ "$set": {
"array": {
"$concatArrays": [
{ "$slice": [ "$array", 0, { "$indexOfArray": [ "$array", "horse" ] }] },
{ "$slice": [
"$array",
{ "$add": [{ "$indexOfArray": [ "$array", "horse" ] }, 1] },
{ "$size": "$array" }
]}
]
}
}}
]
);
In this case the manipulation used is to implement the $slice and $indexOfArray operators to essentially piece together a new array which "skips" over the first matched array element. Theses pieces are joined via the $concatArrays operator, returning a new array absent of the first matched element.
This is now probably more effective since the operation which is still a single request is now also a single operation and would incur a little less server overhead.
Of course the only catch is that this is not supported in any release of MongoDB prior to 4.2. The bulkWrite() on the other hand may be a newer API implementation, but the actual underlying calls to the server would apply back to MongoDB 2.6 implementing actual "Bulk API" calls, and even regresses back to earlier versions by the way all core drivers actually implement this method.
Demonstration
As a demonstration, here is a listing of both approaches:
const { Schema } = mongoose = require('mongoose');
const uri = 'mongodb://localhost:27017/test';
const opts = { useNewUrlParser: true, useUnifiedTopology: true };
mongoose.Promise = global.Promise;
mongoose.set('debug', true);
mongoose.set('useCreateIndex', true);
mongoose.set('useFindAndModify', false);
const arrayTestSchema = new Schema({
array: [String]
});
const ArrayTest = mongoose.model('ArrayTest', arrayTestSchema);
const array = ["bird", "tiger", "horse", "bird", "horse"];
const log = data => console.log(JSON.stringify(data, undefined, 2));
(async function() {
try {
const conn = await mongoose.connect(uri, opts);
await Promise.all(
Object.values(conn.models).map(m => m.deleteMany())
);
await ArrayTest.create({ array });
// Use bulkWrite update
await ArrayTest.bulkWrite(
[
{ "updateOne": {
"filter": { "array": "bird" },
"update": {
"$unset": { "array.$": "" }
}
}},
{ "updateOne": {
"filter": { "array": null },
"update": {
"$pull": { "array": null }
}
}}
]
);
log({ bulkWriteResult: (await ArrayTest.findOne()) });
// Use agggregation expression
await ArrayTest.collection.updateOne(
{ "array": "horse" },
[
{ "$set": {
"array": {
"$concatArrays": [
{ "$slice": [ "$array", 0, { "$indexOfArray": [ "$array", "horse" ] }] },
{ "$slice": [
"$array",
{ "$add": [{ "$indexOfArray": [ "$array", "horse" ] }, 1] },
{ "$size": "$array" }
]}
]
}
}}
]
);
log({ aggregateWriteResult: (await ArrayTest.findOne()) });
} catch (e) {
console.error(e);
} finally {
mongoose.disconnect();
}
})();
And the output:
Mongoose: arraytests.deleteMany({}, {})
Mongoose: arraytests.insertOne({ array: [ 'bird', 'tiger', 'horse', 'bird', 'horse' ], _id: ObjectId("5d8f509114b61a30519e81ab"), __v: 0 }, { session: null })
Mongoose: arraytests.bulkWrite([ { updateOne: { filter: { array: 'bird' }, update: { '$unset': { 'array.$': '' } } } }, { updateOne: { filter: { array: null }, update: { '$pull': { array: null } } } } ], {})
Mongoose: arraytests.findOne({}, { projection: {} })
{
"bulkWriteResult": {
"array": [
"tiger",
"horse",
"bird",
"horse"
],
"_id": "5d8f509114b61a30519e81ab",
"__v": 0
}
}
Mongoose: arraytests.updateOne({ array: 'horse' }, [ { '$set': { array: { '$concatArrays': [ { '$slice': [ '$array', 0, { '$indexOfArray': [ '$array', 'horse' ] } ] }, { '$slice': [ '$array', { '$add': [ { '$indexOfArray': [ '$array', 'horse' ] }, 1 ] }, { '$size': '$array' } ] } ] } } } ])
Mongoose: arraytests.findOne({}, { projection: {} })
{
"aggregateWriteResult": {
"array": [
"tiger",
"bird",
"horse"
],
"_id": "5d8f509114b61a30519e81ab",
"__v": 0
}
}
NOTE : The example listing is using mongoose, partly because it was referenced in the other answer given and partly to also demonstrate an important point with the aggregate syntax example. Note the code uses ArrayTest.collection.updateOne() since at the present release of Mongoose ( 5.7.1 at time of writing ) the aggregation pipeline syntax to such updates is being removed by the standard mongoose Model methods.
As such the .collection accessor can be used in order to get the underlying Collection object from the core MongoDB Node driver. This would be required until a fix is made to mongoose which allows this expression to be included.
As mentioned in this Jira this feature will never exist properly.
The approach I recommend using would be via the aggregation pipeline update syntax as proposed in a different answer, however that answer has some edge cases where it fails - for example if the element does not exist in the array, here is a working version for all edge cases.
ArrayTest.updateOne({},
[
{
"$set": {
"array": {
"$concatArrays": [
{
$cond: [
{
$gt: [
{
"$indexOfArray": [
"$array",
"horse"
]
},
0
]
},
{
"$slice": [
"$array",
0,
{
"$indexOfArray": [
"$array",
"horse"
]
}
]
},
[]
]
},
{
"$slice": [
"$array",
{
"$add": [
{
"$indexOfArray": [
"$array",
"horse"
]
},
1
]
},
{
"$size": "$array"
}
]
}
]
}
}
}
])
Mongo Playground

Mongoose (MongoDB): Exclude Properties in a Dictionary Like Schema Type

I have a Schema of the following structure:
var schema = mongoose.Schema({
answers: {type: mongoose.Schema.Types.Mixed}
});
I use the answers field as an object (associative array to implement something like a dictionary). Here is an example:
{
"__v": 0,
"_id": {
"$oid": "53a0251c50d0536c1bfc6006"
},
"answers": {
"fea": {
"viewed": false
},
"3d2": {
"viewed": true,
"value": true
},
"4fr": {
"viewed": true,
"value": true
},
"84h": {
"viewed": false
},
...
}
}
In a query I want to select only the "value" field of each entry. How is that possible through the select syntax? This of course doesn't work:
XY.find(...)
.select({'answers': true, 'answers.*.value': false})
.exec(...);
Maybe I have to design the data in another fashion?
Best regards,
Kersten
You should never model with "explicit values" as the "key" names. This is very bad practice. Consider what you would do in a SQL database. Would you create "fields/columns" for the different "names" of the things you want?
No you would not. You have a generic field that specifies a "type" and then you have others that hold the data. Nothing changes here:
{
"_id": {
"$oid": "53a0251c50d0536c1bfc6006"
},
"answers": [
{ "type": "fea", "viewed": false },
{ "type": "3d2", "viewed": true, "value": true },
{ "type": "4fr", "viewed": true, "value": true },
{ "type": "84h", "viewed": false },
...
]
}
Now this is easy to use something like the aggregation framework to make the projection of the content like you want:
With Modern MongoDB 2.6 and onwards you can use $map and $setDifference to filter the array without using $unwind:
Model.aggregate(
[
{ "$project": {
"answers": {
"$setDifference": [
{
"$map": {
"input": "$answers",
"as": "el",
"in": {
"$cond": [
1,
{
"type": "$$el.type",
"value": { "$ifNull": [ "$$el.value", false ] }
},
false
]
}
}
},
[false]
]
}
}}
],
function(err,result) {
}
);
Or with older versions pre 2.6:
Model.aggregate(
[
{ "$unwind": "$answers" },
{ "$group": {
"_id": "$_id",
"answers": {
"$push": {
"type": "$answers.type",
"value": { "$ifNull": [ "$answers.value", false ] }
}
}
}}
],
function(err,result) {
}
);
Of course you can "filter" the array results to certain conditions by either adding a logical evaluation as the first argument to the $cond operator in the $map implementation. Or by using a $match pipeline stage in between the $unwind and $group stages.
Either form allows you to re-shape the result without problem and is the fastest way to process this, which is a strong advantage of using arrays as opposed to embedded objects whose keys are actually a "data" item.
If you are stuck with this then you need to process with JavaScript evaluation like mapReduce. This runs much slower than the aggregation framework due to the need to invoke and run in a JavaScript interpreter process:
Model.mapReduce(
{
"map": function() {
for ( var k in this.answers ) {
this.answers[k] = this.answers[k].hasOwnProperty("value")
? this.answers[k].value : false;
}
var id = this._id;
delete this._id;
emit( id, this );
},
"reduce": function(){}
},
function(err,docs) {
}
)
But really, consider changing your structure as it makes things much more flexible for queries and other operations.

Resources