MongoDB merge two collections with unmatched documents

MongoDB merge two collections with unmatched documents - node.js

I am trying to compare and find different documents from two collections
below are the samples, Mongodb version:4.0, ORM:mongoose
**col1: Has one new document**
{ "id" : 200001, "mobileNo" : #######001 }
{ "id" : 200002, "mobileNo" : #######002 } //mobileNo may not be unique.
{ "id" : 200003, "mobileNo" : #######002 }
{ "id" : 200004, "mobileNo" : #######004 }
**col2:**
{ "id" : 200001, "mobileNo" : #######001 }
{ "id" : 200002, "mobileNo" : #######002 }
{ "id" : 200003, "mobileNo" : #######003 }
Now I want to insert the document { "id" : 200004, "mobileNo" : #######004 } from col1 to col2
i.e; the documents which doesn't match.
This is what I've tried so far :
const col1= await Col1.find({}, { mobileNo: 1,id: 1, _id: 0 })
col1.forEach(async function (col1docs) {
let col2doc = await Col2.find({ mobileNo: { $ne: col1docs.mobileNo},
id:{$ne:col1docs.id} }, { mobileNo: 1, _id: 0, id: 1 })
if (!(col2doc)) {
Col2.insertMany(col1docs);
}
});
I have also tried with $eq instead of $ne but neither i am getting the unmatched documents nor they are getting inserted. Any suggestions??? Combination of id+phoneNo is unique

I would say instead of doing two .find() calls plus iteration & then third call to write data, try this query :
db.col1.aggregate([
{
$lookup: {
from: "col2",
let: { id: "$id", mobileNo: "$mobileNo" },
pipeline: [
{
$match: { $expr: { $and: [ { $eq: [ "$id", "$$id" ] }, { $gte: [ "$mobileNo", "$$mobileNo" ] } ] } }
},
{ $project: { _id: 1 } } // limiting to `_id` as we don't need entire doc of `col2` - just need to see whether a ref exists or not
],
as: "data"
}
},
{ $match: { data: [] } // Filtering leaves docs in `col1` which has no match in `col2`
},
{ $project: { data: 0, _id: 0 } }
])
Test : mongoplayground
Details : From the above query you're taking advantage of specifying conditions in $lookup to get docs from col1 which have reference in col2. Let's say $lookup will run on each document of col1 - So with the unique combination of id & mobileNo from current document in col1 has a matching in col2 then col2 doc's _id will be pushed in data array, at the end what we get out of col1 is data: [] to say no matching docs were found for these col1 doc's. Now you can just write all the returned docs to col2 using .insertMany(). Actually you can do this entire thing using $merge on MongoDB version > 4.2 without any need of 2nd write call(.insertMany()).
For your scenario on MongoDB version > 4.2 something like this will merge docs to second collection :
{ $merge: 'col2' } // Has to be final stage in aggregation
Note : If this has to be done periodically - no matter how you do this, try to minimize data that you're operating on, maybe maintain a time field & you can use that field to filter docs first & do this job, or you can also take advantage of _id to say we've done for all these docs in last run & we need to start from this docs - which helps you a lot to reduce data to be worked on. Additionally don't forget to maintain indexes.

Related

How do I query for the value of a key in an object? [duplicate]

So I'm attempting to find all records who have a field set and isn't null.
I try using $exists, however according to the MongoDB documentation, this query will return fields who equal null.
$exists does match documents that contain the field that stores the null value.
So I'm now assuming I'll have to do something like this:
db.collection.find({ "fieldToCheck" : { $exists : true, $not : null } })
Whenever I try this however, I get the error [invalid use of $not] Anyone have an idea of how to query for this?

Use $ne (for "not equal")
db.collection.find({ "fieldToCheck": { $ne: null } })

Suppose we have a collection like below:
{
"_id":"1234"
"open":"Yes"
"things":{
"paper":1234
"bottle":"Available"
"bottle_count":40
}
}
We want to know if the bottle field is present or not?
Ans:
db.products.find({"things.bottle":{"$exists":true}})

i find that this works for me
db.getCollection('collectionName').findOne({"fieldName" : {$ne: null}})

This comment is written in 2021 and applies for MongoDB 5.X and earlier versions.
If you value query performance never use $exists (or use it only when you have a sparse index over the field that is queried. the sparse index should match the criteria of the query, meaning, if searching for $exists:true, the sparse index should be over field:{$exist:true} , if you are querying where $exists:true the sparse index should be over field:{$exist:false}
Instead use :
db.collection.find({ "fieldToCheck": { $ne: null } })
or
db.collection.find({ "fieldToCheck": { $eq: null } })
this will require that you include the fieldToCheck in every document of the collection, however - the performance will be vastly improved.

db.<COLLECTION NAME>.find({ "<FIELD NAME>": { $exists: true, $ne: null } })

In my case, i added new field isDeleted : true to only fields that are deleted.
So for all other records there was no isDeleted field, so i wanted to get all the fields that isDeleted either does not exist or false. So query is
.find({ isDeleted: { $ne: true } });

I Tried to convert it into boolean condition , where if document with
table name already exist , then it will append in the same document ,
otherwise it will create one .
table_name is the variable using which i am trying to find the document
query = { table_name : {"$exists": "True"}}
result = collection.find(query)
flag = 0
for doc in result:
collection.update_one({}, { "$push" : { table_name : {'name':'hello'} } } )
flag = 1
if (flag == 0):
collection.insert_one({ table_name : {'roll no' : '20'}})

aggregate example
https://mongoplayground.net/p/edbKil4Zvwc
db.collection.aggregate([
{
"$match": {
"finishedAt": {
"$exists": true
}
}
},
{
"$unwind": "$tags"
},
{
"$match": {
"$or": [
{
"tags.name": "Singapore"
},
{
"tags.name": "ABC"
}
]
}
},
{
"$group": {
"_id": null,
"count": {
"$sum": 1
}
}
}
])

Formatting the returned object from MongoDB/Mongoose group by

I have a MongoDB with documents of the form:
{
...
"template" : "templates/Template1.html",
...
}
where template is either "templates/Template1.html", "templates/Template2.html" or "templates/Template3.html".
I'm using this query to group by template and count how many times each template is used:
var group = {
key:{'template':1},
reduce: function(curr, result){ result.count++ },
initial: { count: 0 }
};
messageModel.collection.group(group.key, null, group.initial, group.reduce, null, true, cb);
I'm getting back the correct result, but it's formatted like this:
{
"0" : {
"template" : "templates/Template1.html",
"count" : 2 },
"1" : {
"template" : "templates/Template2.html",
"count" : 2 },
"2" : {
"template" : "templates/Template3.html",
"count" : 1 }
}
I was wondering if it's possible to change the query so that it returns something like:
{
"templates/Template1.html" : { "count" : 2 },
"templates/Template2.html" : { "count" : 2 },
"templates/Template3.html" : { "count" : 1 }
}
or even:
{
"templates/Template1.html" : 2 ,
"templates/Template2.html" : 2 ,
"templates/Template3.html" : 1
}
I would rather change the query and not parse the returned object from the original query.

As mentioned by Blakes Seven in the comments you could use aggregate() instead of group() to achieve nearly your desired result.
messageModel.collection.aggregate([
{ // Group the collection by `template` and count the occurrences
$group: {
_id: "$template",
count: { $sum: 1 }
}
},
{ // Format the output
$project: {
_id: 0,
template: "$_id",
count: 1
}
},
{ // Sort the formatted output
$sort: { template: 1 }
}
]);
The output would look like this:
[
{
"template" : "templates/Template1.html",
"count" : 2 },
{
"template" : "templates/Template2.html",
"count" : 2 },
{
"template" : "templates/Template3.html",
"count" : 1 }
}
]
Again, as stated by Blakes in the comments the database can only output an array of objects rather than a solitary object. That would be a transformation that you would need to do outside of the database.
I think it deserves to be restated that this transformation produces an anti-pattern and should be avoided. An object key name provides the context or description for the value. Using a file location as a key name would be a fairly vague description whereas 'template' provides a bit more information about what that value represents.

Mongoose aggregation "$sum" of rows in sub document

I'm fairly good with sql queries, but I can't seem to get my head around grouping and getting sum of mongo db documents,
With this in mind, I have a job model with schema like below :
{
name: {
type: String,
required: true
},
info: String,
active: {
type: Boolean,
default: true
},
all_service: [
price: {
type: Number,
min: 0,
required: true
},
all_sub_item: [{
name: String,
price:{ // << -- this is the price I want to calculate
type: Number,
min: 0
},
owner: {
user_id: { // <<-- here is the filter I want to put
type: Schema.Types.ObjectId,
required: true
},
name: String,
...
}
}]
],
date_create: {
type: Date,
default : Date.now
},
date_update: {
type: Date,
default : Date.now
}
}
I would like to have a sum of price column, where owner is present, I tried below but no luck
Job.aggregate(
[
{
$group: {
_id: {}, // not sure what to put here
amount: { $sum: '$all_service.all_sub_item.price' }
},
$match: {'not sure how to limit the user': given_user_id}
}
],
//{ $project: { _id: 1, expense: 1 }}, // you can only project fields from 'group'
function(err, summary) {
console.log(err);
console.log(summary);
}
);
Could someone guide me in the right direction. thank you in advance

Primer
As is correctly noted earlier, it does help to think of an aggregation "pipeline" just as the "pipe" | operator from Unix and other system shells. One "stage" feeds input to the "next" stage and so on.
The thing you need to be careful with here is that you have "nested" arrays, one array within another, and this can make drastic differences to your expected results if you are not careful.
Your documents consist of an "all_service" array at the top level. Presumably there are often "multiple" entries here, all containing your "price" property as well as "all_sub_item". Then of course "all_sub_item" is an array in itself, also containg many items of it's own.
You can think of these arrays as the "relations" between your tables in SQL, in each case a "one-to-many". But the data is in a "pre-joined" form, where you can fetch all data at once without performing joins. That much you should already be familiar with.
However, when you want to "aggregate" accross documents, you need to "de-normalize" this in much the same way as in SQL by "defining" the "joins". This is to "transform" the data into a de-normalized state that is suitable for aggregation.
So the same visualization applies. A master document's entries are replicated by the number of child documents, and a "join" to an "inner-child" will replicate both the master and initial "child" accordingly. In a "nutshell", this:
{
"a": 1,
"b": [
{
"c": 1,
"d": [
{ "e": 1 }, { "e": 2 }
]
},
{
"c": 2,
"d": [
{ "e": 1 }, { "e": 2 }
]
}
]
}
Becomes this:
{ "a" : 1, "b" : { "c" : 1, "d" : { "e" : 1 } } }
{ "a" : 1, "b" : { "c" : 1, "d" : { "e" : 2 } } }
{ "a" : 1, "b" : { "c" : 2, "d" : { "e" : 1 } } }
{ "a" : 1, "b" : { "c" : 2, "d" : { "e" : 2 } } }
And the operation to do this is $unwind, and since there are multiple arrays then you need to $unwind both of them before continuing any processing:
db.collection.aggregate([
{ "$unwind": "$b" },
{ "$unwind": "$b.d" }
])
So there the "pipe" first array from "$b" like so:
{ "a" : 1, "b" : { "c" : 1, "d" : [ { "e" : 1 }, { "e" : 2 } ] } }
{ "a" : 1, "b" : { "c" : 2, "d" : [ { "e" : 1 }, { "e" : 2 } ] } }
Which leaves a second array referenced by "$b.d" to further be de-normalized into the the final de-normalized result "without any arrays". This allows other operations to process.
Solving
With just about "every" aggregation pipeline, the "first" thing you want to do is "filter" the documents to only those that contain your results. This is a good idea, as especially when doing operations such as $unwind, then you don't want to be doing that on documents that do not even match your target data.
So you need to match your "user_id" at the array depth. But this is only part of getting the result, since you should be aware of what happens when you query a document for a matching value in an array.
Of course, the "whole" document is still returned, because this is what you really asked for. The data is already "joined" and we haven't asked to "un-join" it in any way.You look at this just as a "first" document selection does, but then when "de-normalized", every array element now actualy represents a "document" in itself.
So not "only" do you $match at the beginning of the "pipeline", you also $match after you have processed "all" $unwind statements, down to the level of the element you wish to match.
Job.aggregate(
[
// Match to filter possible "documents"
{ "$match": {
"all_service.all_sub_item.owner": given_user_id
}},
// De-normalize arrays
{ "$unwind": "$all_service" },
{ "$unwind": "$all_service.all_subitem" },
// Match again to filter the array elements
{ "$match": {
"all_service.all_sub_item.owner": given_user_id
}},
// Group on the "_id" for the "key" you want, or "null" for all
{ "$group": {
"_id": null,
"total": { "$sum": "$all_service.all_sub_item.price" }
}}
],
function(err,results) {
}
)
Alternately, modern MongoDB releases since 2.6 also support the $redact operator. This could be used in this case to "pre-filter" the array content before processing with $unwind:
Job.aggregate(
[
// Match to filter possible "documents"
{ "$match": {
"all_service.all_sub_item.owner": given_user_id
}},
// Filter arrays for matches in document
{ "$redact": {
"$cond": {
"if": {
"$eq": [
{ "$ifNull": [ "$owner", given_user_id ] },
given_user_id
]
},
"then": "$$DESCEND",
"else": "$$PRUNE"
}
}},
// De-normalize arrays
{ "$unwind": "$all_service" },
{ "$unwind": "$all_service.all_subitem" },
// Group on the "_id" for the "key" you want, or "null" for all
{ "$group": {
"_id": null,
"total": { "$sum": "$all_service.all_sub_item.price" }
}}
],
function(err,results) {
}
)
That can "recursively" traverse the document and test for the condition, effectively removing any "un-matched" array elements before you even $unwind. This can speed things up a bit since items that do not match would not need to be "un-wound". However there is a "catch" in that if for some reason the "owner" did not exist on an array element at all, then the logic required here would count that as another "match". You can always $match again to be sure, but there is still a more efficient way to do this:
Job.aggregate(
[
// Match to filter possible "documents"
{ "$match": {
"all_service.all_sub_item.owner": given_user_id
}},
// Filter arrays for matches in document
{ "$project": {
"all_items": {
"$setDifference": [
{ "$map": {
"input": "$all_service",
"as": "A",
"in": {
"$setDifference": [
{ "$map": {
"input": "$$A.all_sub_item",
"as": "B",
"in": {
"$cond": {
"if": { "$eq": [ "$$B.owner", given_user_id ] },
"then": "$$B",
"else": false
}
}
}},
false
]
}
}},
[[]]
]
}
}},
// De-normalize the "two" level array. "Double" $unwind
{ "$unwind": "$all_items" },
{ "$unwind": "$all_items" },
// Group on the "_id" for the "key" you want, or "null" for all
{ "$group": {
"_id": null,
"total": { "$sum": "$all_items.price" }
}}
],
function(err,results) {
}
)
That process cuts down the size of the items in both arrays "drastically" compared to $redact. The $map operator processes each elment of an array to the given statement within "in". In this case, each "outer" array elment is sent to another $map to process the "inner" elements.
A logical test is performed here with $cond whereby if the "condiition" is met then the "inner" array elment is returned, otherwise the false value is returned.
The $setDifference is used to filter down any false values that are returned. Or as in the "outer" case, any "blank" arrays resulting from all false values being filtered from the "inner" where there is no match there. This leaves just the matching items, encased in a "double" array, e.g:
[[{ "_id": 1, "price": 1, "owner": "b" },{..}],[{..},{..}]]
As "all" array elements have an _id by default with mongoose (and this is a good reason why you keep that) then every item is "distinct" and not affected by the "set" operator, apart from removing the un-matched values.
Process $unwind "twice" to convert these into plain objects in their own documents, suitable for aggregation.
So those are the things you need to know. As I stated earlier, be "aware" of how the data "de-normalizes" and what that implies towards your end totals.

It sounds like you want to, in SQL equivalent, do "sum (prices) WHERE owner IS NOT NULL".
On that assumption, you'll want to do your $match first, to reduce the input set to your sum. So your first stage should be something like
$match: { all_service.all_sub_items.owner : { $exists: true } }
Think of this as then passing all matching documents to your second stage.
Now, because you are summing an array, you have to do another step. Aggregation operators work on documents - there isn't really a way to sum an array. So we want to expand your array so that each element in the array gets pulled out to represent the array field as a value, in its own document. Think of this as a cross join. This will be $unwind.
$unwind: { "$all_service.all_sub_items" }
Now you've just made a much larger number of documents, but in a form where we can sum them. Now we can perform the $group. In your $group, you specify a transformation. The line:
_id: {}, // not sure what to put here
is creating a field in the output document, which is not the same documents as the input documents. So you can make the _id here anything you'd like, but think of this as the equivalent to your "GROUP BY" in sql. The $sum operator will essentially be creating a sum for each group of documents you create here that match that _id - so essentially we'll be "re-collapsing" what you just did with $unwind, by using the $group. But this will allow $sum to work.
I think you're looking for grouping on just your main document id, so I think your $sum statement in your question is correct.
$group : { _id : $_id, totalAmount : { $sum : '$all_service.all_sub_item.price' } }
This will output documents with an _id field equivalent to your original document ID, and your sum.
I'll let you put it together, I'm not super familiar with node. You were close but I think moving your $match to the front and using an $unwind stage will get you where you need to be. Good luck!

Aggregate results in Mongoose

I have a database with 800+ different bars, clubs and restaurants across Australia.
I want to build a list of links for my website counting the number of different venues across different suburbs and primary categories.
Like this:
Restaurants, Bowen Hills (15)
Restaurants, Dawes Point (6)
Clubs, Sydney (138)
I could do it the hard way by first getting all venues. Then run a Venue.distinct('details.location.suburb') to get all the unique suburbs.
From here I could run subsequent queries to get the count for the number of venues in that particular suburb and category.
It will be a lot of calls though. There's got to be better way?
Can the Mongo aggregation framework help here?
It seems to be impossible to do this in a single query.
Here's the Venue model:
{
"name" : "Johnny's Bar & Grill",
"meta" : {
"category" : {
"all" : [
"restaurant",
"bar"
],
"primary" : "restaurant"
}
},
"details" : {
"location" : {
"streetNumber" : "180",
"streetName" : "abbotsford road",
"suburb" : "bowen hills",
"city" : "brisbane",
"postcode" : "4006",
"state" : "qld",
"country" : "australia"
},
"contact" : {
"phone" : [
"(07) 5555 5555"
]
}
}
}
}
Here's the prettified solution from BatScream that I ended up using:
Venue.aggregate([
{
$group: {
_id: {
primary: '$meta.category.primary',
suburb: '$details.location.suburb',
country: '$details.location.country',
state: '$details.location.state',
city: '$details.location.city'
},
count: {
$sum: 1
},
type: {
$first: '$meta.category.primary'
}
}
},
{
$sort: {
count: -1
}
},
{
$limit: 50
},
// Reshapes each document in the stream, such as by adding new fields or removing existing fields. For each input document, outputs one document.
{
$project: {
_id: 0,
type : '$type',
location : '$_id.suburb',
count: 1
}
}
],
function(err, res){
next(err, res);
});
}

You can get a very useful and easily transformable output using the following aggregation operation.
Group the records based on their country, category, state, city and
suburb.
Get the count of the records in each group.
Obtain the type of the group from the first record of the group.
Project the necessary fields.
Code:
db.collection.aggregate([
{$group:{"_id":{"primary":"$meta.category.primary",
"suburb":"$details.location.suburb",
"country":"$details.location.country",
"state":"$details.location.state",
"city":"$details.location.city"},
"count":{$sum:1},
"type":{$first:"$meta.category.primary"}}},
{$sort:{"count":-1}},
{$project:{"_id":0,
"type":"$type",
"location":"$_id.suburb",
"count":1}}
])
sample o/p:
{ "count" : 1, "type" : "restaurant", "location" : "bowen hills" }

Compare two date fields in MongoDB

in my collection each document has 2 dates, modified and sync. I would like to find those which modified > sync, or sync does not exist.
I tried
{'modified': { $gt : 'sync' }}
but it's not showing what I expected. Any ideas?
Thanks

You can not compare a field with the value of another field with the normal query matching. However, you can do this with the aggregation framework:
db.so.aggregate( [
{ $match: …your normal other query… },
{ $match: { $eq: [ '$modified', '$sync' ] } }
] );
I put …your normal other query… in there as you can make that bit use the index. So if you want to do this for only documents where the name field is charles you can do:
db.so.ensureIndex( { name: 1 } );
db.so.aggregate( [
{ $match: { name: 'charles' } },
{ $project: {
modified: 1,
sync: 1,
name: 1,
eq: { $cond: [ { $gt: [ '$modified', '$sync' ] }, 1, 0 ] }
} },
{ $match: { eq: 1 } }
] );
With the input:
{ "_id" : ObjectId("520276459bf0f0f3a6e4589c"), "modified" : 73845345, "sync" : 73234 }
{ "_id" : ObjectId("5202764f9bf0f0f3a6e4589d"), "modified" : 4, "sync" : 4 }
{ "_id" : ObjectId("5202765b9bf0f0f3a6e4589e"), "modified" : 4, "sync" : 4, "name" : "charles" }
{ "_id" : ObjectId("5202765e9bf0f0f3a6e4589f"), "modified" : 4, "sync" : 45, "name" : "charles" }
{ "_id" : ObjectId("520276949bf0f0f3a6e458a1"), "modified" : 46, "sync" : 45, "name" : "charles" }
This returns:
{
"result" : [
{
"_id" : ObjectId("520276949bf0f0f3a6e458a1"),
"modified" : 46,
"sync" : 45,
"name" : "charles",
"eq" : 1
}
],
"ok" : 1
}
If you want any more fields, you need to add them in the $project.

For MongoDB 3.6 and newer:
The $expr operator allows the use of aggregation expressions within the query language, thus you can do the following:
db.test.find({ "$expr": { "$gt": ["$modified", "$sync"] } })
or using aggregation framework with $match pipeline
db.test.aggregate([
{ "$match": { "$expr": { "$gt": ["$modified", "$sync"] } } }
])
For MongoDB 3.0+:
You can also use the aggregation framework with the $redact pipeline operator that allows you to process the logical condition with the $cond operator and uses the special operations $$KEEP to "keep" the document where the logical condition is true or $$PRUNE to "remove" the document where the condition was false.
Consider running the following aggregate operation which demonstrates the above concept:
db.test.aggregate([
{ "$redact": {
"$cond": [
{ "$gt": ["$modified", "$sync"] },
"$$KEEP",
"$$PRUNE"
]
} }
])
This operation is similar to having a $project pipeline that selects the fields in the collection and creates a new field that holds the result from the logical condition query and then a subsequent $match, except that $redact uses a single pipeline stage which is more efficient:

Simply
db.collection.find({$where:"this.modified>this.sync"})
Example
Kobkrits-MacBook-Pro-2:~ kobkrit$ mongo
MongoDB shell version: 3.2.3
connecting to: test
> db.time.insert({d1:new Date(), d2: new Date(new Date().getTime()+10000)})
WriteResult({ "nInserted" : 1 })
> db.time.find()
{ "_id" : ObjectId("577a619493653ac93093883f"), "d1" : ISODate("2016-07-04T13:16:04.167Z"), "d2" : ISODate("2016-07-04T13:16:14.167Z") }
> db.time.find({$where:"this.d1<this.d2"})
{ "_id" : ObjectId("577a619493653ac93093883f"), "d1" : ISODate("2016-07-04T13:16:04.167Z"), "d2" : ISODate("2016-07-04T13:16:14.167Z") }
> db.time.find({$where:"this.d1>this.d2"})
> db.time.find({$where:"this.d1==this.d2"})
>

Use Javascript, use foreach And convert Date To toDateString()
db.ledgers.find({}).forEach(function(item){
if(item.fromdate.toDateString() == item.todate.toDateString())
{
printjson(item)
}
})

Right now your query is trying to return all results such that the modified field is greater than the word 'sync'. Try getting rid of the quotes around sync and see if that fixes anything. Otherwise, I did a little research and found this question. What you're trying to do just might not be possible in a single query, but you should be able to manipulate your data once you pull everything from the database.

To fix this issue without aggregation change your query to this:
{'modified': { $gt : ISODate(this.sync) }}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string