Compare two Collections in MongoDB and show the differences - node.js

I'm trying to compare two collections in mongodb. I have Collection A and Collection B and I only want to show the Differences. How is this done? I thought it could be done with the Aggregation Framework but I did not get the expected values. I just want to see which Document in Collection A is not the same as in Collection B.
Collection: A
{
"_id" : ObjectId("x"),
"p" : [
{
"t" : 1,
"p" : 123
},
{
"t" : 2,
"p" : 123
}
]
},
{
"_id" : ObjectId("y"),
"p" : [
{
"t" : 1,
"p" : 234
},
{
"t" : 2,
"p" : 234
}
]
}
Collection: B
{
"_id" : ObjectId("x"),
"p" : [
{
"t" : 1,
"p" : 123
},
{
"t" : 2,
"p" : 538458 // OTHER VALUE HERE
}
]
},
{
"_id" : ObjectId("y"),
"p" : [
{
"t" : 1,
"p" : 234
},
{
"t" : 2,
"p" : 234
}
]
}

You could export each collection by using mongoexport, this will create a file with all the documents, but make sure you omit the _id (documents maybe identical but will have different ids):
mongoexport --db db_name --collection collection_name | sed '/"_id":/s/"_id":[^,]*,//' > file_name.json
Then you can compare the two files using diff.

Related

How to search value exist in array of objects of array

I have a dataset like this:
{
"_id" : ObjectId("5ede1b6c317aca326c2f18d7"),
"createdate" : ISODate("2020-06-11T18:30:00.000Z"),
"userHolder" : [
{
"time" : "12:00",
"user" : [
"5ede1ff42b3e633edc0ba10e"
]
},
{
"time" : "16:30",
"user" : []
}
],
},
{
"_id" : ObjectId("5ede1b6c317aca326c2f18d8"),
"createdate" : ISODate("2020-06-121T18:30:00.000Z"),
"userHolder" : [
{
"time" : "12:30",
"user" : [
"5ede1ff42b3e633edc0ba10f"
]
},
{
"time" : "13:00",
"user" : [
"5ede1ff42b3e633edc0ba10e"
]
},
{
"time" : "12:00",
"user" : [
"5ede1ff42b3e633edc0ba10f"
]
},
{
"time" : "16:30",
"user" : []
}
],
}
I split the half hour entry. i,e full day 48 columns on userHolder columns. Like 12:30, 13:00, 13:30 and so on. If user not have entry then that column will not create.
So if I want to search 5ede1ff42b3e633edc0ba10e this id on the complete table then how to write the query.
I tried to use >$all operator but this not works on nested structure.
There is a $elemMatch but for that query will be too large as I have to write the 48 conditions of timestamp. Expected result is query return the _id of the entry so that it will clear that these id will exist on n numbers of entry. I want the Data not count.
Any help is really appreciated for that.

Sort JSON document by values embedded in an array of objects

I have a document in the below format. The goal is to group the document by student name and sort it by rank in the ascending order. Once that is done, iterate through the rank(within a student) and if each subsequent rank is greater than the previous one, the version field needs to be incremented. As part of a pipeline, student_name will be passed to me so matching by student name should be good instead of grouping.
NOTE: Tried it with python and works to some extent. A python solution would also be great!
{
"_id" : ObjectId("5d389c7907bf860f5cd11220"),
"class" : "I",
"students" : [
{
"student_name" : "AAA",
"Version" : 2,
"scores" : [
{
"value" : "50",
"rank" : 2
},
{
"value" : "70",
"rank" : 1
}
]
},
{
"student_name" : "BBB",
"Version" : 5,
"scores" : [
{
"value" : 80,
"rank" : 2
},
{
"value" : 100,
"rank" : 1
},
{
"value" : 100,
"rank" : 1
}
]
}
]
}
I tried this piece of code to sort
def version(student_name):
db.column.aggregate(
[
{"$unwind": "$students"},
{"$unwind": "$students.scores"},
{"$sort" : {"students.scores.rank" : 1}},
{"$group" : {"students.student_name}
]
)
for i in range(0,(len(students.scores)-1)):
if students.scores[i].rank < students.scores[i+1].rank:
tag.update_many(
{"$inc" : {"students.Version":1}}
)
The expected output for student AAA should be
{
"_id" : ObjectId("5d389c7907bf860f5cd11220"),
"class" : "I",
"students" : [
{
"student_name" : "AAA",
"Version" : 3, #version incremented
"scores" : [
{
"value" : "70",
"rank" : 1
},
{
"value" : "50",
"rank" : 2
}
]
}
I was able to sort the document.
pipeline = [
{"$unwind": "$properties"},
{"$unwind": "$properties.values"},
{"$sort" : {"$properties.values.rank" : -1}},
{"$group": {"_id" : "$properties.property_name", "values" : {"$push" : "$properties.values"}}}
]
import pprint
pprint.pprint(list(db.column.aggregate(pipeline)))

Querying MongoDB collection by array intersection

What I have:
An array of strings that I wish to query with. ['a', 'b', 'c']
Data I'm querying against:
A collection of objects of type foo, all with a bar field. The bar field is an array of strings, the same type as the one I'm querying with, potentially with some of the same elements.
foo1 = { bar: ['a'] }
foo2 = { bar: ['d'] }
foo3 = { bar: ['a', 'c']}
What I need:
A query that returns all foo objects whose entire bar array is contained within the query array. In the example above, I'd want foo1 and foo3 to come back
using aggregation
you might need to use $setIsSubset in aggregate pipeline
db.col.aggregate(
[
{$project : { bar : 1 , isSubset: { $setIsSubset : [ "$bar" , ['a','b','c'] ] }}},
{$match : { isSubset : true}}
]
)
collection
> db.col.find()
{ "_id" : ObjectId("5a6420d984eeec7b0b2f767b"), "bar" : [ "a" ] }
{ "_id" : ObjectId("5a6420d984eeec7b0b2f767c"), "bar" : [ "a", "c" ] }
{ "_id" : ObjectId("5a6420d984eeec7b0b2f767d"), "bar" : [ "d" ] }
aggregate
> db.col.aggregate([{$project : { bar : 1 , isSubset: { $setIsSubset : [ "$bar" , ['a','b','c'] ] }}}, {$match : {isSubset : true}}])
{ "_id" : ObjectId("5a6420d984eeec7b0b2f767b"), "bar" : [ "a" ], "isSubset" : true }
{ "_id" : ObjectId("5a6420d984eeec7b0b2f767c"), "bar" : [ "a", "c" ], "isSubset" : true }
>
EDIT
using find with $expr
db.col.find({$expr : { $setIsSubset : [ "$bar" , ['a','b','c'] ] }})
result
> db.col.find({$expr : { $setIsSubset : [ "$bar" , ['a','b','c'] ] }})
{ "_id" : ObjectId("5a6420d984eeec7b0b2f767b"), "bar" : [ "a" ] }
{ "_id" : ObjectId("5a6420d984eeec7b0b2f767c"), "bar" : [ "a", "c" ] }
>

How to merge multiple fields in a collection?

Example entry:
{ "_id" : "00-01#mail.ru", " pass" : 123654, "field2" : 235689, "field3" : "cccp123654", "field4" : "lhfrjy" }
Desired result:
{ "_id" : "00-01#mail.ru", " pass" : 123654, 235689, "cccp123654", "lhfrjy" }
I want to have two final fields (_id and pass).
I have attempted the following:
db.emails.aggregate([
{ "$project": {
"pass": { "$setUnion": [ "$field2", "$field3" ] }
}}
])
However, this results in the following error:
2018-01-22T03:01:26.074+0000 E QUERY [thread1] Error: command failed: {
"ok" : 0,
"errmsg" : "All operands of $setUnion must be arrays. One argument is of type: string",
"code" : 17043,
"codeName" : "Location17043"
} : aggregate failed :
_getErrorWithCode#src/mongo/shell/utils.js:25:13
doassert#src/mongo/shell/assert.js:16:14
assert.commandWorked#src/mongo/shell/assert.js:370:5
DBCollection.prototype.aggregate#src/mongo/shell/collection.js:1319:5
#(shell):1:1
Can someone assist?
we can convert $objectToArray and $slice after 1 element in array
> db.io.aggregate(
[
{$addFields : {arr : {$objectToArray : "$$ROOT"}}},
{$project : { pass : {$slice : ["$arr.v", 1, 20 ] }}}
]
).pretty()
result
{
"_id" : "00-01#mail.ru",
"pass" : [
123654,
235689,
"cccp123654",
"lhfrjy"
]
}
>

All fields search [duplicate]

This question already has answers here:
MongoDB Query Help - query on values of any key in a sub-object
(3 answers)
Closed 6 years ago.
This is my data set, which is part of a bigger json code. I want to write a query, which will match all fields inside the value chain.
Dataset:
"value_chain" : {
"category" : "Source, Make & Deliver",
"hpe_level0" : "gift Chain Planning",
"hpe_level1" : "nodemand to Plan",
"hpe_level2" : "nodemand Planning",
"hpe_level3" : "nodemand Sensing"
},
Example:
If someone searches for "gift", the query should scan through all fields, and if there is a match, return the document.
This is something I tried, but didnt work
db.sw_api.find({
value_chain: { $elemMatch: { "Source, Make & Deliver" } }
})
Sounds like you need to create $text index on all the text fields first since it performs a text search on the content of the fields indexed with a text index:
db.sw_api.createIndex({
"value_chain.category" : "text",
"value_chain.hpe_level0" : "text",
"value_chain.hpe_level1" : "text",
"value_chain.hpe_level2" : "text",
"value_chain.hpe_level3" : "text"
}, { "name": "value_chain_text_idx"});
The index you create is a composite index consisting of 5 columns, and mongo will automatically create the text namespace for you by default if you don't override it. With the above, if you don't specify the index name as
db.sw_api.createIndex({
"value_chain.category" : "text",
"value_chain.hpe_level0" : "text",
"value_chain.hpe_level1" : "text",
"value_chain.hpe_level2" : "text",
"value_chain.hpe_level3" : "text"
});
there is a potential error "ns name is too long (127 byte max)" since the text index will look like this:
"you_db_name.sw_api.$value_chain.category_text_value_chain.hpe_level0_text_value_chain.hpe_level1_text_value_chain.hpe_level2_text_value_chain.hpe_level3_text"
Hence the need to give it a name which is not too long if autogenerated by mongo.
Once the index is created, a db.sw_api.getIndexes() query will show you the indexes present:
/* 1 */
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "dbname.sw_api"
},
{
"v" : 1,
"key" : {
"_fts" : "text",
"_ftsx" : 1
},
"name" : "value_chain_text_idx",
"ns" : "dbname.sw_api",
"weights" : {
"value_chain.category" : 1,
"value_chain.hpe_level0" : 1,
"value_chain.hpe_level1" : 1,
"value_chain.hpe_level2" : 1,
"value_chain.hpe_level3" : 1
},
"default_language" : "english",
"language_override" : "language",
"textIndexVersion" : 3
}
]
Once you create the index, you can then do a $text search:
db.sw_api.find({ "$text": { "$search": "gift" } })

Resources