Querying MongoDB collection by array intersection

Querying MongoDB collection by array intersection - node.js

What I have:
An array of strings that I wish to query with. ['a', 'b', 'c']
Data I'm querying against:
A collection of objects of type foo, all with a bar field. The bar field is an array of strings, the same type as the one I'm querying with, potentially with some of the same elements.
foo1 = { bar: ['a'] }
foo2 = { bar: ['d'] }
foo3 = { bar: ['a', 'c']}
What I need:
A query that returns all foo objects whose entire bar array is contained within the query array. In the example above, I'd want foo1 and foo3 to come back

using aggregation
you might need to use $setIsSubset in aggregate pipeline
db.col.aggregate(
[
{$project : { bar : 1 , isSubset: { $setIsSubset : [ "$bar" , ['a','b','c'] ] }}},
{$match : { isSubset : true}}
]
)
collection
> db.col.find()
{ "_id" : ObjectId("5a6420d984eeec7b0b2f767b"), "bar" : [ "a" ] }
{ "_id" : ObjectId("5a6420d984eeec7b0b2f767c"), "bar" : [ "a", "c" ] }
{ "_id" : ObjectId("5a6420d984eeec7b0b2f767d"), "bar" : [ "d" ] }
aggregate
> db.col.aggregate([{$project : { bar : 1 , isSubset: { $setIsSubset : [ "$bar" , ['a','b','c'] ] }}}, {$match : {isSubset : true}}])
{ "_id" : ObjectId("5a6420d984eeec7b0b2f767b"), "bar" : [ "a" ], "isSubset" : true }
{ "_id" : ObjectId("5a6420d984eeec7b0b2f767c"), "bar" : [ "a", "c" ], "isSubset" : true }
>
EDIT
using find with $expr
db.col.find({$expr : { $setIsSubset : [ "$bar" , ['a','b','c'] ] }})
result
> db.col.find({$expr : { $setIsSubset : [ "$bar" , ['a','b','c'] ] }})
{ "_id" : ObjectId("5a6420d984eeec7b0b2f767b"), "bar" : [ "a" ] }
{ "_id" : ObjectId("5a6420d984eeec7b0b2f767c"), "bar" : [ "a", "c" ] }
>

Related

Parse and modify JSON

I've a JSON with next structure and data:
[ {
"id" : 716612,
"type" : "ad",
"stats" : [ {
"day" : "2020-06-01",
"impressions" : 1956,
"clicks" : 1,
"reach" : 1782
},
{
"day" : "2020-06-13",
"spent" : "73.32",
"reach" : 1059
} ]
}, {
"id" : 414290,
"type" : "campaign",
"stats" : [ {
"day" : "2020-05-21",
"effective_cost_per_click" : "31.200",
"effective_cost_per_mille" : "108.337"
},
{
"day" : "2020-05-17",
"impressions" : 1,
"reach" : 1,
"ctr" : "0.000",
"uniq_views_count" : 1
} ]
} ]
I need to map id and type from top level with data inside stats to get result like this:
[ {
"id" : 716612,
"type" : "ad",
"day" : "2020-06-01",
"impressions" : 1956,
"clicks" : 1,
"reach" : 1782
},
{
"id" : 716612,
"type" : "ad",
"day" : "2020-06-13",
"spent" : "73.32",
"reach" : 1059
},
...
I tried with:
def json = new JsonSlurper().parseText(text)
def result = json.collectMany{ a ->
a["stats"].collectMany{ b ->
b.collect{
[id: a.id,
type: a.type
]
}
}
}
But it returns only id and type fields without stats. I thought that I'm looping through stat and just adding needed fields from above. I guess I don't get the difference between collectMany and collect?

You were close 😁
You want to collect the stat plus the id and type, so you need:
def result = json.collectMany { a ->
a.stats.collect { b ->
[ id: a.id, type: a.type ] + b
}
}

Sort JSON document by values embedded in an array of objects

I have a document in the below format. The goal is to group the document by student name and sort it by rank in the ascending order. Once that is done, iterate through the rank(within a student) and if each subsequent rank is greater than the previous one, the version field needs to be incremented. As part of a pipeline, student_name will be passed to me so matching by student name should be good instead of grouping.
NOTE: Tried it with python and works to some extent. A python solution would also be great!
{
"_id" : ObjectId("5d389c7907bf860f5cd11220"),
"class" : "I",
"students" : [
{
"student_name" : "AAA",
"Version" : 2,
"scores" : [
{
"value" : "50",
"rank" : 2
},
{
"value" : "70",
"rank" : 1
}
]
},
{
"student_name" : "BBB",
"Version" : 5,
"scores" : [
{
"value" : 80,
"rank" : 2
},
{
"value" : 100,
"rank" : 1
},
{
"value" : 100,
"rank" : 1
}
]
}
]
}
I tried this piece of code to sort
def version(student_name):
db.column.aggregate(
[
{"$unwind": "$students"},
{"$unwind": "$students.scores"},
{"$sort" : {"students.scores.rank" : 1}},
{"$group" : {"students.student_name}
]
)
for i in range(0,(len(students.scores)-1)):
if students.scores[i].rank < students.scores[i+1].rank:
tag.update_many(
{"$inc" : {"students.Version":1}}
)
The expected output for student AAA should be
{
"_id" : ObjectId("5d389c7907bf860f5cd11220"),
"class" : "I",
"students" : [
{
"student_name" : "AAA",
"Version" : 3, #version incremented
"scores" : [
{
"value" : "70",
"rank" : 1
},
{
"value" : "50",
"rank" : 2
}
]
}

I was able to sort the document.
pipeline = [
{"$unwind": "$properties"},
{"$unwind": "$properties.values"},
{"$sort" : {"$properties.values.rank" : -1}},
{"$group": {"_id" : "$properties.property_name", "values" : {"$push" : "$properties.values"}}}
]
import pprint
pprint.pprint(list(db.column.aggregate(pipeline)))

Compare two Collections in MongoDB and show the differences

I'm trying to compare two collections in mongodb. I have Collection A and Collection B and I only want to show the Differences. How is this done? I thought it could be done with the Aggregation Framework but I did not get the expected values. I just want to see which Document in Collection A is not the same as in Collection B.
Collection: A
{
"_id" : ObjectId("x"),
"p" : [
{
"t" : 1,
"p" : 123
},
{
"t" : 2,
"p" : 123
}
]
},
{
"_id" : ObjectId("y"),
"p" : [
{
"t" : 1,
"p" : 234
},
{
"t" : 2,
"p" : 234
}
]
}
Collection: B
{
"_id" : ObjectId("x"),
"p" : [
{
"t" : 1,
"p" : 123
},
{
"t" : 2,
"p" : 538458 // OTHER VALUE HERE
}
]
},
{
"_id" : ObjectId("y"),
"p" : [
{
"t" : 1,
"p" : 234
},
{
"t" : 2,
"p" : 234
}
]
}

You could export each collection by using mongoexport, this will create a file with all the documents, but make sure you omit the _id (documents maybe identical but will have different ids):
mongoexport --db db_name --collection collection_name | sed '/"_id":/s/"_id":[^,]*,//' > file_name.json
Then you can compare the two files using diff.

How to merge multiple fields in a collection?

Example entry:
{ "_id" : "00-01#mail.ru", " pass" : 123654, "field2" : 235689, "field3" : "cccp123654", "field4" : "lhfrjy" }
Desired result:
{ "_id" : "00-01#mail.ru", " pass" : 123654, 235689, "cccp123654", "lhfrjy" }
I want to have two final fields (_id and pass).
I have attempted the following:
db.emails.aggregate([
{ "$project": {
"pass": { "$setUnion": [ "$field2", "$field3" ] }
}}
])
However, this results in the following error:
2018-01-22T03:01:26.074+0000 E QUERY [thread1] Error: command failed: {
"ok" : 0,
"errmsg" : "All operands of $setUnion must be arrays. One argument is of type: string",
"code" : 17043,
"codeName" : "Location17043"
} : aggregate failed :
_getErrorWithCode#src/mongo/shell/utils.js:25:13
doassert#src/mongo/shell/assert.js:16:14
assert.commandWorked#src/mongo/shell/assert.js:370:5
DBCollection.prototype.aggregate#src/mongo/shell/collection.js:1319:5
#(shell):1:1
Can someone assist?

we can convert $objectToArray and $slice after 1 element in array
> db.io.aggregate(
[
{$addFields : {arr : {$objectToArray : "$$ROOT"}}},
{$project : { pass : {$slice : ["$arr.v", 1, 20 ] }}}
]
).pretty()
result
{
"_id" : "00-01#mail.ru",
"pass" : [
123654,
235689,
"cccp123654",
"lhfrjy"
]
}
>

Mongoose mapReduce : reduce returns object or array?

I Have the following Collection :
/* 0 */
{
"clientID" : ObjectId("51b9c10d91d1a3a52b0000b8"),
"_id" : ObjectId("532b4f1cb3d2eacb1300002b"),
"answers" : [],
"questions" : []
}
/* 1 */
{
"clientID" : ObjectId("51b9c10d91d1a3a52b0000b8"),
"_id" : ObjectId("532b6b9eb3d2eacb1300002c"),
"answers" : [
"1",
"8"
],
"questions" : [
"1",
"2",
"3"
]
}
/* 2 */
{
"clientID" : ObjectId("51b9c10d91d1a3a52b0000b8"),
"_id" : ObjectId("532b6baeb3d2eacb1300002d"),
"answers" : [
"1",
"8"
],
"questions" : [
"1",
"2",
"3"
]
}
/* 3 */
{
"clientID" : ObjectId("5335f9d864e2b1290c00012e"),
"_id" : ObjectId("533b828146ca43634000002d"),
"answers" : [
"ORANGE"
],
"questions" : [
"Color"
]
}
/* 4 */
{
"clientID" : ObjectId("5335f9d864e2b1290c00012e"),
"_id" : ObjectId("5351be327b539a4d1a00002b"),
"answers" : [
"ORANGE"
],
"questions" : [
"Color"
]
}
/* 5 */
{
"clientID" : ObjectId("5335f9d864e2b1290c00012e"),
"_id" : ObjectId("5351be5ec89d717d1a00002b"),
"answers" : [
"ORANGE"
],
"questions" : [
"Color"
]
}
I am running the following code in order to find how many times the (questions,answers) combination appears in the collection:
o.map= function(){
emit({"questions" : this.questions, "answers" :this.answers },this.clientID)
};
o.reduce = function(answers, collection){
return collection.length;
};
logSearchDB.mapReduce(o,function (err, results) {
results.sort(function(a, b){return b.value-a.value});
for (var i = 0; i < results.length; i++) {
console.log(JSON.stringify(results[i]))
};
})
The output is:
{"_id":{"questions":[],"answers":[]},"value":"51b9c10d91d1a3a52b0000b8"}
{"_id":{"questions":["Color"],"answers":["ORANGE"]},"value":3}
{"_id":{"questions":["1","2","3"],"answers":["1","8"]},"value":2}
I expected that the first row will have "value" : 1
I guess the 'reduce' function got a 'collection' object : "51b9c10d91d1a3a52b0000b8", instead of getting an array : ["51b9c10d91d1a3a52b0000b8"].
Why the map reduce doesn't collect everything into an array?

The reason why you have just a plain value in that first row is because there was only one occurrence of your key value. This is generally how mapReduce works, at least in the way it was specified in the original papers.
So the reduce function is not actually called when there only is a single key. To work around this you use the finalize function in your map reduce:
var finalize = function(key,value) {
if ( typeof(value) != "number" )
value = 1;
return value;
};
db.collection.mapReduce(
mapper,
reducer,
{
"finalize": finalize,
"out": { "inline": 1 }
}
);
That runs over all of the output and sees that when the value is seen to be not a nunber, being the clientID you are emitting, then the value is set at 1 because that is how hany are in the grouping.
Really your query is better suited to the aggregation framework than mapReduce. The aggregation framework is a native code implementation as opposed to using a JavaScript interpreter. It runs much faster than mapReduce:
db.collection.aggregate([
{ "$group": {
"_id": {
"questions": "$questions",
"answers": "$answers"
},
"count": { "$sum": 1 }
}}
])
So it is the better option to use. It was a later introduction to MongoDB so people still tend to think in terms of mapReduce or otherwise there is legacy code from earlier versions of MongoDB. But this has been around for quite a while now.
Also see the operator reference for the aggregation framework.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Querying MongoDB collection by array intersection - node.js

Related

Parse and modify JSON

Sort JSON document by values embedded in an array of objects

Compare two Collections in MongoDB and show the differences

How to merge multiple fields in a collection?

Mongoose mapReduce : reduce returns object or array?

Categories

Resources