How to aggregate fields from embedded documents in Mongoose - node.js

Coverage Model.
var CoverageSchema = new Schema({
module : String,
source: String,
namespaces: [{
name: String,
types: [{
name: String,
functions: [{
name: String,
coveredBlocks: Number,
notCoveredBlocks: Number
}]
}]
}]
});
I need coveredBlocks aggregations on every level:
*Module: {moduleBlocksCovered}, // SUM(blocksCovered) GROUP BY module, source
**Namespaces: [{nsBlocksCovered}] // SUM(blocksCovered) GROUP BY module, source, ns
****Types: [{typeBlocksCovered}] // SUM(blocksCovered) BY module, source, ns, type
How do I get this result with Coverage.aggregate in Mongoose ?
{
module: 'module1',
source: 'source1',
coveredBlocks: 7, // SUM of all functions in module
namespaces:[
name: 'ns1',
nsBlocksCovered: 7, // SUM of all functions in namespace
types:[
{
name: 'type1',
typeBlocksCovered: 7, // SUM(3, 4) of all function in type
functions[
{name: 'func1', blocksCovered: 3},
{name:'func2', blocksCovered: 4}]
}
]
]
}

My ideas is to deconstruct everything using $unwind then reconstruct the document back again using group and projection.
aggregate flow:
//deconstruct functions
unwind(namesapces)
unwind(namespaces.types)
unwind(namespace.types.functions)
//cal typeBlocksCovered
group module&source ,ns,type to sum functions blocksCovered->typeBlocksCovered + push functions back to types
project to transform fields to be easier for next group
// cal nsBlocksCovered
group module&source ,ns to sum typeBlocksCovered -> nsBlocksCovered) + push types back to ns
project to transform fields to be easier for next group
// cal coveredBlocks
group module&source to sum nsBlocksCovered -> coveredBlocks
project to transform fields to match your mongoose docs
My sample query with mongo shell syntax and its seem working , guess is you are using collection name "Coverage"
db.Coverage.aggregate([
{"$unwind":("$namespaces")}
,{"$unwind":("$namespaces.types")}
,{"$unwind":("$namespaces.types.functions")}
,{"$group": {
_id: {module:"$module", source:"$source", nsName: "$namespaces.name", typeName : "$namespaces.types.name"}
, typeBlocksCovered : { $sum : "$namespaces.types.functions.blocksCovered"}
, functions:{ "$push": "$namespaces.types.functions"}}}
,{"$project" :{module:"$_id.module", source:"$_id.source"
,namespaces:{
name:"$_id.nsName"
,types : { name: "$_id.typeName",typeBlocksCovered : "$typeBlocksCovered" ,functions: "$functions"}
}
,_id:0}}
,{"$group": {
_id: {module:"$module", source:"$source", nsName: "$namespaces.name"}
, nsBlocksCovered : { $sum : "$namespaces.types.typeBlocksCovered"}
, types:{ "$push": "$namespaces.types"}}}
,{"$project" :{module:"$_id.module", source:"$_id.source"
,namespaces:{
name:"$_id.nsName"
,nsBlocksCovered:"$nsBlocksCovered"
,types : "$types"
}
,_id:0}}
,{"$group": {
_id: {module:"$module", source:"$source"}
, coveredBlocks : { $sum : "$namespaces.nsBlocksCovered"}
, namespaces:{ "$push": "$namespaces"}}}
,{"$project" :{module:"$_id.module", source:"$_id.source", coveredBlocks : "$coveredBlocks", namespaces: "$namespaces",_id:0}}
])

Related

MongoDB aggregation $group stage by already created values / variable from outside

Imaging I have an array of objects, available before the aggregate query:
const groupBy = [
{
realm: 1,
latest_timestamp: 1318874398, //Date.now() values, usually different to each other
item_id: 1234, //always the same
},
{
realm: 2,
latest_timestamp: 1312467986, //actually it's $max timestamp field from the collection
item_id: 1234,
},
{
realm: ..., //there are many of them
latest_timestamp: ...,
item_id: 1234,
},
{
realm: 10,
latest_timestamp: 1318874398, //but sometimes then can be the same
item_id: 1234,
},
]
And collection (example set available on MongoPlayground) with the following schema:
{
realm: Number,
timestamp: Number,
item_id: Number,
field: Number, //any other useless fields in this case
}
My problem is, how to $group the values from the collection via the aggregation framework by using the already available set of data (from groupBy) ?
What have been tried already.
Okay, let skip crap ideas, like:
for (const element of groupBy) {
//array of `find` queries
}
My current working aggregation query is something like that:
//first stage
{
$match: {
"item": 1234
"realm" [1,2,3,4...,10]
}
},
{
$group: {
_id: {
realm: '$realm',
},
latest_timestamp: {
$max: '$timestamp',
},
data: {
$push: '$$ROOT',
},
},
},
{
$unwind: '$data',
},
{
$addFields: {
'data.latest_timestamp': {
$cond: {
if: {
$eq: ['$data.timestamp', '$latest_timestamp'],
},
then: '$latest_timestamp',
else: '$$REMOVE',
},
},
},
},
{
$replaceRoot: {
newRoot: '$data',
},
},
//At last, after this stages I can do useful job
but I found it a bit obsolete, and I already heard that using [.mapReduce][1] could solve my problem a bit faster, than this query. (But official docs doesn't sound promising about it) Does it true?
As for now, I am using 4 or 5 stages, before start working with useful (for me) documents.
Recent update:
I have checked the $facet stage and I found it curious for this certain case. Probably it will help me out.
For what it's worth:
After receiving documents after the necessary stages I am building a representative cluster chart, that you may also know as a heatmap
After that I was iterating each document (or array of objects) one-by-one to find their correct x and y coordinated in place which should be:
[
{
x: x (number, actual $price),
y: y (number, actual $realm),
value: price * quantity,
quantity: sum_of_quantity_on_price_level
}
]
As for now, it's old awful code with for...loop inside each other, but in the future, I will be using $facet => $bucket operators for that kind of job.
So, I have found an answer to my question in another, but relevant way.
I was thinking about using $facet operator and to be honest, it's still an option, but using it, as below is a bad practice.
//building $facet query before aggregation
const ObjectQuery = {}
for (const realm of realms) {
Object.assign(ObjectQuery, { `${realm.name}` : [ ... ] }
}
//mongoose query here
aggregation([{
$facet: ObjectQuery
},
...
])
So, I have chosen a $project stage and $switch operator to filter results, such as $groups do.
Also, using MapReduce could also solve this problem, but for some reason, the official Mongo docs recommends to avoid using it, and choose aggregation: $group and $merge operators instead.

MongoDB merge two collections with unmatched documents

I am trying to compare and find different documents from two collections
below are the samples, Mongodb version:4.0, ORM:mongoose
**col1: Has one new document**
{ "id" : 200001, "mobileNo" : #######001 }
{ "id" : 200002, "mobileNo" : #######002 } //mobileNo may not be unique.
{ "id" : 200003, "mobileNo" : #######002 }
{ "id" : 200004, "mobileNo" : #######004 }
**col2:**
{ "id" : 200001, "mobileNo" : #######001 }
{ "id" : 200002, "mobileNo" : #######002 }
{ "id" : 200003, "mobileNo" : #######003 }
Now I want to insert the document { "id" : 200004, "mobileNo" : #######004 } from col1 to col2
i.e; the documents which doesn't match.
This is what I've tried so far :
const col1= await Col1.find({}, { mobileNo: 1,id: 1, _id: 0 })
col1.forEach(async function (col1docs) {
let col2doc = await Col2.find({ mobileNo: { $ne: col1docs.mobileNo},
id:{$ne:col1docs.id} }, { mobileNo: 1, _id: 0, id: 1 })
if (!(col2doc)) {
Col2.insertMany(col1docs);
}
});
I have also tried with $eq instead of $ne but neither i am getting the unmatched documents nor they are getting inserted. Any suggestions??? Combination of id+phoneNo is unique
I would say instead of doing two .find() calls plus iteration & then third call to write data, try this query :
db.col1.aggregate([
{
$lookup: {
from: "col2",
let: { id: "$id", mobileNo: "$mobileNo" },
pipeline: [
{
$match: { $expr: { $and: [ { $eq: [ "$id", "$$id" ] }, { $gte: [ "$mobileNo", "$$mobileNo" ] } ] } }
},
{ $project: { _id: 1 } } // limiting to `_id` as we don't need entire doc of `col2` - just need to see whether a ref exists or not
],
as: "data"
}
},
{ $match: { data: [] } // Filtering leaves docs in `col1` which has no match in `col2`
},
{ $project: { data: 0, _id: 0 } }
])
Test : mongoplayground
Details : From the above query you're taking advantage of specifying conditions in $lookup to get docs from col1 which have reference in col2. Let's say $lookup will run on each document of col1 - So with the unique combination of id & mobileNo from current document in col1 has a matching in col2 then col2 doc's _id will be pushed in data array, at the end what we get out of col1 is data: [] to say no matching docs were found for these col1 doc's. Now you can just write all the returned docs to col2 using .insertMany(). Actually you can do this entire thing using $merge on MongoDB version > 4.2 without any need of 2nd write call(.insertMany()).
For your scenario on MongoDB version > 4.2 something like this will merge docs to second collection :
{ $merge: 'col2' } // Has to be final stage in aggregation
Note : If this has to be done periodically - no matter how you do this, try to minimize data that you're operating on, maybe maintain a time field & you can use that field to filter docs first & do this job, or you can also take advantage of _id to say we've done for all these docs in last run & we need to start from this docs - which helps you a lot to reduce data to be worked on. Additionally don't forget to maintain indexes.

Using $concat with $project is giving error : 'MongoError: $concat only supports strings, not double'?

I have a mongoose model in which some fields are like :
var AssociateSchema = new Schema({
personalInformation: {
familyName: { type: String },
givenName: { type: String }
}
})
I want to perform a '$regex' on the concatenation of familyName and givenName (something like 'familyName + " " + 'givenName'), for this purpose I'm using aggregate framework with $concat inside $project to produce a 'fullName' field and then '$regex' inside $match to search on that field. The code in mongoose for my query is:
Associate.aggregate([
{ $project: {fullName: { $concat: [
'personalInformation.givenName','personalInformation.familyName']}}},
$match: { fullName: { 'active': true, $regex: param, $options: 'i' } }}
])
But it's giving me error:
MongoError: $concat only supports strings, not double on the first
stage of my aggregate pipeline i.e $project stage.
Can anyone point out what I'm doing wrong ?
I also got this error and then discovered that indeed one of the documents in the collection was to blame. They way I fished it out was by filtering by field type as explained in the docs:
db.addressBook.find( { "zipCode" : { $type : "double" } } )
I found the field had the value NaN, which to my eyes wouldn't be a number, but mongodb interprets it as such.
Looking at your code, I'm not sure why $concat isn't working for you unless you've had some integers sneak into some of your document fields. Have you tried having a $-sign in front of your concatenated values? as in, '$personalInformation.givenName'? Are you sure every single familyName and givenName is a string, not a double, in your collection? All it takes is one double for your $concat to fold.
In any case, I had a similar type mismatch problem with actual doubles. $concat indeed supports only strings, and usually, all you'd do is cast any non-strings to strings.. but alas, at the time of this writing MongoDB 3.6.2 does not yet support integer/double => string casting, only date => string casting. Sad face.
That said, try adding this projection hack at the top of your query. This worked for me as a typecast. Just make sure you provide a long enough byte length (128-byte name is pretty long so you should be okay).
{
$project: {
castedGivenName: {
$substrBytes: [ 'personalInformation.givenName', 0, 128 ]
},
castedFamilyName: {
$substrBytes: [ 'personalInformation.familyName', 0, 128 ]
}
},
{
$project: {
fullName: {
$concat: [
'$castedGivenName',
'$castedFamilyName'
]
}
}
},
{
$match: { fullName: { 'active': true, $regex: param, $options: 'i' } }
}
I managed to make it work by using $substr method, so the $project part of my aggregate pipeline is now:
`$project: {
fullName: {
$concat: [
{ $substr: ['$personalInformation.givenName', 0, -1] }, ' ', { $substr: ['$personalInformation.familyName', 0, -1] }
]
}
}
}`

find records where field type is string

I have this records :
{id : 1 , price : 5}
{id : 2 , price : "6"}
{id : 3 , price : 13}
{id : 4 , price : "75"}
I want to build a query that get just record who have price with type "string"
so, it will get :
{id : 2 , price : "6"}
{id : 4 , price : "75"}
You can use the $type query operator to do this:
db.test.find({price: {$type: 2}})
If you're using MongoDB 3.2+, you can also use the string alias for the type:
db.test.find({price: {$type: 'string'}})
While #JohnnyHK's answer is absolutely correct in most cases, MongoDB also returns documents where field is an array and any of the elements in that array have that type (docs). So for example, the document
{
_id: 1,
tags: ['123']
}
is returned for the query Books.find({ tags: { $type: "string" } }) as well. To prevent this, you can adjust the query to be
Books.find({
tags: {
$type: "string",
$not: {
$type: "array"
}
}
})

MongoDB: Point not in interval when using $near operator with $maxDistance

When I try to find all members within 50km of Salt Lake City, Utah from the Mongo shell I get the error:
error: {
"$err" : "point not in interval of [ -180, 180 ] :: caused by :: { 0: 0.0, 1: 50000.0 }",
"code" : 16433
}
Here is the query I am running:
db.members.find(
{ 'geo.point' :
{ $near :
{
$geometry : {
type : "Point" ,
coordinates : [ 111.000 , 40.000 ]
},
$maxDistance : 50000
}
}
}
)
Member schema is like this:
var memberSchema = mongoose.Schema({
name: {
first: {type:String, default:''},
last: {type:String, default:''},
},
geo: {
latitude: {type:String, default:''},
longitude: {type:String, default:''},
country: {type:String, default:''},
state: {type:String, default:''},
place: {type:String, default:''},
zip: {type:String, default:''},
point: {type: [Number], index: '2d'}
}
});
Member object in DB looks like this:
{
"_id" : ObjectId("xxxxxxxxxxxxxxxxxxx"),
"name": {
"first": "Thom",
"last": "Allen"
},
"geo" : {
"point" : [ -111.8833, 40.7500 ],
"zip" : "84115",
"state" : "UT",
"country" : "US",
"longitude" : "-111.8833",
"latitude" : "40.7500"
}
}
Is it possible that my fields are not stored in the correct format? If I change 50000 to anything below 180 it will work, but that is not how it should function according to the docs here:
http://docs.mongodb.org/manual/reference/operator/query/near/
** Just a heads up, the proper mongo location array IS in fact [longitude, latitude].
A few things. First, I think your query is off - you are querying for coordinates : [ 111.000 , 40.000 ] and it should be coordinates : [ -111.000 , 40.000 ]
Second, the example data point your provide [ -111.8833, 40.7500 ] is more than 50 km from your corrected query point, it's actually about 122 km (test it here: http://andrew.hedges.name/experiments/haversine/ )
So, correcting for those two issues if I store the data in mongodb as you have stored it I can do the following:
1) create the correct index:
db.members.ensureIndex({ "geo.point": "2dsphere" })
2) run this query:
db.members.find({ 'geo.point':
{$geoWithin:
{$centerSphere: [[ -111.000 , 40.000 ], 113/6371]}
}
} )
Note that I've divided 113 km/ 6371 which gives you radians which is what is required for this specific query.
Try it yourself. In general you will be better off if you can store things in the future using GeoJSON but with your existing schema and the above index and query I'm able to get the correct results.
What you have in your data is the format for legacy co-ordinate pairs but you are trying to query using the GeoJSON syntax.
The only valid index form for legacy co-ordinate pairs is a "2d" index, so if you have created a "2d sphere" index that will not work. So you need to remove any "2d sphere" index and create a "2d" index as follows:
db.members.ensureIndex({ "geo.point": "2d" })
If you actually intend to use the GeoJSON form and "2dsphere" index type, then you need the data to support it, for example:
{
"loc" : {
"type" : "Point",
"coordinates" : [ 3, 6 ]
}
}
So it needs that underlying structure of "type" and "coordinates" in order to use this index type and query form.

Resources