I have two collections. A 'users' collection and an 'events' collection. There is a primary key on the events collection which indicates which user the event belongs to.
I would like to count how many events a user has matching a certain condition.
Currently, I am performing this like:
db.users.find({ usersMatchingACondition }).forEach(user => {
const eventCount = db.events.find({
title: 'An event title that I want to find',
userId: user._id
}).count();
print(`This user has ${eventCount} events`);
});
Ideally what I would like returned is an array or object with the UserID and how many events that user has.
With 10,000 users - this is obviously producing 10,000 queries and I think it could be made a lot more efficient!
I presume this is easy with some kind of aggregate query - but I'm not familiar with the syntax and am struggling to wrap my head around it.
Any help would be greatly appreciated!
You need $lookup to get the data from events matched by user_id. Then you can use $filter to apply your event-level condition and to get a count you can use $size operator
db.users.aggregate([
{
$match: { //users matching condition }
},
{
$lookup:
{
from: 'events',
localField: '_id', //your "primary key"
foreignField: 'user_id',
as: 'user_events'
}
},
{
$addFields: {
user_events: {
$filter: {
input: "$user_events",
cond: {
$eq: [
'$$this.title', 'An event title that I want to find'
]
}
}
}
}
},
{
$project: {
_id: 1,
// other fields you want to retrieve: 1,
totalEvents: { $size: "$user_events" }
}
}
])
There isn't much optimization that can be done without aggregate but since you specifically said that
First, instead of
const eventCount = db.events.find({
title: 'An event title that I want to find',
userId: user._id
}).count();
Do
const eventCount = db.events.count({
title: 'An event title that I want to find',
userId: user._id
});
This will greatly speed up your queries because the find query actually fetches the documents first and then does the counting.
For returning an array you can just initialize an array at the start and push {userid: id, count: eventCount} objects to it.
Related
Actually, in the database I got a job that I request with a GET route:
So when I populate candidates I got this response format :
My problem here is I don't need that "id" object, I just need a "selected_candidates" array with users inside as objects. Actually it's an object, in another object that is in an Array.
Here the code from my controller (the populate is in the jobsService):
If I change the data format of the job like that way:
...It is working great (with a path: "candidates_selected") like expected BUT I don't have that "status" string (Normal because I don't have it anymore in the DataBase. Because of ObjectId):
I would like a solution to have them both, but maybe it's the limit of noSQL?
A solution without populate but with a Loop (I don't think it's a good idea):
I think there is no convenience way to achieve it. However you may try the aggregate framework from the native MongoDB driver.
Let your Mongoose schemas be ASchema and BSchema
const result = await ASchema.aggregate([
{$addFields: {org_doc: '$$ROOT'}}, // save original document to retrieve later
{$unwind: '$candidates_selected'},
{
$lookup: {
from: BSchema.collection.name,
let: {
selected_id: '$candidates_selected.id',
status: '$candidates_selected.status',
},
pipeline: [
{
$match: {$expr: {$eq: ['$$selected_id', '$_id']}}, // find candidate by id
},
{
$addFields: {status: '$$status'} // attach status
}
],
as: 'found_candidate'
}
},
{
$group: { // regroup the very first $unwind stage
_id: '$_id',
org_doc: {$first: '$org_doc'},
found_candidates: {
$push: {$arrayElemAt: ['$found_candidate', 0]} // result of $lookup is an array, concat them to reform the original array
}
}
},
{
$addFields: {'org_doc.candidates_selected': '$found_candidates'} // attach found_candidates to the saved original document
},
{
$replaceRoot: {newRoot: '$org_doc'} // recover the original document
}
])
I'm trying to use CASL for authorization check of nested items.
It uses mongoose for query data and check access.
My domain is that:
A "User" could has more "Vehicles"
A "Document" must have a Vehicle
Schema:
vehicle { users: [ {type: objectId, ref: 'user'} ] }
document { vehicle: {type: objectId, ref: 'vehicle' }}
To find the vehicle "by user" I do:
db.getCollection('vehicle').find(
{ users: {$in: [ ObjectId("5ae1a957d67500018efa2c9d") ]} }
)
That works.
In the documents collection, the data has records such as this:
{
"_id": ObjectId("5aeaad1277e8a6009842564d"),
"vehicle": ObjectId("5aea338b82d8170096b52ce9"),
"company": "Allianz",
"price": 500,
"date_start": ISODate("2018-05-02T22:00:00.000Z"),
"date_end": ISODate("2019-05-02T22:00:00.000Z"),
"createdAt": ISODate("2018-05-03T06:32:50.590Z"),
"updatedAt": ISODate("2018-05-03T06:32:50.590Z"),
"__v": 0
}
To find the document "by user" I do:
db.getCollection('document').find(
{ "vehicle.users": {$in: [ ObjectId("5ae1a957d67500018efa2c9d") ]} }
)
It doesn't work. Is possibile to do that in one single "find" query?
You can't do it in a simple MongoDB find() query, because the data about vehicle users exists in the vehicle collection, not the documents collection.
However, it is possible with an aggregation pipeline using the $lookup operator to link the data in two different collections. The aggregation would be something like this:
db.document.aggregate([
{$lookup: {
"from": "vehicle",
"localField": "vehicle",
"foreignField": "_id",
"as": "vehicleDetails",
}},
{$match: {"vehicleDetails.users" : ObjectId("5ae1a957d67500018efa2c9d")}}
])
You will probably need to add more stages to reshape the data the way you need it, but the key is to use $lookup to link the data from the two collections, then use $match to filter the set of results.
In order for this query to work you need to store users ids array in vehicle document. Neither Mongo nor CASL doesn't manage external references automatically.
Alternative solutions:
So, I see few ways:
Retrieve ids of all vehicles when you define rules. This works good in case if amount of vehicles not big (<= 1000)
const vehicleIds = await getVehicleIds(user)
can(['read', 'update'], 'document', { vehicle: { $in: vehicleIds } })
Denormalize your scheme. For example, add additional user_id field to vehicle document
Think whether you can embed document as subdocument to vechicle, something like this:
vehicle {
documents: [Document],
users: [ {type: objectId, ref: 'user'} ]
}
Just don't define rule per documents and enforce them in routes (REST or GraphQL doesn't matter).
app.get('/vehicle/:id/documents', async (req, res) => {
const vehicle = await Vehicle.findById(req.params.id)
req.ability.throwUnlessCan('read', vehicle)
const documents = Document.find({ vehicle: vehicle.id })
res.send({ documents })
})
Say I have a collection of documents, each one managing a discussion between a teacher and a student:
{
_id,
teacherId,
studentId,
teacherLastMessage,
studentLastMessage
}
I will get queries with 3 parameters: an _id, a userId and a message.
I'm looking for a way to update the teacherLastMessage field or studentLastMessage field depending on which one the user is.
At the moment, I have this:
return Promise.all([
// if user is teacher, set teacherLastMessage
db.collection('discussions').findOneAndUpdate({
teacherId: userId,
_id
}, {
$set: {
teacherLastMessage: message
}
}, {
returnOriginal: false
}),
// if user is student, set studentLastMessage
db.collection('discussions').findOneAndUpdate({
studentId: userId,
_id
}, {
$set: {
studentLastMessage: message
}
}, {
returnOriginal: false
})
]).then((results) => {
results = results.filter((result) => result.value);
if (!results.length) {
throw new Error('No matching document');
}
return results[0].value;
});
Is there a way to tell mongo to make a conditional update, based on the field matched? Something like this:
db.collection('discussions').findOneAndUpdate({
$or: [{
teacherId: userId
}, {
studentId: userId
}],
_id
}, {
$set: {
// if field matched was studentId, set studentLastMessage
// if field matched was teacherId, set teacherLastMessage
}
});
Surely it must be possible with mongo 3.2?
What you want would require referencing other fields inside of $set. This is currently impossible. Refer to this ticket as an example.
First of all, your current approach with two update queries looks just fine to me. You can continue using that, just make sure that you have the right indexes in place. Namely, to get the best performance for these updates, you should have two compound indexes:
{ _id: 1, teacherId: 1 }
{ _id: 1, studentId: 1 }.
To look at this from another perspective, you should probably restructure your data. For example:
{
_id: '...',
users: [
{
userId: '...',
userType: 'student',
lastMessage: 'lorem ipsum'
},
{
userId: '...',
userType: 'teacher',
lastMessage: 'dolor sit amet'
}
]
}
This would allow you to perform your update with a single query.
Your data structure is a bit weird, unless you have a specific business case which requires the data the be molded that way i would suggest creating a usertype unless a user can both be a teacher and a student then keep your structure.
The $set{} param can take a object, my suggestion is to do your business logic prior. You should already know prior to your update if the update is going to be for a teacher or student - some sort of variable should be set / authentication level to distinguish teachers from students. Perhaps on a successful login in the callback you could set a cookie/local storage. Regardless - if you have the current type of user, then you could build your object earlier, so make an object literal with the properties you need based on the user type.
So
if(student)
{
var updateObj = { studentLastMsg: msg }
}
else
{
var updateObj = { teacherLastMsg: msg }
}
Then pass in your update for the $set{updateObj} I'll make this a snippet - on mobile
I have 3 schema's like below:
User
var UserSchema = new Schema({
name: String
});
Actor
var ActorSchema = new Schema({
name: String
});
Rating
var RatingSchema = new Schema({
actor: {
type: mongoose.Schema.Types.ObjectId,
ref: 'Actor'
},
user: {
type: mongoose.Schema.Types.ObjectId,
ref: 'Actor'
},
userRating: Number
});
I want to send all actors info to the front end like [actor1, actor2 ...].
Each actor contain actor details and 'userRating' which is given by the user who is currently logged in.
A user can give ratings to multiple actors and an actor can receive ratings from multiple users. These will be stored in Ratings table.
I wrote something like this
Actor
.find({}) // get all actors and populate userRating into each actor
.populate({
path: 'userRating',
model: 'Rating',
match: { actor: {$eq: req.actor}, user: {$eq: req.user}},
select: 'userRating'
})
.exec(function(error, actors){
if(error)
res.status(501).json({error: error});
else
res.json(actors);
});
I got only actors in the result. actor object doesn't contain 'userRating'. can someone correct my query
It depends on what you are actually sending as input for the query parameters here. Also the main thing that you need to understand is that this is not a "JOIN", but in fact separate queries being issued by the mongoose software layer, so there are distinct differences in handling.
In the basic case where the "values" being supplied as parameters are actually the ObjectId values of the references, then you actually just want these directly in the main "query" rather than arguments to the .populate() action ( which is actually where the "additional queries" are happening ).
Furthermore your "relations/references" are in the Rating model, so that is where your query is issued instead:
Rating.find({
"actor": req.actor,
"user": req.user
}).populate("actor user").exec(function(err,ratings) {
// Matched ratings by actor and user supplied
})
If your parameters are instead the "name" data of each object, then since that information is not present in the Rating model until populated the only way mongoose can do this is to retrieve "all" of the Rating objects, then do the "population" with the "match" criteria, and finally filter out any results where the population was null due to un-matched items:
Rating.find().populate([
{ "path": "actor", "match": { "name": req.actor } },
{ "path": "user", "match": { "name": req.user } }
]).exec(function(err,ratings) {
// Now filter out the null results
ratings = ratings.filter(function(rating) {
return ( rating.actor != null && rating.user != null )
});
// Then work with filtered data
})
Of course that is highly inefficient since this is a "client" side operation and you are pulling in all of the Rating content "first". So what you really mean to do in this case is to actually do the "three" query operations yourself, and by getting the ObjectId values from both User and Actor models in order to apply the match to the Rating model instead:
async.parallel(
{
"user": function(callback) {
User.findOne({ "name": req.user },callback)
},
"actor": function(callback) {
Actor.findOne({ "name": req.actor },callback)
}
},
function(err,data) {
// Use returned _id values in query
Rating.find({
"actor": data.actor._id,
"user": data.user._id
}).populate("actor user").exec(err,ratings) {
// populated Rating results
});
}
)
Then the queries resolve the "only" ObjectId values you actually require and the final query on Rating only retrieves those results that actually match the conditions, rather than everything and doing a "post filter" operation.
As a final approach, if you have MongoDB 3.2 available, then you could alternately use the $lookup operation instead to perform the "JOINS" on the "server" instead:
Rating.aggregate(
[
{ "$lookup": {
"from": "users",
"localField": "user",
"foreignField": "_id",
"as": "user"
}},
{ "$unwind": "$user" },
{ "$match": { "user.name": req.user } },
{ "$lookup": {
"from": "actors",
"localField": "actor",
"foreignField": "_id",
"as": "actor"
}},
{ "$unwind": "actor" },
{ "$match": { "actor.name": req.actor } }
],
function(err,ratings) {
// populated on the server in one request
}
)
From the "client" point of view, this is just "one" request and response as opposed to what .populate() does. But it really is not more than a "server" side rendition of the "client" logic presented before.
So if looking up by values of "name", you should instead do the "three" query approach for optimal performance, since the aggregation version is still really working with a lot more data than it needs to.
Of course the "best" perspective is to simply use the ObjectId values to begin with.
Of course the main thing here is that information like "userRating" belongs to the Rating model, and that is therefore where you provide the "query" in all cases in order to retrieve that data. These are not "JOIN" operations like in SQL, so the "server" is not looking at the combined results then selecting the fields.
As a bit of self education turn on "debugging" to see how mongoose is actually issuing statements to the server. Then you will see how .populate() is actually applied:
mongoose.set("debug",true)
Let's say I have some Schema which has a virtual field like this
var schema = new mongoose.Schema(
{
name: { type: String }
},
{
toObject: { virtuals: true },
toJSON: { virtuals: true }
});
schema.virtual("name_length").get(function(){
return this.name.length;
});
In a query is it possible to sort the results by the virtual field? Something like
schema.find().sort("name_length").limit(5).exec(function(docs){ ... });
When I try this, the results are simple not sorted...
You won't be able to sort by a virtual field because they are not stored to the database.
Virtual attributes are attributes that are convenient to have around
but that do not get persisted to mongodb.
http://mongoosejs.com/docs/2.7.x/docs/virtuals.html
Virtuals defined in the Schema are not injected into the generated MongoDB queries. The functions defined are simply run for each document at the appropriate moments, once they have already been retrieved from the database.
In order to reach what you're trying to achieve, you'll also need to define the virtual field within the MongoDB query. For example, in the $project stage of an aggregation.
There are, however, a few things to keep in mind when sorting by virtual fields:
projected documents are only available in memory, so it would come with a huge performance cost if we just add a field and have the entire documents of the search results in memory before sorting
because of the above, indexes will not be used at all when sorting
Here's a general example on how to sort by virtual fields while keeping a relatively good performance:
Imagine you have a collection of teams and each team contains an array of players directly stored into the document. Now, the requirement asks for us to sort those teams by the ranking of the favoredPlayer where the favoredPlayer is basically a virtual property containing the most relevant player of the team under certain criteria (in this example we only want to consider offense and defense players). Also, the aforementioned criteria depend on the users' choices and can, therefore, not be persisted into the document.
To top it off, our "team" document is pretty large, so in order to mitigate the performance hit of sorting in-memory, we project only the fields we need for sorting and then restore the original document after limiting the results.
The query:
[
// find all teams from germany
{ '$match': { country: 'de' } },
// project only the sort-relevant fields
// and add the virtual favoredPlayer field to each team
{ '$project': {
rank: 1,
'favoredPlayer': {
'$arrayElemAt': [
{
// keep only players that match our criteria
$filter: {
input: '$players',
as: 'p',
cond: { $in: ['$$p.position', ['offense', 'defense']] },
},
},
// take first of the filtered players since players are already sorted by relevance in our db
0,
],
},
}},
// sort teams by the ranking of the favoredPlayer
{ '$sort': { 'favoredPlayer.ranking': -1, rank: -1 } },
{ '$limit': 10 },
// $lookup, $unwind, and $replaceRoot are in order to restore the original database document
{ '$lookup': { from: 'teams', localField: '_id', foreignField: '_id', as: 'subdoc' } },
{ '$unwind': { path: '$subdoc' } },
{ '$replaceRoot': { newRoot: '$subdoc' } },
];
For the example you gave above, the code could look something like the following:
var schema = new mongoose.Schema(
{ name: { type: String } },
{
toObject: { virtuals: true },
toJSON: { virtuals: true },
});
schema.virtual('name_length').get(function () {
return this.name.length;
});
const MyModel = mongoose.model('Thing', schema);
MyModel
.aggregate()
.project({
'name_length': {
'$strLenCP': '$name',
},
})
.sort({ 'name_length': -1 })
.exec(function(err, docs) {
console.log(docs);
});