Mongoose collection statistics / manipulations queries - node.js

first, a comment. The collection described is simplified, for this question. I'm interesting in understanding how to manipulate a mongo db and get statistics of my data.
Let's say I have a collection with test results. The schema is:
Results {
_id: ObjectId
TestNumber: int
result: String // this contains "pass" or "fail"
// additional data
}
For each test can be many reports, so most likely each TestNumber appears in more than one document.
How can I perform a query which returns this info on the entire collection:
TestNumber | count of result == "pass" | count of result == "fail"

You can use the below aggregation operations pipelined together:
Group all the documents based on their testNumber and the type of
result together, so for every testNumber, we would have two
groups each, one for fail and another for pass, with the count of
documents in each group.
Project a variable "pass" for the group containing the result as
pass, and fail for the other group.
Group together the documents again based on the testNumber, and
push the pass and fail documents into an array.
Project the fields as required.
The Code:
Results.aggregate([
{$group:{"_id":{"testNumber":"$testNumber","result":"$result"},
"count":{$sum:1}}},
{$project:{"_id":0,
"testNumber":"$_id.testNumber",
"result":{$cond:[{$eq:["$_id.result","pass"]},
{"pass":"$count"},
{"fail":"$count"}]}}},
{$group:{"_id":"$testNumber",
"result":{$push:"$result"}}},
{$project:{"testNumber":"$_id","result":1,"_id":0}}
],function(a,b){
// post process
})
Sample Data:
db.collection.insert([
{
"_id":1,
"testNumber":1,
"result":"pass"
},
{
"_id":2,
"testNumber":1,
"result":"pass"
},
{
"_id":3,
"testNumber":1,
"result":"fail"
},
{
"_id":4,
"testNumber":2,
"result":"pass"
}])
Sample o/p:
{ "result" : [ { "pass" : 1 } ], "testNumber" : 2 }
{ "result" : [ { "fail" : 1 }, { "pass" : 2 } ], "testNumber" : 1 }
iterating doc.result will give you the pass count and the number of failed tests for the testNumber.

Related

how to select random documents with some conditions fulfilled in MongoDB

Basically I have documents in which I have on field called "Difficulty Level" and value of this filed is between 1 to 10 for each documents.
So, I have to select random 10 or 20 documents so that in randomly selected documents , atleast 1 document should be there for each difficulty level i.e. from 1 to 10. means there should atlease one document with "Difficulty level" : 1 ,"Difficulty level" : 2 ,"Difficulty level" : 3 ............."Difficulty level" : 10.
So, How can I select documents randomly with this condition fulfilled ?
Thanks
I tried $rand operator for selecting random documents but can't getting solution for that condition.
If I've understood correctly you can try something like this:
The goal here is to create a query like this example
This query gets two random elements using $sample, one for level1 and another for level2. And using $facet you can get multiple results.
db.collection.aggregate([
{
"$facet": {
"difficulty_level_1": [
{
"$match": { "difficulty_level": 1 } },
{ "$sample": { "size": 1 } }
],
"difficulty_level_2": [
{ "$match": { "difficulty_level": 2 } },
{ "$sample": { "size": 1 } }
]
}
}
])
So the point is to do this query in a dynamic way. So you can use JS to create the object query an pass it to the mongo call.
const random = Math.floor((Math.random()*10)+1) // Or wathever to get the random number
let query = {"$facet":{}}
for(let i = 1 ; i <= random; i++){
const difficulty_level = `difficulty_level_${i}`
query["$facet"][difficulty_level] = [
{ $match: { difficulty_level: i }},
{ $sample: { size: 1 }}
]
}
console.log(query) // This output can be used in mongoplayground and it works!
// To use the query you can use somethiing like this (or other way you call the DB)
this.db.aggregate([query])

MongoDB query to find most recently added object in an array within a document then making a further query based on that result

I have two collections in MongoDB; users & challenges.
The structure of the users collection looks like this:
name: "John Doe"
email: "john#doe.com"
progress: [
{
_id : ObjectId("610be25ae20ce4872b814b24")
challenge: ObjectId("60f9629edd16a8943d2cab9b")
completed: true
date_completed: 2021-08-06T12:15:32.129+00:00
}
{
_id : ObjectId("611be24ae32ce4772b814b32")
challenge: ObjectId("60g6723efd44a6941l2cab81")
completed: true
date_completed: 2021-08-07T12:15:32.129+00:00
}
]
date: 2021-08-05T13:06:34.129+00:00
The structure of the challenges collection looks like this:
_id: ObjectId("610be25ae20ce4872b814b24")
section_no: 1
section_name: "Print Statements"
challenge_no: 1
challenge_name: "Hello World!"
default_code: "public class Main {public static void main(String[] args) {}}"
solution: "Hello World!"
What I want to be able to do is find the most recent entry in a particular user's 'progress' array within the users collection and based on that result I want to query the challenges collection to find the next challenge for that user.
So say the most recent challenge entry in that user's 'progress' array is...
{
_id : ObjectId("611be24ae32ce4772b814b32")
challenge: ObjectId("60g6723efd44a6941l2cab81")
completed: true
date_completed: 2021-08-07T12:15:32.129+00:00
}
...which is Section 1 Challenge 2. I want to be able to query the challenges collection to return Section 1 Challenge 3, and if that doesn't exist then return Section 2 Challenge 1.
Apologies if this is worded poorly, I am fairly new to MongoDb and unsure of how to create complex queries in it.
Thanks in advance!
One approach:
[
{ // Unwind all arrays
"$unwind":"$progress"
},
{ // Sort in descending order all documents
"$sort":{
"progress.date_completed":-1
}
},
{ // Group them together again but pick only the most recent array element
"$group":{
"_id":"$_id",
"latestProgress":{
"$first":"$progress"
}
}
},
{ // Join with other collection
"$lookup":{
"from":"challenges",
"localField":"latestProgress.challenge",
"foreignField":"challenge",
"as":"Progress"
}
},
{ // Only pick the first array element (since there will be just one)
"$set":{
"Progress":{
"$first":"$Progress"
}
}
}
]
I have provided a comment for each stage so that it would be easier to understand the idea. I'm not confident it's the best approach but it does work since I have tested.
Just a note that there could be a case where Progress field is missing. In that case there is no such challenge document.

How can I do multiple queries to mongo at one request

Let say I have a collection of Person{email: 'actual email', ..other data} and want to query if Person exists with given email and retrieve it data if so or get a null if not.
If i want to do that once than no problem just do a query, through mongoose using Person.findOne() or whatever.
But what if I have to do a check for 25-100 given emails? Of course I can just send a tons of requests to mongodb and retrieve the data but it seems a vast of network.
Is there a good and perfomant way to query a mongodb with multiple clauses in single batch like findBatch([{email: 'email1'}, {email: 'email2'}...{email: 'emailN'} ]) and got as result [document1,null,document3,null, documentN] where null is for not matched find criterias?
Currently I see only one option:
Huge find with single {email: $in: [] } query and that do a matching through the searching on the server side in application logic. Cons: quite cumbersome and error prone if you have more than one search criteria.
Is there any better ways to implement such thing?
Try this:
Replace the arrayOfEmails with your query array
Replace emailField with the actual name in your db documents
db.collName.aggregate([
{
"$match" : {
"emailField" : {
"$in" : arrayOfEmails
}
}
},
{
"$group" : {
"_id" : null,
"docs" : {
"$push" : {
"$cond" : [
{
"$in" : [
arrayOfEmails,
[
"$emailField"
]
]
},
"$$ROOT",
null
]
}
}
}
}
])

Find after aggregate in MongoDB

{
"_id" : ObjectId("5852725660632d916c8b9a38"),
"response_log" : [
{
"campaignId" : "AA",
"created_at" : ISODate("2016-12-20T11:53:55.727Z")
},
{
"campaignId" : "AB",
"created_at" : ISODate("2016-12-20T11:55:55.727Z")
}]
}
I have a document which contains an array. I want to select all those documents that do not have response_log.created_at in last 2 hours from current time and count of response_log.created_at in last 24 is less than 3.
I am unable to figure out how to go about it. Please help
You can use the aggregation framework to filter the documents. A pipeline with $match and $redact steps will do the filtering.
Consider running the following aggregate operation where $redact allows you to proccess the logical condition with the $cond operator and uses the system variables $$KEEP to "keep" the document where the logical condition is true or $$PRUNE to "remove" the document where the condition was false.
This operation is similar to having a $project pipeline that selects the fields in the collection and creates a new field that holds the result from the logical condition query and then a subsequent $match, except that $redact uses a single pipeline stage which is more efficient:
var moment = require('moment'),
last2hours = moment().subtract(2, 'hours').toDate(),
last24hours = moment().subtract(24, 'hours').toDate();
MongoClient.connect(config.database)
.then(function(db) {
return db.collection('MyCollection')
})
.then(function (collection) {
return collection.aggregate([
{ '$match': { 'response_log.created_at': { '$gt': last2hours } } },
{
'$redact': {
'$cond': [
{
'$lt': [
{
'$size': {
'$filter': {
'input': '$response_log',
'as': 'res',
'cond': {
'$lt': [
'$$res.created_at',
last24hours
]
}
}
}
},
3
]
},
'$$KEEP',
'$$PRUNE'
]
}
}
]).toArray();
})
.then(function(docs) {
console.log(docs)
})
.catch(function(err) {
throw err;
});
Explanations
In the above aggregate operation, if you execute the first $match pipeline step
collection.aggregate([
{ '$match': { 'response_log.created_at': { '$gt': last2hours } } }
])
The documents returned will be the ones that do not have "response_log.created_at" in last 2 hours from current time where the variable last2hours is created with the momentjs library using the subtract API.
The preceding pipeline with $redact will then further filter the documents from the above by using the $cond ternary operator that evaluates this logical expression that uses $size to get the count and $filter to return a filtered array with elements that match other logical condition
{
'$lt': [
{
'$size': {
'$filter': {
'input': '$response_log',
'as': 'res',
'cond': { '$lt': ['$$res.created_at', last24hours] }
}
}
},
3
]
}
to $$KEEP the document if this condition is true or $$PRUNE to "remove" the document where the evaluated condition is false.
I know that this is probably not the answer that you're looking for but this may not be the best use case for Mongo. It's easy to do that in a relational database, it's easy to do that in a database that supports map/reduce but it will not be straightforward in Mongo.
If your data looked different and you kept each log entry as a separate document that references the object (with id 5852725660632d916c8b9a38 in this case) instead of being a part of it, then you could make a simple query for the latest log entry that has that id. This is what I would do in your case if I ware to use Mongo for that (which I wouldn't).
What you can also do is keep a separate collection in Mongo, or add a new property to the object that you have here which would store the latest date of campaign added. Then it would be very easy to search for what you need.
When you are working with a database like Mongo then how your data looks like must reflect what you need to do with it, like in this case. Adding a last campaign date and updating it on every campaign added would let you search for those campaign that you need very easily.
If you want to be able to make any searches and aggregates possible then you may be better off using a relational database.

Mongoose count by subobjects

I am trying to count the number of models in a collection based on a property:
I have an upvote model, that has: post (objectId) and a few other properties.
First, is this good design? Posts could get many upvotes, so I didn’t want to store them in the Post model.
Regardless, I want to count the number of upvotes on posts with a specific property with the following and it’s not working. Any suggestions?
upvote.count({‘post.specialProperty’: mongoose.Types.ObjectId(“id”), function (err, count) {
console.log(count);
});
Post Schema Design
In regards to design. I would design the posts collection for documents to be structured as such:
{
"_id" : ObjectId(),
"proprerty1" : "some value",
"property2" : "some value",
"voteCount" : 1,
"votes": [
{
"voter": ObjectId()// voter Id,
other properties...
}
]
}
You will have an array that will hold objects that can contain info such as voter id and other properties.
Updating
When a posts is updated you could simply increment or decrement the voteCountaccordingly. You can increment by 1 like this:
db.posts.update(
{"_id" : postId},
{
$inc: { voteCount: 1},
$push : {
"votes" : {"voter":ObjectId, "otherproperty": "some value"}
}
}
)
The $inc modifier can be used to change the value for an existing key or to create a new key if it does not already exist. Its very useful for updating votes.
Totaling votes of particular Post Criteria
If you want to total the amount for posts fitting a certain criteria, you must use the Aggregation Framework.
You can get the total like this:
db.posts.aggregate(
[
{
$match : {property1: "some value"}
},
{
$group : {
_id : null,
totalNumberOfVotes : {$sum : "$voteCount" }
}
}
]
)

Resources