MongoDB Shema to support concurrent update on a document - node.js

We were working on a project with a 300 documents with currentValue field in a main collection, in order to track the history of each document of first collection. we created another collection named history with approximately 6.5 millions of documents.
For each input of system we have to add around 30 history item and update currentValue field of main collection, so, We tried computational field design pattern for currentValue, which lead us to have writeConfilict in concurrent situations (at concurrency of around 1000 requests).
Then we tried to compute currentValue field with sum (amount field) and groupBy(mainId field) on history collection which takes too long (> 3s).
Main collection docs:
{
"_id" : ObjectId(...),
"stock" : [
{
"currentAmount" : -313430.0,
"lastPrice" : -10.0,
"storage" : ObjectId("..."),
"alarmCapacity" : 12
},
{
"currentAmount" : 30,
"lastPrice" : 0,
"storage" : ObjectId("..."),
"alarmCapacity" : 12
},
.
.
.
],
"name" : "name",
}
History collection docs:
{
"_id" : ObjectId("..."),
"mainId" : ObjectId("..."),
"amount" : 5,
}
If you have any other idea to handle this situation(application or db level), I would be thankful.
UPDATE 1
The update query if I use computed pattern would be:
mainCollection.findOneAndUpdate(
{
$and: [
{ _id: id },
{ "stock.storage": fromId },
{ "stock.deletedAt": null }
],
},
{
$inc: {
"stock.$.currentAmount": -1 * amount,
}
},
{
session
}
)
And Aggregation pipeline if I want to calculate currentAmount everytime:
mainCollection.aggregate([
{
$match: {
branch: new ObjectId("...")
}
},
{
$group: {
_id: "$ingredient",
currentAmount: {
$sum: "$amount"
}
}
}])

in order to have computed field, mongo design patterns, suggested computed field,
The Computed Pattern is utilized when we have data that needs to be computed repeatedly in our application. link
like below:
// your main collection will look like this
{
"_id" : ObjectId(...),
"stock" : [
{
"currentAmount" : -313430.0,
"lastPrice" : -10.0,
"storage" : ObjectId("..."),
"alarmCapacity" : 12
},
{
"currentAmount" : 30,
"lastPrice" : 0,
"storage" : ObjectId("..."),
"alarmCapacity" : 12
},
"totalAmount": 20000 // for example
}
but for having concurrent there is a better way to solve this problem with cumulative summation, in this algorithm, we sum last documents inputs, with current input:
{
"_id" : ObjectId("..."),
"mainId" : ObjectId("..."),
"amount" : 5,
"cumulative": 15 // sum of last documents input
}

Related

How to do mongoose aggregation with nested array documents

I have a Mongodb collection, Polls with following schema
{
"options" : [
{
"_id" : Object Id,
"option" : String,
"votes" : [ Object Id ] // object ids of users who voted
},.....
]
}
Assume i have userId of the user in node js to whom I want to send this info.
My task is to
(1) include an extra field in the above json object (which i get using mongoose).
as
"myVote" : option._id
I need to find option._id for which
options[someIndex].votes contains userId
(2) change the existing "votes" field in each option to represent number of votes on a particular option as can be seen in example
Example:
{
"options" : [
{
"_id" : 1,
"option" : "A",
"votes" : [ 1,2,3 ]
},
{
"_id" : 2,
"option" : "B",
"votes" : [ 5 ]
},
{
"_id" : 3,
"option" : "C",
"votes" : [ ]
}
]
}
So if i user with user id = 5 wants to see the poll, then i need to send following info:
Expected Result :
{
"my_vote" : 2, // user with id 5 voted on option with id 2
"options" : [
{
"_id" : 1,
"option" : "A",
"votes" : 3 //num of votes on option "A"
},
{
"_id" : 2,
"option" : "B",
"votes" : 1 //num of votes on option "B"
},
{
"_id" : 3,
"option" : "C",
"votes" : 0 //num of votes on option "C"
}
]
}
Since it was the question that you actually asked that was neither really provided in the current acceptance answer, and also that it does some unnecessary things, there is another approach:
var userId = 5; // A variable to work into the submitted pipeline
db.sample.aggregate([
{ "$unwind": "$options" },
{ "$group": {
"_id": "$_id",
"my_vote": { "$min": {
"$cond": [
{ "$setIsSubset": [ [userId], "$options.votes" ] },
"$options._id",
false
]
}},
"options": { "$push": {
"_id": "$options._id",
"option": "$options.option",
"votes": { "$size": "$options.votes" }
}}
}}
])
Which of course will give you output per document like this:
{
"_id" : ObjectId("5573a0a8b67e246aba2b4b6e"),
"my_vote" : 2,
"options" : [
{
"_id" : 1,
"option" : "A",
"votes" : 3
},
{
"_id" : 2,
"option" : "B",
"votes" : 1
},
{
"_id" : 3,
"option" : "C",
"votes" : 0
}
]
}
So what you are doing here is using $unwind in order to break down the array for inspection first. The following $group stage ( and the only other stage you need ) makes use of the $min and $push operators for re-construction.
Inside each of those operations, the $cond operation tests the array content via $setIsSubset and either returns the matched _id value or false. When reconstructing the inner array element, specify all elements rather than just the top level document in arguments to $push and make use of the $size operator to count the elements in the array.
You also make mention with a link to another question about dealing with an empty array with $unwind. The $size operator here will do the right thing, so it is not required to $unwind and project a "dummy" value where the array is empty in this case.
Grand note, unless you are actually "aggregating" across documents it generally would be advised to do this operation in client code rather than the aggregation framework. Using $unwind effectively creates a new document in the aggregation pipeline for each element of the array contained in each document, which produces significant overhead.
For such an operation acting on distinct documents only, client code is more efficient to process each document individually.
If you really must persist that server processing is the way to do this, then this is probably most efficient using $map instead:
db.sample.aggregate([
{ "$project": {
"my_vote": {
"$setDifference": [
{ "$map": {
"input": "$options",
"as": "o",
"in": { "$cond": [
{ "$setIsSubset": [ [userId], "$$o.votes" ] },
"$$o._id",
false
]}
}},
[false]
]
},
"options": { "$map": {
"input": "$options",
"as": "o",
"in": {
"_id": "$$o._id",
"option": "$$o.option",
"votes": { "$size": "$$o.votes" }
}
}}
}}
])
So this just "projects" the re-worked results for each document. The my_vote is not the same though, since it is a single element array ( or possible multiple matches ) that the aggregation framework lacks the operators to reduce to a non array element without further overhead:
{
"_id" : ObjectId("5573a0a8b67e246aba2b4b6e"),
"options" : [
{
"_id" : 1,
"option" : "A",
"votes" : 3
},
{
"_id" : 2,
"option" : "B",
"votes" : 1
},
{
"_id" : 3,
"option" : "C",
"votes" : 0
}
],
"my_vote" : [
2
]
}
Check out this question.
It's not asking the same thing, but there's no way to do what you're asking without multiple queries anyway. I would modify the JSON you get back directly, as you're just displaying extra info that is already contained in the result of the query.
Save the userID you're querying for.
Take the results of your query (options array in an object), search through the votes of each element in the array.
When you've found the right vote, attach the _id (perhaps add 'n/a' if you don't find a vote).
Write a function that does 2 and 3, and you can just pass it a userID, and get back a new object with myVote attached.
I don't think doing it like this will be slower than doing another query in Mongoose.

MongoDB use position in sorted query result to compute field

I have a Mongoose Model for users. Each user has a certain amount of points. I'd like to create a field that is the users rank where:
rank = user position sorted by rank / total users
Let's suppose the user model looks like this:
{
'name': 'bob',
'points': 15,
'rank': 9/15,
}
(I realize that the fraction would really be a decimal when stored).
Is there a way that I can update all of these users by:
1) Sorting them by points
2) Get a user's index in this sorted list
3) Divide that index by the total number of items in the list
I'm not sure what kind of mongo operators are out there for finding a doc's position in query results and for finding the total size of the query results.
Using the previous answer is not a good idea. It requires recalculating rank after each update of points values.
Mongo version 5.0+ introduced $rank aggregation:
db.users.aggregate([
{
$setWindowFields: {
sortBy: { points: 1 },
output: {
rank: {
$rank: {}
}
}
}
}
])
will output
{ "points": 140, "rank": 1 },
{ "points": 160, "rank": 2 },
{ "points": 170, "rank": 3 },
{ "points": 180, "rank": 4 },
{ "points": 220, "rank": 5 }
You can do this using a couple of queries and a bit of JavaScript. Expanding on the steps you outlined, what you need to do is:
Find all of the user documents, sort them by points in descending order and assign the results to a cursor. You might want to ensure that you have an index on this field to make this query run faster.
Get the count for the number of documents returned.
Keep track of the position of the document within the results using an index.
Iterate through the documents, calculating the rank using the count and the index, and updating the corresponding user's rank with the result of that calculation.
In the mongo shell, the code would look something like the following.
var c = db.user.find().sort({ "points": -1 });
var count = c.count();
var i = 1;
while (c.hasNext()) {
var rank = i / count;
var user = c.next();
db.user.update(
{ "_id": user._id },
{ "$set": { "rank": rank } }
);
i++;
}
So if you had the following three users in your collection:
{
"_id" : ObjectId("54f0af63cfb269d664de0b4e"),
"name" : "bob",
"points" : 15,
"rank" : 0
}
{
"_id" : ObjectId("54f0af7fcfb269d664de0b4f"),
"name" : "arnold",
"points" : 20,
"rank" : 0
}
{
"_id" : ObjectId("54f0af95cfb269d664de0b50"),
"name" : "claus",
"points" : 10,
"rank" : 0
}
After the update their documents would look like this:
{
"_id" : ObjectId("54f0af63cfb269d664de0b4e"),
"name" : "bob",
"points" : 15,
"rank" : 0.6666666666666666
}
{
"_id" : ObjectId("54f0af7fcfb269d664de0b4f"),
"name" : "arnold",
"points" : 20,
"rank" : 0.3333333333333333
}
{
"_id" : ObjectId("54f0af95cfb269d664de0b50"),
"name" : "claus",
"points" : 10,
"rank" : 1
}

MongoDB-Query Optimization

I have a collection with a sub-document consisting of more than 40K records.
My aggregate query takes about 300 secs. I have tried optimizing the same using compound as well as multi-key indexing, which completes in 180 secs.
I still require a reduced query time execution.
here is my collection:
{
"_id" : ObjectId("545b32cc7e9b99112e7ddd97"),
"grp_id" : 654,
"user_id" : 2,
"mod_on" : ISODate("2014-11-06T08:35:40.857Z"),
"crtd_on" : ISODate("2014-11-06T08:35:24.791Z"),
"uploadTp" : 0,
"tp" : 1,
"status" : 3,
"id_url" : [
{"mid":"xyz12793"},
{"mid":"xyz12794"},
{"mid":"xyz12795"},
{"mid":"xyz12796"}
],
"incl" : 1,
"total_cnt" : 25,
"succ_cnt" : 25,
"fail_cnt" : 0
}
and following is my query
db.member_id_transactions.aggregate([ { '$match':
{ id_url: { '$elemMatch': { mid: 'xyz12794' } } } },
{ '$unwind': '$id_url' },
{ '$match': { grp_id: 654, 'id_url.mid': 'xyz12794' } } ])
has anyone faced the same issue?
here's the o/p for aggregate query with explain option
{
"result" : [
{
"_id" : ObjectId("546342467e6d1f4951b56285"),
"grp_id" : 685,
"user_id" : 2,
"mod_on" : ISODate("2014-11-12T11:24:01.336Z"),
"crtd_on" : ISODate("2014-11-12T11:19:34.682Z"),
"uploadTp" : 1,
"tp" : 1,
"status" : 3,
"id_url" : [
{"mid":"xyz12793"},
{"mid":"xyz12794"},
{"mid":"xyz12795"},
{"mid":"xyz12796"}
],
"incl" : 1,
"__v" : 0,
"total_cnt" : 21406,
"succ_cnt" : 21402,
"fail_cnt" : 4
}
],
"ok" : 1,
"$gleStats" : {
"lastOpTime" : Timestamp(0, 0),
"electionId" : ObjectId("545c8d37ab9cc679383a1b1b")
}
}
One way to reduce the number of records being filtered further is to include the field grp_id, in the first $match operator.
db.member_id_transactions.aggregate([
{$match:{ "id_url.mid": 'xyz12794',"grp_id": 654 } },
{$unwind: "$id_url" },
{$match: { "id_url.mid": "xyz12794" } }
])
See how the performance is now. Add grp_id to the index to get better response time.
The above aggregation query though it works, is unnecessary. since you are not altering the structure of the document, and you expect only one element in the array to match the filter condition, you could just use a simple find and project.
db.member_id_transactions.find(
{ "id_url.mid": "xyz12794","grp_id": 654 },
{"_id":0,"grp_id":1,"id_url":{$elemMatch:{"mid":"xyz12794"}},
"user_id":1,"mod_on":1,"crtd_on":1,"uploadTp":1,
"tp":1,"status":1,"incl":1,"total_cnt":1,
"succ_cnt":1,"fail_cnt":1
}
)

Aggregate results in Mongoose

I have a database with 800+ different bars, clubs and restaurants across Australia.
I want to build a list of links for my website counting the number of different venues across different suburbs and primary categories.
Like this:
Restaurants, Bowen Hills (15)
Restaurants, Dawes Point (6)
Clubs, Sydney (138)
I could do it the hard way by first getting all venues. Then run a Venue.distinct('details.location.suburb') to get all the unique suburbs.
From here I could run subsequent queries to get the count for the number of venues in that particular suburb and category.
It will be a lot of calls though. There's got to be better way?
Can the Mongo aggregation framework help here?
It seems to be impossible to do this in a single query.
Here's the Venue model:
{
"name" : "Johnny's Bar & Grill",
"meta" : {
"category" : {
"all" : [
"restaurant",
"bar"
],
"primary" : "restaurant"
}
},
"details" : {
"location" : {
"streetNumber" : "180",
"streetName" : "abbotsford road",
"suburb" : "bowen hills",
"city" : "brisbane",
"postcode" : "4006",
"state" : "qld",
"country" : "australia"
},
"contact" : {
"phone" : [
"(07) 5555 5555"
]
}
}
}
}
Here's the prettified solution from BatScream that I ended up using:
Venue.aggregate([
{
$group: {
_id: {
primary: '$meta.category.primary',
suburb: '$details.location.suburb',
country: '$details.location.country',
state: '$details.location.state',
city: '$details.location.city'
},
count: {
$sum: 1
},
type: {
$first: '$meta.category.primary'
}
}
},
{
$sort: {
count: -1
}
},
{
$limit: 50
},
// Reshapes each document in the stream, such as by adding new fields or removing existing fields. For each input document, outputs one document.
{
$project: {
_id: 0,
type : '$type',
location : '$_id.suburb',
count: 1
}
}
],
function(err, res){
next(err, res);
});
}
You can get a very useful and easily transformable output using the following aggregation operation.
Group the records based on their country, category, state, city and
suburb.
Get the count of the records in each group.
Obtain the type of the group from the first record of the group.
Project the necessary fields.
Code:
db.collection.aggregate([
{$group:{"_id":{"primary":"$meta.category.primary",
"suburb":"$details.location.suburb",
"country":"$details.location.country",
"state":"$details.location.state",
"city":"$details.location.city"},
"count":{$sum:1},
"type":{$first:"$meta.category.primary"}}},
{$sort:{"count":-1}},
{$project:{"_id":0,
"type":"$type",
"location":"$_id.suburb",
"count":1}}
])
sample o/p:
{ "count" : 1, "type" : "restaurant", "location" : "bowen hills" }

Compare two date fields in MongoDB

in my collection each document has 2 dates, modified and sync. I would like to find those which modified > sync, or sync does not exist.
I tried
{'modified': { $gt : 'sync' }}
but it's not showing what I expected. Any ideas?
Thanks
You can not compare a field with the value of another field with the normal query matching. However, you can do this with the aggregation framework:
db.so.aggregate( [
{ $match: …your normal other query… },
{ $match: { $eq: [ '$modified', '$sync' ] } }
] );
I put …your normal other query… in there as you can make that bit use the index. So if you want to do this for only documents where the name field is charles you can do:
db.so.ensureIndex( { name: 1 } );
db.so.aggregate( [
{ $match: { name: 'charles' } },
{ $project: {
modified: 1,
sync: 1,
name: 1,
eq: { $cond: [ { $gt: [ '$modified', '$sync' ] }, 1, 0 ] }
} },
{ $match: { eq: 1 } }
] );
With the input:
{ "_id" : ObjectId("520276459bf0f0f3a6e4589c"), "modified" : 73845345, "sync" : 73234 }
{ "_id" : ObjectId("5202764f9bf0f0f3a6e4589d"), "modified" : 4, "sync" : 4 }
{ "_id" : ObjectId("5202765b9bf0f0f3a6e4589e"), "modified" : 4, "sync" : 4, "name" : "charles" }
{ "_id" : ObjectId("5202765e9bf0f0f3a6e4589f"), "modified" : 4, "sync" : 45, "name" : "charles" }
{ "_id" : ObjectId("520276949bf0f0f3a6e458a1"), "modified" : 46, "sync" : 45, "name" : "charles" }
This returns:
{
"result" : [
{
"_id" : ObjectId("520276949bf0f0f3a6e458a1"),
"modified" : 46,
"sync" : 45,
"name" : "charles",
"eq" : 1
}
],
"ok" : 1
}
If you want any more fields, you need to add them in the $project.
For MongoDB 3.6 and newer:
The $expr operator allows the use of aggregation expressions within the query language, thus you can do the following:
db.test.find({ "$expr": { "$gt": ["$modified", "$sync"] } })
or using aggregation framework with $match pipeline
db.test.aggregate([
{ "$match": { "$expr": { "$gt": ["$modified", "$sync"] } } }
])
For MongoDB 3.0+:
You can also use the aggregation framework with the $redact pipeline operator that allows you to process the logical condition with the $cond operator and uses the special operations $$KEEP to "keep" the document where the logical condition is true or $$PRUNE to "remove" the document where the condition was false.
Consider running the following aggregate operation which demonstrates the above concept:
db.test.aggregate([
{ "$redact": {
"$cond": [
{ "$gt": ["$modified", "$sync"] },
"$$KEEP",
"$$PRUNE"
]
} }
])
This operation is similar to having a $project pipeline that selects the fields in the collection and creates a new field that holds the result from the logical condition query and then a subsequent $match, except that $redact uses a single pipeline stage which is more efficient:
Simply
db.collection.find({$where:"this.modified>this.sync"})
Example
Kobkrits-MacBook-Pro-2:~ kobkrit$ mongo
MongoDB shell version: 3.2.3
connecting to: test
> db.time.insert({d1:new Date(), d2: new Date(new Date().getTime()+10000)})
WriteResult({ "nInserted" : 1 })
> db.time.find()
{ "_id" : ObjectId("577a619493653ac93093883f"), "d1" : ISODate("2016-07-04T13:16:04.167Z"), "d2" : ISODate("2016-07-04T13:16:14.167Z") }
> db.time.find({$where:"this.d1<this.d2"})
{ "_id" : ObjectId("577a619493653ac93093883f"), "d1" : ISODate("2016-07-04T13:16:04.167Z"), "d2" : ISODate("2016-07-04T13:16:14.167Z") }
> db.time.find({$where:"this.d1>this.d2"})
> db.time.find({$where:"this.d1==this.d2"})
>
Use Javascript, use foreach And convert Date To toDateString()
db.ledgers.find({}).forEach(function(item){
if(item.fromdate.toDateString() == item.todate.toDateString())
{
printjson(item)
}
})
Right now your query is trying to return all results such that the modified field is greater than the word 'sync'. Try getting rid of the quotes around sync and see if that fixes anything. Otherwise, I did a little research and found this question. What you're trying to do just might not be possible in a single query, but you should be able to manipulate your data once you pull everything from the database.
To fix this issue without aggregation change your query to this:
{'modified': { $gt : ISODate(this.sync) }}

Resources