mongoose update millions of records while extracting information - node.js

We have a production database with over 5 million customer customer records, each customer document has an embedded array of licenses they have applied for. And example customer document is as follows:
{
_id: ObjectId('...'),
phoneNumber: 'xxxx',
// Other customer fields
licenses: [
{
_id: ObjectId('...'),
state: 'PENDING',
expired: false,
createdAt: ISODate(''),
// Other license fields
},
// More Licenses for this customer
]
}
I have been tasked with changing the state of every PENDING license applied for during the month of September to REJECTED and sending an SMS to every customer whose pending permit just got rejected.
Using the model.where(condition).countDocuments() I have found that there is over 3 million customers (not licenses) matching the aforementioned criteria. Each customer has an average of 9 licenses.
I need assistance coming up with a strategy that won't slow down the system when performing this action. Furthermore, this is around 17GB of data.
Sending SMS is fine, I can queue details for SMS service. My challenge is processing the licenses while extracting relevant information for SMS.

First of all you have to create an index on the collection:
db.collection.createIndex( { "licenses.state": 1 } )
Then you shoud do something like that:
model.updateMany({}, {
'$set': {
'licenses.$[elem].state': 'REJECTED'
}
}, { arrayFilters: [{
'elem.createdAt': { $gte: ISODate(....) }
}],
multi: true
} ).then(function (doc)){}
If you have a replica set and your updates are on the primary instance you should not affect the secondary instances when reading on those once.
If you want to split the update on many batches you can use the _id (already indexed). Of course it depends on your _id format.

Related

Adding value to already declared MongoDB object with mongoose Schema

I am new to MongoDB and mongoose. I am trying to create a Node & MongoDB auction app. So since it is actually an online auction, users should be able to bid for items. I successfully completed the user registration, sign in page and authentication process, however, I am a bit stuck in the bidding page.
I created a Schema using mongoose and each item for auction is saved in the database. I want to add name and price of each user who bid for the item in the same object in MongoDB, like this:
{
name: "valuable vase from 1700s",
owner: "John Doe",
itemId: 100029,
bids: {
100032: 30000,
100084: 34000
}
}
So each user will have ids like 100032: 30000, and when they bid, their "account id: price" will be added under bids in the database object of the item.
I made some research and found some ways to solve the problem but I want to know if what I want to do is possible and if it is the right solution to do.
Thanks for giving me your time!
There are indeed couple of ways to achieve what you want.
In my opinion, a collection called ItemBids, where each document includes this data structure, will benefit you the most.
{
itemId: ObjectId # reference to the item document
accountId: ObjectId # reference to the account
bid: Number # the bid value
}
This pattern is suitable for your case because you can easily query the bids by whatever you want -
You can get all the account bids, you can get all the item bids, and you can sort them with native Mongo by the bid price.
Every time there's a bid, you insert a new document to this collection.
Another option is embedding an array of Bids objects in the item Object.
Each Bid object should include:
bids: [{
account: ObjectId("") # This is the account
price: Number
}]
The cons here are that querying it and accessing it will require more complex queries.
You can read more about the considerations
here:
https://docs.mongodb.com/manual/core/data-model-design
https://coderwall.com/p/px3c7g/mongodb-schema-design-embedded-vs-references
The way you decided to implement your functionality is a little bit complicated.
It is not impossible to do that but, the better way is to use array of objects instead of a single object like this:
{
name: '',
..
..
bids: [{
user: 100032,
price: 30000
}, {
user: 100084,
price: 34000
}]
}

How to properly have one-to-one relation in MongoDB without any cons?

I am working on a project and have these two MongoDB collections, team (holding the details of teams) and payment (holding the payment of the teams) (strictly 1-1 relationship).
Payment Schema
{
...
team: { type: Schema.Types.ObjectId, ref: 'Team', unique: true },
...
}
For Team, I have two alternatives:
Team1 Schema
{
user_id: { type: Schema.Types.ObjectId, ref: 'User' }
}
...
Team2 Schema
{
user_id: { type: Schema.Types.ObjectId, ref: 'User' }
payment: { type: Schema.Types.ObjectId, ref: 'Payment', unique: true }
}
NEED: I have a component "My Teams" where I need to show logged-in user's all teams and his payment status (yes/no).
ISSUE WITH Team1 Schema: Since I do not have reference to Payment so I need to make another call to backend with team's _id to get Payment object for every team. If a user has 10 teams then it will be 11 backend calls (1 for teams, next 10 for their payment statuses).
ISSUE WITH Team2 Schema: Since I now have Payment _id inside the Team2 Schema so I can simply check if that field exist or not to determine if it's paid or not. But now the issue is, when a payment is made, I need to update both of Collections and need to use Transactions (to rollback in case any fails) which increases complexity and is also not support unless I have replica sets set upped.
Can you please help me figuring out this the best way possible?
Thanks in advance.
The simplest solution is just to have team_id in payment schema (which you already have).
You neither need user_id nor payment_id in team schema to get payments with team.
You could just have an aggregate query with lookup on the payments table to get the team along with payment.
So, considering you have an ID of teams and you need the teams data along with the payments data, you could write an aggregation query, something like this,
Team.aggregate([
{
$match: { _id: { $in: list_of_user_ids } } // this will get the teams which match the array of ids
},
{
$lookup: // this will search data from a different collection
{
from: 'payments', // the collection to search from
localField: '_id', // the matching field in the team collection
foreignField: 'team', // matching field in the payment colection
as: 'payment' the name you want to give to the resulting payment object
}
}
])
Edit 1:
The lookup I've written does exactly what you need. Just that I assumed you had an array of user Ids. If you have a single user ID, just change the match operation to what you've written
$match: { user_id: currently_loggedin_userId }

MongoDB update multiple items with multiple changes

Is there any recommended way to update multiple items in MongoDB with one query ? I know that this is possible:
db.collection('mycollection').update({active: 1}, {$set: {active:0}}, {multi: true});
But in my case I want to update several documents with "unique" changes.
e.g. I want to combine these two queries into one:
db.collection('mycollection').update({
id: 'my id'
}, {
$set: {
name: "new name"
}
});
db.collection('mycolleciont').update({
id: 'my second id'
}, {
$set: {
name: "new name two"
}
});
Why ? I have a system which gets daily updates imported. The updates are mostly large so its around 200,000 Updates a day so currently I am executing 200,000 times the update query which takes a long time.
If its necessary to know: I am using Mongo 3 and nodeJS.

Get last created object for each user?

I have a collection, say, "Things":
{ id: 1
creator: 1
created: Today }
{ id: 2
creator: 2
created: Today }
{ id: 3
creator: 2
created: Yesterday }
I'd like to create a query that'll return each Thing created by a set of users, but only their most recently created thing.
What would this look like? I can get search my collection with an array of creators and it works just fine - how can I also only get the most recently created object per user?
Thing.find({ _creator : { "$in" : creatorArray })...
You cannot find, sort and pick the most recent in just a single find() query. But you can do it using aggregation:
Match all the records where the creator is amongst the one who we are looking
for.
Sort the records in descending order based on the created field.
Group the documents based on the creator.
Pick each creator's first document from the group, which will also be
his latest.
Project the required fields.
snippet:
Thing.aggregate([
{$match:{"creator":{$in:[1,2]}}},
{$sort:{"created":-1}},
{$group:{"_id":"$creator","record":{$first:"$$ROOT"}}},
{$project:{"_id":0,
"id":"$record.id",
"creator":"$record.creator",
"created":"$record.created"}}
], function(err,data){
})

How should I model my MongoDB collection for nested documents?

I'm managing a MongoDB database for a building products store. The most immediate collection is products, right?
There are quite several products, however they all belong to one among a set of 5-8 categories and then to one subcatefory among a small set of subcategories.
For example:
-Electrical
*Wires
p1
p2
..
*Tools
p5
pn
..
*Sockets
p11
p23
..
-Plumber
*Pipes
..
*Tools
..
PVC
..
I will use Angular at web site client side to show whole products catalog, I think about AJAX for querying the right subset of products I want.
Then, I wonder whether I should manage one only collection like:
{
MainCategory1: {
SubCategory1: {
{},{},{},{},{},{},{}
}
SubCategory2: {
{},{},{},{},{},{},{}
}
SubCategoryn: {
{},{},{},{},{},{},{}
}
},
MainCategory2: {
SubCategory1: {
{},{},{},{},{},{},{}
}
SubCategory2: {
{},{},{},{},{},{},{}
}
SubCategoryn: {
{},{},{},{},{},{},{}
}
},
MainCategoryn: {
SubCategory1: {
{},{},{},{},{},{},{}
}
SubCategory2: {
{},{},{},{},{},{},{}
}
SubCategoryn: {
{},{},{},{},{},{},{}
}
}
}
Or a single collection per each category. The number of documents might not be higher than 500. However I care about a balance for:
quick DB answer,
easy server side DB querying, and
client-side Angular code for rendering results to html.
I'm using mongodb node.js module, not Mongoose now.
What CRUD operations will I do?
Inserts of products, I'd also like to have a way to obtain autogenerated ids (maybe sequential) per each new register. However, as it might seem natural I wouldn't offer the _id to the user.
Querying the whole documents set of a subcategory. Maybe just obtaining a few attributes at first.
Querying whole or a specific subset of attributes of a document (product) in particular.
Modifying a product's attributes values.
I agree client side should get the easiest result to render. However, to nest categories into products is still a bad idea. The trade off is once you want to change, for example, the name of a category, it will be a disaster. And if you think about the possible usecases, for example:
list all categories
find all subcategories of a certain category
find all products in a certain category
You'll find it hard to do these stuff with your data structure.
I had same situation in my current project. So here's what I do for your reference.
First, categories should be in a separate collection. DON'T nest categories into each other, as it will complicate the procedure to find all subcategories. The traditional way for finding all subcategories is to maintain an idPath property. For example, your categories are divided into 3 levels:
{
_id: 100,
name: "level1 category"
parentId: 0, // means it's the top category
idPath: "0-100"
}
{
_id: 101,
name: "level2 category"
parentId: 100,
idPath: "0-100-101"
}
{
_id: 102,
name: "level3 category"
parentId: 101,
idPath: "0-100-101-102"
}
Note with idPath, parentId is not necessary anymore. It's for you to understand the structure easier.
Once you need to find all subcategories of category 100, simply do the query:
db.collection("category").find({_id: /^0-100-/}, function(err, doc) {
// whatever you want to do
})
With category stored in a separate collection, in your product you'll need to reference them by _id, just like when we use RDBMS. For example:
{
... // other fields of product
categories: [100, 101, 102, ...]
}
Now if you want to find all products in a certain category:
db.collection("category").find({_id: new RegExp("/^" + idPath + "-/"}, function(err, categories) {
var cateIds = _.pluck(categories, "_id"); // I'm using underscore to pluck category ids
db.collection("product").find({categories: { $in: cateIds }}, function(err, products) {
// products are here
}
})
Fortunately, category collection is usually very small, with only hundreds of records inside (or thousands). And it doesn't varies a lot. So you can always store a live copy of categories inside memory, and it can be constructed as nested objects like:
[{
id: 100,
name: "level 1 category",
... // other fields
subcategories: [{
id: 101,
... // other fields
subcategories: [...]
}, {
id: 103,
... // other fields
subcategories: [...]
},
...]
}, {
// another top1 category
}, ...]
You may want to refresh this copy every several hours, so:
setTimeout(3600000, function() {
// refresh your memory copy of categories.
});
That's all I get in mind right now. Hope it helps.
EDIT:
to provide int ID for each user, $inc and findAndModify is very useful. you may have a idSeed collection:
{
_id: ...,
seedValue: 1,
forCollection: "user"
}
When you want to get an unique ID:
db.collection("idSeed").findAndModify({forCollection: "user"}, {}, {$inc: {seedValue: 1}}, {}, function(err, doc) {
var newId = doc.seedValue;
});
The findAndModify is an atomic operator provided by mongodb. It will guarantee thread safety. and the find and modify actually happens in a "transaction".
2nd question is in my answer already.
query subsets of properties is described with mongodb Manual. NodeJS API is almost the same. Read the document of projection parameter.
update subsets is also supported by $set of mongodb operator.

Resources