We have a production database with over 5 million customer customer records, each customer document has an embedded array of licenses they have applied for. And example customer document is as follows:
{
_id: ObjectId('...'),
phoneNumber: 'xxxx',
// Other customer fields
licenses: [
{
_id: ObjectId('...'),
state: 'PENDING',
expired: false,
createdAt: ISODate(''),
// Other license fields
},
// More Licenses for this customer
]
}
I have been tasked with changing the state of every PENDING license applied for during the month of September to REJECTED and sending an SMS to every customer whose pending permit just got rejected.
Using the model.where(condition).countDocuments() I have found that there is over 3 million customers (not licenses) matching the aforementioned criteria. Each customer has an average of 9 licenses.
I need assistance coming up with a strategy that won't slow down the system when performing this action. Furthermore, this is around 17GB of data.
Sending SMS is fine, I can queue details for SMS service. My challenge is processing the licenses while extracting relevant information for SMS.
First of all you have to create an index on the collection:
db.collection.createIndex( { "licenses.state": 1 } )
Then you shoud do something like that:
model.updateMany({}, {
'$set': {
'licenses.$[elem].state': 'REJECTED'
}
}, { arrayFilters: [{
'elem.createdAt': { $gte: ISODate(....) }
}],
multi: true
} ).then(function (doc)){}
If you have a replica set and your updates are on the primary instance you should not affect the secondary instances when reading on those once.
If you want to split the update on many batches you can use the _id (already indexed). Of course it depends on your _id format.
I have a model Booking, which is having hasMany relation with hotels, and hotel is having one to one relation with supppliers.
What i need is, get all booking where supplier_id = 33333.
I am trying this
BOOKINGS.findAll({
where: {
'hotels.supplier.supplier_id' : '32',
},
include: [
{
model: HOTELS,
include: [
{
model: SUPPLIERS,
],
}
],
limit : 30,
offset: 0
})
It throws error like hotels.supplier... column not found.. I tried all things because on docs of sequelze it only gives solution to add check which adds where inside the include which i can't use as it adds sub queries.
I don't want to add where check alongwith supplier model inside the include array, because it adds sub queries, so If i am having 1000 bookings then for all bookings it will add sub query which crashes my apis.
I need a solutions like this query in Sequelize.
Select col1,col2,col3 from BOOKINGS let join HOTELS on BOOKINGS.booking_id = HOTELS.booking_id, inner join SUPPLIERS on BOOKINGS.supplier_id = SUPPLIERS.supplier_id
Adding a where in the include object will not add a sub query. It will just add a where clause to the JOIN which is being applied to the supplier model. It will not crash your API in anyway. You can test it out on your local machine plenty of times to make sure.
BOOKINGS.findAll({
include: [
{
model: HOTELS,
include: [
{
model: SUPPLIERS,
where: { supplier_id: 32 }
}
]
}
],
limit: 30,
offset: 0
})
If you still want to use the query on the top level you can use sequelize.where+ sequelize.literal but you will need to use the table aliases that sequelize assigns. e.g this alias for supplier table will not work hotels.supplier.supplier_id. Sequelize assings table aliases like in the example I have shown below:
BOOKINGS.findAll({
where: sequelize.where(sequelize.literal("`hotels->suppliers`.supplier_id = 32")),
include: [
{
model: HOTELS,
include: [SUPPLIERS]
}
],
limit: 30,
offset: 0
})
When I associate two models, how do I prevent name collisions?
// Find all projects with a least one task where task.state === project.state
Project.findAll({
include: [{
model: Task,
where: { state: Sequelize.col('project.state') }
}]
})
In this example, what if there was a name property in both project and task.
I'm currently working on a project where I have 3 models with a child and parent structure.
City has multiple Locations which has multiple Stones. Each Stone only has one Location as parent and each Location only has one City as parent.
Now, I want to create a list of all Stones that 'belong' (through a Location) to a specific City. How would I retrieve all these associations, without having to do the following:
City.find({
where: {
id: 1337
},
include: [{
model: Location,
include: [{
model: Stone
}]
}]
})
.then((city) => {
city.stones = [].concat.apply([], city.locations.map(location => location.stones));
});
I'm trying to find out if there's a "SQL only" solution, so not retrieving data / having to execute JavaScript to generate this array.
I'm managing a MongoDB database for a building products store. The most immediate collection is products, right?
There are quite several products, however they all belong to one among a set of 5-8 categories and then to one subcatefory among a small set of subcategories.
For example:
-Electrical
*Wires
p1
p2
..
*Tools
p5
pn
..
*Sockets
p11
p23
..
-Plumber
*Pipes
..
*Tools
..
PVC
..
I will use Angular at web site client side to show whole products catalog, I think about AJAX for querying the right subset of products I want.
Then, I wonder whether I should manage one only collection like:
{
MainCategory1: {
SubCategory1: {
{},{},{},{},{},{},{}
}
SubCategory2: {
{},{},{},{},{},{},{}
}
SubCategoryn: {
{},{},{},{},{},{},{}
}
},
MainCategory2: {
SubCategory1: {
{},{},{},{},{},{},{}
}
SubCategory2: {
{},{},{},{},{},{},{}
}
SubCategoryn: {
{},{},{},{},{},{},{}
}
},
MainCategoryn: {
SubCategory1: {
{},{},{},{},{},{},{}
}
SubCategory2: {
{},{},{},{},{},{},{}
}
SubCategoryn: {
{},{},{},{},{},{},{}
}
}
}
Or a single collection per each category. The number of documents might not be higher than 500. However I care about a balance for:
quick DB answer,
easy server side DB querying, and
client-side Angular code for rendering results to html.
I'm using mongodb node.js module, not Mongoose now.
What CRUD operations will I do?
Inserts of products, I'd also like to have a way to obtain autogenerated ids (maybe sequential) per each new register. However, as it might seem natural I wouldn't offer the _id to the user.
Querying the whole documents set of a subcategory. Maybe just obtaining a few attributes at first.
Querying whole or a specific subset of attributes of a document (product) in particular.
Modifying a product's attributes values.
I agree client side should get the easiest result to render. However, to nest categories into products is still a bad idea. The trade off is once you want to change, for example, the name of a category, it will be a disaster. And if you think about the possible usecases, for example:
list all categories
find all subcategories of a certain category
find all products in a certain category
You'll find it hard to do these stuff with your data structure.
I had same situation in my current project. So here's what I do for your reference.
First, categories should be in a separate collection. DON'T nest categories into each other, as it will complicate the procedure to find all subcategories. The traditional way for finding all subcategories is to maintain an idPath property. For example, your categories are divided into 3 levels:
{
_id: 100,
name: "level1 category"
parentId: 0, // means it's the top category
idPath: "0-100"
}
{
_id: 101,
name: "level2 category"
parentId: 100,
idPath: "0-100-101"
}
{
_id: 102,
name: "level3 category"
parentId: 101,
idPath: "0-100-101-102"
}
Note with idPath, parentId is not necessary anymore. It's for you to understand the structure easier.
Once you need to find all subcategories of category 100, simply do the query:
db.collection("category").find({_id: /^0-100-/}, function(err, doc) {
// whatever you want to do
})
With category stored in a separate collection, in your product you'll need to reference them by _id, just like when we use RDBMS. For example:
{
... // other fields of product
categories: [100, 101, 102, ...]
}
Now if you want to find all products in a certain category:
db.collection("category").find({_id: new RegExp("/^" + idPath + "-/"}, function(err, categories) {
var cateIds = _.pluck(categories, "_id"); // I'm using underscore to pluck category ids
db.collection("product").find({categories: { $in: cateIds }}, function(err, products) {
// products are here
}
})
Fortunately, category collection is usually very small, with only hundreds of records inside (or thousands). And it doesn't varies a lot. So you can always store a live copy of categories inside memory, and it can be constructed as nested objects like:
[{
id: 100,
name: "level 1 category",
... // other fields
subcategories: [{
id: 101,
... // other fields
subcategories: [...]
}, {
id: 103,
... // other fields
subcategories: [...]
},
...]
}, {
// another top1 category
}, ...]
You may want to refresh this copy every several hours, so:
setTimeout(3600000, function() {
// refresh your memory copy of categories.
});
That's all I get in mind right now. Hope it helps.
EDIT:
to provide int ID for each user, $inc and findAndModify is very useful. you may have a idSeed collection:
{
_id: ...,
seedValue: 1,
forCollection: "user"
}
When you want to get an unique ID:
db.collection("idSeed").findAndModify({forCollection: "user"}, {}, {$inc: {seedValue: 1}}, {}, function(err, doc) {
var newId = doc.seedValue;
});
The findAndModify is an atomic operator provided by mongodb. It will guarantee thread safety. and the find and modify actually happens in a "transaction".
2nd question is in my answer already.
query subsets of properties is described with mongodb Manual. NodeJS API is almost the same. Read the document of projection parameter.
update subsets is also supported by $set of mongodb operator.