Mongo Updates being super slow - node.js

We are facing a timeout issue with our mongo updates. Our collection currently contains around 300 thousand documents. When we try to update a record via the UI, the server times out and the UI is stuck in limbo.
Lead.updateOne({
_id: body.CandidateID
}, {
$set: {
ingestionStatus: 'SUBMITTED',
program: body.program,
participant: body.participant,
promotion: body.promotion,
addressMeta: body.addressMeta,
CreatedByID: body.CreatedByID,
entryPerson: body.entryPerson,
lastEnteredOn: body.lastEnteredOn,
zipcode: body.zipcode,
state: body.state,
readableAddress: body.readableAddress,
promotionId: body.promotionId,
programId: body.programId,
phone1: body.phone1,
personId: body.personId,
lastName: body.lastName,
hasSignature: body.hasSignature,
firstName: body.firstName,
city: body.city,
email: body.email,
addressVerified: body.addressVerified,
address: body.address,
accountId: body.accountId
}
This is how we update a single record. We are using mlab and Heroku in our stack. Looking for advice on how to speed this up considerably.
Thank you.

If your indexes are fine then you could try rebuilding indexes on this collection.
collection indexes from the mango command line:
For example, rebuild the lead collection indexes from the mongo command line:
db.lead.reIndex();
Reference:
https://docs.mongodb.com/v3.2/tutorial/manage-indexes/
https://docs.mongodb.com/manual/reference/command/repairDatabase/

if you are not using this then try this one
Index builds can block write operations on your database, so you don’t want to build indexes in the foreground on large tables during peak usage. You can use the background creation of indexes by specifying background: true when creating.
db.collection.createIndex({ a:1 }, { background: true })
This will ultimately take longer to complete, but it will not block operations and will have less of an impact on performance.

1) Shard Lead collection by id as shard key.
2) Check if the memory taken by mongodb due to index is less than the memory of the mongoDb server.

Have you tried what this answer suggests? Namely, updating with no write-concern?

Related

How to limit users daily post limit (MERN)

I'm currently using passport for authentication and mongodb to store user information.
However I'm stuck trying to limit user's daily post limit. I was thinking of having a field like daily post limit in User Schema and whenever user post something I deduct the count.
const user = new mongoose.Schema({
githubId: {
required: true,
type: String,
},
username: {
required: true,
type: String,
},
dailyPostLimit: {
type: Number,
default: 3,
},
});
However I'm not sure if there's a way to reset that count to default(3) everyday. Is CRON task suitable here or is there a simpler way to accomplish this?
A cron task works well for resetting a value like this one, and caching a value like this one is a reasonable approach to solving this problem. But, keep in mind that you're caching this value, and cache invalidation is a hard problem that can often lead to bugs & additional complexity.
counting posts
Rather than caching, my first instinct would be to count the number of posts each time. Here's some pseudo code:
const count = await posts.count({userId, createdAt: {$gte: startOfDay}});
// alternative: const count = await posts.count({userId, _id: {$gte: startofDayConvertedToMongoId});
if (count > 3) throw new Error("no more posts for you");
await posts.create(newPost)
(note: if you're worried about race conditions, any solution you choose will need to check the count in a transaction)
If you have an index that starts with {userId: 1, createdAt: 1}, or if you use the _id instead {userId: 1, _id: 1} (assuming that you're not allowing client _id creation), these queries will be quite cheap, and it'll be hard for them to get out of sync.
separate cache collection
Even if you do decide to cache these creation values, I'd recommend caching them away from your user's collection to keep your collections more focused. You could create a post_count collection and then update only the cache collection for these counts: post_count.updateOne({userId, day}, {$incr: {count: 1}, $setOnInsert: {day, userId, count: 0}}, {upsert: true});. One nice benefit of this approach is you can use a ttl index on day to have mongo automatically remove the old documents in this collection after they've expired.
Since you are using MongoDB I would suggest,
Use agenda and create a job that runs at UTC 00:00 (If you have diverse users from different time zone) or time zone specific to your user's country.
In this job call updateMany function on your user model to reset dailyPostLimit field.

Batch updates reporting contention error using nodejs

I am trying to update in a collection and there are 1400+ offices are there and after checking and running the query I am updating in a collection document and update in the subcollection with few details after querying but sometimes i am getting this error
10 ABORTED: Too much contention on these documents. Please try again.
and i am simply using batch for writing in the doc here is my code for updation in the collection.
batch.set(
rootCollections.x.doc(getISO8601Date())
.collection(subCollection.y)
.doc(change.after.get('Id')),
{
officeId: change.after.get('Id'),
office: change.after.get('office'),
status: change.after.get('status'),
creationTimestamp:
change.after.get('creationTimestamp') ||
change.after.get('createTimestamp') ||
change.after.get('timestamp'),
activeUsers: [...new Set(phoneNumbers)].length,
confirmedUers: activityCheckinSubscription.docs.length,
uniqueActivities: [...new Set(activities)].length,
payments: 0,
contact: [
`${change.after.data().creator.phoneNumber},${
change.after.data().attachment['First Contact'].value
}`,
],
},
{ merge: true },
);
batch.set(
rootCollections.x.doc(getISO8601Date()),
{
Added: admin.firestore.FieldValue.increment(1),
},
{ merge: true },
);
PromiseArray.push(batch.commit());
await Promise.all(PromiseArray);
It seems that you are facing the same issue from this similar case here, where there were thousands of records in the database being updated. As clarified there, you have limitation of how much writes you can perform in a document in one second - more details here - and even though Firestore sometimes might hang on with the faster writes, it will fail at some point.
As this is hard coded and a limit that it's imposed by Firestore, what you can try is the solution explained in this similar case here, that it's either change to Realtime Database, where the limit is not the number of writes, but the size of the data or in case of the usage of a counter or some other data aggregation in Firestore, to use a distributed counter solution, that you can get more details here.
To summarize, there isn't much you can do unless of workaround it with this solution, as this is a limitation documented of Firestore.

index new document and get the indexed document in the same query

it is possible to index a new document and return him after he succeeded indexed?
I tried to take the _id that returns but I'm using 2 queries and the index action takes some time and the second query not find the _id so it not always doing it perfectly.
this is the query that index the document:
const query = await elsaticClient.index({
routing: "dasdsad34_d",
index: "milan",
body: {
text: "san siro",
user: {
user_id: "3",
username: "maldini",
},
tags: ["Forza Milan","grande milan"],
publish_date: new Date(),
likes: [],
users_tags: [1,5],
type: {
name: "comment",
parent: "dasdsad34_d",
},
},
});
No, its not possible with default behavior. By default, Elasticsearch has only a near real time support. Its default refresh interval is 1 second as index refresh is deemed as a costly operation.
In order to overcome this, in your indexing operation, you can add refresh=true. You can get further details from below links.
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-refresh.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html
Please note that this is NOT a recommended option as this comes with huge overhead. Only use this, if your inserts into this index in question are having a very very low number.
Recommended way is to use refresh=wait_for on your indexing operation. But this has a downside of waiting for a second for the natural refresh to complete. So if you have default refresh interval set to 1 and are okay with this as an acceptable trade off, then this is the way to go.
However, if you have a higher refresh interval set, then the wait time for the indexing operation will be as high the refresh interval. So choose your option carefully.

Mongoose: How to populate 2 level deep population without populating fields of first level? in mongodb

Here is my Mongoose Schema:
var SchemaA = new Schema({
field1: String,
.......
fieldB : { type: Schema.Types.ObjectId, ref: 'SchemaB' }
});
var SchemaB = new Schema({
field1: String,
.......
fieldC : { type: Schema.Types.ObjectId, ref: 'SchemaC' }
});
var SchemaC = new Schema({
field1: String,
.......
.......
.......
});
While i access schemaA using find query, i want to have fields/property
of SchemaA along with SchemaB and SchemaC in the same way as we apply join operation in SQL database.
This is my approach:
SchemaA.find({})
.populate('fieldB')
.exec(function (err, result){
SchemaB.populate(result.fieldC,{path:'fieldB'},function(err, result){
.............................
});
});
The above code is working perfectly, but the problem is:
I want to have information/properties/fields of SchemaC through SchemaA, and i don't want to populate fields/properties of SchemaB.
The reason for not wanting to get the properties of SchemaB is, extra population will slows the query unnecessary.
Long story short:
I want to populate SchemaC through SchemaA without populating SchemaB.
Can you please suggest any way/approach?
As an avid mongodb fan, I suggest you use a relational database for highly relational data - that's what it's built for. You are losing all the benefits of mongodb when you have to perform 3+ queries to get a single object.
Buuuuuut, I know that comment will fall on deaf ears. Your best bet is to be as conscious as you can about performance. Your first step is to limit the fields to the minimum required. This is just good practice even with basic queries and any database engine - only get the fields you need (eg. SELECT * FROM === bad... just stop doing it!). You can also try doing lean queries to help save a lot of post-processing work mongoose does with the data. I didn't test this, but it should work...
SchemaA.find({}, 'field1 fieldB', { lean: true })
.populate({
name: 'fieldB',
select: 'fieldC',
options: { lean: true }
}).exec(function (err, result) {
// not sure how you are populating "result" in your example, as it should be an array,
// but you said your code works... so I'll let you figure out what goes here.
});
Also, a very "mongo" way of doing what you want is to save a reference in SchemaC back to SchemaA. When I say "mongo" way of doing it, you have to break away from your years of thinking about relational data queries. Do whatever it takes to perform fewer queries on the database, even if it requires two-way references and/or data duplication.
For example, if I had a Book schema and Author schema, I would likely save the authors first and last name in the Books collection, along with an _id reference to the full profile in the Authors collection. That way I can load my Books in a single query, still display the author's name, and then generate a hyperlink to the author's profile: /author/{_id}. This is known as "data denormalization", and it has been known to give people heartburn. I try and use it on data that doesn't change very often - like people's names. In the occasion that a name does change, it's trivial to write a function to update all the names in multiple places.
SchemaA.find({})
.populate({
path: "fieldB",
populate:{path:"fieldC"}
}).exec(function (err, result) {
//this is how you can get all key value pair of SchemaA, SchemaB and SchemaC
//example: result.fieldB.fieldC._id(key of SchemaC)
});
why not add a ref to SchemaC on SchemaA? there will be no way to bridge to SchemaC from SchemaA if there is no SchemaB the way you currently have it unless you populate SchemaB with no other data than a ref to SchemaC
As explained in the docs under Field Selection, you can restrict what fields are returned.
.populate('fieldB') becomes populate('fieldB', 'fieldC -_id'). The -_id is required to omit the _id field just like when using select().
I think this is not possible.Because,when a document in A referring a document in B and that document is referring another document in C, how can document in A know which document to refer from C without any help from B.

Mongoose: populate() / DBref or data duplication?

I have two collections:
Users
Uploads
Each upload has a User associated with it and I need to know their details when an Upload is viewed. Is it best practice to duplicate this data inside the the Uploads record, or use populate() to pull in these details from the Users collection referenced by _id?
OPTION 1
var UploadSchema = new Schema({
_id: { type: Schema.ObjectId },
_user: { type: Schema.ObjectId, ref: 'users'},
title: { type: String },
});
OPTION 2
var UploadSchema = new Schema({
_id: { type: Schema.ObjectId },
user: {
name: { type: String },
email: { type: String },
avatar: { type: String },
//...etc
},
title: { type: String },
});
With 'Option 2' if any of the data in the Users collection changes I will have to update this across all associated Upload records. With 'Option 1' on the other hand I can just chill out and let populate() ensure the latest User data is always shown.
Is the overhead of using populate() significant? What is the best practice in this common scenario?
If You need to query on your Users, keep users alone. If You need to query on your uploads, keep uploads alone.
Another question you should ask yourself is: Every time i need this data, do I need the embedded objects (and vice-versa)? How many time this data will be updated? How many times this data will be read?
Think about a friendship request:
Each time you need the request you need the user which made the request, then embed the request inside the user document.
You will be able to create an index on the embedded object too, and your search will be mono query / fast / consistent.
Just a link to my previous reply on a similar question:
Mongo DB relations between objects
I think this post will be right for you http://www.mongodb.org/display/DOCS/Schema+Design
Use Cases
Customer / Order / Order Line-Item
Orders should be a collection. customers a collection. line-items should be an array of line-items embedded in the order object.
Blogging system.
Posts should be a collection. post author might be a separate collection, or simply a field within posts if only an email address. comments should be embedded objects within a post for performance.
Schema Design Basics
Kyle Banker, 10gen
http://www.10gen.com/presentation/mongosf2011/schemabasics
Indexing & Query Optimization
Alvin Richards, Senior Director of Enterprise Engineering
http://www.10gen.com/presentation/mongosf-2011/mongodb-indexing-query-optimization
**These 2 videos are the bests on mongoddb ever seen imho*
Populate() is just a query. So the overhead is whatever the query is, which is a find() on your model.
Also, best practice for MongoDB is to embed what you can. It will result in a faster query. It sounds like you'd be duplicating a ton of data though, which puts relations(linking) at a good spot.
"Linking" is just putting an ObjectId in a field from another model.
Here is the Mongo Best Practices http://www.mongodb.org/display/DOCS/Schema+Design#SchemaDesign-SummaryofBestPractices
Linking/DBRefs http://www.mongodb.org/display/DOCS/Database+References#DatabaseReferences-SimpleDirect%2FManualLinking

Resources