I am using insertMany to insert about 300 documents at a time from Lamda (AWS) to MongoDB Atlas. We are using NodeJS and Mongoose. The server seems to max out at 3% CPU so I don't believe the problem described below is harware related. There are no performance suggestions from atlas either.
The issue we are having is that the time it takes to insert 300 documents is between 27 - 30+ seconds. When using aws Lamda anything over 30 seconds causes a time out.
I feel like we must be doing something incorrectly as 30 plus seconds seems to be a very long time. Each document is only 7KB.
The indexing is done with a timestamp and a string (like a URL) and unique timestamp and string combination.
There are 13,500 documents in the collection.
Any ideas how to speed this up ?
Related
When I query through mongo shell or robo 3t the query takes about 3ms to 5ms. But the same query from my node(12.18.3) application which uses mongoose(5.5.10) it takes about 6000ms on average. Robo 3t returns 50 documents by default and mongo shell returns 20 I guess. Node application was returning 800 documents. So I made mongo shell return 800 documents by setting DBQuery.shellBatchSize = 300. This query still took very less time than 6000ms.
What am I missing here ?
Edit 1: It's a Simple Query. db.gateways.find({org_id:1}) I have an index on org_id. Each Document has about 450 fields and average document size is 6.5kb. Above query returns 800 documents.
I have ~19,000 docs in an mlab sandbox, total size ~160mb. I will have queries that may want a heavily filtered subset of those docs, or completely unfiltered. In all cases, the information I need returned is a 100-length array that is an attribute of each document. So in the case of no filter, I expect to get back 19k arrays which I will then do further processing on. The query I'm doing to test this is db.find({}, { vector: 1 });.
Using .explain(), I can see that this query doesn't take long at all. However, it takes upwards of 50 seconds for my code to see the returned data. At first I thought it was simply downloading a bunch of data taking a long time, but the total vector data size is only around 20mb, which shouldn't take that long. (For context, my nodejs server is running locally on my pc while the db is hosted, so there will be some transfer).
What is taking this so long? An empty query shouldn't take any time to execute, and projecting the results into only vectors means I should only have to download ~20mb of data, which should be a matter of seconds.
I'm using mongoDB to scrap a dataset using Node.js. The collection which I have has 0.2 million documents and so the Node.js is crashing giving a segmentation fault. Is there a way to split/divide the collection to 2 or more collections so that Node.js doesn't crash.
Thanks!!
Did you try using limit to constraint the no of documents returned? You can take the total document count in collection and then split it using limit and skip For ex: if collection has 200 docs
First time limit 100 docs and skip 0
Second time limit 100 again but this time skip 100
This is oneway i an think of. There may be other ways
I am having issues with doing counts on a single table with up to 1million records. I have a 32 core 244gb ram box that I am running my test on so hardware should not be an issue.
I have indexes set up on all of my queries that I am using to perform counts. I have enabled node max_old_space_size to 15gb.
The process I am following is basically looping through a huge array, creating 1000 promises, within each promise I am performing 12 counts, waiting for the promises to all resolve, and then continuing with the next one thousand batch.
As part of my test, I am doing inserts, updates, and reads as well. All of those, are showing great performance up to 20000/sec on each. However, when I get to the portion of my code doing the counts(), I can see via mongostat that there are only 20-30 commands being executed per second. I have not determined at this point, if my node code is only sending that many, or if mongo is queuing it up.
Meanwhile, in my node.js code, all 1000 promises are started and waiting to evaluate. I know this is a lot of info, so please let me know what more granular details I should provide to get some more insight into why the count performance is so slow.
So basically, for a batch of 1000 records, doing lets say 12 counts each, for a total of 12,000 counts, it is taking close to 10 minutes, on a table of 1million records.
MongoDB Native Client v2.2.1
Node v4.2.1
What I'd like to add is that I have tried changing the maxPoolSize on the driver from 100-1000 with no change in performance. I've tried changing my queries that I perform from yield/generator/promise to callbacks wrapped in promise, which has helped somewhat.
The strange thing is, when my program starts, even if i use just the default number of connections which I see as 7 when running mongostat, I can get around 2500 count() queries per second throughout. However, after a few seconds this goes back down to about 300-400. This leads me to believe that mongo can handle that many all the time, but my code is not able to send that many requests, even though I set maxPoolSize to 300 and start 10000 simultaneous promises resolving in parallel. So what gives, any ideas from anyone ?
I am currently using MongoDB cursor's toArray() function to convert the database results into an array:
run = true;
count = 0;
var start = process.hrtime();
db.collection.find({}, {limit: 2000}).toArray(function(err, docs){
var diff = process.hrtime(start);
run = false;
socket.emit('result', {
result: docs,
time: diff[0] * 1000 + diff[1] / 1000000,
ticks: count
});
if(err) console.log(err);
});
This operation takes about 7ms on my computer. If I remove the .toArray() function then the operation takes about 0.15ms. Of course this won't work because I need to forward the data, but I'm wondering what the function is doing since it takes so long? Each document in the database simply consists of 4 numbers.
In the end I'm hoping to run this on a much smaller processor, like a Raspberry Pi, and here the operation where it fetches 500 documents from the database and converts it to an array takes about 230ms. That seems like a lot to me. Or am I just expecting too much?
Are there any alternative ways to get data from the database without using toArray()?
Another thing that I noticed is that the entire Node application slows remarkably down while getting the database results. I created a simple interval function that should increment the count value every 1 ms:
setInterval(function(){
if(run) count++;
}, 1);
I would then expect the count value to be almost the same as the time, but for a time of 16 ms on my computer the count value was 3 or 4. On the Raspberry Pi the count value was never incremented. What is taking so much CPU usage? The monitor told me that my computer was using 27% CPU and the Raspberry Pi was using 92% CPU and 11% RAM, when asked to run the database query repeatedly.
I know that was a lot of questions. Any help or explanations are much appreciated. I'm still new to Node and MongoDB.
db.collection.find() returns a cursor, not results, and opening a cursor is pretty fast.
Once you start reading the cursor (using .toArray() or by traversing it using .each() or .next()), the actual documents are being transferred from the database to your client. That operation is taking up most of the time.
I doubt that using .each()/.next() (instead of .toArray(), which—under the hood—uses one of those two) will improve the performance much, but you could always try (who knows). Since .toArray() will read everything in memory, it may be worthwhile, although it doesn't sound like your data set is that large.
I really think that MongoDB on Raspberry Pi (esp a Model 1) is not going to work well. If you don't depend on the MongoDB query features too much, you should consider using an alternative data store. Perhaps even an in-memory storage (500 documents times 4 numbers doesn't sound like lots of RAM is required).