I'm developing a web service that is being load tested. So that I don't waste anyone's time, I won't go into detail about the REST API because it's too complicated to explain.
I'm developing with Node and using MongoDB's native Node driver. I use the following query:
let mediaIdQuery = {_id: {$in: media}};
let result = await db.collection(COLLECTION_MEDIA).findOne(mediaIdQuery);
'media' is an array of strings. I'm essentially searching the database to see if I have a document with an _id that is in the array.
I add these documents elsewhere. In testing, it seems to work fine when the creation of the documents and search query for them are far apart in time. However, if I send a request that creates a document and a search request within ~100 ms of each other it seems that this query fails to find the document.
This is actually an attempt at optimization on my part. In the past, I had a for loop that went through each individual array element and sent a separate search query each time. This, should be able to do it all in 1 query, so I figure it should be faster. When I used the for loop method, everything worked properly.
I'm just really confused why the database can't find the documents when the requests are close in time. I'm 100% sure that the insert queries of the documents occur before the search request occurs because it takes pretty much 0ms to insert the documents, and the insert queries are sent first. Is there some sort of delay in MongoDB's system before the documents are visible? I've never experienced issues with anything like that before, so I'm hesitant to blame it on a delay in the system. Any ideas?
Edit:
My database is not sharded. Everything is stored on 1 instance.
Related
I recently stress-tested my express server with the following two queries:
db.collection.find(queryCondition).limit(1).toArray()
// and
db.collection.findOne(queryCondition)
THESE ARE THE NEW RELIC RESULTS
Can someone explain why .find() shows fast transaction times for MongoDB yet slow transaction times for node.js? Then, in complete contrast, .findOne() shows slow MongoDB times but fast node.js times?
For context, my express server is on a t2.micro instance and my database is on another t2.micro instance.
Let's compare the the performance of .find() and .findOne() in nodejs and on the mongodb level.
MongoDb:
Here, find().limit() should emrge as a clear winner as it fetches the cursor to the result, which is a pointer to the result of the query, instead of the data itself, and that is precisely the case as per your observation.
Nodejs:
Here, theoretically, .find().limit() should also emerge faster, however, it seems that in the New Relic results screenshot that you've linked, you're actually doing .find().limit().toArray() which fetches you an array of data as per your query instead of just fetching the cursor, and findOne() just fetches you a document (in the form of a JS object in nodejs).
As per the mongodb driver docs for nodejs, .find() quickly returns a cursor and is, therefore, a synchronous operation that does not require a .then() or await, on the other hand, since .toArray() is a method of Cursor and fetches all the documents matching the query in an array (not unlike fetching the cursor and putting all the documents that .next() can fetch in an array yourself). This can be time-consuming depending on the query, and therefore, it returns a promise.
In your test, what seems to be happening is that with .findOne(), you're fetching just one document (which is time consuming on the MongoDb level and at least as time consuming in nodejs as well) but with find(), you're first fetching the cursor (fast on the mongodb level) then telling the nodejs driver to fetch the data from that cursor (time consuming), which is why .find().limit(1).toArray() is appearing to be perhaps more time consuming than findOne() in nodejs, and in the bottom graph in your link, the space is almost entirely blue, which represents nodejs.
I suggest you try simply doing .find().limit() and checking the result, but then heed that you won't be getting your actual data, just a cursor that's pretty useless until you fetch data from it.
I hope this has been of use.
I am trying to request a large number of documents from my database (which has over 400k documents). I started using _all_docs built-in view. I first tried with this query:
http://database:port/databasename/_all_docs?limit=100&include_docs=true
No problem. Completes as expected. Now to ramp it up:
http://database:port/databasename/_all_docs?limit=1000&include_docs=true
Still fine. Took longer, more data, etc. as expected. Ramp it up again:
http://database:port/databasename/_all_docs?limit=10000&include_docs=true
Request never completes. The Dev tools in chrome show Size = 5.3MB (seems to be significant), and this occurs no matter what value for the limit parameter I use that is over 6500ish. No matter if i specify 6500 or 10,000, it always returns 5.3MB downloaded, and the request stalls.
I have also tried other combinations, such as "skip" and it seems that limit + skip must be < 6500 or I get the same stall.
My environment: Couchdb 1.6.1, Ubuntu 14.04.3 LTS, Azure A1 standard
you have to prewarm your queries, just throwing a 100K or more docs and expecting that you'd get them out of couchdb won't work, it just won't work.
When you ask for some items from a view (in your case Default View), at the first read CouchDB will notice that the B-tree for the view doesn't exist yet, so it goes ahead and builds it on the first read. Depending on how many documents you have in your database, that can take a while, putting a good work load on your database.
On every subsequent read, CouchDB will check if documents have changed since the last write, and throw the changed documents at the map and reduce function. So if you only query some views from time to time, but have lots of changes in between, expect some delays on the next read.
There are 2 ways to handle this situation
1. Pre-warm your view - run a cronjob that does reads to make sure that your view has the B-Tree for this View.
2. Prepare your view in advance for a particular query before inserting the data in the couchdb.
and for now if you really want to read all your docs, don't read them all at once, rather use the skip, limit range queries.
Can the size of a MongoDB document affect the performance of a find() query?
I'm running the following query on a collection, in the MongoDB shell
r.find({_id:ObjectId("5552966b380c2dbc29472755")})
The entire document is 3MB. When I run this query the operation takes about 8 seconds to perform. The document has a "salaries" property which makes up the bulk of the document's size (about 2.9MB). So when I ommit the salaries property and run the following query it takes less than a second.
r.find({_id:ObjectId("5552966b380c2dbc29472755")},{salaries:0})
I only notice this performance difference when I run the find() query only. When I run a find().count() query there is no difference. It appears that performance degrades only when I want to fetch the entire document.
The collection is never updated (never changes in size), an index is set on _id and I've run repairDatabase() on the database. I've searched around the web but can't find a satisfactory answer to why there is a performance difference. Any insight and recommendations would be appreciated. Thanks.
I think the experiments you've just ran are an answer to your own question.
Mongo will index the _id field by default, so document size shouldn't affect the length of time it takes to locate the document, but if its 3MB then you will likely notice a difference in actually downloading that data. I imagine that's why its taking less time if you omit some of the fields.
To get a better sense of how long your query is actually taking to run, try this:
r.find({
_id: ObjectId("5552966b380c2dbc29472755")
})
.explain(function(err, explaination) {
if (err) throw err;
console.log(explaination);
});
If salaries is the 3MB culprit, and its structured data, then to speed things up you could try A) splitting it up into separate mongo documents or B) querying based on sub-properties of that document, and in both cases A and B you can build indexes to keep those queries fast.
I have been thinking on how I can make my app in NodeJS to go faster, so I have tried querying for only some fields and the entire document, because at MongoDB Documentation says its faster to query for certain fields. The problem is it's seems to me incorrect, where am I failing? Here is the code I am using I have made it to save to csv to get a Chart from Libreoffice:
http://pastebin.com/G8KRRY3n
First Option (A) is get the entire Document.
Second Option (B) is get some fields.
Here is the graph I toke from it (Every operation in miliseconds):
http://prntscr.com/5oofoz
I process almost 9500 users. As you can see, at first (0~200) items procesed, It's the same, but then the second options start to grow in time... I have tried to switch the order of the options because of the garbage collector has something to do, but the results are almost the same.
Yes, the first option is faster at first elements, So the question is... In a High Traffic webapp which option is the recomended? Why? I am newbie at performance field so I am pretty sure I'm doing something wrong...
I have a collection in a mongo database that I append some logging-type of information. I'm trying to figure out the most efficient/simplest method to "tail -f" that in a meteor app - as a new document is added to the collection, it should be sent to the client, who should append it to the end of the current set of documents in the collection.
The client isn't going to be sent nor keep all of the documents in the collection, likely just the last ~100 or so.
Now, from a Mongo perspective, I don't see a way of saying "the last N documents in the collection" such that we wouldn't need to apply any sort at all. It seems like the best option available is doing natural sort descending, then a limit call, so something like what's listed in the mongo doc on $natural
db.collection.find().sort( { $natural: -1 } )
So, on the server side AFAICT the way of publishing this 'last 100 documents' Meteor collection would be something like:
Meteor.publish('logmessages', function () {
return LogMessages.find({}, { sort: { $natural: -1 }, limit: 100 });
});
Now, from a 'tail -f' perspective, this seems to have the right effect of sending the 'last 100 documents' to the server, but does so in the wrong order (the newest document would be at the start of the Meteor collection instead of at the end).
On the client side, this seems to mean needing to (unfortunately) reverse the collection. Now, I don't see a reverse() in the Meteor Collection docs and sorting by $natural: 1 doesn't work on the client (which seems reasonable, since there's no real Mongo context). In some cases, the messages will have timestamps within the documents and the client could sort by that to get the 'natural order' back, but that seems kind of hacky.
In any case, it feels like I'm likely missing a much simpler way have a live 'last 100 documents inserted into the collection' collection published from mongo through meteor. :)
Thanks!
EDIT - looks like if I change the collection in Mongo to a capped collection, then the server could create a tailable cursor to efficiently (and quickly) get notified of new documents added to the collection. However, it's not clear to me if/how to get the server to do so through a Meteor collection.
An alternative that seems a little less efficient but doesn't require switching to a capped collection (AFAICT) is using Smart Collections which does tailing of the oplog so at least it's event-driven instead of polling, and since all the operations in the source collection will be inserts, it seems like it'd still be pretty efficient. Unfortunately, AFAICT I'm still left with the sorting issues since I don't see how to define the server side collection as 'last 100 documents inserted'. :(
If there is a way of creating a collection in Mongo as a query of another ("materialized view" of sorts), then maybe I could create a log-last-100 "collection view" in Mongo, and then Meteor would be able to just publish/subscribe the entire pseudo-collection?
For insert-only data, $natural should get you the same results as indexing on timestamp and sorting so that's a good idea. The reverse thing is unfortunate; I think you have a couple choices:
use $natural and do the reverse yourself
add timestamp, still use $natural
add timestamp, index by time, sort
'#1' - For 100 items, doing the reverse client-side should be no problem even for mobile devices and that will off-load it from the server. You can use .fetch() to convert to an array and then reverse it to maintain order without needing to use timestamps. You'll be playing in normal array-land though; no more nice mini-mongo features so do any filtering first before reversing.
'#2' - This one is interesting because you don't have to use an index but you can still use the timestamp on the client to sort the records. This gives you the benefit of staying in mini-mongo-land.
'#3' - Costs space on the db but its the most straight-forward
If you don't need the capabilities of mini-mongo (or are comfortable doing array filtering yourself) then #1 is probably best.
Unfortunately MongoDB doesn't have views so can't do your log-last-100 view idea (although that would be a nice feature).
Beyond the above, keep an eye on your subscription life-cycle so users don't continually pull down log updates in the background when not viewing the log. I could see that quickly becoming a performance killer.