Related
i am new in mongodb and i am facing an issue, i have around millions of documents in my collectionand i am trying to find single entry using findOne({}) command and when i am trying to find recent entries then response comes in miliseconds but when i am trying to fetch older entries around 600 millionth document then it takes around 2 minutes on mongo shell and my node server gives
{ MongoErro : connection 1 to 127.0.0.1:27017 timed out }
and my nodejs server sends an empty response. can any one tell me what should i do to resolve this issueThanks in advance
explain gives me
db.contacts.find({"phoneNumber":"9165900137"}).explain("executionStats")
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "meanApp.contacts",
"indexFilterSet" : false,
"parsedQuery" : {
"phoneNumber" : {
"$eq" : "9165900137"
}
},
"winningPlan" : {
"stage" : "COLLSCAN",
"filter" : {
"phoneNumber" : {
"$eq" : "9165900137"
}
},
"direction" : "forward"
},
"rejectedPlans" : [ ]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 1,
"executionTimeMillis" : 321188,
"totalKeysExamined" : 0,
"totalDocsExamined" : 495587806,
"executionStages" : {
"stage" : "COLLSCAN",
"filter" : {
"phoneNumber" : {
"$eq" : "9165900137"
}
},
"nReturned" : 1,
"executionTimeMillisEstimate" : 295230,
"works" : 495587808,
"advanced" : 1,
"needTime" : 495587806,
"needYield" : 0,
"saveState" : 3871779,
"restoreState" : 3871779,
"isEOF" : 1,
"invalidates" : 0,
"direction" : "forward",
"docsExamined" : 495587806
}
},
"serverInfo" : {
"host" : "li1025-15.members.linode.com",
"port" : 27017,
"version" : "3.2.16",
"gitVersion" : "056bf45128114e44c5358c7a8776fb582363e094"
},
"ok" : 1
}
As indicated in the explain plan results, the current query is doing Collection Scan. This means it has to scan every document in collection to produce the match and you have got about half a billion documents.
Try adding this index and it might take a bit to create it.
db.contacts.createIndex( { phoneNumber: 1 }, { background: true } )
Run the query once the index creation is successful, you must see a dramatic improvement in performance. To be certain whether index got picked up, try explain again and it should no longer say COLLSCAN.
I've written a back end node server for a multiplayer game I'm developing and most of the time each request takes about 20-100ms to resolve. However, sometimes (Maybe 1 out of 50 requests) I will do the same request and it will take 2000+ms to resolve.
The server is written entirely in node.js and is hosted on heroku. I am using mongoose to make the calls to the database.
Here is a screenshot of the logs, at the top you can see how queries normally function. The request comes in at 19:03:03.68 and the response is sent out at 19:03:03.73, saving all the data finishes at at 19:03:03.74. Heroku logs the request as taking 58ms which is the desired and expect outcome.
Below that is when the issue occurs. You can see multiple requests come in from two separate clients (Each client sends ~1 request per second which is correct) However the requests build up and after about 2000-5000ms they will all quickly resolve one after another. I’ve tried narrowing down the issue without much luck, but I believe it’s related to when I query the database as you can see multiple requests come in but the first query to the database doesn’t actually resolve until around 2300ms later. As far as I can tell these requests are identical to the ones that resolve in 20-100ms and occur completely at random.
The actual code is similar to this on the server (Simplified for the sake of this question):
console.log (“request received”);
Game.findOne({‘id’: gameID}, function(err, theGame){
console.log("First Query");
I also opened up the mongo shell for the database to look for queries taking an excessive amount of time (>2000ms) with this code:
db.system.profile.find( {millis: {$gt : 2000} } ).sort( { ts: 1} );
Here are the slightly modified results which should include everything relevant:
{ "op" : "update", "ns" : "theDb.players", "query" :
{ "_id" : ObjectId("572b8eb242d70903005df0df")
}, "updateobj" :
{ "$set" :
{ "lastSeen" : ISODate("2016-05-05T18:19:30.761Z"), "timeElapsed" : 16
}
}, "nscanned" : 1, "nscannedObjects" : 1, "nMatched" : 1, "nModified" : 1, "fastmod" : true, "keyUpdates" : 0, "writeConflicts" : 0, "numYield" : 0, "locks" :
{ "Global" :
{ "acquireCount":
{ "r" : NumberLong(2), "w" : NumberLong(2) }
}, "MMAPV1Journal" :
{ "acquireCount" :
{ "w" : NumberLong(2) }, "acquireWaitCount" :
{ "w" : NumberLong(1) }, "timeAcquiringMicros" :
{ "w" : NumberLong(7294179) }
}, "Database" :
{ "acquireCount" :
{ "w" : NumberLong(2) }
}, "Collection" :
{ "acquireCount" :
{ "W" : NumberLong(1) }
}, "oplog" :
{ "acquireCount" :
{ "w" : NumberLong(1) }
}
}, "milli" : 2298, "execStats" : {}, "ts" : ISODate("2016-05-05T18:19:33.060Z")
Second Result:
{ "op" : "update", "ns" : "theDb.connections", "query" :
{ "_id" : ObjectId("572b8eaf42d70903005df0dd")
}, "updateobj" :
{ "$set" :
{ "internalCounter" : 3, "lastCount" : 3, "lastSeen" : ISODate("2016-05-05T18:19:30.761Z"), "playerID" : 128863276517, "sinceLast" : 0
}
}, "nscanned" : 1, "nscannedObjects" : 1, "nMatched" : 1, "nModified" : 1, "keyUpdates" : 0, "writeConflicts" : 0, "numYield" : 0, "locks" :
{ "Global" :
{ "acquireCount" :
{ "r" : NumberLong(2), "w" : NumberLong(2)
}
}, "MMAPV1Journal" :
{ "acquireCount" :
{ "w" : NumberLong(2) }, "acquireWaitCount" :
{ "w" : NumberLong(1) }, "timeAcquiringMicros" :
{ "w" :NumberLong(7294149) }
}, "Database" :
{ "acquireCount" :
{ "w" : NumberLong(2) }
}, "Collection" :
{ "acquireCount" :
{ "W" : NumberLong(1) }
}, "oplog" :
{ "acquireCount" :
{ "w" : NumberLong(1) }
}
}, "millis" : 2299, "execStats" : {},"ts" : ISODate("2016-05-05T18:19:33.061Z")
I really need to ensure the latency for any request never exceeds 500ms otherwise it extremely irritating in the game itself. I’m really at a loss for what might be causing this and how to figure out more.
I'm assuming the cause for the issue is that timeAcquiringMicros is so long. I'm unsure of what is causing this though.
*Note, the client is requesting the data with just standard http requests, I’m not currently using any sockets.
Alright, I've finally solved the issue. The problem wasn't actually connected to anything that I had done. I was using the sandbox plan that mlab offers in connection to heroku which had my application competing for processing time with other people also using the sandbox plan. Their queries were slowing down the database causing those spikes in response times.
The solution: I had to upgrade to their shared cluster plan. Since upgrading I haven't had any irregularities in query times.
I have a MongoDB with documents of the form:
{
...
"template" : "templates/Template1.html",
...
}
where template is either "templates/Template1.html", "templates/Template2.html" or "templates/Template3.html".
I'm using this query to group by template and count how many times each template is used:
var group = {
key:{'template':1},
reduce: function(curr, result){ result.count++ },
initial: { count: 0 }
};
messageModel.collection.group(group.key, null, group.initial, group.reduce, null, true, cb);
I'm getting back the correct result, but it's formatted like this:
{
"0" : {
"template" : "templates/Template1.html",
"count" : 2 },
"1" : {
"template" : "templates/Template2.html",
"count" : 2 },
"2" : {
"template" : "templates/Template3.html",
"count" : 1 }
}
I was wondering if it's possible to change the query so that it returns something like:
{
"templates/Template1.html" : { "count" : 2 },
"templates/Template2.html" : { "count" : 2 },
"templates/Template3.html" : { "count" : 1 }
}
or even:
{
"templates/Template1.html" : 2 ,
"templates/Template2.html" : 2 ,
"templates/Template3.html" : 1
}
I would rather change the query and not parse the returned object from the original query.
As mentioned by Blakes Seven in the comments you could use aggregate() instead of group() to achieve nearly your desired result.
messageModel.collection.aggregate([
{ // Group the collection by `template` and count the occurrences
$group: {
_id: "$template",
count: { $sum: 1 }
}
},
{ // Format the output
$project: {
_id: 0,
template: "$_id",
count: 1
}
},
{ // Sort the formatted output
$sort: { template: 1 }
}
]);
The output would look like this:
[
{
"template" : "templates/Template1.html",
"count" : 2 },
{
"template" : "templates/Template2.html",
"count" : 2 },
{
"template" : "templates/Template3.html",
"count" : 1 }
}
]
Again, as stated by Blakes in the comments the database can only output an array of objects rather than a solitary object. That would be a transformation that you would need to do outside of the database.
I think it deserves to be restated that this transformation produces an anti-pattern and should be avoided. An object key name provides the context or description for the value. Using a file location as a key name would be a fairly vague description whereas 'template' provides a bit more information about what that value represents.
I have a collection with a sub-document consisting of more than 40K records.
My aggregate query takes about 300 secs. I have tried optimizing the same using compound as well as multi-key indexing, which completes in 180 secs.
I still require a reduced query time execution.
here is my collection:
{
"_id" : ObjectId("545b32cc7e9b99112e7ddd97"),
"grp_id" : 654,
"user_id" : 2,
"mod_on" : ISODate("2014-11-06T08:35:40.857Z"),
"crtd_on" : ISODate("2014-11-06T08:35:24.791Z"),
"uploadTp" : 0,
"tp" : 1,
"status" : 3,
"id_url" : [
{"mid":"xyz12793"},
{"mid":"xyz12794"},
{"mid":"xyz12795"},
{"mid":"xyz12796"}
],
"incl" : 1,
"total_cnt" : 25,
"succ_cnt" : 25,
"fail_cnt" : 0
}
and following is my query
db.member_id_transactions.aggregate([ { '$match':
{ id_url: { '$elemMatch': { mid: 'xyz12794' } } } },
{ '$unwind': '$id_url' },
{ '$match': { grp_id: 654, 'id_url.mid': 'xyz12794' } } ])
has anyone faced the same issue?
here's the o/p for aggregate query with explain option
{
"result" : [
{
"_id" : ObjectId("546342467e6d1f4951b56285"),
"grp_id" : 685,
"user_id" : 2,
"mod_on" : ISODate("2014-11-12T11:24:01.336Z"),
"crtd_on" : ISODate("2014-11-12T11:19:34.682Z"),
"uploadTp" : 1,
"tp" : 1,
"status" : 3,
"id_url" : [
{"mid":"xyz12793"},
{"mid":"xyz12794"},
{"mid":"xyz12795"},
{"mid":"xyz12796"}
],
"incl" : 1,
"__v" : 0,
"total_cnt" : 21406,
"succ_cnt" : 21402,
"fail_cnt" : 4
}
],
"ok" : 1,
"$gleStats" : {
"lastOpTime" : Timestamp(0, 0),
"electionId" : ObjectId("545c8d37ab9cc679383a1b1b")
}
}
One way to reduce the number of records being filtered further is to include the field grp_id, in the first $match operator.
db.member_id_transactions.aggregate([
{$match:{ "id_url.mid": 'xyz12794',"grp_id": 654 } },
{$unwind: "$id_url" },
{$match: { "id_url.mid": "xyz12794" } }
])
See how the performance is now. Add grp_id to the index to get better response time.
The above aggregation query though it works, is unnecessary. since you are not altering the structure of the document, and you expect only one element in the array to match the filter condition, you could just use a simple find and project.
db.member_id_transactions.find(
{ "id_url.mid": "xyz12794","grp_id": 654 },
{"_id":0,"grp_id":1,"id_url":{$elemMatch:{"mid":"xyz12794"}},
"user_id":1,"mod_on":1,"crtd_on":1,"uploadTp":1,
"tp":1,"status":1,"incl":1,"total_cnt":1,
"succ_cnt":1,"fail_cnt":1
}
)
in my collection each document has 2 dates, modified and sync. I would like to find those which modified > sync, or sync does not exist.
I tried
{'modified': { $gt : 'sync' }}
but it's not showing what I expected. Any ideas?
Thanks
You can not compare a field with the value of another field with the normal query matching. However, you can do this with the aggregation framework:
db.so.aggregate( [
{ $match: …your normal other query… },
{ $match: { $eq: [ '$modified', '$sync' ] } }
] );
I put …your normal other query… in there as you can make that bit use the index. So if you want to do this for only documents where the name field is charles you can do:
db.so.ensureIndex( { name: 1 } );
db.so.aggregate( [
{ $match: { name: 'charles' } },
{ $project: {
modified: 1,
sync: 1,
name: 1,
eq: { $cond: [ { $gt: [ '$modified', '$sync' ] }, 1, 0 ] }
} },
{ $match: { eq: 1 } }
] );
With the input:
{ "_id" : ObjectId("520276459bf0f0f3a6e4589c"), "modified" : 73845345, "sync" : 73234 }
{ "_id" : ObjectId("5202764f9bf0f0f3a6e4589d"), "modified" : 4, "sync" : 4 }
{ "_id" : ObjectId("5202765b9bf0f0f3a6e4589e"), "modified" : 4, "sync" : 4, "name" : "charles" }
{ "_id" : ObjectId("5202765e9bf0f0f3a6e4589f"), "modified" : 4, "sync" : 45, "name" : "charles" }
{ "_id" : ObjectId("520276949bf0f0f3a6e458a1"), "modified" : 46, "sync" : 45, "name" : "charles" }
This returns:
{
"result" : [
{
"_id" : ObjectId("520276949bf0f0f3a6e458a1"),
"modified" : 46,
"sync" : 45,
"name" : "charles",
"eq" : 1
}
],
"ok" : 1
}
If you want any more fields, you need to add them in the $project.
For MongoDB 3.6 and newer:
The $expr operator allows the use of aggregation expressions within the query language, thus you can do the following:
db.test.find({ "$expr": { "$gt": ["$modified", "$sync"] } })
or using aggregation framework with $match pipeline
db.test.aggregate([
{ "$match": { "$expr": { "$gt": ["$modified", "$sync"] } } }
])
For MongoDB 3.0+:
You can also use the aggregation framework with the $redact pipeline operator that allows you to process the logical condition with the $cond operator and uses the special operations $$KEEP to "keep" the document where the logical condition is true or $$PRUNE to "remove" the document where the condition was false.
Consider running the following aggregate operation which demonstrates the above concept:
db.test.aggregate([
{ "$redact": {
"$cond": [
{ "$gt": ["$modified", "$sync"] },
"$$KEEP",
"$$PRUNE"
]
} }
])
This operation is similar to having a $project pipeline that selects the fields in the collection and creates a new field that holds the result from the logical condition query and then a subsequent $match, except that $redact uses a single pipeline stage which is more efficient:
Simply
db.collection.find({$where:"this.modified>this.sync"})
Example
Kobkrits-MacBook-Pro-2:~ kobkrit$ mongo
MongoDB shell version: 3.2.3
connecting to: test
> db.time.insert({d1:new Date(), d2: new Date(new Date().getTime()+10000)})
WriteResult({ "nInserted" : 1 })
> db.time.find()
{ "_id" : ObjectId("577a619493653ac93093883f"), "d1" : ISODate("2016-07-04T13:16:04.167Z"), "d2" : ISODate("2016-07-04T13:16:14.167Z") }
> db.time.find({$where:"this.d1<this.d2"})
{ "_id" : ObjectId("577a619493653ac93093883f"), "d1" : ISODate("2016-07-04T13:16:04.167Z"), "d2" : ISODate("2016-07-04T13:16:14.167Z") }
> db.time.find({$where:"this.d1>this.d2"})
> db.time.find({$where:"this.d1==this.d2"})
>
Use Javascript, use foreach And convert Date To toDateString()
db.ledgers.find({}).forEach(function(item){
if(item.fromdate.toDateString() == item.todate.toDateString())
{
printjson(item)
}
})
Right now your query is trying to return all results such that the modified field is greater than the word 'sync'. Try getting rid of the quotes around sync and see if that fixes anything. Otherwise, I did a little research and found this question. What you're trying to do just might not be possible in a single query, but you should be able to manipulate your data once you pull everything from the database.
To fix this issue without aggregation change your query to this:
{'modified': { $gt : ISODate(this.sync) }}