i am new in mongodb and i am facing an issue, i have around millions of documents in my collectionand i am trying to find single entry using findOne({}) command and when i am trying to find recent entries then response comes in miliseconds but when i am trying to fetch older entries around 600 millionth document then it takes around 2 minutes on mongo shell and my node server gives
{ MongoErro : connection 1 to 127.0.0.1:27017 timed out }
and my nodejs server sends an empty response. can any one tell me what should i do to resolve this issueThanks in advance
explain gives me
db.contacts.find({"phoneNumber":"9165900137"}).explain("executionStats")
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "meanApp.contacts",
"indexFilterSet" : false,
"parsedQuery" : {
"phoneNumber" : {
"$eq" : "9165900137"
}
},
"winningPlan" : {
"stage" : "COLLSCAN",
"filter" : {
"phoneNumber" : {
"$eq" : "9165900137"
}
},
"direction" : "forward"
},
"rejectedPlans" : [ ]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 1,
"executionTimeMillis" : 321188,
"totalKeysExamined" : 0,
"totalDocsExamined" : 495587806,
"executionStages" : {
"stage" : "COLLSCAN",
"filter" : {
"phoneNumber" : {
"$eq" : "9165900137"
}
},
"nReturned" : 1,
"executionTimeMillisEstimate" : 295230,
"works" : 495587808,
"advanced" : 1,
"needTime" : 495587806,
"needYield" : 0,
"saveState" : 3871779,
"restoreState" : 3871779,
"isEOF" : 1,
"invalidates" : 0,
"direction" : "forward",
"docsExamined" : 495587806
}
},
"serverInfo" : {
"host" : "li1025-15.members.linode.com",
"port" : 27017,
"version" : "3.2.16",
"gitVersion" : "056bf45128114e44c5358c7a8776fb582363e094"
},
"ok" : 1
}
As indicated in the explain plan results, the current query is doing Collection Scan. This means it has to scan every document in collection to produce the match and you have got about half a billion documents.
Try adding this index and it might take a bit to create it.
db.contacts.createIndex( { phoneNumber: 1 }, { background: true } )
Run the query once the index creation is successful, you must see a dramatic improvement in performance. To be certain whether index got picked up, try explain again and it should no longer say COLLSCAN.
Related
I have data in a collection ex:"jobs". I am trying to copy specific data from "jobs" after every 2 hours to a new collection (which may not exist initially) and also add a new key to the copied data.
I have been trying with this query to copy the data:
db.getCollection("jobs").aggregate([{ $match: { "job_name": "UploadFile", "created_datetime" : {"$gte":"2021-08-18 12:00:00"} } },{"$merge":{into: {coll : "reports"}}}])
But after this, the count in "reports" collection is 0. Also, how can I update the documents (with an extract key "report_name") without using an extra updateMany() query?
The data in jobs collection is as shown:
{
"_id" : ObjectId("60fa8e8283dc22799134dc6f"),
"job_id" : "408a5654-9a89-4c15-82b4-b0dc894b19d7",
"job_name" : "UploadFile",
"data" : {
"path" : "share://LOCALNAS/Screenshot from 2021-07-23 10-34-34.png",
"file_name" : "Screenshot from 2021-07-23 10-34-34.png",
"parent_path" : "share://LOCALNAS",
"size" : 97710,
"md5sum" : "",
"file_uid" : "c4411f10-a745-48d0-a55d-164707b7d6c2",
"version_id" : "c3dfd31a-80ba-4de0-9115-2d9b778bcf02",
"session_id" : "c4411f10-a745-48d0-a55d-164707b7d6c2",
"resource_name" : "Screenshot from 2021-07-23 10-34-34.png",
"metadata" : {
"metadata" : {
"description" : "",
"tag_ids" : [ ]
},
"category_id" : "60eed9ea33c690a0dfc89b41",
"custom_metadata" : [ ]
},
"upload_token" : "upload_token_c5043927484e",
"upload_url" : "/mnt/share_LOCALNAS",
"vfs_action_handler_id" : "91be4282a9ad5067642cdadb75278230",
"element_type" : "file"
},
"user_id" : "60f6c507d4ba6ee28aee5723",
"node_id" : "syeda",
"state" : "COMPLETED",
"priority" : 2,
"resource_name" : "Screenshot from 2021-07-23 10-34-34.png",
"group_id" : "upload_group_0babf8b7ce0b",
"status_info" : {
"progress" : 100,
"status_msg" : "Upload Completed."
},
"error_code" : "",
"error_message" : "",
"created_datetime" : ISODate("2021-07-23T15:10:18.506Z"),
"modified_datetime" : ISODate("2021-07-23T15:10:18.506Z"),
"schema_version" : "1.0.0",
}
Your $match stage contains a condition which takes created_datetime as string while in your sample data it is an ISODate. Such condtion won't return any document, try:
{
$match: {
"job_name": "UploadFile",
"created_datetime": {
"$gte": ISODate("2021-07-01T12:00:00.000Z")
}
}
}
Mongo Playground
I have this collection in MongoDB:
{
"_id" : ObjectId("5df013b10a88910018267a89"),
"StockNo" : "33598",
"Description" : "some description",
"detections" : [
{
"lastDetectedOn" : ISODate("2020-01-29T04:36:41.191+0000"),
"lastDetectedBy" : "comp-t",
"_id" : ObjectId("5e3135f68c9e930017de8aec")
},
{
"lastDetectedOn" : ISODate("2019-12-21T18:12:06.571+0000"),
"lastDetectedBy" : "comp-n",
"_id" : ObjectId("5e3135f68c9e930017de8ae9")
},
{
"lastDetectedOn" : ISODate("2020-01-29T07:36:06.910+0000"),
"lastDetectedBy" : "comp-a",
"_id" : ObjectId("5e3135f68c9e930017de8ae7")
}
],
"createdAt" : ISODate("2019-12-10T21:52:49.788+0000"),
"updatedAt" : ISODate("2020-01-29T07:36:22.950+0000"),
"__v" : NumberInt(0)
}
I want to search by StockNo and get the name of the computer that last detected it (lastDetectedBy) only if lastDetectedOn was in the last 5 minutes with Mongoose in node.js with Express.
I also have this collection:
{
"_id" : ObjectId("5df113b10d35670018267a89"),
"InvoiceNo" : "1",
"InvoiceDate" : ISODate("2020-01-14T02:18:11.196+0000"),
"InvoiceContact : "",
"isActive" : true
},
{
"_id" : ObjectId("5df013c90a88910018267a8a"),
"InvoiceNo" : "2",
"InvoiceDate" : ISODate("2020-01-14T02:18:44.279+0000"),
"InvoiceContact : "Bob Smith",
"isActive" : true
},
{
"_id" : ObjectId("5e3096bb8c9e930017dc6e20"),
"InvoiceNo" : "3",
"InvoiceDate" : ISODate("2020-01-14T02:19:50.155+0000"),
"InvoiceContact : "",
"isActive" : true
}
And I want to update all the documents with empty InvoiceContact which has been issued in the last 30 seconds (or any date range between now and sometime in the past) with isActive equals true to isActive equals false. So for example, the first record has been issued in the last 30 seconds without InvoiceContact and isActive is true so this must be updated but the next two records will remain untouched for different reasons, the second record has InvoiceContact and the third record is out of range.
First Part
var mins5 = new Date(ISODate() - 1000* 60 * 5 )
db.getCollection('user').find({$and:[
{ "StockNo":"33598"},
{"detections.lastDetectedOn" : { $gte : mins5 }}
]})
.map(function(list){
var results = [];
list.detections.forEach(function (detections){
if(detections.lastDetectedOn > mins5){
results.push(detections.lastDetectedBy);
}
})
return results;
});
Second Part could be solved by a similar query using update instead of find.
If I have these objects :
{
"_id" : ObjectId("5caf2c1642e3731464c2c79d"),
"requested" : [],
"roomNo" : "E0-1-09",
"capacity" : 40,
"venueType" : "LR(M)",
"seatingType" : "TB",
"slotStart" : "8:30AM",
"slotEnd" : "9:50AM",
"__v" : 0
}
/* 2 */
{
"_id" : ObjectId("5caf2deb4a7f5222305b55d5"),
"requested" : [],
"roomNo" : "E0-1-09",
"capacity" : 40,
"venueType" : "LR(M)",
"seatingType" : "TB",
"slotStart" : "10:00AM",
"slotEnd" : "11:20AM",
"__v" : 0
}
is it possible to get something like this using aggregate in mongodb?
[{ roomNo: "E0-1-09" , availability : [{slotStart : "8:30AM", slotEnd: "9:50AM"} ,
{slotStart: "10:00AM", slotEnd : "11:20AM"}]
what im using currently:
db.getDB().collection(collection).aggregate([
{ $group: {_id:{roomNo: "$roomNo", availability :[{slotStart:"$slotStart", slotEnd:"$slotEnd"}]}}}
])
actually getting it twice like so :
[{ roomNo: "E0-1-09" , availability : [{slotStart : "8:30AM", slotEnd: "9:50AM"}]
[{ roomNo: "E0-1-09" , availability : [{slotStart: "10:00AM", slotEnd : "11:20AM"}]
You have to use $push accumulator
db.collection.aggregate([
{ "$group": {
"_id": "$roomNo",
"availability": {
"$push": {
"slotEnd": "$slotEnd",
"slotStart": "$slotStart"
}
}
}}
])
I've written a back end node server for a multiplayer game I'm developing and most of the time each request takes about 20-100ms to resolve. However, sometimes (Maybe 1 out of 50 requests) I will do the same request and it will take 2000+ms to resolve.
The server is written entirely in node.js and is hosted on heroku. I am using mongoose to make the calls to the database.
Here is a screenshot of the logs, at the top you can see how queries normally function. The request comes in at 19:03:03.68 and the response is sent out at 19:03:03.73, saving all the data finishes at at 19:03:03.74. Heroku logs the request as taking 58ms which is the desired and expect outcome.
Below that is when the issue occurs. You can see multiple requests come in from two separate clients (Each client sends ~1 request per second which is correct) However the requests build up and after about 2000-5000ms they will all quickly resolve one after another. I’ve tried narrowing down the issue without much luck, but I believe it’s related to when I query the database as you can see multiple requests come in but the first query to the database doesn’t actually resolve until around 2300ms later. As far as I can tell these requests are identical to the ones that resolve in 20-100ms and occur completely at random.
The actual code is similar to this on the server (Simplified for the sake of this question):
console.log (“request received”);
Game.findOne({‘id’: gameID}, function(err, theGame){
console.log("First Query");
I also opened up the mongo shell for the database to look for queries taking an excessive amount of time (>2000ms) with this code:
db.system.profile.find( {millis: {$gt : 2000} } ).sort( { ts: 1} );
Here are the slightly modified results which should include everything relevant:
{ "op" : "update", "ns" : "theDb.players", "query" :
{ "_id" : ObjectId("572b8eb242d70903005df0df")
}, "updateobj" :
{ "$set" :
{ "lastSeen" : ISODate("2016-05-05T18:19:30.761Z"), "timeElapsed" : 16
}
}, "nscanned" : 1, "nscannedObjects" : 1, "nMatched" : 1, "nModified" : 1, "fastmod" : true, "keyUpdates" : 0, "writeConflicts" : 0, "numYield" : 0, "locks" :
{ "Global" :
{ "acquireCount":
{ "r" : NumberLong(2), "w" : NumberLong(2) }
}, "MMAPV1Journal" :
{ "acquireCount" :
{ "w" : NumberLong(2) }, "acquireWaitCount" :
{ "w" : NumberLong(1) }, "timeAcquiringMicros" :
{ "w" : NumberLong(7294179) }
}, "Database" :
{ "acquireCount" :
{ "w" : NumberLong(2) }
}, "Collection" :
{ "acquireCount" :
{ "W" : NumberLong(1) }
}, "oplog" :
{ "acquireCount" :
{ "w" : NumberLong(1) }
}
}, "milli" : 2298, "execStats" : {}, "ts" : ISODate("2016-05-05T18:19:33.060Z")
Second Result:
{ "op" : "update", "ns" : "theDb.connections", "query" :
{ "_id" : ObjectId("572b8eaf42d70903005df0dd")
}, "updateobj" :
{ "$set" :
{ "internalCounter" : 3, "lastCount" : 3, "lastSeen" : ISODate("2016-05-05T18:19:30.761Z"), "playerID" : 128863276517, "sinceLast" : 0
}
}, "nscanned" : 1, "nscannedObjects" : 1, "nMatched" : 1, "nModified" : 1, "keyUpdates" : 0, "writeConflicts" : 0, "numYield" : 0, "locks" :
{ "Global" :
{ "acquireCount" :
{ "r" : NumberLong(2), "w" : NumberLong(2)
}
}, "MMAPV1Journal" :
{ "acquireCount" :
{ "w" : NumberLong(2) }, "acquireWaitCount" :
{ "w" : NumberLong(1) }, "timeAcquiringMicros" :
{ "w" :NumberLong(7294149) }
}, "Database" :
{ "acquireCount" :
{ "w" : NumberLong(2) }
}, "Collection" :
{ "acquireCount" :
{ "W" : NumberLong(1) }
}, "oplog" :
{ "acquireCount" :
{ "w" : NumberLong(1) }
}
}, "millis" : 2299, "execStats" : {},"ts" : ISODate("2016-05-05T18:19:33.061Z")
I really need to ensure the latency for any request never exceeds 500ms otherwise it extremely irritating in the game itself. I’m really at a loss for what might be causing this and how to figure out more.
I'm assuming the cause for the issue is that timeAcquiringMicros is so long. I'm unsure of what is causing this though.
*Note, the client is requesting the data with just standard http requests, I’m not currently using any sockets.
Alright, I've finally solved the issue. The problem wasn't actually connected to anything that I had done. I was using the sandbox plan that mlab offers in connection to heroku which had my application competing for processing time with other people also using the sandbox plan. Their queries were slowing down the database causing those spikes in response times.
The solution: I had to upgrade to their shared cluster plan. Since upgrading I haven't had any irregularities in query times.
I am trying to remove the lowest homework score.
I tried this,
var a = db.students.find({"scores.type":"homework"}, {"scores.$":1}).sort({"scores.score":1})
but how can I remove this set of data?
I have 200 pieces of similar data below.
{
"_id" : 148,
"name" : "Carli Belvins",
"scores" : [
{
"type" : "exam",
"score" : 84.4361816750119
},
{
"type" : "quiz",
"score" : 1.702113040528119
},
{
"type" : "homework",
"score" : 22.47397850465176
},
{
"type" : "homework",
"score" : 88.48032660881387
}
]
}
you are trying to remove an element but the statement you provided is just to find it.
Use db.students.remove(<query>) instead. Full documentation here