MongoDB locking database for a single document - is it optimizable? - multithreading

I've got a mongo instance that serves dictionary knowledge to other services that concurrently read/update the collection. I've noticed (with slowlog) that some queries take a long time to do what they should - because they are locking the database and other queries queue. There are multiple indexes and compound index on the collection, so it only hits one document. There are aprox. 1million records in the collection.
I am bewildered why mongo needs to lock global, database AND collection? The modification only takes place on one document within one collection of one database.
Since I am using WiredTiger, I would assume the lock would only be on the document itself? How can I optimize the performance?
I don't care if other concurrent reads get an "old" version or that a concurrent write tries to create the same document twice (upsert) as the unique index would just throw an error and let my service handle it. Are there other optimizations I could consider?
the slowlog message I get is following (formatted for readability)
command myDB.words
command: findAndModify {
findandmodify: "words",
query: { lang: "en", name: "cat" },
new: true,
remove: false,
upsert: true,
fields: {},
update: {
$setOnInsert: { __v: 0 },
$inc: { count: 2 },
$addToSet: {
type: { $each: [ 0 ] },
occured: ObjectId('....')
},
$set: { lang: "en", name: "cat" }
},
writeConcern: { w: 1 }
}
planSummary:
IXSCAN { name: 1, lang: 1 }
update: { .... }
keysExamined:1
docsExamined:1
nMatched:1
nModified:1
keysInserted:1
numYields:0
reslen:2649211
locks:{
Global: { acquireCount: { r: 1, w: 1 } },
Database: { acquireCount: { w: 1 } },
Collection: { acquireCount: { w: 1 } }
}
protocol:op_query
754ms
Following indexes exist;
indexes on lang and name
compound index { name:1, lang:1 }
If it matters, i am using mongoose and the services connecting to mongdb are running on node8.4. MongoDB 3.4.7

Related

MongoDB Remove object key based on value

In my application, I have a MongoDB document that will be updated with $inc operation to increase/decrease the number in appliesTo object. Here is a sample object
{
name: "test-document",
appliesTo: {
profiles: {
Profile1: 3,
Profile2: 1
},
tags: {
Tag1: 7,
Tag2: 1
}
}
}
After I'm running the following command
await db.items.updateOne({name: "test-document"}, {$inc: {'appliesTo.profiles.Profile2': -1})
my document will be changed to
{
name: "test-document",
appliesTo: {
profiles: {
Profile1: 3,
Profile2: 0
},
tags: {
Tag1: 7,
Tag2: 1
}
}
}
I'm struggling with writing a query that will remove all keys, which values are 0. The only solution I have currently is to iterate over each key and update it using $unset command. But this is not an atomic operation
Is there a smarter way to handle it in one query?
There is no way to do both operations in a single regular update quey, you can try update with aggregation pipeline starting from MongoDB 4.2,
$cond to check is a key field's value greater than 1 then do $add operation otherwise remove it by $$REMOVE operator
await db.items.updateOne(
{ name: "test-document" },
[{
$set: {
"appliesTo.profiles.Profile2": {
$cond: [
{ $gt: ["$appliesTo.profiles.Profile2", 1] },
{ $add: ["$appliesTo.profiles.Profile2", -1] }
"$$REMOVE"
]
}
}
}]
)
Playground

How can I optimize my MongoDB Upsert statement?

A decision was made to switch our database from SQL to noSQL and I have a few questions on best practices and if my current implementation could be improved.
My current SQL implementation for upserting player data after a game.
let template = Players.map(
(player) =>
`(
${player.Rank},"${player.Player_ID}","${player.Player}",${player.Score},${tpp},1
)`,
).join(',');
let stmt = `INSERT INTO playerStats (Rank, Player_ID, Player, Score, TPP, Games_Played)
VALUES ${template}
ON CONFLICT(Player_ID) DO UPDATE
SET Score = Score+excluded.Score,
Games_Played=Games_Played+1,
TPP=TPP+excluded.TPP`;
db.run(stmt, function (upsert_error) { ...
The expected code is to update existing players by checking if a current Player_id exist. If so update their score among other things. Else insert a new player.
Mongo Implementation
const players = [
{ name: 'George', score: 10, id: 'g65873' },
{ name: 'Wayne', score: 100, id: 'g63853' },
{ name: 'Jhonny', score: 500, id: 'b1234' },
{ name: 'David', score: 3, id: 'a5678' },
{ name: 'Dallas', score: 333333, id: 'a98234' },
];
const db = client.db(dbName);
const results = players.map((player) => {
// updateOne(query, update, options)
db.collection('Players')
.updateOne(
{ Player_Name: player.name },
{
$setOnInsert: { Player_Name: player.name, id: player.id },
$inc: { Score: player.score },
},
{ upsert: true, multi: true },
);
});
Is there a better way in mongo to implement this? I tried using updateMany and bulkUpdate and I didn't get the results I expected.
Are there any tips, tricks, or resources aside from the mongo.db that you would recommend for those moving from SQL to noSQL?
Thanks again!
Your approach is fine. However, there are a few flaws:
Command updateOne updates exactly one document as the name implies. Thus multi: true
is obsolete.
Field names are case-sensitive (unlike most SQL databases). It should be $inc: { score: player.score }, not "Score"
Field Player_Name does not exist, it will never find any document for update.
So, your command should be like this:
db.collection('Players').updateOne(
{ name: player.name }, //or { id: player.id } ?
{
$setOnInsert: { name: player.name, id: player.id },
$inc: { score: player.score },
},
{ upsert: true }
)
According to my experience, moving from SQL to NoSQL is harder if you try to translate the SQL statement you have in your mind into a NoSQL command one-by-one. For me it worked better when I wiped out the SQL idea and try to understand and develop the NoSQL command from scratch.
Of course, when you do your first find, delete, insert, update then you will see many analogies to SQL but latest when you approach to the aggregation framework you are lost if you try to translate them into SQL or vice versa.

Push if not present or update a nested array mongoose [duplicate]

I have documents that looks something like that, with a unique index on bars.name:
{ name: 'foo', bars: [ { name: 'qux', somefield: 1 } ] }
. I want to either update the sub-document where { name: 'foo', 'bars.name': 'qux' } and $set: { 'bars.$.somefield': 2 }, or create a new sub-document with { name: 'qux', somefield: 2 } under { name: 'foo' }.
Is it possible to do this using a single query with upsert, or will I have to issue two separate ones?
Related: 'upsert' in an embedded document (suggests to change the schema to have the sub-document identifier as the key, but this is from two years ago and I'm wondering if there are better solutions now.)
No there isn't really a better solution to this, so perhaps with an explanation.
Suppose you have a document in place that has the structure as you show:
{
"name": "foo",
"bars": [{
"name": "qux",
"somefield": 1
}]
}
If you do an update like this
db.foo.update(
{ "name": "foo", "bars.name": "qux" },
{ "$set": { "bars.$.somefield": 2 } },
{ "upsert": true }
)
Then all is fine because matching document was found. But if you change the value of "bars.name":
db.foo.update(
{ "name": "foo", "bars.name": "xyz" },
{ "$set": { "bars.$.somefield": 2 } },
{ "upsert": true }
)
Then you will get a failure. The only thing that has really changed here is that in MongoDB 2.6 and above the error is a little more succinct:
WriteResult({
"nMatched" : 0,
"nUpserted" : 0,
"nModified" : 0,
"writeError" : {
"code" : 16836,
"errmsg" : "The positional operator did not find the match needed from the query. Unexpanded update: bars.$.somefield"
}
})
That is better in some ways, but you really do not want to "upsert" anyway. What you want to do is add the element to the array where the "name" does not currently exist.
So what you really want is the "result" from the update attempt without the "upsert" flag to see if any documents were affected:
db.foo.update(
{ "name": "foo", "bars.name": "xyz" },
{ "$set": { "bars.$.somefield": 2 } }
)
Yielding in response:
WriteResult({ "nMatched" : 0, "nUpserted" : 0, "nModified" : 0 })
So when the modified documents are 0 then you know you want to issue the following update:
db.foo.update(
{ "name": "foo" },
{ "$push": { "bars": {
"name": "xyz",
"somefield": 2
}}
)
There really is no other way to do exactly what you want. As the additions to the array are not strictly a "set" type of operation, you cannot use $addToSet combined with the "bulk update" functionality there, so that you can "cascade" your update requests.
In this case it seems like you need to check the result, or otherwise accept reading the whole document and checking whether to update or insert a new array element in code.
if you dont mind changing the schema a bit and having a structure like so:
{ "name": "foo", "bars": { "qux": { "somefield": 1 },
"xyz": { "somefield": 2 },
}
}
You can perform your operations in one go.
Reiterating 'upsert' in an embedded document for completeness
I was digging for the same feature, and found that in version 4.2 or above, MongoDB provides a new feature called Update with aggregation pipeline.
This feature, if used with some other techniques, makes possible to achieve an upsert subdocument operation with a single query.
It's a very verbose query, but I believe if you know that you won't have too many records on the subCollection, it's viable. Here's an example on how to achieve this:
const documentQuery = { _id: '123' }
const subDocumentToUpsert = { name: 'xyz', id: '1' }
collection.update(documentQuery, [
{
$set: {
sub_documents: {
$cond: {
if: { $not: ['$sub_documents'] },
then: [subDocumentToUpsert],
else: {
$cond: {
if: { $in: [subDocumentToUpsert.id, '$sub_documents.id'] },
then: {
$map: {
input: '$sub_documents',
as: 'sub_document',
in: {
$cond: {
if: { $eq: ['$$sub_document.id', subDocumentToUpsert.id] },
then: subDocumentToUpsert,
else: '$$sub_document',
},
},
},
},
else: { $concatArrays: ['$sub_documents', [subDocumentToUpsert]] },
},
},
},
},
},
},
])
There's a way to do it in two queries - but it will still work in a bulkWrite.
This is relevant because in my case not being able to batch it is the biggest hangup. With this solution, you don't need to collect the result of the first query, which allows you to do bulk operations if you need to.
Here are the two successive queries to run for your example:
// Update subdocument if existing
collection.updateMany({
name: 'foo', 'bars.name': 'qux'
}, {
$set: {
'bars.$.somefield': 2
}
})
// Insert subdocument otherwise
collection.updateMany({
name: 'foo', $not: {'bars.name': 'qux' }
}, {
$push: {
bars: {
somefield: 2, name: 'qux'
}
}
})
This also has the added benefit of not having corrupted data / race conditions if multiple applications are writing to the database concurrently. You won't risk ending up with two bars: {somefield: 2, name: 'qux'} subdocuments in your document if two applications run the same queries at the same time.

Mongodb parallel queries via nodejs

I am trying to run a number of mongodb queries via node Async. But they are still taking time to run..
The database is indexed and completely optimised.
Is there a way by which I can increase the query speed time via mongodb admin ... or like increase its performance by allocating more memory to it.
The queries are running one by one when I see on the console. and some are taking too long ... resulting in no response..
2015-12-29T10:31:48.958-0800 I COMMAND [conn63] command consumers.$cmd command: count { count: "consumer1s", query: { ZIP: 37089, $or: [ { ADULTS_F_18_24: "Y" }, { ADULTS_F_24_35: "Y" } ] } } planSummary: IXSCAN { ZIP: 1.0, GENDER: 1.0 } keyUpdates:0 writeConflicts:0 numYields:1 reslen:44 locks:{ Global: { acquireCount: { r: 4 } }, MMAPV1Journal: { acquireCount: { r: 4 } }, Database: { acquireCount: { r: 2 } }, Collection: { acquireCount: { R: 2 }, acquireWaitCount: { R: 2 }, timeAcquiringMicros: { R: 54270 } } } 146ms
2015-12-29T10:31:54.925-0800 I COMMAND [conn62] command consumers.$cmd command: count { count: "consumer1s", query: { ZIP: 37024, $or: [ { ADULTS_F_18_24: "Y" }, { ADULTS_F_24_35: "Y" } ] } } planSummary: IXSCAN { ZIP: 1.0, GENDER: 1.0 } keyUpdates:0 writeConflicts:0 numYields:88 reslen:44 locks:{ Global: { acquireCount: { r: 178 } }, MMAPV1Journal: { acquireCount: { r: 172 } }, Database: { acquireCount: { r: 89 } }, Collection: { acquireCount: { R: 89 }, acquireWaitCount: { R: 83 }, timeAcquiringMicros: { R: 1654781 } } } 6114ms
Hi please see the logs to understand my question ... 2 queries following same plan .. have a large execution time difference ... Please tell me the reason and how to fix it
Following info must be handy.
I am working this application on a Macintosh System. OSX Yosemite 10.10.2 Processor 3.2Ghz Intel Core i5. Memory is 8GB 1600MHz DDR3. Any suggestions how I can allocate more virtual memory to the mongodb
As #Martin said, you need to profile. Use something like cursor.explain to make sure the query is using indexes and to find weak points. Use whatever resource monitor your system has (like top/htop on linux) to see if it's running out of memory or if it's CPU-bound.
"The queries are running one by one" -- I assume you're not using async.series or similar, which is sequential.

mongo cursor timeout

I am trying to aggregate some records in a mongo database using the node driver. I am first matching to org, fed, and sl fields (these are indexed). If I only include a few companies in the array that I am matching the org field to, the query runs fine and works as expected. However, when including all of the clients in the array, I always get:
MongoError: getMore: cursor didn't exist on server, possible restart or timeout?
I have tried playing with the allowDiskUse, and the batchSize settings, but nothing seems to work. With all the client strings in the array, the aggregation runs for ~5hours before throwing the cursor error. Any ideas? Below is the pipeline along with the actual aggregate command.
setting up the aggregation pipeline:
var aggQuery = [
{
$match: { //all clients, from last three days, and scored
org:
{ $in : array } //this is the array I am talking about
,
frd: {
$gte: _.last(util.lastXDates(3))
},
sl : true
}
}
, {
$group: { //group by isp and make fields for calculation
_id: "$gog",
count: {
$sum: 1
},
countRisky: {
$sum: {
$cond: {
if :{
$gte: ["$scr", 65]
},
then: 1,
else :0
}
}
},
countTimeZoneRisky: {
$sum: {
$cond: {
if :{
$eq: ["$gmt", "$gtz"]
},
then: 0,
else :1
}
}
}
}
}
, {
$match: { //show records with count >= 500
count: {
$gte: 500
}
}
}
, {
$project: { //rename _id to isp, only show relevent fields
_id: 0,
ISP: "$_id",
percentRisky: {
$multiply: [{
$divide: ["$countRisky", "$count"]
},
100
]
},
percentTimeZoneDiscrancy: {
$multiply: [{
$divide: ["$countTimeZoneRisky", "$count"]
},
100
]
},
count: 1
}
}
, {
$sort: { //sort by percent risky and then by count
percentRisky: 1,
count: 1
}
}
];
Running the aggregation:
var cursor = reportingCollections.hitColl.aggregate(aggQuery, {
allowDiskUse: true,
cursor: {
batchSize: 40000
}
});
console.log('Writing data to csv ' + currentFileNamePrefix + '!');
//iterate through cursor and write documents to CSV
cursor.each(function (err, document) {
//write each document to csv file
//maybe start a nuclear war
});
You're calling the aggregate method which doesn't return the cursor by default (like e.g. find()). To return query as a cursor, you must add the cursor option in the options. But, the timeout setting for the aggregation cursor is (currently) not supported. The native node.js driver only supports the batchSize setting.
You would set the batchOption like this:
var cursor = coll.aggregate(query, {cursor: {batchSize:100}}, writeResultsToCsv);
To circumvent such problems, I'd recommend aggregation or map-reduce directly through mongo client. There you can add the notimeout option.
The default timeout is 10 minutes (obviously useless for long time-consuming queries) and there's no way currently to set a different one as far as I know, only infinite by aforementioned option. The timeout hits you especially for high batch sizes, because it will take more than 10 mins to process the incoming docs and before you ask mongo server for more, the cursor has been deleted.
IDK your use case, but if it's a web view, there should be only fast queries/aggregations.
BTW I think this didn't change with 3.0.*

Resources