Mongodb parallel queries via nodejs

Mongodb parallel queries via nodejs - node.js

I am trying to run a number of mongodb queries via node Async. But they are still taking time to run..
The database is indexed and completely optimised.
Is there a way by which I can increase the query speed time via mongodb admin ... or like increase its performance by allocating more memory to it.
The queries are running one by one when I see on the console. and some are taking too long ... resulting in no response..
2015-12-29T10:31:48.958-0800 I COMMAND [conn63] command consumers.$cmd command: count { count: "consumer1s", query: { ZIP: 37089, $or: [ { ADULTS_F_18_24: "Y" }, { ADULTS_F_24_35: "Y" } ] } } planSummary: IXSCAN { ZIP: 1.0, GENDER: 1.0 } keyUpdates:0 writeConflicts:0 numYields:1 reslen:44 locks:{ Global: { acquireCount: { r: 4 } }, MMAPV1Journal: { acquireCount: { r: 4 } }, Database: { acquireCount: { r: 2 } }, Collection: { acquireCount: { R: 2 }, acquireWaitCount: { R: 2 }, timeAcquiringMicros: { R: 54270 } } } 146ms
2015-12-29T10:31:54.925-0800 I COMMAND [conn62] command consumers.$cmd command: count { count: "consumer1s", query: { ZIP: 37024, $or: [ { ADULTS_F_18_24: "Y" }, { ADULTS_F_24_35: "Y" } ] } } planSummary: IXSCAN { ZIP: 1.0, GENDER: 1.0 } keyUpdates:0 writeConflicts:0 numYields:88 reslen:44 locks:{ Global: { acquireCount: { r: 178 } }, MMAPV1Journal: { acquireCount: { r: 172 } }, Database: { acquireCount: { r: 89 } }, Collection: { acquireCount: { R: 89 }, acquireWaitCount: { R: 83 }, timeAcquiringMicros: { R: 1654781 } } } 6114ms
Hi please see the logs to understand my question ... 2 queries following same plan .. have a large execution time difference ... Please tell me the reason and how to fix it
Following info must be handy.
I am working this application on a Macintosh System. OSX Yosemite 10.10.2 Processor 3.2Ghz Intel Core i5. Memory is 8GB 1600MHz DDR3. Any suggestions how I can allocate more virtual memory to the mongodb

As #Martin said, you need to profile. Use something like cursor.explain to make sure the query is using indexes and to find weak points. Use whatever resource monitor your system has (like top/htop on linux) to see if it's running out of memory or if it's CPU-bound.
"The queries are running one by one" -- I assume you're not using async.series or similar, which is sequential.

Related

MongoDB Remove object key based on value

In my application, I have a MongoDB document that will be updated with $inc operation to increase/decrease the number in appliesTo object. Here is a sample object
{
name: "test-document",
appliesTo: {
profiles: {
Profile1: 3,
Profile2: 1
},
tags: {
Tag1: 7,
Tag2: 1
}
}
}
After I'm running the following command
await db.items.updateOne({name: "test-document"}, {$inc: {'appliesTo.profiles.Profile2': -1})
my document will be changed to
{
name: "test-document",
appliesTo: {
profiles: {
Profile1: 3,
Profile2: 0
},
tags: {
Tag1: 7,
Tag2: 1
}
}
}
I'm struggling with writing a query that will remove all keys, which values are 0. The only solution I have currently is to iterate over each key and update it using $unset command. But this is not an atomic operation
Is there a smarter way to handle it in one query?

There is no way to do both operations in a single regular update quey, you can try update with aggregation pipeline starting from MongoDB 4.2,
$cond to check is a key field's value greater than 1 then do $add operation otherwise remove it by $$REMOVE operator
await db.items.updateOne(
{ name: "test-document" },
[{
$set: {
"appliesTo.profiles.Profile2": {
$cond: [
{ $gt: ["$appliesTo.profiles.Profile2", 1] },
{ $add: ["$appliesTo.profiles.Profile2", -1] }
"$$REMOVE"
]
}
}
}]
)
Playground

Get top 100 documents based on multiple fields

I am trying to get the 100 documents from my DB based on the sum a few fields.
The data is similar to this:
{
"userID": "227837830704005140",
"shards": {
"ancient": {
"pulled": 410
},
"void": {
"pulled": 1671
},
"sacred": {
"pulled": 719
}
}
}
I want to sum the "pulled" number for the 3 types of shard, and then use that to determine the top 100.
I tried this in nodejs:
let top100: IShardData[] = await collection.find<IShardData>({}, {
projection: {
"shards.ancient.pulled": 1, "shards.void.pulled": 1, "shards.sacred.pulled": 1, orderBySumValue: { $add: ["$shards.ancient.pulled", "$shards.void.pulled", "$shards.sacred.pulled"] }
}, sort: { orderBySumValue: 1 }, limit: 100
}).toArray()
This connects to the DB, gets the right collection, and seems to sum up the values correctly but is not sorting them for the top 100 by the sum of the fields. I used this as a basis for my code: https://www.tutorialspoint.com/mongodb-order-by-two-fields-sum
Not sure what I need to do to make it work. Any help is appreciated.
Thanks in advance!

Here's one way to do it using an aggregation pipeline.
db.collection.aggregate([
{
// create new field for the sum
"$set": {
"pulledSum": {
"$sum": [
"$shards.ancient.pulled",
"$shards.void.pulled",
"$shards.sacred.pulled"
]
}
}
},
{
// sort on the sum
"$sort": {
"pulledSum": -1
}
},
{
// limit to the desired number
"$limit": 10
},
{
// don't return some fields
"$unset": [
"_id",
"pulledSum"
]
}
])
Try it on mongoplayground.net.

sort should be written with a dollar sign.
$sort: { orderBySumValue: 1 }

MongoDB locking database for a single document - is it optimizable?

I've got a mongo instance that serves dictionary knowledge to other services that concurrently read/update the collection. I've noticed (with slowlog) that some queries take a long time to do what they should - because they are locking the database and other queries queue. There are multiple indexes and compound index on the collection, so it only hits one document. There are aprox. 1million records in the collection.
I am bewildered why mongo needs to lock global, database AND collection? The modification only takes place on one document within one collection of one database.
Since I am using WiredTiger, I would assume the lock would only be on the document itself? How can I optimize the performance?
I don't care if other concurrent reads get an "old" version or that a concurrent write tries to create the same document twice (upsert) as the unique index would just throw an error and let my service handle it. Are there other optimizations I could consider?
the slowlog message I get is following (formatted for readability)
command myDB.words
command: findAndModify {
findandmodify: "words",
query: { lang: "en", name: "cat" },
new: true,
remove: false,
upsert: true,
fields: {},
update: {
$setOnInsert: { __v: 0 },
$inc: { count: 2 },
$addToSet: {
type: { $each: [ 0 ] },
occured: ObjectId('....')
},
$set: { lang: "en", name: "cat" }
},
writeConcern: { w: 1 }
}
planSummary:
IXSCAN { name: 1, lang: 1 }
update: { .... }
keysExamined:1
docsExamined:1
nMatched:1
nModified:1
keysInserted:1
numYields:0
reslen:2649211
locks:{
Global: { acquireCount: { r: 1, w: 1 } },
Database: { acquireCount: { w: 1 } },
Collection: { acquireCount: { w: 1 } }
}
protocol:op_query
754ms
Following indexes exist;
indexes on lang and name
compound index { name:1, lang:1 }
If it matters, i am using mongoose and the services connecting to mongdb are running on node8.4. MongoDB 3.4.7

How to create a computed field using another computed field in a Mongoose projection?

Consider a schema with the following properties:
{
maxDmg: Number,
attacks: Number
}
I need to create a computed field which includes the result from another computed field, something like:
$project {
maxDmg: true,
attacks: true,
effDmg: { $multiply: ["$maxDmg", 0.70] },
dps: { $divide: [ {$multiply:["$effDmg","$attacks"]}, 60 ] }
}
But it looks like the computed column 'effDmg' cannot be referenced within another computed column. Is there a workaround?

I don't think you can access existing computed varible, but you can create the same computed variable using let opertor and use in the another computed varible. Hope this helps.
$project: {
maxDmg: true,
attacks: true,
effDmg: {
$multiply: ["$maxDmg", 0.70]
},
dps: {
$let: {
vars: {
effDmg: {
$multiply: ["$maxDmg", 0.70]
}
},
in: {
$divide: [{
$multiply: ["$$effDmg", "$attacks"]
}, 60]
}
}
}
}
Update:
You can keep the computations inside one project and later you can project them out. This way atleast you end up reusing the variables.
$project: {
maxDmg: true,
attacks: true,
together: {
$let: {
vars: {
effDmg: {
$multiply: ["$maxDmg", 0.70]
}
},
in: {
dps: {
$divide: [{
$multiply: ["$$effDmg", "$attacks"]
}, 60]
},
effDmg: "$$effDmg"
}
}
}
}
Sample Output:
{
"maxDmg": 34,
"attacks": 45,
"together": {
"dps": 17.849999999999998,
"effDmg": 23.799999999999997
}
}

mongo cursor timeout

I am trying to aggregate some records in a mongo database using the node driver. I am first matching to org, fed, and sl fields (these are indexed). If I only include a few companies in the array that I am matching the org field to, the query runs fine and works as expected. However, when including all of the clients in the array, I always get:
MongoError: getMore: cursor didn't exist on server, possible restart or timeout?
I have tried playing with the allowDiskUse, and the batchSize settings, but nothing seems to work. With all the client strings in the array, the aggregation runs for ~5hours before throwing the cursor error. Any ideas? Below is the pipeline along with the actual aggregate command.
setting up the aggregation pipeline:
var aggQuery = [
{
$match: { //all clients, from last three days, and scored
org:
{ $in : array } //this is the array I am talking about
,
frd: {
$gte: _.last(util.lastXDates(3))
},
sl : true
}
}
, {
$group: { //group by isp and make fields for calculation
_id: "$gog",
count: {
$sum: 1
},
countRisky: {
$sum: {
$cond: {
if :{
$gte: ["$scr", 65]
},
then: 1,
else :0
}
}
},
countTimeZoneRisky: {
$sum: {
$cond: {
if :{
$eq: ["$gmt", "$gtz"]
},
then: 0,
else :1
}
}
}
}
}
, {
$match: { //show records with count >= 500
count: {
$gte: 500
}
}
}
, {
$project: { //rename _id to isp, only show relevent fields
_id: 0,
ISP: "$_id",
percentRisky: {
$multiply: [{
$divide: ["$countRisky", "$count"]
},
100
]
},
percentTimeZoneDiscrancy: {
$multiply: [{
$divide: ["$countTimeZoneRisky", "$count"]
},
100
]
},
count: 1
}
}
, {
$sort: { //sort by percent risky and then by count
percentRisky: 1,
count: 1
}
}
];
Running the aggregation:
var cursor = reportingCollections.hitColl.aggregate(aggQuery, {
allowDiskUse: true,
cursor: {
batchSize: 40000
}
});
console.log('Writing data to csv ' + currentFileNamePrefix + '!');
//iterate through cursor and write documents to CSV
cursor.each(function (err, document) {
//write each document to csv file
//maybe start a nuclear war
});

You're calling the aggregate method which doesn't return the cursor by default (like e.g. find()). To return query as a cursor, you must add the cursor option in the options. But, the timeout setting for the aggregation cursor is (currently) not supported. The native node.js driver only supports the batchSize setting.
You would set the batchOption like this:
var cursor = coll.aggregate(query, {cursor: {batchSize:100}}, writeResultsToCsv);

To circumvent such problems, I'd recommend aggregation or map-reduce directly through mongo client. There you can add the notimeout option.
The default timeout is 10 minutes (obviously useless for long time-consuming queries) and there's no way currently to set a different one as far as I know, only infinite by aforementioned option. The timeout hits you especially for high batch sizes, because it will take more than 10 mins to process the incoming docs and before you ask mongo server for more, the cursor has been deleted.
IDK your use case, but if it's a web view, there should be only fast queries/aggregations.
BTW I think this didn't change with 3.0.*

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Mongodb parallel queries via nodejs - node.js

Related

MongoDB Remove object key based on value

Get top 100 documents based on multiple fields

MongoDB locking database for a single document - is it optimizable?

How to create a computed field using another computed field in a Mongoose projection?

mongo cursor timeout

Categories

Resources