mongodb - most efficient way of calculating missing indices in sequence - node.js

Given a collection with lets say 1.000.000 entries and each of them have their own unique property called number which is indexed. How can I efficiently find the lowest gap in the number sequence.
An easy example would be a sequence of indexes like: 1,2,3,4,6,7,10, where I would like to get back the number 5 since this will be the lowest missing number in the sequence.
Is there a possible way (maybe aggregation) without the need to query all numbers.

One way of doing this would be with a cursor. With a cursor, you can manually iterate through the documents until you find one that matches your criteria.
var cursor = db.coll.find({}).sort({number: 1});
var prev = null
while (cusor.hasNext()) {
var curr = cursor.getNext()
if (prev && prev.number + 1 !== curr.number) break;
prev = curr;
}

One is get all the numbers and find the ones missing between them.
An aggregate example that you can use to not have to get them all. https://www.mongodb.com/community/forums/t/query-to-find-missing-sequence/123771/2
// Assuming the sample data with sequence numbers from 1 thru 10 as follows:
{ id: 1 },
{ id: 2 },
{ id: 4 },
{ id: 7 },
{ id: 9 },
{ id: 10 }
// And, note the missing numbers are 3, 5, 6 and 8. You can use the following aggregation to find them:
db.collection.aggregate([
{
$group: {
_id: null,
nos: { $push: "$id" }
}
},
{
$addFields: {
missing: { $setDifference: [ { $range: [ 1, 11 ] }, "$nos" ] }
}
}
])

Related

How do I get count in MongoDB based on specific fields

I have documents like this in my MongoDB Listings collection.
listingID: 'abcd',
listingData: {
category: 'resedetial'
},
listingID: 'xyz',
listingData: {
category: 'resedetial'
},
listingID: 'efgh',
listingData: {
category: 'office'
}
I am trying to get total count of all listings and count according to category.
I can get total count of listings with aggregation query. But I am not sure how to get output like this resedentialCount: 2, officeCount: 1 , ListingsCount: 3
This is my aggregation query
{
$match: {
listingID,
},
},
{
$group: {
_id: 1,
ListingsCount: { $sum: 1 },
},
}
Try this:
let listingAggregationCursor = db.collection.aggregate([
{$group: {_id:"$listingData.category",ListingsCount:{$sum:1} }}
])
let listingAggregation=await listingAggregationCursor.toArray();
(I got this query from https://www.statology.org/mongodb-group-by-count)
This will give you an array of objects with each listing category as well as how many times they occur.
For getting the total listingsCount, sum up all of the count fields from the array of objects. You can do that like this:
let listingsCount=0;
for(listingCategory of listingAggregation) {
listingsCount+=listingCategory.count;
}
You should have the data you need at this point. Now it's just a matter of extracting and formatting it as you see fit.
Hope this helps!

how to select random documents with some conditions fulfilled in MongoDB

Basically I have documents in which I have on field called "Difficulty Level" and value of this filed is between 1 to 10 for each documents.
So, I have to select random 10 or 20 documents so that in randomly selected documents , atleast 1 document should be there for each difficulty level i.e. from 1 to 10. means there should atlease one document with "Difficulty level" : 1 ,"Difficulty level" : 2 ,"Difficulty level" : 3 ............."Difficulty level" : 10.
So, How can I select documents randomly with this condition fulfilled ?
Thanks
I tried $rand operator for selecting random documents but can't getting solution for that condition.
If I've understood correctly you can try something like this:
The goal here is to create a query like this example
This query gets two random elements using $sample, one for level1 and another for level2. And using $facet you can get multiple results.
db.collection.aggregate([
{
"$facet": {
"difficulty_level_1": [
{
"$match": { "difficulty_level": 1 } },
{ "$sample": { "size": 1 } }
],
"difficulty_level_2": [
{ "$match": { "difficulty_level": 2 } },
{ "$sample": { "size": 1 } }
]
}
}
])
So the point is to do this query in a dynamic way. So you can use JS to create the object query an pass it to the mongo call.
const random = Math.floor((Math.random()*10)+1) // Or wathever to get the random number
let query = {"$facet":{}}
for(let i = 1 ; i <= random; i++){
const difficulty_level = `difficulty_level_${i}`
query["$facet"][difficulty_level] = [
{ $match: { difficulty_level: i }},
{ $sample: { size: 1 }}
]
}
console.log(query) // This output can be used in mongoplayground and it works!
// To use the query you can use somethiing like this (or other way you call the DB)
this.db.aggregate([query])

Update collection to change the rank

i have a mongodb collection that I sort by the amount of points each item has, and it shows a rank according to it's place in the collection :
db.collection('websites').find({}).sort({ "points": -1 }).forEach(doc => {
rank++;
doc.rank = rank;
delete doc._id;
console.log(doc)
Si I thought to myself : Ok, I'm gonna update the rank in the collection, so I added this :
db.collection('websites').updateMany({},
{ $set: { rank: doc.rank } }
)
But I was too good to be true, and it updates every single item with the same rank, which changes at each refresh, what exactly is going on, here ?
EDIT : I managed to do it by doing this :
rank = 0;
db.collection('websites').find({}).sort({ "points": -1 }).forEach(doc => {
rank++;
doc.rank = rank;
//delete doc._id;
console.log(doc._id);
db.collection('websites').updateMany({_id : doc._id},
{ $set: { rank: doc.rank } },
{ upsert: true }
)
})
Try this:
db.collection('websites')
.updateOne( //update only one
{rank: doc.rank}, //update the one where rank is the sent in parameter doc.rank
{ $set: { rank: doc.rank } } // if multiple docs have the same rank you should send in more parameters
)
db.collection('websites').updateMany({/*All docs match*/},
{ $set: { rank: doc.rank } }
)
Reason it updates same rank because you have no filter which means it matches all docs in the collection and you have updateMany
You need to set a filter to restrict docs to be updated.
db.collection('websites').updateMany({id: "someID"},
{ $set: { rank: doc.rank } }
)
The OP states we want to sort all the docs by points, then "rerank" them from 1 to n in that order and update the DB. Here is an example of where "aggregate is the new update" thanks to the power of $merge onto the same collection as the input:
db.foo.aggregate([
// Get everything in descending order...
{$sort: {'points':-1}}
// ... and turn it into a big array:
,{$group: {_id:null, X:{$push: '$$ROOT'}}}
// Walk the array and incrementally set rank. The input arg
// is $X and we set $X so we are overwriting the old X:
,{$addFields: {X: {$function: {
body: function(items) {
for(var i = 0; i < items.length; i++) {
items[i]['rank'] = (i+1);
}
return items;
},
args: [ '$X' ],
lang: "js"
}}
}}
// Get us back to regular docs, not an array:
,{$unwind: '$X'}
,{$replaceRoot: {newRoot: '$X'}}
// ... and update everything:
,{$merge: {
into: "foo",
on: [ "_id" ],
whenMatched: "merge",
whenNotMatched: "fail"
}}
]);
If using $function spooks you, you can use a somewhat more obtuse approach with $reduce as a stateful for loop substitute. To better understand what is happening, block comment with /* */ the stages below $group and one by one uncomment each successive stage to see how that operator is affecting the pipeline.
db.foo.aggregate([
// Get everything in descending order...
{$sort: {'points':-1}}
// ... and turn it into a big array:
,{$group: {_id:null, X:{$push: '$$ROOT'}}}
// Use $reduce as a for loop with state.
,{$addFields: {X: {$reduce: {
input: '$X',
// The value (stateful) part of the loop will contain a
// counter n and the array newX which we will rebuild with
// the incremental rank:
initialValue: {
n:0,
newX:[]
},
in: {$let: {
vars: {qq:{$add:['$$value.n',1]}}, // n = n + 1
in: {
n: '$$qq',
newX: {$concatArrays: [
'$$value.newX',
// A little weird but this means "take the
// current item in the array ($$this) and
// set $$this.rank = $qq by merging it into the
// item. This results in a new object but
// $concatArrays needs an array so wrap it
// with [ ]":
[ {$mergeObjects: ['$$this',{rank:'$$qq'}]} ]
]}
}
}}
}}
}}
,{$unwind: '$X.newX'}
,{$replaceRoot: {newRoot: '$X.newX'}}
,{$merge: {
into: "foo",
on: [ "_id" ],
whenMatched: "merge",
whenNotMatched: "fail"
}}
]);
The problem here is that mongo is using the same doc.rank value to update all the records that match the filter criteria (all records in your case). Now you have two options to resolve the issue -
Works but is less efficient) - Idea here is that you need to calculate the rank for each website that you want to update. loop throuh all the document and run below query which will update every document with it's calculated rank. You could probably think that this is inefficient and you would be right. We are making large number of network calls to update the records. Worse part is that the slowness is unbounded and will get slower as number of records increases.
db.collection('websites')
.updateOne(
{ id: 'docIdThatNeedsToBeUpdated'},
{ $set: { rank: 'calculatedRankOfTheWebsite' } }
)
Efficient option - Use the same technique to calculate the rank for each website and loop through it to generate the update statement as above. But this time you would not make the update calls separately for all the websites. Rather you would use Bulk update technique. You add all your update statement to a batch and execute them all at one go.
//loop and use below two line to add the statements to a batch.
var bulk = db.websites.initializeUnorderedBulkOp();
bulk.find({ id: 'docIdThatNeedsToBeUpdated' })
.updateOne({
$set: {
rank: 'calculatedRankOfTheWebsite'
}
});
//execute all of the statement at one go outside of the loop
bulk.execute();
I managed to do it by doing:
rank = 0;
db.collection('websites').find({}).sort({ "points": -1 }).forEach(doc => {
rank++;
doc.rank = rank;
//delete doc._id;
console.log(doc._id);
db.collection('websites').updateMany({_id : doc._id},
{ $set: { rank: doc.rank } },
{ upsert: true }
)
})
Thank you everyone !

How to update multiple fields (maxLength, maxBreadth or maxArea) of a document based on multiple conditions sent from req.body?

I want to store maximum length, breadth, area ever encountered from a request and store it in the database. In this case, anything can be maximum and based on that I want to update max values in the database for that particular field only.
const body = {
oID: 123, // Primary key
length: 50,
breadth: 50
};
const { length, breadth } = body;
const currentArea = length * breadth;
await totalArea.updateOne(
{ oID: 123 },
{
$set: { $maxLength: maxLength < length ? length : $maxLength }, // I want to update the docs based on mongo query only since this won't work.
$set: { $maxBreadth: maxBreadth < breadth ? breadth : $maxBreadth }, // I want to update the docs based on mongo query only since this won't work.
$set: { $maxArea: maxArea < currentArea ? currentArea : $maxArea } // I want to update the docs based on mongo query only since this won't work.
},
{ upsert: true }
);
In the above example, I have demonstrated the logic using ternary operator and I want to perform the same operation using mongoDB query. i.e update a particular field while comparing it to existing fields from the database and update it if it satisfies the condition.
await totalArea.updateOne(
{ oID: 123
},
{
$max: {
maxLength: length,
maxBreadth: breadth,
maxArea : currentArea
}
},
{ upsert: true }
);
I found this to be working correctly. Thanks to #Mehari Mamo's comment.
If I understand correctly, you want something like this:
db.collection.update({
oID: 123
},
[{$set: {maxLength: {$max: [length, "$maxLength"]},
maxBreadth: {$max: [breadth, "$maxBreadth"]},
maxArea : {$max: [currentArea, "$maxArea "]}
}
}
],
{
upsert: true
})
You can check it here .
The [] on the second step allows you to access the current values of the document fields, as this is an aggregation pipeline.

Node.js relocate hash elements according to value order

I wonder whether it's possible to change the order of hash elements based on the orders for values.
for example,
a = { a:3, b:1, c:2}
a = sort_on_values(a)
a = { b:1, c:2, a:3}
It is my understanding that properties on an objects are a set, meaning they have no order, so trying to sort them is not worthwhile.
The particular implementation (node.js) may happen to always return them in insertion order, in which case you are lucky, but I would not rely on that.
If you want an ordered list, then use an ordered list like an array.
For example:
var array = [ { a: 3 }, { b: 1 }, { c: 2 } ];
array.sort(function (a, b) {
return a[Object.keys(a)[0]] - b[Object.keys(b)[0]];
});
console.log(array);
prints out something like
[
{ b: 1 },
{ c: 2 },
{ a: 3 }
]

Resources