Mongo custom multikey sorting - node.js

Mongo docs state:
The Mongo multikey feature can automatically index arrays of values.
That's nice. But how about sorting based on multikeys? More specifically, how to sort a collection according to array match percentage?
For example, I have a pattern [ 'fruit', 'citrus' ] and a collection, that looks like this:
{
title: 'Apples',
tags: [ 'fruit' ]
},
{
title: 'Oranges',
tags: [ 'fruit', 'citrus' ]
},
{
title: 'Potato',
tags: [ 'vegetable' ]
}
Now, I want to sort the collection according to match percentage of each entry to the tags pattern. Oranges must come first, apples second and potatoes last.
What's the most efficient and easy way to do it?

As of MongoDB 2.1 a similar computation can be done using the aggregation framework. The syntax is something like
db.fruits.aggregate(
{$match : {tags : {$in : ["fruit", "citrus"]}}},
{$unwind : "$tags"},
{$group : {_id : "$title", numTagMatches : {$sum : 1}}},
{$sort : {numTagMatches : -1}} )
which returns
{
"_id" : "Oranges",
"numTagMatches" : 2
},
{
"_id" : "Apples",
"numTagMatches" : 1
}
This should be much faster than the map-reduce method for two reasons. First because the implementation is native C++ rather than javascript. Second, because "$match" will filter out the items which don't match at all (if this is not what you want, you can leave out the "$match" part, and change the "$sum" part to be either 1 or 0 depending on if the tag is equal to "fruit" or "citrus" or neither).
The only caveat here is that mongo 2.1 isn't recommended for production yet. If you're running in production you'll need to wait for 2.2. But if you're just experimenting on your own you can play around with 2.1, as the aggregation framework should be more performant.

Note: The following explanation is required for Mongo 2.0 and earlier. For later versions you should consider the new aggregation framework.
We do something similar while trying to fuzzy-match input sentence which we index. You can use map reduce to emit the object ID every time you get a match and them sum them up. You'll then need to load the results into your client and sort by the highest value first.
db.plants.mapReduce(
function () {
var matches = 0;
for (var i = 0; i < targetTerms.length; i++) {
var term = targetTerms[i];
for (var j = 0; j < this.tags.length; j++) {
matches += Number(term === this.tags[j]);
}
}
emit(this._id, matches);
},
function (prev, curr) {
var result = 0;
for (var i = 0; i < curr.length; i++) {
result += curr[i];
}
return result;
},
{
out: { inline: 1 },
scope: {
targetTerms: [ 'fruit', 'oranges' ],
}
}
);
You would have you pass your ['fruit', 'citrus' ] input values using the scope parameter in the map reduce call as {targetTerms: ['fruit', 'citrus' ]} so that they are available in the map function above.

Related

Update collection to change the rank

i have a mongodb collection that I sort by the amount of points each item has, and it shows a rank according to it's place in the collection :
db.collection('websites').find({}).sort({ "points": -1 }).forEach(doc => {
rank++;
doc.rank = rank;
delete doc._id;
console.log(doc)
Si I thought to myself : Ok, I'm gonna update the rank in the collection, so I added this :
db.collection('websites').updateMany({},
{ $set: { rank: doc.rank } }
)
But I was too good to be true, and it updates every single item with the same rank, which changes at each refresh, what exactly is going on, here ?
EDIT : I managed to do it by doing this :
rank = 0;
db.collection('websites').find({}).sort({ "points": -1 }).forEach(doc => {
rank++;
doc.rank = rank;
//delete doc._id;
console.log(doc._id);
db.collection('websites').updateMany({_id : doc._id},
{ $set: { rank: doc.rank } },
{ upsert: true }
)
})
Try this:
db.collection('websites')
.updateOne( //update only one
{rank: doc.rank}, //update the one where rank is the sent in parameter doc.rank
{ $set: { rank: doc.rank } } // if multiple docs have the same rank you should send in more parameters
)
db.collection('websites').updateMany({/*All docs match*/},
{ $set: { rank: doc.rank } }
)
Reason it updates same rank because you have no filter which means it matches all docs in the collection and you have updateMany
You need to set a filter to restrict docs to be updated.
db.collection('websites').updateMany({id: "someID"},
{ $set: { rank: doc.rank } }
)
The OP states we want to sort all the docs by points, then "rerank" them from 1 to n in that order and update the DB. Here is an example of where "aggregate is the new update" thanks to the power of $merge onto the same collection as the input:
db.foo.aggregate([
// Get everything in descending order...
{$sort: {'points':-1}}
// ... and turn it into a big array:
,{$group: {_id:null, X:{$push: '$$ROOT'}}}
// Walk the array and incrementally set rank. The input arg
// is $X and we set $X so we are overwriting the old X:
,{$addFields: {X: {$function: {
body: function(items) {
for(var i = 0; i < items.length; i++) {
items[i]['rank'] = (i+1);
}
return items;
},
args: [ '$X' ],
lang: "js"
}}
}}
// Get us back to regular docs, not an array:
,{$unwind: '$X'}
,{$replaceRoot: {newRoot: '$X'}}
// ... and update everything:
,{$merge: {
into: "foo",
on: [ "_id" ],
whenMatched: "merge",
whenNotMatched: "fail"
}}
]);
If using $function spooks you, you can use a somewhat more obtuse approach with $reduce as a stateful for loop substitute. To better understand what is happening, block comment with /* */ the stages below $group and one by one uncomment each successive stage to see how that operator is affecting the pipeline.
db.foo.aggregate([
// Get everything in descending order...
{$sort: {'points':-1}}
// ... and turn it into a big array:
,{$group: {_id:null, X:{$push: '$$ROOT'}}}
// Use $reduce as a for loop with state.
,{$addFields: {X: {$reduce: {
input: '$X',
// The value (stateful) part of the loop will contain a
// counter n and the array newX which we will rebuild with
// the incremental rank:
initialValue: {
n:0,
newX:[]
},
in: {$let: {
vars: {qq:{$add:['$$value.n',1]}}, // n = n + 1
in: {
n: '$$qq',
newX: {$concatArrays: [
'$$value.newX',
// A little weird but this means "take the
// current item in the array ($$this) and
// set $$this.rank = $qq by merging it into the
// item. This results in a new object but
// $concatArrays needs an array so wrap it
// with [ ]":
[ {$mergeObjects: ['$$this',{rank:'$$qq'}]} ]
]}
}
}}
}}
}}
,{$unwind: '$X.newX'}
,{$replaceRoot: {newRoot: '$X.newX'}}
,{$merge: {
into: "foo",
on: [ "_id" ],
whenMatched: "merge",
whenNotMatched: "fail"
}}
]);
The problem here is that mongo is using the same doc.rank value to update all the records that match the filter criteria (all records in your case). Now you have two options to resolve the issue -
Works but is less efficient) - Idea here is that you need to calculate the rank for each website that you want to update. loop throuh all the document and run below query which will update every document with it's calculated rank. You could probably think that this is inefficient and you would be right. We are making large number of network calls to update the records. Worse part is that the slowness is unbounded and will get slower as number of records increases.
db.collection('websites')
.updateOne(
{ id: 'docIdThatNeedsToBeUpdated'},
{ $set: { rank: 'calculatedRankOfTheWebsite' } }
)
Efficient option - Use the same technique to calculate the rank for each website and loop through it to generate the update statement as above. But this time you would not make the update calls separately for all the websites. Rather you would use Bulk update technique. You add all your update statement to a batch and execute them all at one go.
//loop and use below two line to add the statements to a batch.
var bulk = db.websites.initializeUnorderedBulkOp();
bulk.find({ id: 'docIdThatNeedsToBeUpdated' })
.updateOne({
$set: {
rank: 'calculatedRankOfTheWebsite'
}
});
//execute all of the statement at one go outside of the loop
bulk.execute();
I managed to do it by doing:
rank = 0;
db.collection('websites').find({}).sort({ "points": -1 }).forEach(doc => {
rank++;
doc.rank = rank;
//delete doc._id;
console.log(doc._id);
db.collection('websites').updateMany({_id : doc._id},
{ $set: { rank: doc.rank } },
{ upsert: true }
)
})
Thank you everyone !

how to remove object in array by index mongodb / mongoose [duplicate]

In the following example, assume the document is in the db.people collection.
How to remove the 3rd element of the interests array by it's index?
{
"_id" : ObjectId("4d1cb5de451600000000497a"),
"name" : "dannie",
"interests" : [
"guitar",
"programming",
"gadgets",
"reading"
]
}
This is my current solution:
var interests = db.people.findOne({"name":"dannie"}).interests;
interests.splice(2,1)
db.people.update({"name":"dannie"}, {"$set" : {"interests" : interests}});
Is there a more direct way?
There is no straight way of pulling/removing by array index. In fact, this is an open issue http://jira.mongodb.org/browse/SERVER-1014 , you may vote for it.
The workaround is using $unset and then $pull:
db.lists.update({}, {$unset : {"interests.3" : 1 }})
db.lists.update({}, {$pull : {"interests" : null}})
Update: as mentioned in some of the comments this approach is not atomic and can cause some race conditions if other clients read and/or write between the two operations. If we need the operation to be atomic, we could:
Read the document from the database
Update the document and remove the item in the array
Replace the document in the database. To ensure the document has not changed since we read it, we can use the update if current pattern described in the mongo docs
You can use $pull modifier of update operation for removing a particular element in an array. In case you provided a query will look like this:
db.people.update({"name":"dannie"}, {'$pull': {"interests": "guitar"}})
Also, you may consider using $pullAll for removing all occurrences. More about this on the official documentation page - http://www.mongodb.org/display/DOCS/Updating#Updating-%24pull
This doesn't use index as a criteria for removing an element, but still might help in cases similar to yours. IMO, using indexes for addressing elements inside an array is not very reliable since mongodb isn't consistent on an elements order as fas as I know.
in Mongodb 4.2 you can do this:
db.example.update({}, [
{$set: {field: {
$concatArrays: [
{$slice: ["$field", P]},
{$slice: ["$field", {$add: [1, P]}, {$size: "$field"}]}
]
}}}
]);
P is the index of element you want to remove from array.
If you want to remove from P till end:
db.example.update({}, [
{ $set: { field: { $slice: ["$field", 1] } } },
]);
Starting in Mongo 4.4, the $function aggregation operator allows applying a custom javascript function to implement behaviour not supported by the MongoDB Query Language.
For instance, in order to update an array by removing an element at a given index:
// { "name": "dannie", "interests": ["guitar", "programming", "gadgets", "reading"] }
db.collection.update(
{ "name": "dannie" },
[{ $set:
{ "interests":
{ $function: {
body: function(interests) { interests.splice(2, 1); return interests; },
args: ["$interests"],
lang: "js"
}}
}
}]
)
// { "name": "dannie", "interests": ["guitar", "programming", "reading"] }
$function takes 3 parameters:
body, which is the function to apply, whose parameter is the array to modify. The function here simply consists in using splice to remove 1 element at index 2.
args, which contains the fields from the record that the body function takes as parameter. In our case "$interests".
lang, which is the language in which the body function is written. Only js is currently available.
Rather than using the unset (as in the accepted answer), I solve this by setting the field to a unique value (i.e. not NULL) and then immediately pulling that value. A little safer from an asynch perspective. Here is the code:
var update = {};
var key = "ToBePulled_"+ new Date().toString();
update['feedback.'+index] = key;
Venues.update(venueId, {$set: update});
return Venues.update(venueId, {$pull: {feedback: key}});
Hopefully mongo will address this, perhaps by extending the $position modifier to support $pull as well as $push.
I would recommend using a GUID (I tend to use ObjectID) field, or an auto-incrementing field for each sub-document in the array.
With this GUID it is easy to issue a $pull and be sure that the correct one will be pulled. Same goes for other array operations.
For people who are searching an answer using mongoose with nodejs. This is how I do it.
exports.deletePregunta = function (req, res) {
let codTest = req.params.tCodigo;
let indexPregunta = req.body.pregunta; // the index that come from frontend
let inPregunta = `tPreguntas.0.pregunta.${indexPregunta}`; // my field in my db
let inOpciones = `tPreguntas.0.opciones.${indexPregunta}`; // my other field in my db
let inTipo = `tPreguntas.0.tipo.${indexPregunta}`; // my other field in my db
Test.findOneAndUpdate({ tCodigo: codTest },
{
'$unset': {
[inPregunta]: 1, // put the field with []
[inOpciones]: 1,
[inTipo]: 1
}
}).then(()=>{
Test.findOneAndUpdate({ tCodigo: codTest }, {
'$pull': {
'tPreguntas.0.pregunta': null,
'tPreguntas.0.opciones': null,
'tPreguntas.0.tipo': null
}
}).then(testModificado => {
if (!testModificado) {
res.status(404).send({ accion: 'deletePregunta', message: 'No se ha podido borrar esa pregunta ' });
} else {
res.status(200).send({ accion: 'deletePregunta', message: 'Pregunta borrada correctamente' });
}
})}).catch(err => { res.status(500).send({ accion: 'deletePregunta', message: 'error en la base de datos ' + err }); });
}
I can rewrite this answer if it dont understand very well, but I think is okay.
Hope this help you, I lost a lot of time facing this issue.
It is little bit late but some may find it useful who are using robo3t-
db.getCollection('people').update(
{"name":"dannie"},
{ $pull:
{
interests: "guitar" // you can change value to
}
},
{ multi: true }
);
If you have values something like -
property: [
{
"key" : "key1",
"value" : "value 1"
},
{
"key" : "key2",
"value" : "value 2"
},
{
"key" : "key3",
"value" : "value 3"
}
]
and you want to delete a record where the key is key3 then you can use something -
db.getCollection('people').update(
{"name":"dannie"},
{ $pull:
{
property: { key: "key3"} // you can change value to
}
},
{ multi: true }
);
The same goes for the nested property.
this can be done using $pop operator,
db.getCollection('collection_name').updateOne( {}, {$pop: {"path_to_array_object":1}})

Query for a list contained in another list in mongodb

I'm fairly new to mongo and while I can manage to do most basic operations with the $in, $or, $all, ect I can't make what I want to work.
I'll basically put a simple form of my problem. Part of my documents are list of number, eg :
{_id:1,list:[1,4,3,2]}
{_id:2,list:[1]}
{_id:3,list:[1,3,4,6]}
I want a query that given a list(lets call it L), would return me every document where their entire list is in L
for example with the given list L = [1,2,3,4,5] I want document with _id 1 and 2 to be returned. 3 musn't be returned since 6 isn't in L.
"$in" doesn't work because it would also return _id 3 and "$all" doesn't work either because it would only return _id 1.
I then thought of "$where" but I can't seem to find how to bound an external variable to the js code. What I call by that is that for example :
var L = [1,2,3,4,5];
db.collections('myCollection').find({$where:function(l){
// return something with the list "l" there
}.bind(null,list)})
I tried to bind list to the function as showed up there but to no avail ...
I'd glady appreciate any hint concerning this issue, thanks.
There's a related question Check if every element in array matches condition with an answer with a nice approach for this scenario. It refers to an array of embedded documents but can be adapted for your scenario like this:
db.list.find({
"list" : { $not : { $elemMatch : { $nin : [1,2,3,4,5] } } },
"list.0" : { $exists: true }
})
ie. the list must not have any element that is not in [1,2,3,4,5] and the list must exist with at least 1 element (assuming that's also a requirement).
You could try using the aggregation framework for this where you can make use of the set operators to achieve this, in particular you would need the $setIsSubset operator which returns true if all elements of the first set appear in the second set, including when the first set equals the second set; i.e. not a strict subset.
For example:
var L = [1,2,3,4,5];
db.collections('myCollection').aggregate([
{
"$project": {
"list": 1,
"isSubsetofL": {
"$setIsSubset": [ "$list", L ]
}
}
},
{
"$match": {
"isSubsetofL": true
}
}
])
Result:
/* 0 */
{
"result" : [
{
"_id" : 1,
"list" : [
1,
4,
3,
2
],
"isSubsetofL" : true
},
{
"_id" : 2,
"list" : [
1
],
"isSubsetofL" : true
}
],
"ok" : 1
}

Remove duplicate array objects mongodb

I have an array and it contains duplicate values in BOTH the ID's, is there a way to remove one of the duplicate array item?
userName: "abc",
_id: 10239201141,
rounds:
[{
"roundId": "foo",
"money": "123
},// Keep one of these
{// Keep one of these
"roundId": "foo",
"money": "123
},
{
"roundId": "foo",
"money": "321 // Not a duplicate.
}]
I'd like to remove one of the first two, and keep the third because the id and money are not duplicated in the array.
Thank you in advance!
Edit I found:
db.users.ensureIndex({'rounds.roundId':1, 'rounds.money':1}, {unique:true, dropDups:true})
This doesn't help me. Can someone help me? I spent hours trying to figure this out.
The thing is, I ran my node.js website on two machines so it was pushing the same data twice. Knowing this, the duplicate data should be 1 index away. I made a simple for loop that can detect if there is duplicate data in my situation, how could I implement this with mongodb so it removes an array object AT that array index?
for (var i in data){
var tempRounds = data[i]['rounds'];
for (var ii in data[i]['rounds']){
var currentArrayItem = data[i]['rounds'][ii - 1];
if (tempRounds[ii - 1]) {
if (currentArrayItem.roundId == tempRounds[ii - 1].roundId && currentArrayItem.money == tempRounds[ii - 1].money) {
console.log("Found a match");
}
}
}
}
Use an aggregation framework to compute a deduplicated version of each document:
db.test.aggregate([
{ "$unwind" : "$stats" },
{ "$group" : { "_id" : "$_id", "stats" : { "$addToSet" : "$stats" } } }, // use $first to add in other document fields here
{ "$out" : "some_other_collection_name" }
])
Use $out to put the results in another collection, since aggregation cannot update documents. You can use db.collection.renameCollection with dropTarget to replace the old collection with the new deduplicated one. Be sure you're doing the right thing before you scrap the old data, though.
Warnings:
1: This does not preserve the order of elements in the stats array. If you need to preserve order, you will have retrieve each document from the database, manually deduplicate the array client-side, then update the document in the database.
2: The following two objects won't be considered duplicates of each other:
{ "id" : "foo", "price" : 123 }
{ "price" : 123, "id" : foo" }
If you think you have mixed key orders, use a $project to enforce a key order between the $unwind stage and the $group stage:
{ "$project" : { "stats" : { "id_" : "$stats.id", "price_" : "$stats.price" } } }
Make sure to change id -> id_ and price -> price_ in the rest of the pipeline and rename them back to id and price at the end, or rename them in another $project after the swap. I discovered that, if you do not give different names to the fields in the project, it doesn't reorder them, even though key order is meaningful in an object in MongoDB:
> db.test.drop()
> db.test.insert({ "a" : { "x" : 1, "y" : 2 } })
> db.test.aggregate([
{ "$project" : { "_id" : 0, "a" : { "y" : "$a.y", "x" : "$a.x" } } }
])
{ "a" : { "x" : 1, "y" : 2 } }
> db.test.aggregate([
{ "$project" : { "_id" : 0, "a" : { "y_" : "$a.y", "x_" : "$a.x" } } }
])
{ "a" : { "y_" : 2, "x_" : 1 } }
Since the key order is meaningful, I'd consider this a bug, but it's easy to work around.

Node, MongoDB (mongoose) distinct count

I have a collection with multiple documents and every one of them has and 'eID' field that is not unique. I want to get the count for all the distinct 'eID'.
Example: if there are 5 documents with the 'eID' = ObjectID(123) and 2 documents with 'eID' = ObjectID(321) I want to output something like:
{
ObjectID(123): 5,
ObjectID(321): 2
}
I don't know if that can be done in the same query but after knowing what are the most ocurring eID's I want to fetch the referenced documents using the ObjectID
Mongoose version 3.8.8
$status is the specific field of collection that i need to count distinct number of element.
var agg = [
{$group: {
_id: "$status",
total: {$sum: 1}
}}
];
model.Site.aggregate(agg, function(err, logs){
if (err) { return res.json(err); }
return res.json(logs);
});
//output
[
{
"_id": "plan",
"total": 3
},
{
"_id": "complete",
"total": 4
},
{
"_id": "hault",
"total": 2
},
{
"_id": "incomplete",
"total": 4
}
]
This answer is not in terms of how this query can be written via mongoose, but I am familiar with the nodejs MongoClient class if you have further questions regarding implementation.
The best (most optimal) way I can think of doing this is to use mapReduce or aggregation on your database. The closest thing to a single command would be the distinct command, which can be invoked on collections, but this will only give you an array of distinct values for the eID key.
See here: http://docs.mongodb.org/manual/core/map-reduce/
For your specific problem, you will want your map and reduce functions roughly as follows:
var map = function() {
var value = 1;
emit(this.eID, value);
};
var reduce = function(key, values) {
var result = 0;
for(var i=-1;++i<values.length;){
var value = values[i];
result += value;
};
return result;
};
There might be an easier way to do this using the aggregation pipeline (I would post the link but I don't have enough reputation).
I also found the mapReduce command for mongoose: http://mongoosejs.com/docs/api.html#model_Model.mapReduce

Resources