Remove duplicate array objects mongodb - node.js

I have an array and it contains duplicate values in BOTH the ID's, is there a way to remove one of the duplicate array item?
userName: "abc",
_id: 10239201141,
rounds:
[{
"roundId": "foo",
"money": "123
},// Keep one of these
{// Keep one of these
"roundId": "foo",
"money": "123
},
{
"roundId": "foo",
"money": "321 // Not a duplicate.
}]
I'd like to remove one of the first two, and keep the third because the id and money are not duplicated in the array.
Thank you in advance!
Edit I found:
db.users.ensureIndex({'rounds.roundId':1, 'rounds.money':1}, {unique:true, dropDups:true})
This doesn't help me. Can someone help me? I spent hours trying to figure this out.
The thing is, I ran my node.js website on two machines so it was pushing the same data twice. Knowing this, the duplicate data should be 1 index away. I made a simple for loop that can detect if there is duplicate data in my situation, how could I implement this with mongodb so it removes an array object AT that array index?
for (var i in data){
var tempRounds = data[i]['rounds'];
for (var ii in data[i]['rounds']){
var currentArrayItem = data[i]['rounds'][ii - 1];
if (tempRounds[ii - 1]) {
if (currentArrayItem.roundId == tempRounds[ii - 1].roundId && currentArrayItem.money == tempRounds[ii - 1].money) {
console.log("Found a match");
}
}
}
}

Use an aggregation framework to compute a deduplicated version of each document:
db.test.aggregate([
{ "$unwind" : "$stats" },
{ "$group" : { "_id" : "$_id", "stats" : { "$addToSet" : "$stats" } } }, // use $first to add in other document fields here
{ "$out" : "some_other_collection_name" }
])
Use $out to put the results in another collection, since aggregation cannot update documents. You can use db.collection.renameCollection with dropTarget to replace the old collection with the new deduplicated one. Be sure you're doing the right thing before you scrap the old data, though.
Warnings:
1: This does not preserve the order of elements in the stats array. If you need to preserve order, you will have retrieve each document from the database, manually deduplicate the array client-side, then update the document in the database.
2: The following two objects won't be considered duplicates of each other:
{ "id" : "foo", "price" : 123 }
{ "price" : 123, "id" : foo" }
If you think you have mixed key orders, use a $project to enforce a key order between the $unwind stage and the $group stage:
{ "$project" : { "stats" : { "id_" : "$stats.id", "price_" : "$stats.price" } } }
Make sure to change id -> id_ and price -> price_ in the rest of the pipeline and rename them back to id and price at the end, or rename them in another $project after the swap. I discovered that, if you do not give different names to the fields in the project, it doesn't reorder them, even though key order is meaningful in an object in MongoDB:
> db.test.drop()
> db.test.insert({ "a" : { "x" : 1, "y" : 2 } })
> db.test.aggregate([
{ "$project" : { "_id" : 0, "a" : { "y" : "$a.y", "x" : "$a.x" } } }
])
{ "a" : { "x" : 1, "y" : 2 } }
> db.test.aggregate([
{ "$project" : { "_id" : 0, "a" : { "y_" : "$a.y", "x_" : "$a.x" } } }
])
{ "a" : { "y_" : 2, "x_" : 1 } }
Since the key order is meaningful, I'd consider this a bug, but it's easy to work around.

Related

How to merge data from another collection to an array of keys or ids?

Here is the problem I am facing:
I am using arangob 3.7 and arangojs driver.
I have following collections:
collection A { _key, data }
collection B { _key, aDataList[A._key] }
I have tried the following
FOR bdoc IN B
FILTER bdoc._key == "some_key"
FOR adoc IN A
FILTER adoc._key IN bdoc.aDataList[*]
RETURN MERGE(bdoc, adoc)
This query returns the objects which falls in to the criteria specified.
But the problem I am facing is the bdoc.aDataList[] order is not same as the one in the actual B document collection.
Lets say here is the sample list:
bdoc.aDataList[ 1, 2, 3 ]
How it need to be updated?
bdoc.aDataList[
{
"_key" : 1,
"data" : "somedata"
},
{
"_key" : 2,
"data" : "somedata"
},
{
"_key" : 3,
"data" : "somedata"
}
]
How to properly replace the aDataList[A.Key] with aDataList[A] values using a single aql query?
Any help would be appreciated
I have found an answer :)
FOR bDoc IN B
FILTER bDoc.key == “somekey”
LET finalData = ( FOR bDocItem IN bDoc.aDataList
FOR aDoc IN A
FILTER bDocItem[“_key”] == aDoc._key
RETURN aDoc)
RETURN { "_key" : bDoc.key, aDataList: finalData }
Instead of traversing keys of A , I traverse through the array. Thus order is preserved

complicated mongoose pull list of data from api and insert into mongodb if it doesn't already exist

I am connecting to the Yelp API using the RapidAPI module in Nodejs. I am able to request a token, connect, and request data, retrieve that data, and insert the relevant information for each result it into mongodb. Here's where it gets complicated...
Let's say I make a Yelp API request and search for bars. I get a list of bars and insert them into the database. Let's say one of these in the list is "Joe's Bar & Grill". One of the fields in my mongodb is "type" and it's an array. So now, this particular document will look something like this:
{
id: 'joes-bar-and-grill',
name: 'Joe\'s Bar & Grill',
type: ['bar']
}
But then I run another request on the Yelp API on "restaurants", and in this list "Joe's Bar & Grill" shows up again. Instead of inserting a new duplicate document into mongodb, I'd like the existing document to end up looking like this:
{
id: 'joes-bar-and-grill',
name: 'Joe\'s Bar & Grill',
type: ['bar', 'restaurant']
}
In addition to this, let's say I run another request again for "bars", and "Joe's Bar & Grill" comes up again. I don't want it to automatically insert "bar" into the type array again, if "bar" already exists in its array.
I've tried findOneAndUpdate with upsert: true and a $push of new data into the array, but I cannot get it to work at all. Does anyone have any ideas?
You can use findOneAndUpdate, combined with $addToSet (to make sure that an entry in the array only exists once) and $each (to allow passing arrays to $addToSet):
Bar.findOneAndUpdate({ id : 'joes-bar-and-grill' }, {
id : 'joes-bar-and-grill',
name : 'Joe\'s Bar & Grill',
$addToSet : { type : { $each : [ 'restaurant' ] } }
}, { upsert : true })
EDIT: now that you posted your entire code, the problem becomes more obvious.
For one, I'm not sure if the third and fourth arguments that you're passing to Location.update() make sense. As far as I know, the third should be an option object, and the fourth an async function.
Secondly, it looks like you're just ignoring any update errors.
And lastly, this isn't going to work:
for (var i = 0; i < payload.businesses.length; i++) { Location.update(...) }
Because Location.update() is asynchronous, the i variable will get clobbered (you should browse around on SO to find the explanation for that; for example, see this question).
You're going to need a library that will provide you with better async support, and preferably one that will also help limiting the number of update queries.
Once such library is async, and using it, your code would become something like this:
const async = require('async');
...
async.eachLimit(payload.businesses, 5, function(business, callback) {
Location.update({ yelpID : business.id }, {
name : business.name,
latitude : business.location.latitude,
longitude : business.location.longitude,
address1 : business.location.address1,
address2 : business.location.address2,
address3 : business.location.address3,
city : business.location.city,
state : business.location.state,
zip_code : business.location.zip_code,
country : business.location.country,
timezone : 'CST'
$addToSet : { type : 'bar' }
}, { upsert : true }, callback);
}, function(err) {
if (err) {
console.error(err);
} else {
console.log('All documents inserted');
}
});
You may use $addToSet operator
The $addToSet operator adds a value to an array unless the value is
already present, in which case $addToSet does nothing to that array.
$addToSet only ensures that there are no duplicate items added to the
set and does not affect existing duplicate elements. $addToSet does
not guarantee a particular ordering of elements in the modified set.
If the field is absent in the document to update, $addToSet creates
the array field with the specified value as its element.
If the field is not an array, the operation will fail.
The below solution assumes that on each update, you receive a single type and not an array. If the input document is an array itself, you may use robertklep's solution with $each operator
db.mycoll.update(
{ "id" : "joes-bar-and-grill" },
{
$set:{
name : 'Joe\'s Bar & Grill',
},
$addToSet : { type : 'restaurant' }
},
true, false);
I have also used $set operator.
The $set operator replaces the value of a field with the specified
value.
The $set operator expression has the following form:
{ $set: { field1: value1, ... } }
Here is the mongo shell output to explain it further :
> db.mycoll.find({ "id" : "joes-bar-and-grill" });
// NO RESULT
> db.mycoll.update(
... { "id" : "joes-bar-and-grill" },
... {
... $set:{
... name : 'Joe\'s Bar & Grill',
... },
... $addToSet : { type : 'restaurant' }
... },
... true, false);
WriteResult({
"nMatched" : 0,
"nUpserted" : 1,
"nModified" : 0,
"_id" : ObjectId("58e719b4d543c5e30d615d59")
})
// INSERTED A NEW DOCUMENT AS IT DOES NOT EXIST
> db.mycoll.find({ "id" : "joes-bar-and-grill" }); // FINDING THE OBJECT
{ "_id" : ObjectId("58e719b4d543c5e30d615d59"), "id" : "joes-bar-and-grill", "name" : "Joe's Bar & Grill", "type" : [ "restaurant" ] }
> db.mycoll.update(
... { "id" : "joes-bar-and-grill" },
... {
... $set:{
... name : 'Joe\'s Bar & Grill',
... },
... $addToSet : { type : 'bar' }
... },
... true, false);
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
// UPDATING THE DOCUMENT WITH NEW TYPE : "bar"
> db.mycoll.findOne({ "id" : "joes-bar-and-grill" });
{
"_id" : ObjectId("58e719b4d543c5e30d615d59"),
"id" : "joes-bar-and-grill",
"name" : "Joe's Bar & Grill",
"type" : [
"restaurant",
"bar"
]
}

Sort descending String with Number in Mongo DB

I have currently a DB with two fields. Only one of them matter for the purpose of this question.
Imagine a DB with a single String field (let's call it "Tags"), and the following pattern: [a-z]*[0-9]*, like:
test129
test130
some43
some44
some45
...
My application needs to generate new "Tags", given the prepend "identifier" (like test or some).
So let's say I input test as the prepend name, and 100 as the number of "Tags" to generate.
He finds the LAST tag with the prepended name test on it.
Parses the number after the prepended name.
Sum +1 on that number, and generate 100 tags with the sequence.
Output in this specific case, would be: test131, test132, ..., test230.
I implemented this, and it was working just great with Mongoose. However, when I tried to generate a "Tag" from a already existent Tag with more than 1000, I found that the first step was a flaw. It was returning, let's say test999 instead of test1200, and causing the iteration to start from 999, and getting errors since it needs to be unique.
This is because, sorting a String differs from sorting a Number. I know the problem, but how can I solve this in a simple way, without having to create extra fields?
UPDATE: Part of the code where I find the tag:
lastAliasNumber: function (next){
console.log('process.lastAliasNumber');
// Skip if prefix is not set (tags already have name)
if(!prefix) return next();
// Build RegExp to find tags with the prefix given
var regexp = new RegExp('^'+prefix+'[0-9]+$', 'i');
Models.Tag
.findOne()
.where({
alias: regexp
})
.sort('-alias')
.exec(function (err, tag){
if(err) return next(err);
// Remove prefix and try parsing number
var lastId = 100;
if(tag){
// Remove prefix
var number = tag.alias.toLowerCase().replace(prefix, '');
// Get number from it
number = parseInt(number);
if(number) lastId = number;
}
console.log('lastAliasNumber', lastId);
next(null, lastId);
});
},
There is no ready way to do this kind of sorting within MongoDB. As your field is a string field, it will be sorted by the rules of string sorting and there is no way to do variable type sorting on one field.
Your best bet (assuming you cannot simply use an integer type and wish to keep only one field) would be to work out the theoretical maximum number of entries and pad your strings with the relevant leading number of 0's accordingly.
EG. assuming a maximum of 1,000,000 entries your strings would be:
test000999
test001200
test000131
Another option would be to have these entries become whole subdocuments with two distinct datatypes.
Consider my quick example documents below
> db.bar.insert({x:{text:"test",num:1}})
WriteResult({ "nInserted" : 1 })
> db.bar.insert({x:{text:"test",num:100}})
WriteResult({ "nInserted" : 1 })
> db.bar.insert({x:{text:"test",num:2}})
WriteResult({ "nInserted" : 1 })
> db.bar.insert({x:{text:"sweet",num:2}})
WriteResult({ "nInserted" : 1 })
> db.bar.insert({x:{text:"sweet",num:1}})
WriteResult({ "nInserted" : 1 })
> db.bar.find().sort({x:1})
{ "_id" : ObjectId("55fa469d695632545d3aff1f"), "x" : { "text" : "sweet", "num" : 1 } }
{ "_id" : ObjectId("55fa469b695632545d3aff1e"), "x" : { "text" : "sweet", "num" : 2 } }
{ "_id" : ObjectId("55fa468a695632545d3aff1b"), "x" : { "text" : "test", "num" : 1 } }
{ "_id" : ObjectId("55fa4695695632545d3aff1d"), "x" : { "text" : "test", "num" : 2 } }
{ "_id" : ObjectId("55fa468f695632545d3aff1c"), "x" : { "text" : "test", "num" : 100 } }
> db.bar.find().sort({x:-1})
{ "_id" : ObjectId("55fa468f695632545d3aff1c"), "x" : { "text" : "test", "num" : 100 } }
{ "_id" : ObjectId("55fa4695695632545d3aff1d"), "x" : { "text" : "test", "num" : 2 } }
{ "_id" : ObjectId("55fa468a695632545d3aff1b"), "x" : { "text" : "test", "num" : 1 } }
{ "_id" : ObjectId("55fa469b695632545d3aff1e"), "x" : { "text" : "sweet", "num" : 2 } }
{ "_id" : ObjectId("55fa469d695632545d3aff1f"), "x" : { "text" : "sweet", "num" : 1 } }

Query for a list contained in another list in mongodb

I'm fairly new to mongo and while I can manage to do most basic operations with the $in, $or, $all, ect I can't make what I want to work.
I'll basically put a simple form of my problem. Part of my documents are list of number, eg :
{_id:1,list:[1,4,3,2]}
{_id:2,list:[1]}
{_id:3,list:[1,3,4,6]}
I want a query that given a list(lets call it L), would return me every document where their entire list is in L
for example with the given list L = [1,2,3,4,5] I want document with _id 1 and 2 to be returned. 3 musn't be returned since 6 isn't in L.
"$in" doesn't work because it would also return _id 3 and "$all" doesn't work either because it would only return _id 1.
I then thought of "$where" but I can't seem to find how to bound an external variable to the js code. What I call by that is that for example :
var L = [1,2,3,4,5];
db.collections('myCollection').find({$where:function(l){
// return something with the list "l" there
}.bind(null,list)})
I tried to bind list to the function as showed up there but to no avail ...
I'd glady appreciate any hint concerning this issue, thanks.
There's a related question Check if every element in array matches condition with an answer with a nice approach for this scenario. It refers to an array of embedded documents but can be adapted for your scenario like this:
db.list.find({
"list" : { $not : { $elemMatch : { $nin : [1,2,3,4,5] } } },
"list.0" : { $exists: true }
})
ie. the list must not have any element that is not in [1,2,3,4,5] and the list must exist with at least 1 element (assuming that's also a requirement).
You could try using the aggregation framework for this where you can make use of the set operators to achieve this, in particular you would need the $setIsSubset operator which returns true if all elements of the first set appear in the second set, including when the first set equals the second set; i.e. not a strict subset.
For example:
var L = [1,2,3,4,5];
db.collections('myCollection').aggregate([
{
"$project": {
"list": 1,
"isSubsetofL": {
"$setIsSubset": [ "$list", L ]
}
}
},
{
"$match": {
"isSubsetofL": true
}
}
])
Result:
/* 0 */
{
"result" : [
{
"_id" : 1,
"list" : [
1,
4,
3,
2
],
"isSubsetofL" : true
},
{
"_id" : 2,
"list" : [
1
],
"isSubsetofL" : true
}
],
"ok" : 1
}

How to query parent based on subdocument's _id?

consider the following records:
user record
{
"_id" : ObjectId("5234ccb7687ea597eabee677"),
"class" : [
{ "_id" : ObjectId("5234ccb7687ea597eabee671", "num" : 10, "color" : "blue" },
{ "_id" : ObjectId("5234ccb7687ea597eabee672", "num" : 100, "color" : "blue" }
]
}
this user has two class sub records, now I need a query that finds all users that have class property where "class._id" has a value of at least one users "class._id"
here is a more detail example:
suppose there is four user:
A:{_id:432645624232345,class:[{_id:123,name:'foo'}]}
B:{_id:432645624232555,class:[{_id:555,name:'foo'},{_id:123,name:'foo'}]}
C:{_id:432645344232345,class:[{_id:555,name:'foo'},{_id:111,name:'www'}]}
D:{_id:432644444232345,class:[{_id:222,name:'sss'},{_id:555,name:'www'},{_id:123,name:'foo'}]}
now if B login , I need to query all the user whose class subdocument contains at least one document which's _id==555 or _id==123 (555 and 123 come from B user), in this case the query result should be:
A:{_id:432645624232345,class:[{_id:123,name:'foo'}]} // match _id=123
B:{_id:432645624232555,class:[{_id:555,name:'foo'},{_id:123,name:'foo'}]} //match _id=123 and _id=555
C:{_id:432645344232345,class:[{_id:555,name:'foo'},{_id:111,name:'www'}]} //match _id=555
D:{_id:432644444232345,class:[{_id:222,name:'sss'},{_id:555,name:'www'},{_id:123,name:'foo'}]} ///match _id=123 and _id=555
which is all the user.
so far i get this:
{"class._id" : { $in : ["5234ccb7687ea597eabee671", "5234ccb7687ea597eabee672"] } }
but when different user login the class._id query condition is different. So is there any operator to do this
{"class._id" : { $in : req.user.class } }
hope I made myself clear.
In order to achieve what you want, first you must isolate the class _ids in an array, and then use it in the query argument.
var classIds = [];
var i = 0;
while (i < req.user.class.length) {
classIds.push(req.user.class[i]._id);
i++;
}
After that you can use classIds array in the query:
{"class._id" : { $in : classIds } }
The following query condition would give you all the users that have at least one class with id equal to any of the elements in the given array:
{"class._id" : { $in : ["5234ccb7687ea597eabee671", "5234ccb7687ea597eabee672"] } }
In the array for the $in clause you may provide any id's you needed , comma separated.
In addition, if you needed such, the below query condition should check for existence of nested document within "class" property that has a property "_id" :
{ "class._id" : { $exists : true } }
Both conditions should work no matter if "class._id" is a single-valued property or an array (mongo supports that).

Resources