Node.js and MongoDB if document exact match exists, ignore insert - node.js

I am maintaining a collection of unique values that has a companion collection that has instances of those values. The reason I have it that way is the companion collection has >10 million records where the unique values collection only add up to 100K and I use those values all over the place and do partial match lookups.
When I upload a csv file it is usually 10k to 500k records at a time that I insert into the companion collection. What is the best way to insert only values that dont already exist into the unique values collection?
Example:
//Insert large quantities of objects into mongo
var bulkInsert = [
{
name: "Some Name",
other: "zxy",
properties: "abc"
},
{
name: "Some Name",
other: "zxy",
properties: "abc"
},
{
name: "Other Name",
other: "zxy",
properties: "abc"
}]
//Need to insert only values that do not already exist in mongo unique values collection
var uniqueValues = [
{
name:"Some Name"
},
{
name:"Other Name"
}
]
EDIT
I tried creating a unique index on the field, but once it finds a duplicate in the Array of documents that I am inserting, it stops the whole process and doesnt proceed to check any values after the break.

Figured it out. If your doing it from the shell, you need to use Bulk() and create insert jobs like this:
var bulk = db.collection.initializeUnorderedBulkOp();
bulk.insert( { name: "1234567890a"} );
bulk.insert( { name: "1234567890b"} );
bulk.insert( { name: "1234567890"} );
bulk.execute();
and in node, the continueOnError flag works on a straight collection.insert()
collection.insert( [{name:"1234567890a"},{name:"1234567890c"}],{continueOnError:true}, function(err, doc){}

Well, I think the solution here is quite simple if I understand correctly your issue.
Since the process is stopped when it finds a duplicated field you should basically check if the value doesn't already exists before to try to add it.
So, for each element in uniqueValues, make a find/findOne query, if it doesn't return any result then add the element, otherwise don't.

Related

How do I use upsert and $addToSet in conjunction?

I would like to insert a document that looks like this:
{
name: "app1",
theArray: [...unique string elements...]
}
I tried to do an upsert using the query below but it somehow creates an array within an array if the document does not exist. If I use $push when the document does not exist, then the array is created fine. However I need to use $addToSet to maintain array element uniqueness.
Current query:
collection1.upsert({
name: "app1",
}, {
$addToSet: {
theArray: data // data is an array of ip addresses eg. ["1.1.1", "2.2.2.2"] which is not unique
}
});
Executing the above query when there is no existing document in the db creates:
{
name: "app-1",
theArray: [
//another array that contains the actual data.
]
}
Is there a way I can get this behavior with just a single query?

node.js to check out duplication value in mongoose

Now I'd like to save my json data into mongoose but the duplicate value had to be filtered.
my_json = [
{"name":"michael","age":21,"sports":"basketball"},
{"name":"nick","age":31,"sports":"golf"},
{"name":"joan","age":41,"sports":"soccer"},
{"name":"henry","age":51,"sports":"baseball"},
{"name":"joe","age":61,"sports":"dance"},
];
Database data is :
{
"name":"joan","age":41,"sports":"soccer"
}
Is there some specific method to avoid duplicate data insert to mongoose directly? It might be saved 4 of values except "joan" value.
Once I suppose to try to use "for statement", it was fine.
However I just want to make a simple code for that what could happen in a variety possible code.
for(var i = 0; i < my_json.length; i++){
// to check out duplicate value
db.json_model.count({"name":my_json[i].name}, function(err, cat){
if(cat.length == 0){
my_json_vo.savePost(function(err) {
});
}
})
};
As you see I need to use count method whether the value is duplicated or not. I don't want to use count method but make it more simple..
Could you give me an advice for that?
You can mark field as unique in mongoose schema:
var schema = new Schema({
name: {type: String, required: true, unique: true}
//...
});
Also, you can add unique index for name field into your database:
db.js_model.createIndex( {"name": 1}, { unique: true, background: true } );
then, if new entity with the same name will be asked to save - mongo won't save it, and respond an error.
In Addition to #Alex answer about adding unique key on the name field.
You can use insertMany() method with ordered parameter set to
false. Like this...
let my_json = [
{"name":"michael","age":21,"sports":"basketball"},
{"name":"nick","age":31,"sports":"golf"},
{"name":"joan","age":41,"sports":"soccer"},
{"name":"henry","age":51,"sports":"baseball"},
{"name":"joe","age":61,"sports":"dance"},
];
User.insertMany(my_json ,{ordered :false});
This query will successfully run and insert unique documents, And also
produces error later after successful insertion. So You will come to
know that there were duplicate records But now in the database, all
records are unique.
Reference InsertMany with ordered parameter

Updating Mongo, but not overwrite current data, just append to it

I need to add to a list of items in Mongo so if I have
items:{item: "apple"}
what would I use to add another item in an object instead of changing that initial object? So I can end up with.
items: {item: "apple"},{item:"orange"},{item:"blueberry"}
Can I use findOneAndUpdate? Or will this over-write the original data. I am having a hard time finding the distinction in the documents.
In closing, what method is used for updating and overwriting and what is used for appending to objects and arrays?
You can use the $addToSet operator.
For example:
db.yourCollection.update(
{ _id: 1 },
{ $addToSet: { items: {item : "orange" } } }
)
The code abode will add the item : "orange" to the items list of the document with id=1

MongoDB save entries the other way round

I use NodeJS and I have a MongoDB collection with a lot of entries. 99% of time the last entry is selected, sometimes the entry before. Since MongoDB has to get through all entries one by one, it would be more useful to sort the entries the other way round:
Instead of this:
{
_id: "foo",
name: "name"
},
{
_id: "bar",
name: "name"
}
// <- new entry will be inserted here
I want to use this:
// <- new entry will be inserted here
{
_id: "foo",
name: "name"
},
{
_id: "bar",
name: "name"
},
So that in most cases the entry I search for is the first or the second item.
Is that possible or even necessary (does it make any difference in speed)?
I could also reverse the items and then iterate through them, but I don't think that this would be faster.
You schould not be concerned about position of the item in the collection. Each collection has an index on _id field so if you sort your collection by this field and take first (second or third) element you will get in in no time ( porbably 0 ms)

MongoDB Ensure Index not stopping Duplicates

I am trying to stop duplicates in my Mongo DB Collection but they are still getting in. I am reading data from twitter and storing it like:
var data = {
user_name: response[i].user.screen_name,
profile_image: response[i].user.profile_image_url,
content: {
text: response[i].text
},
id: response[i].id_str,
};
and I have the following to stop any duplicates:
db[collection].ensureIndex( { id: 1, "content.text": 1 }, { unique: true, dropDups: true } );
The id field is working and no duplicates appear but "content.text" field does not work and duplicates are appearing. Any Ideas why?
When you enforce a unique constraint on a composite index, two documents are considered same only if the documents have the same value for both id and context.text fields and not for either key individually.
To enforce unique constraints on the fields, id and context.text individually, You could enforce it as below:
db.col.ensureIndex({"id":1},{unique:true}) and similarly for the other field.

Resources