MongoDB Ensure Index not stopping Duplicates - node.js

I am trying to stop duplicates in my Mongo DB Collection but they are still getting in. I am reading data from twitter and storing it like:
var data = {
user_name: response[i].user.screen_name,
profile_image: response[i].user.profile_image_url,
content: {
text: response[i].text
},
id: response[i].id_str,
};
and I have the following to stop any duplicates:
db[collection].ensureIndex( { id: 1, "content.text": 1 }, { unique: true, dropDups: true } );
The id field is working and no duplicates appear but "content.text" field does not work and duplicates are appearing. Any Ideas why?

When you enforce a unique constraint on a composite index, two documents are considered same only if the documents have the same value for both id and context.text fields and not for either key individually.
To enforce unique constraints on the fields, id and context.text individually, You could enforce it as below:
db.col.ensureIndex({"id":1},{unique:true}) and similarly for the other field.

Related

How to avoid inserting duplicate values in the collection using db.insertMany() in MongoDB?

Query:
db.getCollection('parents').insertMany(
[{
'name': 'Mark',
'children': 'No Childs'
},{
'name': 'Carl',
'children': '2'
}], {
'ordered': true,
}
)
I'm using mongoose as ORM
Can someone please suggest if we have any options like unique: true?
I want the content to be checked not the _id since it will never be duplicate
Note: I've tried findOneAndUpdate using upsert: true, but here I've to insert multiple docs at the same time
As per the documentation if you have an unique index on a field (in your case name) and you are trying to insert multiple documents with insertMany you will get an exception the moment an error occurs:
Inserting a duplicate value for any key that is part of a unique
index, such as _id, throws an exception.
However if you set 'ordered': false the execution of the insertMany would not stop and you would have those documents which do not conflict with the unique index inserted:
With ordered to false, the insert operation would continue with any
remaining documents.
So setup an unique index on the name field and when you do insertMany only those records which violate that index would be skipped.
While defining the model schema you can specify which field is unique for example:
var userSchema = new mongoose.Schema(
{
FieldName:{
type: String,
unique:true
}
})
var UserModel = mongoose.Model('User',userSchema);
If you want just to add index there is
schema.index({FieldName1: 1, FieldName2: 1}, {unique: true});
Or more details can be found here

node.js to check out duplication value in mongoose

Now I'd like to save my json data into mongoose but the duplicate value had to be filtered.
my_json = [
{"name":"michael","age":21,"sports":"basketball"},
{"name":"nick","age":31,"sports":"golf"},
{"name":"joan","age":41,"sports":"soccer"},
{"name":"henry","age":51,"sports":"baseball"},
{"name":"joe","age":61,"sports":"dance"},
];
Database data is :
{
"name":"joan","age":41,"sports":"soccer"
}
Is there some specific method to avoid duplicate data insert to mongoose directly? It might be saved 4 of values except "joan" value.
Once I suppose to try to use "for statement", it was fine.
However I just want to make a simple code for that what could happen in a variety possible code.
for(var i = 0; i < my_json.length; i++){
// to check out duplicate value
db.json_model.count({"name":my_json[i].name}, function(err, cat){
if(cat.length == 0){
my_json_vo.savePost(function(err) {
});
}
})
};
As you see I need to use count method whether the value is duplicated or not. I don't want to use count method but make it more simple..
Could you give me an advice for that?
You can mark field as unique in mongoose schema:
var schema = new Schema({
name: {type: String, required: true, unique: true}
//...
});
Also, you can add unique index for name field into your database:
db.js_model.createIndex( {"name": 1}, { unique: true, background: true } );
then, if new entity with the same name will be asked to save - mongo won't save it, and respond an error.
In Addition to #Alex answer about adding unique key on the name field.
You can use insertMany() method with ordered parameter set to
false. Like this...
let my_json = [
{"name":"michael","age":21,"sports":"basketball"},
{"name":"nick","age":31,"sports":"golf"},
{"name":"joan","age":41,"sports":"soccer"},
{"name":"henry","age":51,"sports":"baseball"},
{"name":"joe","age":61,"sports":"dance"},
];
User.insertMany(my_json ,{ordered :false});
This query will successfully run and insert unique documents, And also
produces error later after successful insertion. So You will come to
know that there were duplicate records But now in the database, all
records are unique.
Reference InsertMany with ordered parameter

Dynamic Schema field in Mongoose

I have a Schema in which I'm storing a relationship between two users. Each of these relationships has user specific data. I'm curious as to if it's possible to do something along the lines of THIS:
{
users: Array,
users[0]: {
typing: Boolean,
last_checked: Date
},
users[1]: {
typing: Boolean,
last_checked: Date
}
}
Instead of having the information stored like so:
{
users: Array,
data: Array
}
and doing logic on the server to find the index, etc Like so:
entry.data[entry.users.indexOf(id)].typing
Basically just trying to find a decent way to store user based information for each user in the 2-person relationship. The most ideal situation to me would be to use the users _id as a key, but can you do that with Mongoose?
I propose you to create an array that's gonna contains user's data. Here I did restrained the size of the array to two relationships.
DataSchema = {
typing: Boolean,
last_checked: Date,
}
UserSchema = {
relationship: {
type: [DataSchema],
validate: [
() => val.length <= 2,
'{PATH} exceeds the limit of 2 relationship',
],
}
}
Access the data:
// User 1 data
entry.relationship[0]
// User 2 data
entry.relationship[1]
// User 1 _id you can use
entry.relationship[0]._id

How to update an index with new variables in Elasticsearch?

I have an index 'user' which has a mapping with field of "first", "last", and "email". The fields with names get indexed at one point, and then the field with the email gets indexed at a separate point. I want these indices to have the same id though, corresponding to one user_id parameter. So something like this:
function indexName(client, id, name) {
return client.update({
index: 'user',
type: 'text',
id: id,
body: {
first: name.first
last: name.last
}
})
}
function indexEmail(client, id, email) {
return client.update({
index: 'user',
type: 'text',
id: id,
body: {
email: email
}
})
}
When running:
indexName(client, "Jon", "Snow").then(indexEmail(client, "jonsnow#gmail.com"))
I get an error message saying that the document has not been created yet. How do I account for a document with a variable number of fields? And how do I create the index if it has not been created and then subsequently update it as I go?
The function you are using, client.update, updates part of a document. What you actually needs is to first create the document using the client.create function.
To create and index, you need the indices.create function.
About the variable number of fields in a document type, it is not a problem because Elastic Search support dynamic mapping. However, it would be advisable to provide a mapping when creating the index, and try to stick to it. Elastic Search default mapping can create you problems later on, e.g. analyzing uuids or email addresses which then become difficult (or impossible) to search and match.

Node.js and MongoDB if document exact match exists, ignore insert

I am maintaining a collection of unique values that has a companion collection that has instances of those values. The reason I have it that way is the companion collection has >10 million records where the unique values collection only add up to 100K and I use those values all over the place and do partial match lookups.
When I upload a csv file it is usually 10k to 500k records at a time that I insert into the companion collection. What is the best way to insert only values that dont already exist into the unique values collection?
Example:
//Insert large quantities of objects into mongo
var bulkInsert = [
{
name: "Some Name",
other: "zxy",
properties: "abc"
},
{
name: "Some Name",
other: "zxy",
properties: "abc"
},
{
name: "Other Name",
other: "zxy",
properties: "abc"
}]
//Need to insert only values that do not already exist in mongo unique values collection
var uniqueValues = [
{
name:"Some Name"
},
{
name:"Other Name"
}
]
EDIT
I tried creating a unique index on the field, but once it finds a duplicate in the Array of documents that I am inserting, it stops the whole process and doesnt proceed to check any values after the break.
Figured it out. If your doing it from the shell, you need to use Bulk() and create insert jobs like this:
var bulk = db.collection.initializeUnorderedBulkOp();
bulk.insert( { name: "1234567890a"} );
bulk.insert( { name: "1234567890b"} );
bulk.insert( { name: "1234567890"} );
bulk.execute();
and in node, the continueOnError flag works on a straight collection.insert()
collection.insert( [{name:"1234567890a"},{name:"1234567890c"}],{continueOnError:true}, function(err, doc){}
Well, I think the solution here is quite simple if I understand correctly your issue.
Since the process is stopped when it finds a duplicated field you should basically check if the value doesn't already exists before to try to add it.
So, for each element in uniqueValues, make a find/findOne query, if it doesn't return any result then add the element, otherwise don't.

Resources