How to compare documents with each other in MongoDB? - node.js

I have collection A with N numbers of documents.
My collection look slike this:
{
"_id": "61721b17e52d6033c444059d",
"advertising_venue": "GAP Store, 1440 W Taylor st",
"ad_shelf_name": "11",
"gender": "man",
"age": "25-35",
"distance_to_shelf": "7.035805",
"date": "October 21st 2021 8:59:51 pm",
"user_id": "0.14136775694578052"
},
{
"_id": "61721b18e52d6033c444059e",
"advertising_venue": "GAP Store, 1440 W Taylor st",
"ad_shelf_name": "11",
"gender": "man",
"age": "25-35",
"distance_to_shelf": "8.065434999999999",
"date": "October 21st 2021 8:59:52 pm",
"user_id": "0.14136775694578052"
},
{
"_id": "61721b19e52d6033c444059f",
"advertising_venue": "GAP Store, 1440 W Taylor st",
"ad_shelf_name": "11",
"gender": "man",
"age": "25-35",
"distance_to_shelf": "10.124695",
"date": "October 21st 2021 8:59:53 pm",
"user_id": "0.14136775694578052"
}
I want to compare each document by value user_id and if it is similar want to remove one of those documents, if it is not similar then it stays in collections as it is.
If is possible to do in MongoDB?

It can be achieved by creating unique index with dropDups:true on user_id.
db.collection.ensureIndex({user_id: 1}, {unique: true, dropDups:
true})

When you say
if it is similar
This has a particular meaning when talking about strings. If you want to delete all documents with identical user_id fields, that can be done.
If you want to delete all documents with almost the same, but a slightly different user_id, then no, that cannot be done with mongodb directly, and you will have to solve that another way.
Assuming you want to delete documents with identical user_id fields, you may want to consider which document you want to keep and which one you want to delete.
Assuming you want to keep only the first copy of each, you can do so by creating a unique index on the user_id field with the option dropDups set to true. Mongodb will then scan the collection on disk and index each user_id. As it comes across any documents which are duplicates it will delete them.
db.mycollection.ensureIndex({'user_id' : 1}, {unique : true, dropDups : true})
However, if you want to delete documents based on some other kind of logic, say you want to keep the newest document, or maybe the document with the lowest distance_to_shelf you will need to first query your data, sorting by the criteria which makes certain records more valuable, and then delete all documents with identical user_id fields that do not have the same _id.

Related

How can i get just one row where some column values are same in knexjs

I tried to get distinct values from my table
let records = db
.select("*")
.from("user_technical_skill")
.distinct('technical_skill_id')
in the user_technical_skill table i have for example
[{
"id": "84ed9c04-b1d3-4e69-b2d2-c569ad94545f",
"user_id": "5dfbf2cc-38f9-4388-a077-11480e62d893",
"technical_skill_id": "111",
"created_at": "2021-04-11T15:31:39.552Z",
"updated_at": "2021-04-11T15:31:39.552Z"
},
{
"id": "4b0fcdd6-cbab-4fdf-ada6-0154c7956630",
"user_id": "a3b91e2a-5d7e-4528-b496-3a0807299db7",
"technical_skill_id": "111",
"created_at": "2021-04-11T15:48:49.145Z",
"updated_at": "2021-04-11T15:48:49.145Z"
}]
two columns where technical_skill_id is 11
but it does not work. I get again the two columns in my query
How can i fix this ?
If you need all columns you can use group by instead of distinct().
let records = db
.select("*")
.from("user_technical_skill")
.groupBy('technical_skill_id')
otherwise if you need just user_technical_skill in the result you can use
let records = db.from("user_technical_skill").distinct('technical_skill_id')
you don't need a select("*") since distinct is like a select alternative.
Ref:
http://knexjs.org/#Builder-distinct
http://knexjs.org/#Builder-groupBy

i can't query over populated children attributes

I am trying to query over populated children attributes using mongoose but it straight up doesn't work and will return empty arrays all the time.
even hardcoding right and existing information as values for the query would return empty arrays.
my schema is a business schema with a 1 to 1 relationship with user schema via the attribute createdBy. the user schema has an attribute name which I am trying to query on.
so if I make a query like this :
business.find({'createdBy.name': {$regex:"steve"}}).populate('createdBy')
the above will never return any documents. although, without the find condition, everything works fine.
Can I search by the name inside a populated child or not? all tutorials say this should work fine but it just doesn't.
EDIT : an example of what the record looks like :
{
"_id": "5fddedd00e8a7e069085964f",
"status": 6,
"addInfo": "",
"descProduit": "",
"createdBy": {
"_id": "5f99b1bea9ba194dec3bd6aa",
"status": 1,
"fcmtokens": [
],
"emailVerified": 1,
"phoneVerified": 0,
"userType": "User",
"name": "steve buschemi",
"firstName": "steve",
"lastName": "buschemi",
"tel": "",
"email": "steve#buschemi.com",
"register_token": "747f1e1e8fa1ecd2f1797bb402563198",
"createdAt": "2020-10-28T18:00:30.814Z",
"updatedAt": "2020-12-18T13:52:07.430Z",
"__v": 19,
"business": "5f99b1e101bfff39a8259457",
"credit": 635,
},
"createdAt": "2020-12-19T12:10:57.703Z",
"updatedAt": "2020-12-19T12:11:16.538Z",
"__v": 0,
"nid": "187"
}
It seems there is no way to filter parent documents by conditions on child documents:
From the official documentation:
In general, there is no way to make populate() filter stories based on properties of the story's author. For example, the below query won't return any results, even though author is populated.
const story = await Story.
findOne({ 'author.name': 'Ian Fleming' }).
populate('author').
exec();
story; // null
If you want to filter stories by their author's name, you should use denormalization.

Cloudant Sorting on a nullable field

I want to sort on a field lets say name which is indexed in Cloudant DB. I am getting all the documents both which has this name field and which doesn't by using the index without sort . But when i try to sort with the name field I am not getting the documents which doesn't have this name field in the doc.
Is there any way to do this by using the query indexes. I want all the documents in sorted order which doesn't have the name field too.
For Example :
Below are some documents:
{
"_id": 1234,
"classId": "abc",
"name": "Happa"
}
{
"_id": 12345,
"classId": "abc",
"name": "Prasanth"
}
{
"_id": 123456,
"classId": "abc",
}
Below is the Query what i am trying to execute:
{
"selector": {
"classId": "abc",
"name" :{
"or" : [
{"$exists": true},{"$exists": false}
]
}
},
"sort": [{ "classId": "asc" }, { "name": "asc" }],
"use_index": "idx-classId_name"
},
I am expecting all the documents to be returned in a sorted order including the document which doesn't have that name field.
Your query makes no sense to me as it stands. You're requesting a listing of documents which either have, or don't have a specific field (meaning every document), and expecting to sort those on this field that may or may not exist. Such an order isn't defined out of the box.
I'd remove the name clause from the selector, sorting only on the classId field which appear in every document, and then do the secondary partial ordering on the client side, so you can decide how you intend to mix in the documents without the name field with those that have it.
Another solution is to use a view instead of a Cloudant Query index. I've not tested this, but hopefully the intent is clear:
function(doc) {
if (doc && doc.classId) {
var name = doc.name || "[notfound]";
emit(doc.classId+"-"+name, 1);
}
}
which will key the docs on "classId-name" and for docs with no name, a specified sentinel value.
Querying the view should return the documents lexicographically ordered on this compound key (which you can reverse with a query parameter if you wish).

How to delete specific object under document in DocumentDB?

I am using documentDB as backend for my project.
I have created a collection named ResellerCollection.
Under it I added document as Reseller with Id's assign to it.
Now under Reseller document I have added a list of Customer and now I wanted to delete customer of reseller document by specific Id.
My JSON generated in documentDB is as follow.
{
"id": "73386791-5895-4a56-9108-df4a773331fe",
"Name": "Nadeem",
"PrimaryContact": "1234",
"Address": "bhusari clny",
"City": "pune",
"State": "maharashtra",
"Country": "india",
"ZipCode": "222",
"Telephone": "45234343",
"Email": "abc#xyz.com",
"Website": "asdfsd.com",
"Customer": [
{
"id": "4acf3ca9-f9e4-4117-a471-7ce8f905baec",
"FullName": "Test Cust1",
"Company": "safds",
"JobTitle": "sadf",
"Email": "abcd#xyz.com",
"Address": "asdfsaf",
"City": "sdf",
"State": "sdf",
"Country": "sadf",
"ZipCode": "2343",
"Telephone": "45234343",
"MerchantID": "232",
"IdentificationNo": "2342343",
"IsActive": true,
"CustomerGroupID": "34",
"ResellerID": "73386791-5895-4a56-9108-df4a773331fe"
},
{
"id": "e0d6d099-3d5d-4776-9b84-14b7ae0b9911",
"FullName": "Test Cust2",
"Company": "safds",
"JobTitle": "sadf",
"Email": "abcd#xyz.com",
"Address": "asdfsaf",
"City": "sdf",
"State": "sdf",
"Country": "sadf",
"ZipCode": "2343",
"Telephone": "sadf",
"MerchantID": "232",
"IdentificationNo": "2342343",
"IsActive": true,
"CustomerGroupID": "34",
"ResellerID": "73386791-5895-4a56-9108-df4a773331fe"
}
],
"UserId": "f807f027-2e21-45b1-b786-e4d2b3d677cb",
"_rid": "+JBQAOQWHQENAAAAAAAAAA==",
"_self": "dbs/+JBQAA==/colls/+JBQAOQWHQE=/docs/+JBQAOQWHQENAAAAAAAAAA==/",
"_etag": "\"0a004764-0000-0000-0000-583bd8b50000\"",
"_attachments": "attachments/",
"_ts": 1480317104
}
Please suggest me how to write delete function for customer in MVC. Whether I should write delete for it to delete specific customer or whether I have to update the whole customer list?
Two ways of doing this.
Implement "AddCustomer" and "RemoveCustomer" stored procedures in DocumentDB. These stored procedures will read the reseller document, then append or remove the new customer passed in as argument. Then just call ExecuteStoredProcedureAsync within your controller.
Instead of the stored procedure approach, perform three steps within the controller. Read the document, make changes, and replace the document.
In both implementations, you'll probably want to use the ETag to perform a conditional write to avoid any clobbering across multiple clients/writers.
As Larry and David pointed out, you should also consider different JSON modeling options such as storing the customers data as separate documents, or by storing only the IDs within the array vs. all relevant customer data.
The only way to delete an element of an array (or any other change to the document) is to fetch the entire document, modify it, then create/upsert the entire document. You can do this client-side or in a stored procedure.
Try not to think of collections as tables in traditional databases or collections in MongoDB. I used to say, think of them as "partitions" but with partitioned collections, even that distinction is not useful. I use a single partitioned collection for everything now. I distinguish document types by having an element in each document: isReseller = true or isCustomer = true for your example. Alternatively, you can do type = 'Reseller' or type = 'Customer'. However, if the customer of one reseller is ever a reseller themselves, the former will allow you to add both is___ = true and the latter will not.
What David Makogon says is definitely a worry. Unless Resellers are restricted to a small number of Customers, you are better off storing them separately and having a foreign key link them. That way, deleting or adding one is a single step. However, getting the Reseller and all its Customers will be two round trips.

Elasticsearch delete/update a document in index1 and index2

If I have two index, Ex: sample1 and sample2.
If I delete or update a value in sample1 then the corresponding document should also deleted or updated in sample2?
Data : sample1 : {name: 'Tom', id: '1', city: 'xx', state, 'yy', country: 'zz'}
sample2 : {id: '1', city: 'xx', state, 'yy', country: 'zz'}
If I delete id: '1' then this document should be deleted from both the index from the server side itself. How to do this ?
The problem will be if I delete the values separatley then if I end up in network issue after deleting value from one index the other index will have values how to avoid this ?
You can use the bulk API for doing this and you'll have better guarantees that both delete/update operations succeed or fail since everything happens in a single network call:
For deleting both documents in two different indices:
POST _bulk
{"delete": {"_index": "index1", "_type": "type1", "_id": "1"}}
{"delete": {"_index": "index2", "_type": "type2", "_id": "1"}}
For updating both documents in two different indices:
POST _bulk
{"index": {"_index": "index1", "_type": "type1", "_id": "1"}}
{"name": "Tom", id: "1", "city": "xx", "state": "yy", "country": "zz"}
{"index": {"_index": "index2", "_type": "type2", "_id": "1"}}
{"id": "1", "city": "xx", "state": "yy", "country": "zz"}
UPDATE
After discussing this, it seemed the needed solution was a mix of using
the delete-by-query API (don't forget to install the plugin if you're on ES 2.x) for deleting documents matching a country in multiple indices
and the update-by-query API for updating documents in multiple indices.
There is no clean way to do this with elasticsearch. What you want/need feels like a transaction and that is not possible with elastic. What you could do is do a bulk request with the 2 queries to update/delete the item in there. Still you have to check the response of the bulk to see if both queries went well. The chances for one of them failing might be a little bit smaller.
I don't think you can do both at the same time, I mean deleting the same document in two different indexes.
But then deleting a document from an index could be done using the Delete By Query API by giving a matchin query so that the appropriate document could be deleted.
Source: Delete By Query API
Elasticsearch cannot guarantee you that it will do those two operations atomically, like a transaction in RDBs. I suggest looking at nested documents or parent/child relationships for what Elasticsearch understands by joined documents.
In this way, if you deleted the parent, the children will be deleted as well.

Resources