Elasticsearch delete/update a document in index1 and index2 - search

If I have two index, Ex: sample1 and sample2.
If I delete or update a value in sample1 then the corresponding document should also deleted or updated in sample2?
Data : sample1 : {name: 'Tom', id: '1', city: 'xx', state, 'yy', country: 'zz'}
sample2 : {id: '1', city: 'xx', state, 'yy', country: 'zz'}
If I delete id: '1' then this document should be deleted from both the index from the server side itself. How to do this ?
The problem will be if I delete the values separatley then if I end up in network issue after deleting value from one index the other index will have values how to avoid this ?

You can use the bulk API for doing this and you'll have better guarantees that both delete/update operations succeed or fail since everything happens in a single network call:
For deleting both documents in two different indices:
POST _bulk
{"delete": {"_index": "index1", "_type": "type1", "_id": "1"}}
{"delete": {"_index": "index2", "_type": "type2", "_id": "1"}}
For updating both documents in two different indices:
POST _bulk
{"index": {"_index": "index1", "_type": "type1", "_id": "1"}}
{"name": "Tom", id: "1", "city": "xx", "state": "yy", "country": "zz"}
{"index": {"_index": "index2", "_type": "type2", "_id": "1"}}
{"id": "1", "city": "xx", "state": "yy", "country": "zz"}
UPDATE
After discussing this, it seemed the needed solution was a mix of using
the delete-by-query API (don't forget to install the plugin if you're on ES 2.x) for deleting documents matching a country in multiple indices
and the update-by-query API for updating documents in multiple indices.

There is no clean way to do this with elasticsearch. What you want/need feels like a transaction and that is not possible with elastic. What you could do is do a bulk request with the 2 queries to update/delete the item in there. Still you have to check the response of the bulk to see if both queries went well. The chances for one of them failing might be a little bit smaller.

I don't think you can do both at the same time, I mean deleting the same document in two different indexes.
But then deleting a document from an index could be done using the Delete By Query API by giving a matchin query so that the appropriate document could be deleted.
Source: Delete By Query API

Elasticsearch cannot guarantee you that it will do those two operations atomically, like a transaction in RDBs. I suggest looking at nested documents or parent/child relationships for what Elasticsearch understands by joined documents.
In this way, if you deleted the parent, the children will be deleted as well.

Related

How to prevent entering duplicate data in Cosmos DB?

I have a container with id as partition key. Based on some condition, I do not want to enter duplicate data in my container, but I am not sure how to do that in Cosmos. i.e., I tried to create unique keys, but that didn't help me.
Condition:
Record will be duplicate if name + addresses[].city + addresses[].state + addresses[].zipCode are same.
Document:
{
"isActive": false,
"id": "d94d7350-8a5c-4300-b4e4-d4528627ffbe",
"name": "test name",
"addresses": [
{
"address1": "718 Old Greenville Rd",
"address2": "",
"city": "Montrose",
"state": "PA",
"zipCode": "18801",
"audit": {}
}
]
}
Findings:
Per https://stackoverflow.com/a/61317715, unique keys cannot include arrays. Unfortunately, I cannot change the document structure. So unique key approach is not the option.
Questions:
Do I need to change partition key? Not sure if I can have /id#name (or something like that) in Cosmos like Dynamo?
Is there any other way of handling this at DB level?
As a last resort, I can add the logic in my code to do this but that would be expensive in terms of RU/s.

How to compare documents with each other in MongoDB?

I have collection A with N numbers of documents.
My collection look slike this:
{
"_id": "61721b17e52d6033c444059d",
"advertising_venue": "GAP Store, 1440 W Taylor st",
"ad_shelf_name": "11",
"gender": "man",
"age": "25-35",
"distance_to_shelf": "7.035805",
"date": "October 21st 2021 8:59:51 pm",
"user_id": "0.14136775694578052"
},
{
"_id": "61721b18e52d6033c444059e",
"advertising_venue": "GAP Store, 1440 W Taylor st",
"ad_shelf_name": "11",
"gender": "man",
"age": "25-35",
"distance_to_shelf": "8.065434999999999",
"date": "October 21st 2021 8:59:52 pm",
"user_id": "0.14136775694578052"
},
{
"_id": "61721b19e52d6033c444059f",
"advertising_venue": "GAP Store, 1440 W Taylor st",
"ad_shelf_name": "11",
"gender": "man",
"age": "25-35",
"distance_to_shelf": "10.124695",
"date": "October 21st 2021 8:59:53 pm",
"user_id": "0.14136775694578052"
}
I want to compare each document by value user_id and if it is similar want to remove one of those documents, if it is not similar then it stays in collections as it is.
If is possible to do in MongoDB?
It can be achieved by creating unique index with dropDups:true on user_id.
db.collection.ensureIndex({user_id: 1}, {unique: true, dropDups:
true})
When you say
if it is similar
This has a particular meaning when talking about strings. If you want to delete all documents with identical user_id fields, that can be done.
If you want to delete all documents with almost the same, but a slightly different user_id, then no, that cannot be done with mongodb directly, and you will have to solve that another way.
Assuming you want to delete documents with identical user_id fields, you may want to consider which document you want to keep and which one you want to delete.
Assuming you want to keep only the first copy of each, you can do so by creating a unique index on the user_id field with the option dropDups set to true. Mongodb will then scan the collection on disk and index each user_id. As it comes across any documents which are duplicates it will delete them.
db.mycollection.ensureIndex({'user_id' : 1}, {unique : true, dropDups : true})
However, if you want to delete documents based on some other kind of logic, say you want to keep the newest document, or maybe the document with the lowest distance_to_shelf you will need to first query your data, sorting by the criteria which makes certain records more valuable, and then delete all documents with identical user_id fields that do not have the same _id.

How to delete specific object under document in DocumentDB?

I am using documentDB as backend for my project.
I have created a collection named ResellerCollection.
Under it I added document as Reseller with Id's assign to it.
Now under Reseller document I have added a list of Customer and now I wanted to delete customer of reseller document by specific Id.
My JSON generated in documentDB is as follow.
{
"id": "73386791-5895-4a56-9108-df4a773331fe",
"Name": "Nadeem",
"PrimaryContact": "1234",
"Address": "bhusari clny",
"City": "pune",
"State": "maharashtra",
"Country": "india",
"ZipCode": "222",
"Telephone": "45234343",
"Email": "abc#xyz.com",
"Website": "asdfsd.com",
"Customer": [
{
"id": "4acf3ca9-f9e4-4117-a471-7ce8f905baec",
"FullName": "Test Cust1",
"Company": "safds",
"JobTitle": "sadf",
"Email": "abcd#xyz.com",
"Address": "asdfsaf",
"City": "sdf",
"State": "sdf",
"Country": "sadf",
"ZipCode": "2343",
"Telephone": "45234343",
"MerchantID": "232",
"IdentificationNo": "2342343",
"IsActive": true,
"CustomerGroupID": "34",
"ResellerID": "73386791-5895-4a56-9108-df4a773331fe"
},
{
"id": "e0d6d099-3d5d-4776-9b84-14b7ae0b9911",
"FullName": "Test Cust2",
"Company": "safds",
"JobTitle": "sadf",
"Email": "abcd#xyz.com",
"Address": "asdfsaf",
"City": "sdf",
"State": "sdf",
"Country": "sadf",
"ZipCode": "2343",
"Telephone": "sadf",
"MerchantID": "232",
"IdentificationNo": "2342343",
"IsActive": true,
"CustomerGroupID": "34",
"ResellerID": "73386791-5895-4a56-9108-df4a773331fe"
}
],
"UserId": "f807f027-2e21-45b1-b786-e4d2b3d677cb",
"_rid": "+JBQAOQWHQENAAAAAAAAAA==",
"_self": "dbs/+JBQAA==/colls/+JBQAOQWHQE=/docs/+JBQAOQWHQENAAAAAAAAAA==/",
"_etag": "\"0a004764-0000-0000-0000-583bd8b50000\"",
"_attachments": "attachments/",
"_ts": 1480317104
}
Please suggest me how to write delete function for customer in MVC. Whether I should write delete for it to delete specific customer or whether I have to update the whole customer list?
Two ways of doing this.
Implement "AddCustomer" and "RemoveCustomer" stored procedures in DocumentDB. These stored procedures will read the reseller document, then append or remove the new customer passed in as argument. Then just call ExecuteStoredProcedureAsync within your controller.
Instead of the stored procedure approach, perform three steps within the controller. Read the document, make changes, and replace the document.
In both implementations, you'll probably want to use the ETag to perform a conditional write to avoid any clobbering across multiple clients/writers.
As Larry and David pointed out, you should also consider different JSON modeling options such as storing the customers data as separate documents, or by storing only the IDs within the array vs. all relevant customer data.
The only way to delete an element of an array (or any other change to the document) is to fetch the entire document, modify it, then create/upsert the entire document. You can do this client-side or in a stored procedure.
Try not to think of collections as tables in traditional databases or collections in MongoDB. I used to say, think of them as "partitions" but with partitioned collections, even that distinction is not useful. I use a single partitioned collection for everything now. I distinguish document types by having an element in each document: isReseller = true or isCustomer = true for your example. Alternatively, you can do type = 'Reseller' or type = 'Customer'. However, if the customer of one reseller is ever a reseller themselves, the former will allow you to add both is___ = true and the latter will not.
What David Makogon says is definitely a worry. Unless Resellers are restricted to a small number of Customers, you are better off storing them separately and having a foreign key link them. That way, deleting or adding one is a single step. However, getting the Reseller and all its Customers will be two round trips.

What's best practice "joining" a bunch of values in mongoose/mongodb without populate

Let me start off by stating that I'm aware of the populate method that mongoose offers, but since my work has decided to move to native mongodb drivers in the future, I can no longer rely on populate to avoid work for myself latter on.
If I have two collections of Documents
People
{_id:1, name:Austin}
{_id:2, name:Doug}
{_id:3, name:Nick}
{_id:4, name:Austin}
Hobbies:
{Person: 1, Hobby: Cars}
{Person:1, Hobby: Boats}
{Person:3, Hobby: Chess}
{Person:4, Hobby: Cars}
How should I go about joining each document in people with Hobbies. Ideally I would prefer to only have to call the database twice once to get the people and the second time to get the hobbies, and then return to the client app objects with them joined toeghter.
It depends on what is your primary concern. Generally, I would say to embed the hobbies into the People, like:
{
"_id":1,
"name":"Austin",
"hobbies": [
"Cars","Boats"
]
},
{
"_id":2,
"name":"Doug",
"hobbies": []
},
{
"_id":3,
"name":"Nick",
"hobbies": [
"Chess"
]
},
{
"_id":4,
"name":"Austin",
"hobbies": [
"Cars"
]
}
which would give you the possibility of using a multi keyed index on hobbies and allow queries like this:
db.daCollection.find({"hobbies":"Cars"})
which would return both Austins as complete documents. Yes, I know that there would be a lot of redundant entries. If you would try to prevent that, could model it like this:
{
"_id": 1,
"name":"Cars"
},...
{
"_id":1,
"name":"Austin",
"hobbies": [
1, ...
]
}
which would need an additional index on the name field of the hobby to be efficient. So when you would want to find every person which is into cars, you would need to find the _id and query for it like
db.person.find({"hobbies":1})
I think it is easier, more intuitive and for most use cases faster if you use the embedding.

View with geospatial and non geospatial keys with CouchDB

I'm using CouchDB and GeoCouch and I'm trying to understand if it were possible to build a geospatial index and "query" the database both by using a location and a value from another field.
Data
{
"_id": "1",
"profession": "medic",
"location": [15.12, 30.22]
}
{
"_id": "2",
"profession": "secretary",
"location": [15.12, 30.22]
}
{
"_id": "3",
"profession": "clown",
"location": [27.12, 2.2]
}
Questions
Is there any way to perform the following queries on these documents:
Find all documents with job = "medic" near location [15.12, 30.22] (more important)
List all the different professions near this location [15.12, 30.22] (a plus)
In case that's not possible, what options do I have? I'm already considering switching to MongoDB, but I'd rather solve in a different way.
Notes
Data changes quickly, new documents might be added and many might be removed
References
Faceted search with geo-index using CouchDB

Resources