How to clear a Cosmos (MongoAPI) collection using Azure CLI command? - azure

I want to clear all my data before new data upload on regular interval. So I need to clear the data only. I don't want to delete and recreate my collections for the same.

There is no way to delete all documents in a collection, in a single operation. You would either need to do this on your own via whatever method you want, for enumerating and deleting documents. Alternatively, you can drop and re-add a collection, which should be significantly more efficient than deleting documents individually, and won't have an RU-based throttle.
One other alternative approach: Configure TTL on your documents. With a bit of creativity here, you could target the TTL on each of your added documents to expire around the same time, effectively resulting in an automatic document-deletion mechanism.

Related

does cosmosdb update delete record even only single field change

I am trying to understand how cosmosdb udpate works? In cosmosdb, there is a upsert operation to update or insert depending on whether the item exists in container or not. usually the flow is like this:
record = client.read_item(id, partition_key)
record['one_field'] = 'new_value'
client.upsert(record)
My doubt here is whether such update operation will delete the original record even only a singe field is changed? If that is the case, then update becomes expensive is the record is large in size. Is my understanding correct here?
Cosmos DB updates a document by replacing it, not by in-place update.
If you query (or read) a document, and then update some properties, you would then replace the document. Or, as you've done, call upsert() (which is similar to a replace, except that it will create a new document if the specified partition+id doesn't exist already).
The notion of "expensive" is not exactly easy to quantify; look at the returned headers to see the RU charge for a given upsert/replace, to determine the overall cost, and whether you'll need to scale your RU/sec setting based on overall usage patterns.

Deleting _deleted documents on CouchDB by date

My CouchDb database is getting bigger and I would like to remove documents by date also I would like to remove _deleted documents by date
I know how to replicate my DB removing documents by date but:
¿Is there a way to do the same with _deleted documents? I mean remove _deleted documents by date
There's not really a way to conditionally cause a deletion using filtered replication, nor can you replicate a complete removal of a document.
You have a variety of options:
you can avoid replicating updates on old documents by filtering on date, but if they have already been replicated they won't be deleted
you can make a view to return old documents, and use a script to delete them at the source database. The deletions will replicate to any target databases, but all database will retain at least a {_deleted:true} tombstone of the documents [that's how the deletion gets replicated in the first place]
you can find old documents and _purge them, but you'll have to do that on each replica
What is your main goal?
If you have hundreds of objects and you want to hide the old ones from the UI of all replicas, write a script to find and DELETE/_delete:true them from in a source/master replica and the changes will propagate.
If you have bazillions of e.g. log messages and you need to free up space by forgetting old ones, write a script to find and _purge and finally _compact, then run it on every replica. But for a case like that, it might be better to rotate databases instead, e.g. manually "shard" or bin into a different database each week, and every week simply drop the N+1 weeks old database on each replica.
If your database is getting bigger, this is probably due to the versionning of your documents. A simple way to free some space is to run database compaction (Documentation)
As for _deleted documents, you can only REALLY delete them by purging
Therefore, it's not recommended to purge _deleted documents. It should only be done to remove very important files such as credentials.

rename collection vs updating collection

I have a mongo DB which i need to update daily(delete non relevant documents and add new ones).
the DB is not sharded.
I take the data from an external data master which is not so easy to work with.
There are 2 options:
1. reingest the entire DB (not so big) into a temp collection and then rename it to old collection name (with dropTarget set to true)
2. do the hard work myself, delete the old entires, and figure out from the data master which new documents are relavant and insert them to the DB
option 1 is prefrable obviously but what is the impact? I'm doing this maintenance in a late hour but I don't want the users to get errors when querying the DB during the rename process.
Is using rename to overwrite a collection a standard way to get things done or am I abusing the API ? :)
According to the documentation renameCollection blocks all database activity for the duration of the operation. If your users have set a sufficiently large time out , they will not directly be affected by this rename operation, however, as the dataset can change under their feet there might be side effects. For example, renaming a collection can invalidate open cursors which interrupts queries that are currently returning data.
Regarding renaming of collections in production, personally I would avoid this where possible, firstly because of the cursor issue above, but more importantly because an incomplete renameCollection operation can leave the target collection in an unusable state and require manual intervention to clean up. Instead I would use an update with upsert:true that overwrites the entire document or inserts a new record if it doesn't exist.

Can ElasticSearch delete all and insert new documents in a single query?

I'd like to swap out all documents for a specific index's type. I'm thinking about this like a database transaction, where I'd:
Delete all documents inside of the type
Create new documents
Commit
It appears that this is possible with ElasticSearch's bulk API, but is there a more direct way?
Based on the following statement, from the elasticsearch Delete by Query API Documentation:
Note, delete by query bypasses versioning support. Also, it is not recommended to delete "large chunks of the data in an index", many times, it’s better to simply reindex into a new index.
You might want to reconsider removing entire types and recreating them from the same index. As this statement suggests, it is better to simply reindex. In fact I have a scenario where we have an index of manufacturer products and when a manufacturer sends an updated list of products, we load the new data into our persistent store and then completely rebuild the entire index. I have implemented the use of Index Aliases to allow for masking the actual index being used. When products changes occur a process is started to rebuild the new index in the background (a process that currently takes about 15 minutes) and then switch the alias to the new index once the data load is complete and delete the old index. So this is completely seamless and does not cause any downtime for our users.

Delete all documents in a CouchDB database *except* the design documents

Is it possible to delete all documents in a couchdb database, except design documents, without creating a specific view for that?
My first approach has been to access the _all_docs standard view, and discard those documents starting with _design. This works but, for large databases, is too slow, since the documents need to be requested from the database (in order to get the document revision) one at a time.
If this is the only valid approach, I think it is much more practical to delete the complete database, and create it from scratch inserting the design documents again.
I can think of a couple of ideas.
Use _all_docs
You do not need to fetch all the documents, only the ID and revisions. By default, that is all that _all_docs returns. You can make a pretty big request in a batch (10k or 100k docs at a time should be fine).
Replicate then delete
You could use an _all_docs query to get the IDs of all design documents.
GET /db/_all_docs?startkey="_design/"&endkey="_design0"
Then replicate them somewhere temporary.
POST /_replicator
{ "source":"db", "target":"db_ddocs", "create_target":true
, "user_ctx": {"roles":["_admin"]}
, "doc_ids": ["_design/ddoc_1", "_design/ddoc_2", "etc..."]
}
Now you can just delete the original database and replicate the temporary one back by swapping the "source" and "target" values.
Deleting vs "deleting"
Note, these are really apples vs. oranges techniques. By deleting a database, you are wiping out the edit history of all its documents. In other words, you cannot replicate those deletion events to any other database. When you "delete" a document in CouchDB, it stores a record of that deletion. If you replicate that database, those deletions will be reflected in the target. (CouchDB stores "tombstones" indicating the document ID, its revision history, and its deleted state.)
That may or may not be important to you. The first idea is probably considered more "correct" however I can see the value of the second. You can visualize the entire program to accomplish this in your head. It's only a few queries and you're done. No looping through _all_docs batches, no headache. Your specific situation will probably make it obvious which is better.
Install couchapp, pull down the design doc to your hard disk, delete the db in futon, push the design doc back up to your recreated database. =)
You could write a shell script that goes through the list of all documents and deletes them all one by one except design docs. Apparently couch-batch can do that. Note that you don't need to fetch the whole docs to do that, just the id and revision.
Other than that, I think filtered replication (or the replication proposed by JasonSmith) is your best bet.

Resources