I have many snapshot restorations in my instance. Due to which I can see there are orphan records (with Company ID in negative).
It’s on many tables, not just this one. Is there a process or best practice to clean this out? Or do we go table by table to run delete scripts to records that are not valid companyid.
Please suggest.
You can use the pp_DeleteCompany stored procedure to delete the orphan snapshots. Example: EXEC pp_DeleteCompany -1234567 (assuming the CompanyID is -1234567)
Related
Say I have a database with 100 records, each with 1000 revisions, and an additional 100,000 deleted documents each with an extensive history of revisions. In addition we also have a view document and some mango indexes.
For this hypothetical situation let's assume I can't delete and rebuild the database. Also replication safety is not a concern.
If I am required to create some kind of script utilizing curl to purge the database of all unused data so that the result of running this script is exactly the same as deleting and rebuilding the database with only 100 records with a single revision on-file, how should I go about doing this?
For your hypothetical situation, you could do the following:
Make a backup of the 100 required documents
Delete all documents in the DB
Use the Purge API to delete the revision history
Re-Create the 100 required documents
A safer approach for saving disk space and BTree size in a real-life scenario would be:
Properly configure CouchDB's compaction settings to not include too many revisions
Only purge documents that won't ever be modified again in the future.
If I deleted a value in a cell from a table, but later, I want to recover it back, how can I do it?
I know that, cassandra doesn’t really delete data, it just mark it as deleted, so how can I recover the data?
Usecase for example: I want to delete all information from a user, so I first delete the information in cassandra database, then, I try to delete his information in somewhere else, but it comes to an error, so I have to stop the deletion process and recover the deleted data from cassandra database.
How can I do that?
Unfortunately not. You could however use sstabledump (Cassandra >= 3.0) to inspect sstable contents, but there are some drawbacks:
if the data was not flushed to disk (thus being in the memtable) it will be deleted before reaching to sstable
you need to find the sstable that the data belongs to
Probably there are some other drawbacks that I miss right now.
Some workarounds
first copy the data to another table and then perform the delete. After you delete the information from the other location, you can safely delete it from your backup table.
a new column ("pending_delete") where you would record the state. You would only query for your "live" data.
a new table where you would store the pk of the data to be deleted and delete it from the both tables after the operation on the other location is successful.
Choosing the right solution I guess depends on your use case and the size of your data.
My CouchDb database is getting bigger and I would like to remove documents by date also I would like to remove _deleted documents by date
I know how to replicate my DB removing documents by date but:
¿Is there a way to do the same with _deleted documents? I mean remove _deleted documents by date
There's not really a way to conditionally cause a deletion using filtered replication, nor can you replicate a complete removal of a document.
You have a variety of options:
you can avoid replicating updates on old documents by filtering on date, but if they have already been replicated they won't be deleted
you can make a view to return old documents, and use a script to delete them at the source database. The deletions will replicate to any target databases, but all database will retain at least a {_deleted:true} tombstone of the documents [that's how the deletion gets replicated in the first place]
you can find old documents and _purge them, but you'll have to do that on each replica
What is your main goal?
If you have hundreds of objects and you want to hide the old ones from the UI of all replicas, write a script to find and DELETE/_delete:true them from in a source/master replica and the changes will propagate.
If you have bazillions of e.g. log messages and you need to free up space by forgetting old ones, write a script to find and _purge and finally _compact, then run it on every replica. But for a case like that, it might be better to rotate databases instead, e.g. manually "shard" or bin into a different database each week, and every week simply drop the N+1 weeks old database on each replica.
If your database is getting bigger, this is probably due to the versionning of your documents. A simple way to free some space is to run database compaction (Documentation)
As for _deleted documents, you can only REALLY delete them by purging
Therefore, it's not recommended to purge _deleted documents. It should only be done to remove very important files such as credentials.
I recently was asked by one of my Customers if there was a method to clean out records with the "DeletedDatabaseRecord" flagged.
They are in the process of implementing a new base company and have done several import/delete/import/delete of key records which has resulted in quite a few of these that they'd prefer not carry over to their actual live company.
Looking through the system i didn't see a build in method to clear these records out.
Is there a method of purging these records that is part of the system, be it from the ERP Configuration tools, stored procedures, or in the interface itself?
Jeff,
No, there is no special functionality to remove records flagged as DeletedDatabaseRecord, but you may always use a simple SQL script to loop over all the tables that have this column and remove from each of them the records that have it set to 1.
Looking in my keyspace directory I see several versions of most of my tables. I am assuming this is because I dropped them at some point and recreated them as I was refining the schema.
table1-b3441432142142sdf02328914104803190
table1-ba234143018dssd810412asdfsf2498041
These created tables names are very cumbersome to work with. Try changing to one of the directories without copy pasting the directory name from the terminal window... Painful. So easy to mistype something.
That side note aside, how do I tell which directory is the most current version of the table? Can I automatically delete the old versions? I am not clear if these are considered snapshots or not since each directory also can contain snapshots. I read in another post you can stop autosnapshot, but I'm not sure I want that. I'd rather just automatically delete any tables not being currently used (i.e.: that are not the latest version).
I stumbled across this trying to do a backup. I realized I am forced go to every table directory and copy out the snapshot files (there are like 50 directories..not including all the old table versions) which seems like a terrible design (maybe I'm missing something??).
I assumed I could do a snapshot of the whole keyspace and get one file back or at least output all the files to a single directory that represents the snapshot of the entire keyspace. At the very least it would be nice knowing what the current versions are so I can grab the correct files and offload them to storage somewhere.
DataStax Enterprise has a backup feature but it only supports AWS and I am using Azure.
So to clarify:
How do I automatically delete old table versions and know which is
the current version?
How can I backup the most recent versions of the tables and output the files to a single directory that I can offload somewhere? I only have two nodes, so simply relying on the repair is not a good option for me if a node goes down.
You can see the active version of a table by looking in the system keyspace and checking the cf_id field. For example, to see the version for a table in the 'test' keyspace with table name 'temp', you could do this:
cqlsh> SELECT cf_id FROM system.schema_columnfamilies WHERE keyspace_name='test' AND columnfamily_name='temp' allow filtering;
cf_id
--------------------------------------
d8ea9830-20e9-11e5-afc0-c381f961c62a
As far as I know, it is safe to delete (rm -r) outdated table version directories that are no longer active. I imagine they don't delete them automatically so that you can recover the data if you dropped them by mistake. I don't know of a way to have them removed automatically even if auto snapshot is disabled.
I don't think there is a command to write all the snapshot files to a single directory. According to the documentation on snapshot, "After the snapshot is complete, you can move the backup files to another location if needed, or you can leave them in place." So it's left up to the application developer how they want to handle archiving the snapshot files.