Cassandra - Delete Old Versions of Tables and Backup Database - linux

Looking in my keyspace directory I see several versions of most of my tables. I am assuming this is because I dropped them at some point and recreated them as I was refining the schema.
table1-b3441432142142sdf02328914104803190
table1-ba234143018dssd810412asdfsf2498041
These created tables names are very cumbersome to work with. Try changing to one of the directories without copy pasting the directory name from the terminal window... Painful. So easy to mistype something.
That side note aside, how do I tell which directory is the most current version of the table? Can I automatically delete the old versions? I am not clear if these are considered snapshots or not since each directory also can contain snapshots. I read in another post you can stop autosnapshot, but I'm not sure I want that. I'd rather just automatically delete any tables not being currently used (i.e.: that are not the latest version).
I stumbled across this trying to do a backup. I realized I am forced go to every table directory and copy out the snapshot files (there are like 50 directories..not including all the old table versions) which seems like a terrible design (maybe I'm missing something??).
I assumed I could do a snapshot of the whole keyspace and get one file back or at least output all the files to a single directory that represents the snapshot of the entire keyspace. At the very least it would be nice knowing what the current versions are so I can grab the correct files and offload them to storage somewhere.
DataStax Enterprise has a backup feature but it only supports AWS and I am using Azure.
So to clarify:
How do I automatically delete old table versions and know which is
the current version?
How can I backup the most recent versions of the tables and output the files to a single directory that I can offload somewhere? I only have two nodes, so simply relying on the repair is not a good option for me if a node goes down.

You can see the active version of a table by looking in the system keyspace and checking the cf_id field. For example, to see the version for a table in the 'test' keyspace with table name 'temp', you could do this:
cqlsh> SELECT cf_id FROM system.schema_columnfamilies WHERE keyspace_name='test' AND columnfamily_name='temp' allow filtering;
cf_id
--------------------------------------
d8ea9830-20e9-11e5-afc0-c381f961c62a
As far as I know, it is safe to delete (rm -r) outdated table version directories that are no longer active. I imagine they don't delete them automatically so that you can recover the data if you dropped them by mistake. I don't know of a way to have them removed automatically even if auto snapshot is disabled.
I don't think there is a command to write all the snapshot files to a single directory. According to the documentation on snapshot, "After the snapshot is complete, you can move the backup files to another location if needed, or you can leave them in place." So it's left up to the application developer how they want to handle archiving the snapshot files.

Related

Combine Cassandra Snapshot with updated data

We deleted some old data within our 3 node cassandra cluster (v3.11) some days ago which shall now be restored from a Snapshot. Is there a possibility to restore the data from the snapshot without loosing updates made since the snapshot was taken?
There are two approaches which come to my mind
A)
Create export via COPY keypsace.table TO xy.csv
Truncate table
restore table from snapshot via sstableloader
Reimport newer data via COPY keypsace.table FROM xy.csv
B)
Just copy sstable files of snapshot into current table directory
Is A) a feasible option? What do we need to consider so that the COPY FROM/TO commands get synchronized over all nodes?
For option B) I read that the deletion commands that happend may be executed again (tombstone rows). Can I ignore this warning if we make sure the deletion commands happened more than 10 days ago (gc_grace_seconds)?
for exporting/importing data from Apache Cassandra®, there is an efficient tool -- DataStax Bulk Loader (aka DSBulk). You could refer to more documentation and examples here. For getting consistent reads and writes, you could leverage --datastax-java-driver.basic.request.consistency LOCAL_QUORUM in your unload & load commands.

cassandra: restoring partially lost data

Theoretical question:
Lets say I have a cassandra cluster with some data in it.
Backups are created on a daily basis.
Now a subset of data is being lost, either by application error or manual deletion.
What is the best way to restore data from existing backup?
I can think of starting a separate node with the backup disk attached, then export data manually through selects and reimport into the prod database.
That would work but sounds complicated, is there a more straight forward solution for such problems?
If its a single partition probably best bet is to use sstabledump or something like sstable-tools to read from it and just manually reinstert. If ok with restoring everything deleted from time of snapshot: reduce gcgrace to purge any tombstones with a force compact (or else they will continue to shadow the restored data) and use the sstable loader or if the token ranges are the same copy the backed up sstables back in the data directory.

Cassandra - recovery of data after accidental delete

As the data in case of Cassandra is physically removed during compaction, is it possible to access the recently deleted data in any way? I'm looking for something similar to Oracle Flashback feature (AS OF TIMESTAMP).
Also, I can see the pieces of deleted data in the relevant commit log file, however it's obviously unreadable. Is it possible to convert this file to a more readable format?
You will want to execute a restore from your commitlog.
The safest is to copy the commitlog to a new cluster (with same schema), and restore following the instructions (comments) from commitlog_archiving.properties file. In your case, you will want to set restore_point_in_time to a time between your insert and your delete.

How to recover a dropped database on MySQL

I accidentaly dropped a database on MySQL yog ultimate. Also, I found that the IT guy uninstalled MySQL yog from the machine.
Now am working on two machines which includes the one from which database was dropped and mysql was uninstalled.
Is there a way to recover the dropped databases.
You said in a comment that you have a backup from a couple of hours prior to the data loss.
If you also have binary logs, you can restore the backup, and then reapply changes from the binary logs.
Here is documentation on this operation: http://dev.mysql.com/doc/refman/5.6/en/point-in-time-recovery.html
You can even filter the binary logs to reapply changes for just one database (mysqlbinlog --database name). For example you may have other databases that were not dropped on the same instance, and you wouldn't want to reapply changes to those other databases.
Recovering two hours worth of binary logs won't take "a very long amount of time." The trickiest part is figuring out the start point to begin replaying the binary logs. If you were lucky enough to include the binary log position with the backup, this will be simpler and very precise. If you have to go by timestamp, it's less precise and you probably cannot hope to do an exact recovery.
If you didn't have binary logs enabled on this instance since you backed up the database, it's a lot trickier to do a data recovery of lost files. You might be able to use a filesystem undelete tool like the EaseUS Data Recovery Wizard (though I can't say I have experience using that tool).
Reconstructing the files you recover is not for the faint of heart, and it's too much to get into here. You might want to get help from a professional MySQL consulting firm. I work for one such firm, Percona, who offers data recovery services.
There's really only one word: Backups.
After MySQL drops database the data is still on the media for a while. So you can fetch records and rebuild database with DBRECOVER.
mysql> drop database employees;
Query OK, 14 rows affected (0.16 sec)
#sync
#sync
select drop database recovery
select MYSQL VERSION as you used; Page Size should be left as 16k
click select directory , and input the ##datadir directory
!!!caution: you should input the ##datadir directory here. pls don't copy the ##datadir directory to any other filesystem or mount point , and use the copy one . The software need to scan the orginal filesystem or mount point ,otherwise it can't work.You'd better set #datadir mount point as read only, avoid any more disk write is necessary. And don't locate DBRECOVER software package on the same filesystem.
https://youtu.be/ao7OY8IbZQE

DROP COLUMN FAMILY from cassandra CLI will not drop the CF

We tried to drop CF's using cassandra cli
DROP COLUMN FAMILY cfName
And when we list the CF from CLI it was not there and when i tried to get the existing CF's via hector
I still could see the CF name
KeyspaceDefinition keyspaceDefinition = newConnection().describeKeyspace(keyspaceName);
keyspaceDefinition.getCfDefs();
Data inside the CF is not there however, the CF is still listed, after listing the CFs via hector if i do a cassandra -cli list column families i can see my deleted CF again
I had to deal with this issue back on Cassandra 1.1 as well. Basically, my column family had become corrupted, and the only way to alter its schema, was to drop/restore the keyspace (which DataStax walked me through, at the time).
If you have a support contract with DataStax, I would HIGHLY recommend contacting them before proceeding. The first thing they'll tell you, is that this is a bug in specific versions of Cassandra 1.1, and that you should upgrade. I haven't tested it, but according to them an in-place upgrade would allow you to modify your schema in the new version. So you might be able to fix this by upgrading to 1.2 or 2.0.
In my case (production, enterprise environment) upgrading on-the-spot was not an option. To fix this, I basically had to drop my entire keyspace, re-create it (and my column families), and recover from a snapshot. I loosely followed the instructions found here:
Take a snapshot of the keyspace on each node. The snapshot files should be stored in the [keyspaceName]/snapshots dir, but I copied mine to another non-Cassandra location just to be on the safe side.
DROP your keyspace.
Stop all nodes.
On each node, delete the .db files in the keyspace directory (but not the snapshot dir).
Copy the files from the snapshot directory back into the keyspace directory.
Restart one node
From that node's cassandra-cli re-create your keyspace.
Verify that your data is there.
Restart the remaining nodes.

Resources