I have Cassandra 3.11.4 and been running a test environment for a while. I have done nodetool cleanup, clearsnapshot, repair, compact etc and what remains in the data storage directory for my keyspace contains numerous "empty" directories.
When running du from the directory:
0 ./a/backups
47804 ./a
0 ./b/backups
0 ./b
0 ./c/backups
0 ./c
0 ./d/backups
0 ./d
7748832 .
Just a portion of the data with names renamed to generic letters, but essentially there are many of these empty directories remaining. The tables referenced however have either already been dropped a long time ago i.e. longer than gc_grace_seconds but the directory links remain? These are not snapshots, as making a snapshot and clearing it with nodetool clearsnapshot works fine.
Before I manually delete each of the empty folders, which is going to be a pain as there are a lot of them; am I missing a step in maintaining my cluster which causes this or is it something that happens and would have to be handled regularly assuming many changes in my test schemas?
Snapshots get cleared and the /backups trailing kind of mean that these are incremental backups?
https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsBackupIncremental.html
Even if it is though, there are no methods to remove these incremental backups that I can find at least with nodetool, and at the least, the setting for incremental_backups in cassandra.yaml is False.
I believe there are answers stating it is safe to delete these "ghost" directories but it would be extremely annoying if the keyspace has many of these. Also, maybe it is just my idea of wanting clean directories, would these "ghost" directories have an impact on performance?
So the "ghost" table directories are either from:
1) empty table - still a valid table, but no data ever inserted
2) truncated tables
3) dropped tables
In the first and second case, if you remove the directory, you could end up causing issues. If you want to validate whether the directory is in use for that table you can query:
select id from system_schema.tables
where keyspac_name = 'xxxx' and
table_name = 'yyyy';
That ID is the id used for the directory extension for that table. Any other occurrences of that directory for that table for that keyspace are not in use.
-Jim
Related
$ cd /tmp
$ cp -r /var/lib/cassandra/data/keyspace/table-6e9e81a0808811e9ace14f79cedcfbc4 .
$ nodetool compact --user-defined table-6e9e81a0808811e9ace14f79cedcfbc4/*-Data.db
I expected the two SSTables (where the second one contains only tombstones) to be merged into one, which would be equivalent to the first one minus data masked by tombstones from the second one.
However, the last command returns 0 exit status and nothing changes in the table-6e9e81a0808811e9ace14f79cedcfbc4 directory (still two tables are there). Any ideas how to unconditionally merge potentially multiple SSTables into one in the offline manner (like above, not on SSTable files currently used by the running cluster)?
Just nodetool compact <keyspace> <table> There is no real offline compaction, only telling cassandra which sstables to compact. user-defined compaction just is to give it a custom list of sstables and a major compaction (above example) will include all sstables in a table.
While it really depends on which version your using on if it will work there is https://github.com/tolbertam/sstable-tools#compact available. If desperate can import cassandra-all for your version and do like it : https://github.com/tolbertam/sstable-tools/blob/master/src/main/java/com/csforge/sstable/Compact.java
When is it NOT necessary to truncate the table when restoring a snapshot (incremental) for Cassandra?
All the different documentation "providers" including the 2nd edition of the Cassandra The Definitive Guide, it says something like this... "If necessary, truncate the table." If you restore without truncating (removing the tombstone), Cassandra continues to shadow the restored data. This behavior also occurs for other types of overwrites and causes the same problem.
If I have an insert only C* keyspace (no upserts and no deletes), do I ever need to truncate before restoring?
The documentation seems to imply that I can delete all of the sstable files from a column family (rm -f /data/.), copy the snapshot to /data/, and nodetool refresh.
Is this true?
You are right - you can restore a snapshot excatly this way. Copy over the sstables, restart the node and you are done. With incremental backups be sure you got all sstables with your data.
What could happen if you have updates and deletes is that after restoring a node or during restoring multiple nodes is that there is stale data available or you could run into problems with tombstones when data was deleted after the snapshot.
The magic with truncating tables is that all data is gone at once and you avoid such problems.
I've been tasked with standing up and prep'ing our production Cassandra cluster (3.11.1). Everything was fine, loaded in a few hundred million records with the stress testing tool, great. However after I was done I did a "DROP KEYSPACE keyspace1;" (the space used by the stress test) assuming this was like MySQL and the space would be cleaned up.
Now I've run nodetools cleanup, flush, truncatehints, cleansnapshots and just about every other command variation I can find. The disk usage is still ~30GB per node and nothing seems to be going on in Cassandra.
So #1 - How do I recover the diskspace that is being absorbed by the now deleted keyspace?
And #2 - How should I have deleted this data, if this was the "wrong way"?
After you drop the keyspace you can delete its directory in your data directory. Which would clean it up, there isnt a command to do that.
Like #Chris said, we can manually delete the data. Also to add on to what he said, dropping the data does NOT really the data until after a specified gc_grace_seconds has passed. Default is 864000 seconds.
We can actually modify this by running this in cqlsh:
ALTER TABLE keyspace.table WITH gc_grace_seconds = 5;
And check again with:
SELECT table_name,gc_grace_seconds FROM system_schema.tables WHERE keyspace_name='transcript';
Looking in my keyspace directory I see several versions of most of my tables. I am assuming this is because I dropped them at some point and recreated them as I was refining the schema.
table1-b3441432142142sdf02328914104803190
table1-ba234143018dssd810412asdfsf2498041
These created tables names are very cumbersome to work with. Try changing to one of the directories without copy pasting the directory name from the terminal window... Painful. So easy to mistype something.
That side note aside, how do I tell which directory is the most current version of the table? Can I automatically delete the old versions? I am not clear if these are considered snapshots or not since each directory also can contain snapshots. I read in another post you can stop autosnapshot, but I'm not sure I want that. I'd rather just automatically delete any tables not being currently used (i.e.: that are not the latest version).
I stumbled across this trying to do a backup. I realized I am forced go to every table directory and copy out the snapshot files (there are like 50 directories..not including all the old table versions) which seems like a terrible design (maybe I'm missing something??).
I assumed I could do a snapshot of the whole keyspace and get one file back or at least output all the files to a single directory that represents the snapshot of the entire keyspace. At the very least it would be nice knowing what the current versions are so I can grab the correct files and offload them to storage somewhere.
DataStax Enterprise has a backup feature but it only supports AWS and I am using Azure.
So to clarify:
How do I automatically delete old table versions and know which is
the current version?
How can I backup the most recent versions of the tables and output the files to a single directory that I can offload somewhere? I only have two nodes, so simply relying on the repair is not a good option for me if a node goes down.
You can see the active version of a table by looking in the system keyspace and checking the cf_id field. For example, to see the version for a table in the 'test' keyspace with table name 'temp', you could do this:
cqlsh> SELECT cf_id FROM system.schema_columnfamilies WHERE keyspace_name='test' AND columnfamily_name='temp' allow filtering;
cf_id
--------------------------------------
d8ea9830-20e9-11e5-afc0-c381f961c62a
As far as I know, it is safe to delete (rm -r) outdated table version directories that are no longer active. I imagine they don't delete them automatically so that you can recover the data if you dropped them by mistake. I don't know of a way to have them removed automatically even if auto snapshot is disabled.
I don't think there is a command to write all the snapshot files to a single directory. According to the documentation on snapshot, "After the snapshot is complete, you can move the backup files to another location if needed, or you can leave them in place." So it's left up to the application developer how they want to handle archiving the snapshot files.
I am looking for confirmation that my Cassandra backup and restore procedures are sound and I am not missing anything. Can you please confirm, or tell me if something is incorrect/missing?
Backups:
I run daily full backups of the keyspaces I care about, via "nodetool snapshot keyspace_name -t current_timestamp". After the snapshot has been taken, I copy the data to a mounted disk, dedicated to backups, then do a "nodetool clearsnapshot $keyspace_name -t $current_timestamp"
I also run hourly incremental backups - executing a "nodetool flush keyspace_name" and then moving files from the backup directory of each keyspace, into the backup mountpoint
Restore:
So far, the only valid way I have found to do a restore (and tested/confirmed) is to do this, on ALL Cassandra nodes in the cluster:
Stop Cassandra
Clear the commitlog *.log files
Clear the *.db files from the table I want to restore
Copy the snapshot/full backup files into that directory
Copy any incremental files I need to (I have not tested with multiple incrementals, but I am assuming I will have to overlay the files, in sequence from oldest to newest)
Start Cassandra
On one of the nodes, run a "nodetool repair keyspace_name"
So my questions are:
Does the above backup and restore strategy seem valid? Are any steps inaccurate or anything missing?
Is there a way to do this without stopping Cassandra on EVERY node? For example, is there a way to restore the data on ONE node, then somehow make it "authoritative"? I tried this, and, as expected, since the restored data is older, the data on the other nodes (which is newer) overwrites in when they sync up during repair.
Thank you!
There's two ways to restore Cassandra backups without restarting C*:
Copy the files into place, then run "nodetool refresh". This has the caveat that the rows will still be older than tombstones. So if you're trying to restore deleted data, it won't do what you want. It also only applies to the local server (you'll want to repair after)
Use "sstableloader". This will load data to all nodes. You'll need to make sure you have the sstables from a complete replica, which may mean loading the sstables from multiple nodes. Added bonus, this works even if the cluster size has changed. I'm not sure if ordering matters here (that is, I don't know if row timestamps are preserved through the load or if they're redefined during load)