Truncating the data create the snapshot in Cassandra consuming unnecessary space - cassandra

I had a column family containing the space of 40GB. I truncated the column family. So, after GC_GRACE_SECONDS Cassandra created snapshots of the truncated data which is consuming the same amount of space. Is there any way by means of which we can get rid of space utilized by snapshots without disable the creation of snapshots. I mean isn't there any timeout parameter after which it will delete the snapshot consuming unnecessary space.

The snapshot that you are seeing getting created after truncating the column family is actually a safety mechanism in C* to avoid mass data loss in case of accidental table delete or truncation (btw it has nothing to do with gc_grace_seconds). There is a setting 'auto_snapshot' in cassandra.yaml which is true by default. From DataStax documentation
auto_snapshot
(Default: true ) Enable or disable whether a snapshot is taken of the data before keyspace truncation or dropping of tables. To prevent data loss, using the default setting is strongly advised. If you set to false, you will lose data on truncation or drop.
If you want to delete snapshots then you can use nodetool clearsnapshot command as explained here

Related

Cassandra truncate performance

I have been recently told that, cassandra truncate is not performant and it is anti pattern. But, I do not know why?
So, I have 2 questions:
Is it more performant to have upsert of all records then doing truncate?
Does truncate operation creates tombstones?
Cassandra Version: 3.x
From the cassandra docs:
Note: TRUNCATE sends a JMX command to all nodes, telling them to
delete SSTables that hold the data from the specified table. If any of
these nodes is down or doesn't respond, the command fails and outputs
a message like the following
So, running truncate will issue a deletion of all sstables belonging to your cassandra table, which will be quite fast but must be acknowledged by all nodes. Depending on your cassandra.yml this will snapshot your data before:
auto_snapshot (Default: true) Enable or disable whether a snapshot is
taken of the data before keyspace truncation or dropping of tables. To
prevent data loss, using the default setting is strongly advised. If
you set to false, you will lose data on truncation or drop.
When creating or modifying tables, you enable or disable the key cache
(partition key cache) or row cache for that table by setting the
caching parameter. Other row and key cache tuning and configuration
options are set at the global (node) level. Cassandra uses these
settings to automatically distribute memory for each table on the node
based on the overall workload and specific table usage. You can also
configure the save periods for these caches globally.
To your question:
upserts will be much slower (when there is significant data in your table)
truncate does not write tombstones at all (instead it will delete all on all nodes for your truncated table sstables immediately)

Cassandra - What is difference between TTL at table and inserting data with TTL

I have a Cassandra 2.1 cluster where we insert data though Java with TTL as the requirement of persisting the data is 30 days.
But this causes problem as the files with old data with tombstones is kept on the disk. This results in disk space being occupied by data which is not required. Repairs take a lot of time to clear this data (upto 3 days on a single node)
Is there a better way to delete the data?
I have come across this on datastax
Cassandra allows you to set a default_time_to_live property for an entire table. Columns and rows marked with regular TTLs are processed as described above; but when a record exceeds the table-level TTL, Cassandra deletes it immediately, without tombstoning or compaction. https://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlAboutDeletes.html?hl=tombstone
Will the data be deleted more efficiently if I set TTL at table level instead of setting each time while inserting.
Also, documentation is for Cassandra 3, so will I have to upgrade to newer version to get any benefits?
Setting default_time_to_live applies the default ttl to all rows and columns in your table - and if no individual ttl is set (and cassandra has correct ntp time on all nodes), cassandra can easily drop those data safely.
But keep some things in mind: your application is still able so set a specific ttl for a single row in your table - then normal processing will apply. On top, even if the data is ttled it won't get deleted immediately - sstables are still immutable, but tombstones will be dropped during compaction.
What could help you really a lot - just guessing - would be an appropriate compaction strategy:
http://docs.datastax.com/en/archived/cassandra/3.x/cassandra/dml/dmlHowDataMaintain.html#dmlHowDataMaintain__twcs-compaction
TimeWindowCompactionStrategy (TWCS)
Recommended for time series and expiring TTL workloads.
The TimeWindowCompactionStrategy (TWCS) is similar to DTCS with
simpler settings. TWCS groups SSTables using a series of time windows.
During compaction, TWCS applies STCS to uncompacted SSTables in the
most recent time window. At the end of a time window, TWCS compacts
all SSTables that fall into that time window into a single SSTable
based on the SSTable maximum timestamp. Once the major compaction for
a time window is completed, no further compaction of the data will
ever occur. The process starts over with the SSTables written in the
next time window.
This help a lot - when choosing your time windows correctly. All data in the last compacted sstable will have roughly equal ttl values (hint: don't do out-of-order inserts or manual ttls!). Cassandra keeps the youngest ttl value in the sstable metadata and when that time has passed cassandra simply deletes the entire table as all data is now obsolete. No need for compaction.
How do you run your repair? Incremental? Full? Reaper? How big in terms of nodes and data is your cluster?
The quick answer is yes. The way it is implemented is by deleting the SStable/s directly from disk. Deleting an SStable without the need to compact will clear up disk space faster. But you need to be sure that the all the data in a specific sstable is "older" than the globally configured TTL for the table.
This is the feature referred to in the paragraph you quoted. It was implemented for Cassandra 2.0 so it should be part of 2.1

SSTables are never deleted on disk if table gets dropped

SSTables are never deleted on disk if table gets deleted.
I had a a table whose tombstones count is >100000 due to which my read queries were throwing Tombstones error. I then dropped the table, but this didn't delete the SSTable files. I re-created the table, then I ran my select queries, I saw the tombstone error again. I don't understand why the old tombstone error has come up again?
Also, when does the SSTable ever gets deleted on disk?
Truncating a table will not remove the SSTable(s) on disk. You need to run nodetool cleanup
Tombstones will disappear through compaction, but only once gc_grace_seconds has passed. The default is 10 days. Why so long? Its designed to be a bit longer than a week providing enough time to run repair on a cluster before deletes are discarded. This maximizes the opportunity for consistency across the nodes.
In order to have your tables deleted from disk you need to make sure that no hard-links are currently pointing at them. By default, a DROP command will create a snapshot of the CF. You need to set to false the auto_snapshot property in the YAML file:
# Whether or not a snapshot is taken of the data before keyspace truncation
# or dropping of column families. The STRONGLY advised default of true
# should be used to provide data safety. If you set this flag to false, you will
# lose data on truncation or drop.
auto_snapshot: false
If you want err on the safe side (and a general procedure to recreate your keyspace), you could go for:
DROP TABLE IF EXISTS mytable
CREATE TABLE mytable (....)
TRUNCATE mytable
I never had a single problem with this so far.
Truncate operation is safer than drop and recreate. Truncate may throw a timeout exception, do it again until it is completely done.

Freeing disk space of overwritten data?

I have a table whose rows get overwritten frequently using the regular INSERT statements. This table holds ~50GB data, and the majority of it is overwritten daily.
However, according to OpsCenter, disk usage keeps going up and is not freed.
I have validated that rows are being overwritten and not simply being appended to the table. But they're apparently still taking up space on disk.
How can I free disk space?
Under the covers the way Cassandra during these writes is that a new row is being appended to the SSTable with a newer time stamp. When you perform a read the newest row (based on time stamp) is being returned to you as the row. However this also means that you are using twice the disk space to accomplish this. It is not until Cassandra runs a compaction operation that the older rows will be removed and the disk space recovered. Here is some information on how Cassandra writes to disk which explains the process:
http://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_write_path_c.html?scroll=concept_ds_wt3_32w_zj__dml-compaction
A compaction is done on a node by node basis and is a very disk intensive operation which may effect the performance of your cluster during the time it is running. You can run a manual compaction using the nodetool compact command:
https://docs.datastax.com/en/cassandra/2.0/cassandra/tools/toolsCompact.html
As Aaron mentioned in his comment above overwriting all the data in your cluster daily is not really the best use case for Cassandra because of issues such as this one.

physical disk space management of cassandra

Recently I have been looking into Cassandra from our new project's perspective and learned a lot from this community and its wiki too. But I have not found anything about about how updates are managed in Cassandra in terms of physical disk space management though it seems to be very much similar to record delete management using compaction.
Suppose there are 100 records with 5 column values each so when all changes would be flushed disk all records will be written adjacently and when delete operation is done then its marked in Memory table first and physically record is deleted after some time as set in configuration or when its full. And the compaction process claims the space.
Now question is that at one side being schema less there is no fixed number of columns at the the beginning but on the other side when compaction process takes place then.. does it put records adjacently on disk like traditional RDBMS to speed up the read process as for RDBMS its easy because they have to allocate fixed amount of space as per declaration of columns datatype.
But how Cassandra exactly makes the records placement on disk in compaction process (both for update/delete) to speed up the reads?
One more question related to compaction is that when there is no delete queries but there is an update query which updates an existent record with some variable length data or insert altogether a new column then how compaction makes its space available on disk between already existent data rows?
Rows and columns are stored in sorted order in an SSTable. This allows a compaction of multiple SSTables to output a new, (sorted) SSTable, with only sequential disk IO. This new SSTable will be outputted into a new file and freespace on the disks. This process doesn't depend on the number of rows of columns, just on them being stored in a sorted order. So yes, in all SSTables (even those resulting form compactions) rows and columns will be arranged in a sorted order on disk.
Whats more, as you hint at in your question, updates are no different from inserts - they do not overwrite the value on disk, but instead get buffered in a Memtable, then get flushed into a new SSTable. When the new SSTable eventually gets compacted with the SSTable containing the original value, the newer value will annihilate the old one - ie the old value will not be outputted from the compaction. Timestamps are used to decide which values is newest.
Deletes are handled in the same fashion, effectively inserted an "anti-value", or tombstone. The limitation of this process is that is can require significant space overhead. Deletes are effectively 'lazy, so the space doesn't get freed until some time later. Also, while the output of the compaction can be the same size as the input, the old SSTables cannot be deleted until the new one is completed, so this can reduce disk utilisation to 50%.
In the system described above, new values for an existing key can be a different size to the existing key without padding to some pre-determined length, as the new value does not get written over the old value on update, but to a new SSTable.

Resources