Drop table or truncate table in Cassandra, which is better - cassandra

We have a use case where we need to re-create a table every day with current data in Cassandra. For this should we use drop table or truncate table, which would be efficient? We do not want the data to be backed up etc?
Thanks
Ankur

I think for almost all cases Truncate is a safer operation than a drop recreate. There have been several issues with dropping/recreating in the past with ghost data, schema disagreement, ect... Although there have been a number of fixes to try to make drop/recreate more stable, if its an operation you are performing every day Truncate should be much cheaper and more stable.

Drop table drops the table and all data. Truncate clears all data in the table, and by default creates a snapshot of the data (but not the schema). Efficiency wise, they're close - though truncate will create the snapshot. You can disable this by setting auto_snapshot to false in cassandra yaml config, but it is server wide. If it's not too much trouble, I'd drop and recreate table - but I've seen issues if you don't wait a while after drop before recreating.

Source : https://support.datastax.com/hc/en-us/articles/204226339-FAQ-How-to-drop-and-recreate-a-table-in-Cassandra-versions-older-than-2-1
NOTE: By default, snapshots are created when tables are dropped or truncated. This will need to be cleaned out manually to reclaim disk space.
Tested manually as well.
Truncate will keep the schema though, drop will not.

Beware!
From datastax documentation: https://docs.datastax.com/en/archived/cql/3.3/cql/cql_reference/cqlTruncate.html
Note: TRUNCATE sends a JMX command to all nodes, telling them to delete SSTables that hold the data from the specified table. If any of these nodes is down or doesn't respond, the command fails and outputs a message like the following:
truncate cycling.user_activity;
Unable to complete request: one or more nodes were unavailable.
Unfortunately, there is nothing on the documentation saying if DROP behaves differently

Related

Cassandra query table without partition key

I am trying to extract data from a table as part of a migration job.
The schema is as follows:
CREATE TABLE IF NOT EXISTS ${keyspace}.entries (
username text,
entry_type int,
entry_id text,
PRIMARY KEY ((username, entry_type), entry_id)
);
In order to query the table we need the partition keys, the first part of the primary key.
Hence, if we know the username and the entry_type, we can query the table.
In this case the username can be whatever, but the entry_type is an integer in the range 0-9.
When doning the extraction we iterate the table 10 times for every username to make sure we try all versions of entry_type.
We can no longer find any entries as we have depleted our list of usernames. But our nodetool tablestats report that there is still data left in the table, gigabytes even. Hence we assume the table is not empty.
But I cannot find a way to inspect the table to figure out what usernames remains in the table. If I could inspect it I could add the usernames left in the table to our extraction job and eventually we could deplete the table. But I cannot simply query the table as such:
SELECT * FROM ${keyspace}.entries LIMIT 1
as cassandra requires the partition keys to make meaningful queries.
What can I do to figure out what is left in our table?
As per the comment, the migration process includes a DELETE operation from the Cassandra table, but the engine will have a delay before actually removing from disk the affected records; this process is controlled internally with tombstones and the gc_grace_seconds attribute of the table. The reason for this delay is fully explained in this blog entry, for a tl dr, if the default value is still in place, Cassandra will need to pass at least 10 days (864,000 seconds) from the execution of the delete before the actual removal of the data.
For your case, one way to proceed is:
Ensure that all your nodes are "Up" and "Healthy" (UN)
Decrease the gc_grace_seconds attribute of your table, in the example, it will set it to 1 minute, while the default is
ALTER TABLE .entries with GC_GRACE_SECONDS = 60;
Manually compact the table:
nodetool compact entries
Once that the process is completed, nodetool tablestats should be up to date
To answer your first question, I would like to put more light on gc_grace_seconds property.
In Cassandra, data isn’t deleted in the same way it is in RDBMSs. Cassandra is designed for high write throughput, and avoids reads-before-writes. So in Cassandra, a delete is actually an update, and updates are actually inserts. A “tombstone” marker is written to indicate that the data is now (logically) deleted (also known as soft delete). Records marked tombstoned must be removed to claim back the storage space. Which is done by a process called Compaction. But remember that tombstones are eligible for physical deletion / garbage collection only after a specific number of seconds known as gc_grace_seconds. This is a very good blog to read more in detail : https://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html
Now possibly you are looking into table size before gc_grace_seconds and data is still there.
Coming to your second issue where you want to fetch some samples from the table without providing partition keys. You can analyze your table content using Spark. The Spark Cassandra Connector allows you to create Java applications that use Spark to analyze database data. You can follow the articles / documentation to write a quick handy spark application to analyze Cassandra data.
https://www.instaclustr.com/support/documentation/cassandra-add-ons/apache-spark/using-spark-to-sample-data-from-one-cassandra-cluster-and-write-to-another/
https://docs.datastax.com/en/dse/6.0/dse-dev/datastax_enterprise/spark/sparkJavaApi.html
I would recommend not to delete records while you do the migration. Rather first complete the migration and post that do a quick validation / verification to ensure all records are migrated successfully (this use can easily do using Spark buy comparing dataframes from old and new tables). Post successful verification truncate the old table as truncate does not create tombstones and hence more efficient. Note that huge no of tombstone is not good for cluster health.

Cassandra - Truncate a table while inserts in progress

I want to understand how the truncate command works in Cassandra (version 3.9) to be able to know what would happen in the following scenario:
I have about 100GB of data on a table in production on a table that needs to be truncated.
I want to truncate this table, but at the same time there will be a few hundred requests per second that will be making inserts at the same time.
I am trying to understand, theoretically how would this play out.
Would the truncate try to acquire some sort of a lock on the table before it can proceed? and possibly stop the insert requests or itself be timed out?
Or would the truncate go through in sequence as the request came in and following insert requests would create the additional rows and I would end up with a small number of rows remaining after the truncate.
I am just trying to reclaim space, so I am not particularly concerned if a small amount of data remains from the insert requests run after the truncate command.
I am just trying to understand if you'd expect this to complete successfully or it would fail / time-out.
I will try to run a similar scenario on a smaller cluster, but I'm not sure if that will be a good substitute to understand the actual behavior. Any inputs will be helpful.
Truncate sends a message to all the nodes with a request to delete all the SSTables at the moment of execution, you will have information only of those upserts received after the truncate was issued.
In the Datastax documentation it is stated that this is done with JMX, but looking at the comments of this answer, this is done with CQL and the messaging service.
If you are trying to reclaim disk space, please note that a snapshot will be created with the truncate if auto_snapshot is set to true (true is the default value), so you will need to remove the snapshot after the execution of the command. Also, note that truncate will require to have all the nodes to be up and healthy to be able to complete.
I tried this for myself. On a 2 node Cassandra cluster I Made inserts at about 160 requests per second in the background and ran a truncate query on the same table that had about 200,000 records.
The table got truncated and the inserts continued without an error.
The new rows inserted after the truncate showed on the DB.

Is there a way to view data in 2 replicas in Cassandra?

I am a newbie to Cassandra.I have created a keyspace in Cassandra in NetworkTopology Strategy with 2 replicas in one datacenter. Is there a cql command or some other way to view my data in two replicas?
Like SELECT * FROM tablename in replica1 / replica2
Whether there is another way such that I can visually see the data in two replicas?
Thanks in advance.
So your question is not real clear "See the data in 2 replicas". If you ever want to validate your data, you can run some commands to visually see things.
The first thing you'd want to do is log onto the node you want to investigate. Go to the data directory of the interested table -> DataDir/keyspace/table. In there you'll see one or more files that look like *Data.db. Those are your sstables. Data in memory is flushed to sstables in certain scenarios. You want to be sure your data is flushed from memory to disk if you're validating (as you may not find what you're looking for otherwise). To do that, you issue a "nodetool flush" command (you can use the keyspace and table as parameters if you only want to flush the specific table).
Like I said, after that, everything in memory would be flushed to disk. So you'd be able to see your sstables (again, *Data.db) files. Once you have those sstables, you can run the "sstabledump" command on each sstable to see the data that resides in them, thus validating your data.
If you have only a few rows you want to validate and a lot of nodes, you can find which node the rows would reside by running "nodetool getendpoints" with the keyspace, table, and partition key. That will tell you every node that will have the data. That way you're not guessing which node the row(s) should be on. Unfortunately, there is no way to know which sstable the rows should exist in (and it could be more than one if updates/deletes, etc. occurred). You'll have to go through each sstable on the specific node(s).
Hope that helps answer your question?
Good luck.
-Jim
You can for a specific partition. If you are sure host1 is a replica (nodetool getendpoints or from query trace), then if you make your query with CL.ONE and explicitly to that host, the coordinator will always pick local first. So
Statement q = new SimpleStatement("SELECT * FROM tablename WHERE key = X");
q.setHost("host1")
Where host1 owns X.
For SELECT * FROM tablename its a bit harder because you are looking over entire data set and coordinator will send out multiple queries for each part of ring. If you do some queries with CL.ONE it will still only go to one node for each part of that range so if you set q.enableTracing() you can see what node answered for each range. You have no control over which coordinator picks so may take few queries.
If you just want to see if theres differences you can use preview repair. nodetool repair --preview --full.

How to validate the restoration is successful

I have taken the full snapshot from a node. I have copied the snapshot directory and placed in the /var/lib/cassandra/data/Keyspace/Tables/ directory in the restoration node. I have tried both restarting the service and also tried using nodetool refresh command for restoring the data in new node. It worked like a charm.
I am unable to list the number of records for tables with high number of records. I am facing Connection timed out error for tables with higher records. So I am unable to validate that the total data from the table has been successfully restored.
Also I tried check the size occupied by the keyspace using nodetool cfstats -H and nodetool tablestats -H and "Space used" parameter seems to be exactly matching.
I use below command for listing the total count of the specific tables.
select count(*) from milestone LIMIT 100000;
My Question:
What if few of the records went missing during restoration? What if the count from the backup and restored data has mismatched and I have no way of knowing it. Could you please suggest the way to validate that the restoration is successful?
How will I ensure the total number of records have successfully copied?
Usually to validate data restoration, you may take a CSV backup of your data sets at the beginning and after restoration take one more CSV backup. Then compare these two backup, is there anything missing or not.
To compare to csv:
# diff mytable_old.csv mytable_new.csv
To know more about CQLSH COPY for csv backup: https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlshCopy.html
Depending on your dataset size it might not be possible (reasonable?) to compare the full dataset.
Either using a random approach and query a % of the dataset.
If you do want to query the full dataset the best approach is to query all partitions one by one by token, and compare with the original dataset. You can look here https://github.com/ckalantzis/cassTickler for an example of how to query the full dataset. The objective is different, but the approach I'm recommending is the same.

MemSQL for Last n Days Data

I plan to use memsql to store my last 7 days data for real time analytics using SQL.
I checked the documentation and find out that there is no such TTL / expiration feature in MemSQL
Is there any such feature (in case I missed it)?
Is memsql fit the use case if I do daily delete on >7 days data? I quite curious about the fragmentation
We tried it on postgresql and we need to execute Vacuum command, it takes a long time to run.
There is no TTL/expiration feature. You can do it by running delete queries. Many customer use cases are doing this type of thing, so yes MemSQL does fit the use case. Fragmentation generally shouldn't be too much of a problem here - what kind of fragmentation are you concerned about?
There is No Out of the Box TTL feature in MemSQL.
We achieved TTL by adding an additional TS column in our MemSQL Rowstore table with TIMESTAMP(6) datatype.
This provides automatic current timestamp insertion when you add a new row to the table.
When querying data from this table, you can apply a simple filter based on this TIMESTAMP column to filter older records beyond your TTL value.
https://docs.memsql.com/sql-reference/v6.7/datatypes/#time-and-date
You can always have a batch job which can run one a month which can delete older data.
we have not seen any issues due to fragmentation but you can do below once in a while if fragmentation is a concern for you:
MemSQL’s memory allocators can become fragmented over time (especially if a large table is shrunk dramatically by deleting data randomly). There is no command currently available that will compact them, but running ALTER TABLE ADD INDEX followed by ALTER TABLE DROP INDEX will do it.
Warning
Caution should be taken with this work around. Plans will rebuild and the two ALTER queries are going to move all moves in the table twice, so this should not be used that often.
Reference:
https://docs.memsql.com/troubleshooting/latest/troubleshooting/

Resources