How to flush data in all tables of keyspace in cassandra? - cassandra

I am currently writing tests in golang and I want to get rid of all the data of tables after finishing tests. I was wondering if it is possible to flush the data of all tables in cassandra.
FYI: I am using 3.11 version of Cassandra.

The term "flush" is ambiguous in this case.
In Cassandra, "flush" is an operation where data is "flushed" from memory and written to disk as SSTables. Flushing can happen automatically based on certain triggers or can be done manually with the nodetool flush command.
However based on your description, what you want is to "truncate" the contents of tables. You can do this using the following CQL command:
cqlsh> TRUNCATE ks_name.table_name
You will need to iterate over each table in the keyspace. For more info, see the CQL TRUNCATE command. Cheers!

Related

How do I replicate a Cassandra's local node for other Cassandra's remote node?

I need to replicate a local node with a SimpleStrategy to a remote node in other Cassandra's DB. Does anyone have any idea where I begin?
The main complexity here, if you're writing data into both clusters is how to avoid overwriting the data that has changed in the cloud later than your local setup. There are several possibilities to do that:
If structure of the tables is the same (including the names of the keyspaces if user-defined types are used), then you can just copy SSTables from your local machine to the cloud, and use sstableloader to replay them - in this case, Cassandra will obey the actual writetime, and won't overwrite changed data. Also, if you're doing deletes from tables, then you need to copy SSTables before tombstones are expired. You may not copy all SSTables every time, just the files that has changed since last data upload. But you always need to copy SSTables from all nodes from which you're doing upload.
If structure isn't the same, then you can either look to using DSBulk or Spark Cassandra Connector. In both cases you'll need to export data with writetime as well, and then load it also with timestamp. Please note that in both cases if different columns have different writetime, then you will need to load that data separately because Cassandra allows to specify only one timestamp when updating/inserting data.
In case of DSBulk you can follow the example 19.4 for exporting of data from this blog post, and example 11.3 for loading (from another blog post). So this may require some shell scripting. Plus you'll need to have disk space to keep exported data (but you can use compression).
In case of Spark Cassandra Connector you can export data without intermediate storage if both nodes are accessible from Spark. But you'll need to write some Spark code for reading data using RDD or DataFrame APIs.

Cassandra repair read writes metrics

when repairs are run in Cassandra, does the read and writes done for repair count in the read/write metrics? Repair has to read the table to build merkle tree, similarly when it has to do repair it has to write to the table, i think it might be. Am I correct?
If so, is there any way to identify such read/writes from regular read/writes?
In Cassandra 3, the metrics from Read repairs, can be obtained via JMX, in the Mbean "org.apache.cassandra.metrics", those operations don't affect the metrics of regular read/write operations.
This same question was asked on the Cassandra user mailing list and I'm posting my response here.
Not quite. Cassandra does a validation compaction for the merkle tree calculation. And it streams SSTables instead of individual mutations from one node to another to synchronise data between replicas. Cheers!

What are the best practices when performing a Truncate in Cassandra?

I want to perform a TRUNCATE on multiple tables with around 25 million records, in clustered, multi-data-center environment. I would just like some advice on steps to take before/after the truncate to ensure that there aren't huge discrepancies between the nodes.
According to this, a TRUNCATE deletes the SSTable holding the data. Does this mean that I'll need to set my consistency level to ALL before the truncate? Is a nodetool repair necessary after the operation?
Any advice would be greatly appreciated.
cqlsh 5.0.1 | Cassandra 3.11.2 | CQL spec 3.4.4
Ensure that all nodes are up before issuing this command.
Truncate will naturally propogate out over the cluster as long as all nodes remain up and available.
Truncate will automatically run as if CONSISTENCY is set to ALL which ensures that the command only returns after all nodes have deleted their data and fail if a node cannot be reached.
Repair is not necessary as there will not be any data to repair after the operation.

Why read fails with cqlsh query when huge tombstones is present

I have a table with huge tombstones.So when i performed a spark job (which reads) on that specific table, it gave result without any issues. But i executed same query using cqlsh it gave me error because huge tombstones present in that table.
Cassandra failure during read query at consistency one(1 replica
needed but o replicas responded ,1 failed
I know tombstones should not be there, i can run repair to avoid them , but apart from that why spark succeeded and cqlsh failed. They both use same sessions and queries.
How spark-cassandra connector works? is it different than cqlsh?
Please let me know .
thank you.
The Spark Cassandra Connector is different to cqlsh in a few ways.
It uses the Java Driver and not the python Driver
It has significantly more lenient retry policies
It full table scans by breaking the request up into pieces
Any of these items could be contributing to why it would work in SCC and not in CQLSH.

Spark Job for Inserting data to Cassandra

I am trying to write data to Cassandra tables using Spark on Scala. Sometimes the spark task fails in between and there are partial writes. Does Spark roll back the partial writes when the new task is started from first.
No. Spark (and Cassandra for that matter) doesn't do a commit style insert based on the whole task. This means that your writes must be idempotent otherwise you can end up with strange behaviors.
No but if I'm right, you can just reprocess your data. Which will overwrite the partial writes. When writing to Cassandra, a kind of update (upsert) is used when you are trying to insert data with the same primary key.

Resources