Data Structure in Cassandra - cassandra

Is there a way to read the SSTable in Cassandra? I see from the documentation that sstabledump is an enterprise version, Is it possible to get the trial version of sstabledump?
Or is there a way to read the SSTable using the existing utilities in Cassandra/bin folder?

sstabledump is also available in apache cassandra.
It can be found in tools/bin directory in cassandra 3.x
Note: sstable2json was replaced by sstabledump in 3.0

You can use sstable2json for that.
http://docs.datastax.com/en/archived/cassandra/2.2/cassandra/tools/toolsSSTable2Json.html
https://www.datastax.com/dev/blog/debugging-sstables-in-3-0-with-sstabledump

Related

Less rows being inserted by sstableloader in ScyllaDB

I'm trying to migrate data from Cassandra to ScyllaDB from snapshot using sstableloader and data in some tables gets loaded without any error but when verifying count using PySpark, it gives less rows in ScyllaDB than in Cassandra. Help needed!
I work at ScyllaDB
There are two tools that can be used to help find the differences:
https://github.com/scylladb/scylla-migrate (https://github.com/scylladb/scylla-migrate/blob/master/docs/scylla-migrate-user-guide.md) you can use the check mode to find the missing rows.
https://github.com/scylladb/scylla-migrator is a tool for migration from alive CQL clusters one to another (Cassandra --> Scylla) will work that also supports validation (https://github.com/scylladb/scylla-migrator#running-the-validator). There is a blog series on using this tool https://www.scylladb.com/2019/02/07/moving-from-cassandra-to-scylla-via-apache-spark-scylla-migrator/.
Please post a bug on https://github.com/scylladb/scylla/issues if indeed there are missing rows.
Solved this problem by using nodetool repair on Cassandra keyspace, took snapshot and loaded the snapshot in ScyllaDB using sstableloader.

How to see the actual configuration values for apache cassandra in cqlsh?

This is what I have in cassandra.yaml
prepared_statements_cache_size_mb: 500MB
Is it possible to see the actual value of that variable once you're in cqlsh?
Since Cassandra 4.0 you'll able to do that by reading from system_views.settings table.
See the blog post from TLP on topic of virtual tables...
Because CQL statements are sent to the cluster (which should be three or more nodes), you can't use CQL to read settings on an individual node. The value for prepared_statements_cache is just for that node.
You could use JMX to read org.apache.cassandra.config.Config on a node, including prepared_statements_cache_size_mb.

sstable2json in cassandra 3.5,

In 2.x versions of cassandra we can view sstable content with bin/sstable2json sstable.db What is proper way of checking sstable data in new version.3.x(currently 3.5).
Check out sstabledump. It has replaced sstable2json in Cassandra 3.X.
http://www.datastax.com/dev/blog/debugging-sstables-in-3-0-with-sstabledump

Cassandra data Migration from 1.2 to 3.0.2

I know similar questions have been asked before but I think my use case is very specific for which I could not find any answer.
In Production we are using Cassandra 1.2 with ByteOrderPartitioner in a 6 Node cluster with Priam as seed management tool. We have recently upgraded all the dependencies and trying to migrate to Cassandra 3.0.2 with Murmur Partitioner and for backward compatibility we need to enable thrift on new cluster .Also we want to migrate away from Priam also.
I was able to setup new cluster but facing lot of issues during data migration. I tried 3 things:
1) Use Copy Command : Fails when number of rows is large
2) SSTable2Json : Cassandra 3.0.2 has stopped supporting SSTable2Json
3) SSTableloader: Failing I think because of different cassandra version of source and destination
java.lang.RuntimeException: Could not retrieve endpoint ranges:
at org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:233)
at org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:119)
at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:67)
Caused by: InvalidRequestException(why:unconfigured table schema_columnfamilies)
at org.apache.cassandra.thrift.Cassandra$execute_cql3_query_result.read(Cassandra.java:37849)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at org.apache.cassandra.thrift.Cassandra$Client.recv_execute_cql3_query(Cassandra.java:1562)
at org.apache.cassandra.thrift.Cassandra$Client.execute_cql3_query(Cassandra.java:1547)
at org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:225)
... 2 more
Right now I am kind of stuck,any help regarding this will be deeply appreciated. Please let me know if you need more details.
No, You cannot upgrade your sstables from 1.2 to 3.0.2 directly since the sstable will differ for different version. This link describes the steps for upgrading the cassandra versions. But it also does not helps for you, since you are having a change in the partitioner type.
Changing the partitioner type is not yet supported in cassandra as of
now (Link).
One of the solution I would prefer is,
Create a stand alone utility which is of cassandra 3.0.2 version to read all the data from you source cassandra and write to sstable
with the help of CQLSSTableWriter with the partition type of Murmur Partitioner (The trick is, you are writing
the sstable with the version 3.0.2, so this sstable will be easily
recognized by your new cluster). Then use SSTableLoader in your target cluster
But I am not sure about why you still require backward compatibility, while creating CQLSSTableWritter you can specify the column family schema with keyword
"WITH COMPACT STORAGE". But I didn't tried CQLSSTableWritter with "WITH COMPACT STORAGE", but without "WITH COMPACT STORAGE" I had tried, it will work for your case too.
Ok so if you try to migrate directly from 1.2 to 3.0.2, you're really looking for trouble.
The migration path should be
latest minor or 1.2
2.0 latest minor
2.1 latest minor
3.0.2
For each jump between version, read the https://github.com/apache/cassandra/blob/trunk/NEWS.txt file to know if you need special actions (upgrade sstable, ...)

How to implement CDC in Cassandra?

I am trying to use CDC in Cassandra tried using incremental backup as mentioned in this link but the format of SSTables is very weired for the composite keys.Is there any way to implement CDC in cassandra.
Any pointers will be very useful.
It is available now from Cassandra 3.8
https://issues.apache.org/jira/browse/CASSANDRA-8844

Resources