Cassandra data Migration from 1.2 to 3.0.2 - cassandra

I know similar questions have been asked before but I think my use case is very specific for which I could not find any answer.
In Production we are using Cassandra 1.2 with ByteOrderPartitioner in a 6 Node cluster with Priam as seed management tool. We have recently upgraded all the dependencies and trying to migrate to Cassandra 3.0.2 with Murmur Partitioner and for backward compatibility we need to enable thrift on new cluster .Also we want to migrate away from Priam also.
I was able to setup new cluster but facing lot of issues during data migration. I tried 3 things:
1) Use Copy Command : Fails when number of rows is large
2) SSTable2Json : Cassandra 3.0.2 has stopped supporting SSTable2Json
3) SSTableloader: Failing I think because of different cassandra version of source and destination
java.lang.RuntimeException: Could not retrieve endpoint ranges:
at org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:233)
at org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:119)
at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:67)
Caused by: InvalidRequestException(why:unconfigured table schema_columnfamilies)
at org.apache.cassandra.thrift.Cassandra$execute_cql3_query_result.read(Cassandra.java:37849)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at org.apache.cassandra.thrift.Cassandra$Client.recv_execute_cql3_query(Cassandra.java:1562)
at org.apache.cassandra.thrift.Cassandra$Client.execute_cql3_query(Cassandra.java:1547)
at org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:225)
... 2 more
Right now I am kind of stuck,any help regarding this will be deeply appreciated. Please let me know if you need more details.

No, You cannot upgrade your sstables from 1.2 to 3.0.2 directly since the sstable will differ for different version. This link describes the steps for upgrading the cassandra versions. But it also does not helps for you, since you are having a change in the partitioner type.
Changing the partitioner type is not yet supported in cassandra as of
now (Link).
One of the solution I would prefer is,
Create a stand alone utility which is of cassandra 3.0.2 version to read all the data from you source cassandra and write to sstable
with the help of CQLSSTableWriter with the partition type of Murmur Partitioner (The trick is, you are writing
the sstable with the version 3.0.2, so this sstable will be easily
recognized by your new cluster). Then use SSTableLoader in your target cluster
But I am not sure about why you still require backward compatibility, while creating CQLSSTableWritter you can specify the column family schema with keyword
"WITH COMPACT STORAGE". But I didn't tried CQLSSTableWritter with "WITH COMPACT STORAGE", but without "WITH COMPACT STORAGE" I had tried, it will work for your case too.

Ok so if you try to migrate directly from 1.2 to 3.0.2, you're really looking for trouble.
The migration path should be
latest minor or 1.2
2.0 latest minor
2.1 latest minor
3.0.2
For each jump between version, read the https://github.com/apache/cassandra/blob/trunk/NEWS.txt file to know if you need special actions (upgrade sstable, ...)

Related

Less rows being inserted by sstableloader in ScyllaDB

I'm trying to migrate data from Cassandra to ScyllaDB from snapshot using sstableloader and data in some tables gets loaded without any error but when verifying count using PySpark, it gives less rows in ScyllaDB than in Cassandra. Help needed!
I work at ScyllaDB
There are two tools that can be used to help find the differences:
https://github.com/scylladb/scylla-migrate (https://github.com/scylladb/scylla-migrate/blob/master/docs/scylla-migrate-user-guide.md) you can use the check mode to find the missing rows.
https://github.com/scylladb/scylla-migrator is a tool for migration from alive CQL clusters one to another (Cassandra --> Scylla) will work that also supports validation (https://github.com/scylladb/scylla-migrator#running-the-validator). There is a blog series on using this tool https://www.scylladb.com/2019/02/07/moving-from-cassandra-to-scylla-via-apache-spark-scylla-migrator/.
Please post a bug on https://github.com/scylladb/scylla/issues if indeed there are missing rows.
Solved this problem by using nodetool repair on Cassandra keyspace, took snapshot and loaded the snapshot in ScyllaDB using sstableloader.

Which one is better to use for interacting cassandra with java?

I need to insert and pull the data from AWS C* table.
My producer of data is defined with Spring Boot with Java8.
So which one I should I use for my project which is stable and efficient.
I got ways ( here I guess)
1. Sprinda-data-JPA.
2. cassandra-driver-core of datastax.
Disclaimer: Questions like this...asking which tool/library is "better" is subjective, and isn't usually a good fit for Stack Overflow.
That being said, the Spring Data Cassandra driver inherently violates two known Cassandra data access anti-patterns (that I know of):
Unbound SELECT COUNT(*) as a part of their paging mechanism.
Use of BATCH for multiple writes.
Additionally, the Spring Data Cassandra driver uses the DataStax driver, providing an additional delay for bug fixes and upgrades.
tl;dr;
You cannot go wrong by using the DataStax Java driver, and I highly recommend its use.

Cassandra UDT with Version 2

I want to use cassandra c++ driver. Its still working on version 2. (binary protocol version 2)
I want to use this to connect to cassandra 2.1
Is it possible to create a prepared statement and insert a UDT value into it.
or UDT can only be used with protocol version 3
regards
UDT are only available in cassandra 2.1 (and later)
https://issues.apache.org/jira/browse/CASSANDRA-5590

Cassandra bulk insert solution

I have a java program run as service , this program must insert 50k rows/s (1 row have 25 column ) to cassandra cluster.
My cluster contain 3 nodes, 1 node have 4 cpu core (core i5 2.4 ghz) , 4 gb ram.
i used Hector api, multithread, bulk insert but the performance is too low as expect (about 25k rows /s ).
Any one have suggest another solution for that. Is there cassandra support an internal bulk insert (without use Thrift).
Astyanax is a high level Java client for Apache Cassandra. Apache Cassandra is a highly available column oriented database.
Astyanax is currently in use at Netflix. Issues generally are fixed as quickly as possbile and releases done frequently.
https://github.com/Netflix/astyanax
I've had good luck creating sstables and loading them directly. There is a sstableloader
tool included in the distribution as well as a JMX interface. You can create the sstables using the SSTableSimpleUnsortedWriter class.
Details here.
The fastest way to bulk-insert data into Cassandra is sstableloader an utility provided by Cassandra in 0.8 onwards. For that you have to create sstables first which is possible with SSTableSimpleUnsortedWriter more about this is described here
Another faster way is Cassandras BulkoutputFormat for hadoop.With this we can write Hadoop job to load data to cassandra.See more on this bulkload to cassandra with hadoo

upgrading the cassandra

I'm using cassandra 0.6.8 and I want to upgrade cassandra 0.6.8 to Cassandra 0.7, will it impact the data I currently have?
NEWS.txt always covers upgrading:
Upgrading
The Thrift API has changed in incompatible ways; see below, and refer
to http://wiki.apache.org/cassandra/ClientOptions for a list of
higher-level clients that have been updated to support the 0.7 API.
The Cassandra inter-node protocol is incompatible with 0.6.x
releases (and with 0.7 beta1), meaning you will have to bring your
cluster down prior to upgrading: you cannot mix 0.6 and 0.7 nodes.
The hints schema was changed from 0.6 to 0.7. Cassandra automatically
snapshots and then truncates the hints column family as part of
starting up 0.7 for the first time.
Keyspace and ColumnFamily definitions are stored in the system
keyspace, rather than the configuration file.
The process to upgrade is:
1) run "nodetool drain" on _each_ 0.6 node. When drain finishes (log
message "Node is drained" appears), stop the process.
2) Convert your storage-conf.xml to the new cassandra.yaml using
"bin/config-converter".
3) Rename any of your keyspace or column family names that do not adhere
to the '^\w+' regex convention.
4) Start up your cluster with the 0.7 version.
5) Initialize your Keyspace and ColumnFamily definitions using
"bin/schematool <host> <jmxport> import". _You only need to do
this to one node_.
Yes, it will.
In my opinion you have to move manually your data from 0.6 to 0.7 by writing some code in java or by another programming.

Resources