upgrading the cassandra - cassandra

I'm using cassandra 0.6.8 and I want to upgrade cassandra 0.6.8 to Cassandra 0.7, will it impact the data I currently have?

NEWS.txt always covers upgrading:
Upgrading
The Thrift API has changed in incompatible ways; see below, and refer
to http://wiki.apache.org/cassandra/ClientOptions for a list of
higher-level clients that have been updated to support the 0.7 API.
The Cassandra inter-node protocol is incompatible with 0.6.x
releases (and with 0.7 beta1), meaning you will have to bring your
cluster down prior to upgrading: you cannot mix 0.6 and 0.7 nodes.
The hints schema was changed from 0.6 to 0.7. Cassandra automatically
snapshots and then truncates the hints column family as part of
starting up 0.7 for the first time.
Keyspace and ColumnFamily definitions are stored in the system
keyspace, rather than the configuration file.
The process to upgrade is:
1) run "nodetool drain" on _each_ 0.6 node. When drain finishes (log
message "Node is drained" appears), stop the process.
2) Convert your storage-conf.xml to the new cassandra.yaml using
"bin/config-converter".
3) Rename any of your keyspace or column family names that do not adhere
to the '^\w+' regex convention.
4) Start up your cluster with the 0.7 version.
5) Initialize your Keyspace and ColumnFamily definitions using
"bin/schematool <host> <jmxport> import". _You only need to do
this to one node_.

Yes, it will.
In my opinion you have to move manually your data from 0.6 to 0.7 by writing some code in java or by another programming.

Related

Skipping "nodetool upgradesstables" for time series ttl expiry data in Cassandra upgrade from 2.1 to 3.11

In Cassandra 2.1 cluster, data format is ka and post upgrade to Cassandra 3.11, I see the new sstables are written in md format. For the time-series data that is going to be expired in 3 months time, can I skip running the nodetool upgradesstables?
I validated the data reads are working fine from the older ka format sstables after upgrade. The reason I want to skip the upgrade is from other threads, I know it is going to take a lot of time for format conversion and anyway this data is going to expire in 3 months.
I don't think it's mandatory to run nodetool sstablesupgrade, Cassandra 3 will be able to work with old SSTables, but you lose a lot of advantages of Cassandra 3 (for instance, space consumption is significantly reduced). Also Datastax has a warning in their upgrade documentation:
WARNING: Failure to upgrade SSTables when required results in a significant performance impact and increased disk usage and possible data loss. Upgrading is not complete until the SSTables are upgraded.

existing Cassandra 2.2.x cluster, changing the number of vNodes - will data be lost or not?

If the number of vNodes in the existing Cassandra 2.2.x cluster is changed - will it cause all the data in that cluster to be lost or not?
Is it possible to change # of vNodes and keep all the data stored in the Cassandra cluster?
The value in the config (cassandra.yaml) is only read on startup. Changing the value here will basically have no effect. You won't lose data.
There used to be a feature called shuffle - but it turned out you really don't want to change the token layout in this way, the streaming associated with shuffle will pretty much kill your cluster.
If you need to do this - the best method is to create a new DC with the desired token ranges and then rebuild them as per the instructions here:
https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_add_dc_to_cluster_t.html
You can then point your app at the new DC and throw away the old.

Cassandra data Migration from 1.2 to 3.0.2

I know similar questions have been asked before but I think my use case is very specific for which I could not find any answer.
In Production we are using Cassandra 1.2 with ByteOrderPartitioner in a 6 Node cluster with Priam as seed management tool. We have recently upgraded all the dependencies and trying to migrate to Cassandra 3.0.2 with Murmur Partitioner and for backward compatibility we need to enable thrift on new cluster .Also we want to migrate away from Priam also.
I was able to setup new cluster but facing lot of issues during data migration. I tried 3 things:
1) Use Copy Command : Fails when number of rows is large
2) SSTable2Json : Cassandra 3.0.2 has stopped supporting SSTable2Json
3) SSTableloader: Failing I think because of different cassandra version of source and destination
java.lang.RuntimeException: Could not retrieve endpoint ranges:
at org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:233)
at org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:119)
at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:67)
Caused by: InvalidRequestException(why:unconfigured table schema_columnfamilies)
at org.apache.cassandra.thrift.Cassandra$execute_cql3_query_result.read(Cassandra.java:37849)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at org.apache.cassandra.thrift.Cassandra$Client.recv_execute_cql3_query(Cassandra.java:1562)
at org.apache.cassandra.thrift.Cassandra$Client.execute_cql3_query(Cassandra.java:1547)
at org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:225)
... 2 more
Right now I am kind of stuck,any help regarding this will be deeply appreciated. Please let me know if you need more details.
No, You cannot upgrade your sstables from 1.2 to 3.0.2 directly since the sstable will differ for different version. This link describes the steps for upgrading the cassandra versions. But it also does not helps for you, since you are having a change in the partitioner type.
Changing the partitioner type is not yet supported in cassandra as of
now (Link).
One of the solution I would prefer is,
Create a stand alone utility which is of cassandra 3.0.2 version to read all the data from you source cassandra and write to sstable
with the help of CQLSSTableWriter with the partition type of Murmur Partitioner (The trick is, you are writing
the sstable with the version 3.0.2, so this sstable will be easily
recognized by your new cluster). Then use SSTableLoader in your target cluster
But I am not sure about why you still require backward compatibility, while creating CQLSSTableWritter you can specify the column family schema with keyword
"WITH COMPACT STORAGE". But I didn't tried CQLSSTableWritter with "WITH COMPACT STORAGE", but without "WITH COMPACT STORAGE" I had tried, it will work for your case too.
Ok so if you try to migrate directly from 1.2 to 3.0.2, you're really looking for trouble.
The migration path should be
latest minor or 1.2
2.0 latest minor
2.1 latest minor
3.0.2
For each jump between version, read the https://github.com/apache/cassandra/blob/trunk/NEWS.txt file to know if you need special actions (upgrade sstable, ...)

Cassandra UDT with Version 2

I want to use cassandra c++ driver. Its still working on version 2. (binary protocol version 2)
I want to use this to connect to cassandra 2.1
Is it possible to create a prepared statement and insert a UDT value into it.
or UDT can only be used with protocol version 3
regards
UDT are only available in cassandra 2.1 (and later)
https://issues.apache.org/jira/browse/CASSANDRA-5590

Cassandra bulk insert solution

I have a java program run as service , this program must insert 50k rows/s (1 row have 25 column ) to cassandra cluster.
My cluster contain 3 nodes, 1 node have 4 cpu core (core i5 2.4 ghz) , 4 gb ram.
i used Hector api, multithread, bulk insert but the performance is too low as expect (about 25k rows /s ).
Any one have suggest another solution for that. Is there cassandra support an internal bulk insert (without use Thrift).
Astyanax is a high level Java client for Apache Cassandra. Apache Cassandra is a highly available column oriented database.
Astyanax is currently in use at Netflix. Issues generally are fixed as quickly as possbile and releases done frequently.
https://github.com/Netflix/astyanax
I've had good luck creating sstables and loading them directly. There is a sstableloader
tool included in the distribution as well as a JMX interface. You can create the sstables using the SSTableSimpleUnsortedWriter class.
Details here.
The fastest way to bulk-insert data into Cassandra is sstableloader an utility provided by Cassandra in 0.8 onwards. For that you have to create sstables first which is possible with SSTableSimpleUnsortedWriter more about this is described here
Another faster way is Cassandras BulkoutputFormat for hadoop.With this we can write Hadoop job to load data to cassandra.See more on this bulkload to cassandra with hadoo

Resources