upgrading cassandra with DataStax Lifecycle manager

upgrading cassandra with DataStax Lifecycle manager - cassandra

The DataStax Opscenter LifeCycle Manager only seems to have an option to run an 'install' job. Looking at the language, it seems to be only to provision new nodes.
Can LifeCycle Manager be used to upgrade existing (managed) clusters to newer version of Datastax enterprise?

Edit 2018-05 OpsCenter 6.5.0 has been released and provides assistance in the process of upgrading DSE between patch releases... aka going from DSE 5.0.3 to 5.0.6. Docs and https://docs.datastax.com/en/opscenter/6.5/opsc/LCM/opscLCMjobsOverview.html and https://docs.datastax.com/en/opscenter/6.5/opsc/LCM/upgradeDSEjob.html.
DataStax engineer here, I work on Lifecycle Manager. Currently LCM cannot help you upgrade nodes, and while I'm not able to share information about future roadmap and unreleased features, I can say we know that customers want to use LCM for upgrades and we agree that it would be a valuable feature.
As of OpsCenter 6.1.x, you must manually upgrade your nodes, and then update your LCM configs to match the new versions. From that point onward you can use LCM for install/config jobs in the upgraded cluster. This isn't a detailed howto, but broadly:
Review the upgrade guide so you know what needs to be done: https://docs.datastax.com/en/upgrade/doc/upgrade/datastax_enterprise/upgrdDSE.html
Perform the upgrade manually, outside of LCM. Note that if you use apt to manage packages, and are not upgrading the very most recent version available, you'll have to use a pretty giant apt-command in order to work around a dependency resolution in apt when upgrading to an "old" version. The resulting command will look something like: apt-get install -y -qq -o Dpkg::Options::=--force-confdef -o Dpkg::Options::=--force-confold dse-pig=5.0.11-1 dse-libhadoop2-client=5.0.11-1 dse-libspark=5.0.11-1 dse-libhadoop-native=5.0.11-1 dse-libmahout=5.0.11-1 dse-hive=5.0.11-1 dse-libpig=5.0.11-1 dse-libsolr=5.0.11-1 dse-libgraph=5.0.11-1 dse-libtomcat=5.0.11-1 dse-libhadoop=5.0.11-1 dse-libhive=5.0.11-1 dse-full=5.0.11-1 dse-libcassandra=5.0.11-1 dse=5.0.11-1 dse-libsqoop=5.0.11-1 dse-libhadoop2-client-native=5.0.11-1 dse-liblog4j=5.0.11-1
Once the manual upgrade is complete, you'll temporarily be in a position where LCM jobs cannot run successfully, since the version of DSE installed does not match the version of DSE that LCM is configured to deploy. At this point LCM jobs will fail with a DSE version mismatch error. To fix this, proceed...
Clone your configuration profile (which is associated with your old dse version) to a new CP using the new DSE version. If you're doing a patch upgrade, this will be pretty simple. If you're doing a major upgrad via the API, you need to be very careful to remove config parameters that DSE no longer supports.
Edit your cluster model so that the cluster, plus any datacenters or nodes with CP's defined use your newly cloned CP for your current datastax version instead of your old cp for your old datastax version. At this point, you can brought LCM back into sync with your cluster. You can proceed to run install/configure jobs again.
Not a simple procedure, but it is possible to upgrade your cluster outside of LCM and then sync lcm up with the new config so you can continue managing it from there. As previously noted, we understand this is not a simple process and understand that there's significant value in providing LCM upgrades natively.

Related

How can I upgrade Apache Hive to version 3 on GCP Apache Spark Dataproc Cluster

For one reason or another, I want to upgrade the version of Apache Hive from 2.3.4 to 3 on Google Cloud Dataproc(1.4.3) Spark Cluster. How can I upgrade the version of Hive but also maintain compatibility with the Cloud Dataproc tooling?

Unfortunately there's no real way to guarantee compatibility with such customizations, and there are known incompatibilities with currently released spark versions being able to talk to Hive 3.x so you'll likely run into problems unless you've managed to cross-compile all the versions you need yourself.
In any case though, the easiest way to go about it if you're only trying to get limited subsets of functionality working is simply dumping your custom jarfiles into:
/usr/lib/hive/lib/
on all your nodes via an init action. You may need to reboot your master node after doing so to update Hive metastore and Hiveserver2, or at least running:
sudo systemctl restart hive-metastore
sudo systemctl restart hive-server2
on your master node.
For Spark issues you may need your custom build of Spark as well and replace the jarfiles under:
/usr/lib/spark/jars/

Upgrade cassandra 2.1.19 cluster to 3.11.1

I want to upgrade cassandra 2.1.19 cluster to 3.11.1 without downtime.
Will 3.11.1 nodes work together with 2.1.19 nodes at the same time?

Key point will be how you connect to your cluster. You will need to try out on test systems if everything works from your application side doing the switch.
I recommend a two stop process in this case, migrate from 2.1.19 to 3.0.x - one node at atime.
For every node do the following (i said you need to test before before going to production right?):
nodetool drain - wait for finish
stop cassandra
backup your configs, the old one wont work out of the box
remove the cassandra package / tarball
read about the java and other cassandra 3.x requirements and ensure you met them
add the repo and install the 3.0.x package or tar ball
some packages start the node immediately - you may have to stop them again
make up the new config files (diff or something will be your friend, read the docs about the new options), one time only you should be able to resuse if on all the other nodes
start cassandra (did I say test this on a test system?) and wait until the node has joined the ring again nodetool status
upgrade your sstables with nodetool upgradesstables - almost always needed, dont skip this even if "something" works right now
this upgrade tends to be really slow - it's just a single thread running rewriting all your data, so i/o will be a factor here
all up and running -> go ahead to the next node and repeat
After that - upgrade 3.0.x to 3.11.x in the same fashion, add the new repo, configure for 3.11.x as for 3.0.x above and so on. But this time you can skip upgrading sstables as the format stays the same (but it wont harm if you do so).
Did I mention to do this on testing system first? One thing that will happen and may break things - older native protocols will be gone as well as rpc/ thrift.
Hope I didn't miss something ;)

Cassandra upgrade from 2.0.x to 2.1.x or 3.0.x

I've searched for previous versions of this question, but none seem to fit my case. I have an existing Cassandra cluster running 2.0.x. I've been allocated new VMs, so I do NOT want to upgrade my existing Cassandra nodes - rather I want to migrate to a) new VMs and b) a more current version of Cassandra.
I know for in-place upgrades, I would upgrade to the latest 2.0.x, then to the latest 2.1.x. AFAIK, there's no SSTable inconsistency here. If I go this route via addition of new nodes, I assume I would follow the datastax instructions for adding new nodes/decommissioning old nodes?
Given the above, is it possible to move from 2.0.x to 3.0.x? I know the SSTable format is different; however, if I'm adding new nodes (rather than re-using SSTables on disk), does this matter?
It seems to me that #2 has to work - otherwise, it implies that any upgrade requiring SSTable upgrades would require all nodes to be taken offline simultaneously; otherwise, there would be mixed 2.x.x and 3.0.x versions running in the same cluster at some point.
Am I completely wrong? Does anyone have any experience doing this?

Yes, it is possible to migrate data to a different environment (the new vm's with the updated Cassandra using sstableloader, but you will need C* 3.0.5 and above, as that version added support to upload sstables from previous versions.
Once that the process is completed it is recommended to execute nodetool upgradesstables to ensure that there are no incompatibilities on the data, and a nodetool cleanup.
Regarding your comment ... it implies that any upgrade requiring SSTable upgrades would require all nodes to be taken offline simultaneously;... is not true; doing the upgrade one node at a time will create a mixed cluster with nodes with the two versions as you mentioned, which is not optimal, but will allow you to avoid any downtime in production. (Note that the impact of this operation will depend on the consistency level used in your application.)

Don't worry about the migration. You can simply migrate your Cassandra 2.0.X cluster to Cassandra 3.0.X. But its better if you migrate your cluster Cassandra 2.0.X to latest Cassandra 2.X.X then Cassandra 3.0.X. You need to follow some steps-
Backup data
Uninstall present version
Install the version you want to upgrade
Restore data
As you are doing migration, you need to be careful about your data always. For the data backup and restore you can follow two ways-
Creating snapshots of your sstables and then after installing the new version of cassandra, placing the files to the data location and run sstableloader.
Backup your schema's to a .cql file and copy all the tables to .csv and then after installing the new version of cassandra, source your schema from .cql and copy all the tables from every single .csv file.
If you are fully convinced how you will complete the migration then you can write a bash script to complete the backup and restore steps.

DataStax OpsCenter v5.2.4 Create Cluster Error

Using DataStax OpsCenter v5.2.4 (currently the latest) installed using AMI ami-8f3e2bbf provided by DataStax and following DataStax's instructions on how to create a cluster on EC2, all DSE nodes fail during creation with this error:
Install Errored: Could not find a matching version for package dse-libpig
Is there a work around for this?
Note that during the process I selected Package: DataStax Enterprise 4.8.1, which is the latest available in the list at this time.

I faced the same issue and taking a clue from BrianC's comment resolved by removing a trailing '#' in my datastax account password

migrating cassandra from 1.1.2 to 1.2.6

My current cassandra version is 1.1.2, it is implemented with a single node cluster, i would like to upgrade it 1.2.6 with multiple nodes in the ring. is it a proper way to migrate it directly to 1.2.6 or i should follow version by version migration.
I found the upgrading steps from this link
http://fossies.org/linux/misc/apache-cassandra-1.2.6-bin.tar.gz:a/apache-cassandra-1.2.6/NEWS.txt.
There are 9 other releases available between this two versions.

I migrate a two cluster nodes from 1.1.6 to 1.2.6 without problems and without doing version by version. Anyway, you should take a closer look into:
http://www.datastax.com/documentation/cassandra/1.2/index.html?pagename=docs&version=1.2&file=index#upgrade/upgradeC_c.html#concept_ds_smb_nyr_ck
Because there are a lot of new features from version 1.2 like the partioners maybe you need to change some configurations for your cluster.

You may directly hop on to C1.2.6.
We migrated our 4-node cluster from C1.0.9 to C1.2.8 recently without any issues. This was a rolling upgrade i.e. upgrade one node at a time and after each upgrade of a node, allow the cluster to stabilize (depends upon the traffic during upgrade)
These are the steps that we followed:
Perform below on each node,
Run Disablegossip and disablethrift, such that this node is seen as DOWN by other nodes.
flush/drain the memtables, run compaction to merge SSTables
take snapshot and enable incremental backups
This stops all the other nodes/clients from writing to this node and since memtables are flushed to disk, startup times are fast as it need not walk-through commit logs.
stop Cassandra (though this node is down, cluster is available for write/read, so zero downtime)
upgrade sstables to new storage format using sstableupgrade
install/untar Cassandra 1.2.8 on the new locations
move upgraded sstables to appropriate location
merge Cassandra.yaml from previous version and current version by a manual diff (need to detail out difference)
start Cassandra
watch the startup messages to ensure the node comes up without difficulty and is shown in the ring with mixed 1.0.x/1.2.x

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string