Mixing Datastax Enterprise with Cassandra community - cassandra

I'm experimenting with Datastax Enterprise and I'm trying to have a cluster that mixes Enterprise nodes and standard Cassandra community nodes. I would only need a few nodes with advanced features like Solr and it would be nice to have all the nodes in the same cluster.
I tried to bootstrap a community node to a test Enterprise cluster, and it couldn't join the ring properly, throwing exceptions like that:
Unable to find compaction strategy class
'com.datastax.bdp.hadoop.cfs.compaction.CFSCompactionStrategy'
I assume that the Enterprise node tries to replicate CFs that have features from DSE, which are not recognized by the community node.
Is there a way to prevent that from happening? Am I trying to do something that's not possible/supported/allowed by DSE?

That is an unsupported configuration. The full cluster needs to be installed with DataStax enterprise binaries on all nodes. You can choose which nodes run as vanilla Cassandra, Hadoop or Solr by startup options on each node. DSE has a custom compaction strategy and snitch so that error is expected.

Related

Is it Opscenter configurable with Scylla?

For Scylla monitoring, we need to configure Grafana but is it possible to integrate Cassandra Opscenter to Scylla?
TL;DR: No.
OpsCenter is a closed source product, which was not tested with Scylla. Part of it that uses Apache Cassandra CQL and JMX will probably work, while others might not.
In addition to the open source, Scylla monitoring stack (base on Prometheus and Grafana), ScyllaDB has its own close version product for cluster management named Scylla Manager.
Tzach (Scylla Product Manager)

How to setup Spark with a multi node Cassandra cluster?

First of all, I am not using the DSE Cassandra. I am building this on my own and using Microsoft Azure to host the servers.
I have a 2-node Cassandra cluster, I've managed to set up Spark on a single node but I couldn't find any online resources about setting it up on a multi-node cluster.
This is not a duplicate of how to setup spark Cassandra multi node cluster?
To set it up on a single node, I've followed this tutorial "Setup Spark with Cassandra Connector".
You have two high level tasks here:
setup Spark (single node or cluster);
setup Cassandra (single node or cluster);
This tasks are different and not related (if we are not talking about data locality).
How to setup Spark in Cluster you can find here Architecture overview.
Generally there are two types (standalone, where you setup Spark on hosts directly, or using tasks schedulers (Yarn, Mesos)), you should draw upon your requirements.
As you built all by yourself, I suppose you will use Standalone installation. The difference between one node is network communication. By default Spark runs on localhost, more commonly it uses FQDNS name, so you should configure it in /etc/hosts and hostname -f or try IPs.
Take a look at this page, which contains all necessary ports for nodes communication. All ports should be open and available between nodes.
Be attentive that by default Spark uses TorrentBroadcastFactory with random ports.
For Cassandra see this docs: 1, 2, tutorials 3, etc.
You will need 4 likely. You also could use Cassandra inside Mesos using docker containers.
p.s. If data locality it is your case you should come up with something yours, because nor Mesos, nor Yarn don't handle running spark jobs for partitioned data closer to Cassandra partitions.

Does DataStax OpsCenter support Cassandra 2.2.5?

I've having some issues with my Cassandra cluster and I would like to install OpsCenter Community in order to debug what's going on.
I've found this and this pages talking about the compatibility between DataStax OpsCenter and Cassandra, but this don't list Cassandra 2.2.5 (actually I'm using DataStax Cassandra - dsc 22).
My question is: can I use DataStax OpsCenter (free / community version) within Cassandra 2.2.5? If not, there's an alternative?
No, the docs you cited indicate that OpsCenter doesn't support any cassandra greater than 2.1.x and the next version of opscenter (6.x) will only support Datastax Enterprise. I don't know of another visual front-end to cassandra at this time.

Migrate Datastax Enterprise Cassandra to Apache Cassandra or Datastax Community?

I have a large, but simple Cassandra database on a Datastax 4.6 cluster. The license renewal is prohibitive for this very simple use case and I am trying to migrate to either a straight Apache or Datastax Comunity version. First is it possible to do an inline update?
I have altered all the keyspaces to remove the "EverywhereStrategy" replication strategy but I still get an error that the DSC version of cassandra I'm trying to get to join the cluster doesn't support it. I'm using Like Cassandra versions (2.0.16) and most other things seem to be close.
java.lang.RuntimeException: org.apache.cassandra.exceptions.ConfigurationException: Unable to find replication strategy class 'org.apache.cassandra.locator.EverywhereStrategy'
If it's not possible to do an inline upgrade what would be the best strategy to migrate a decent size (30 node, 150Tb) cluster?
So to make this work you have to extract any of the DSE features that you may have on any of your tables.
This meant I had to change the replication strategy on the dse_system table from EverywhereStrategy to SimpleStrategy with RF=3 (or almost anything after conversion you can drop this keyspace) The error message was:
java.lang.RuntimeException: org.apache.cassandra.exceptions.ConfigurationException: Unable to find replication strategy class 'org.apache.cassandra.locator.EverywhereStrategy'
I Also had to drop the unused CFS keyspaces. We never used the hadoop/CFS integration so we had nothing in these keyspaces anyway. I didn't capture the error for this.
We did have a solr index on a table we were testing on this cluster about a year ago so I had to drop this columnfamily. The error message was:
java.lang.RuntimeException: java.lang.ClassNotFoundException: com.datastax.bdp.search.solr.Cql3SolrSecondaryIndex
There may be other incompatibilities if you use other features of Datastax Enterprise that you would have to remove, but this was enough for me to get the migration working.
dse-core.jar contains the EverywhereStrategy class.
We solved this problem by doing the following:
Replace everything except the above JAR so nodes can come up fine. Once all nodes are migrated to OSS, drop the dse_system keyspace (that uses this replication), delete the JAR and restart the nodes one by one.

Performing Analytics over Cassandra DB

I am working for a small concern and very new to apache cassandra. Studying about cassandra and performing some small analytics like sum function on cassandra DB for creating reports. For the same, Hive and Accunu can be choices.
Datastax Enterprise provides the solution for Apache Cassandra and Hive Integration. Is Datastax Enterprise is the only solution for such integration. Is there any way to resolve the hive and cassandra integration. If so, Can I get the links or documents regarding the same. Is that possible to work the same with the windows platform.
Is any other solution to perform analytics on cassandra DB?
Thanks in advance .
I was trying to download DataStax Enterprise (DSE) for Windows but found there is no such option on their website. I suppose they do not support DSE for Windows.
Apache Cassandra does have builtin Hadoop support. You need to set up a standalone Hadoop cluster colocated with Apache Cassandra nodes and then use ColumnFamilyInputFormat and ColumnFamilyOutputFormat to read/write data from/to your Hadoop cluster.

Resources