Dynamically adding new nodes in Cassandra - cassandra

Is it possible to add new hosts to a Cassandra cluster dynamically?
What I'm trying to do is set up a program that can:
Set up a local version of the database for each user
Each user's machine will become part of the cluster (the machines will be hosts)
Data will be replicated across all the clusters
Building a cluster of multiple hosts usually entails configuring the cassandra.yaml to store the seeds, listen_address and rpc_address of each host.
My idea is to edit these files through java and insert the new host addresses as required but making sure that data is accurate across each users's cassandra.yaml files would be challenging.
I'm wondering if someone has done something similar or has any advice on a better way to achieve this.

Yes is possible. Look at Netflix's Priam for an complete example of a dynamic cassandra cluster management (but designed to work with Amazon EC2).
For rpc_address and listen_address, you can setup a startup script that configures the cassandra.yaml if it's not ok.
For seeds you can configure a custom seed provider. Look at the seed provider used for Netflix's Priam for some ideas how to implement it
The most difficult part will be managing the tokens assigned to each node in a efficient way. Cassandra 1.2 is around the corner and will include a feature called virtual nodes that, IMO, will work well in your case. See the Acunu presentation about it

Related

Cassndra in Production

Anybody supporting a Cassandra application in production? Curious to know about, how you handle cassandra.yaml file. Also, do you think "seed node" get's a status of master node (partially).
Anybody supporting a Cassandra application in production?
Yes, my team supports several applications which use Cassandra in production.
Curious to know about, how you handle cassandra.yaml file.
By "handle" the cassandra.yaml file, I assume you mean deploy with different values with automation at large scale. We use an open source tool called Rundeck for that.
Rundeck allows you to build options into your jobs, which is useful for properties like cluster_name, seeds, etc. Then, you inject those options into your deploy scripts, using a regex replace (sed) to get them into specific properties in the yaml. Ex:
sed -i "s/cluster_name: 'Test Cluster'/cluster_name: '#cluster_name#'/" cassandra.yaml
Also, do you think "seed node" get's a status of master node (partially).
No, a seed node is not any kind of "master" node.
A seed node is no different from any other node.
In theory, every node in your cluster could be a seed node for another node. All it is, is a way for a new node to discover the network topology of the cluster. Think of it as an entry point to the cluster.

Hazelcast Mancenter enable/disable sanpshot

I am trying to configure wan replication using hazelcast mancenter, but I am not getting the option to select snapshot enable/disable feature here as the option is not listed in the dropdown.Is there a way to achieve this through mancenter?
Version 3.9.4
hazelcast version 3.9.3
thanks
You can add a WAN replication configuration dynamically to a cluster. It is for having one-off WAN sync operations, not continuous replication. The added configuration has two caveats:
It is not persistent, so it will not survive a member restart.
It cannot be used as a target for regular WAN replication. It can only be used for WAN sync.
That's why snapshot setting is not there as well. For adding a persistent WAN config, it must be defined in member configurations.

Thingsboard cluster setup

Building a Thingsboard cluster
I need help setting up a Thingsboard cluster, the documentation online is very limited.
The cluster will contain 2 Zookeeper nodes and 4 Thingsboard nodes with Cassandra DB.
Should Zookeeper be installed separately?
A step-by-step guide would be much appreciated!
I cannot provide you detailed step-by-step instructions to setup a ThingsBoard cluster. I can point you into the right direction by sharing the different documents you need to do so.
Bottom line, the following tasks must be completed:
Install and configure a ZooKeeper ensemble.
Check the ZooKeeper documentation for further installation details. Keep in mind that you need at least three different ZK-nodes in a clustered environment and that you always need an odd number of ZK nodes (3,5,7,...). It is a very very very bad idea to build a cluster consisting out of two ZK-nodes, check split brain condition that might appear under these circumstances! Basically you setup the number of individual nodes you wish to use and change the configuration file to enable the different nodes as an ensemble. This is documented quite well in the ZK-docs.
Install and configure a Cassandra cluster.
Again you will setup the number of individual nodes you need for your Cassandra cluster and modify the individual configuration files to convert them into a Cassandra cluster. Check Cassandra documentation for details. Be sure to check proper configuration using the nodetool status command as described at the end of the document. All your nodes should be up and running.
Install and configure a ThingsBoard cluster.
Use the instructions provided with ThingsBoard single node setup.
Install Java
Skip External database installation
ThingsBoard service installation
Configure ThingsBoard to use the external database - Cassandra
Go to Cluster setup and apply the configuration steps depicted (ZK, Cassandra and RPC). Keep in mind to point to ALL members of your ZK, Cassandra cluster. You can also use IP-addresses instead of host names.
Return to single node setup and run the installation script at ONE NODE only!
Start ThingsBoard service
If everything went well, you should be able to access your ThingsBoard nodes directly using the URL http://[NODE_IP]:8080. You can verify proper cluster operation by creating a tenant on one node and check its presence on another node.
I don't know if using an even number of ThingsBoard nodes is a good idea. The documentation does not mention anything about this.
One final remark, you could/should consider putting a proxy in front of your ThingsBoard cluster to provide load balancing to your web clients and improve user experience. This way you shouldn't share the individual host addresses with your users and you will prevent node overloading due to the fact that everybody is using the same web-address to access your dashboard(s). You could also proxy your MQTT broker to provide load balancing as well.
Good luck in setting up your cluster!
Zookeeper needs at least 3 nodes to run in a cluster mode. Each node voting and the valid replica count to gain the QUORUM is 3.

Preventing Cassandra Node from Being Overwhelemed

When in Java, I create a Cassandra cluster builder, I provide a list of multiple Cassandra nodes as shown below:
Cluster cluster = Cluster.builder().addContactPoint(host1, host2, host3, host4).build();
But from what I understand, the connector connects only to the first host in the list that is available, and that host becomes my connection point to the Cassandra cluster.
Now, my question is if my Java application reads/writes huge amount of data from/to Cassandra, then doesn't my Java application overwhelm the node that it is connected to?
Is there a way to configure my connection such that it uses multiple nodes of Cassandra for its reads/writes? What is the common practice?
It uses the contact point to find the rest of the nodes in the cluster, then creates a pool of connections to all the hosts and balances the requests among them. It doesn't only connect to the hosts you provide unless you use the whitelist load balancing policy or a custom one.
If your worried about overwhelming nodes use the RoundRobinLoadBalancingPolicy (DC aware if multiple DCs) and it will distribute it amongst all of them evenly. If you have hot spots of data and use the TokenAware policy you may have it uneven, but you shouldn't need to worry about it.

Set cluster name when using Cassandra CQL/JDBC driver

I'm using the Cassandra CQL/JDBC driver I got from google code but it doesn't seem to let me provide a cluster name - is there a way?
I'm using cluster names to ensure I don't run commands against a live system, it has a different cluster name to my dev systems.
Edit: Just to clarify, I have two totally separate Cassandra clusters, one live and one for test. They have different cluster names to ensure that I don't accidentally run test code meant for the test cluster on the live cluster. Therefore any client I need to use must let me set a cluster name. Hector does this.
There is no inbuilt protection for checking cluster names for Cassandra clients. It is built to ensure nodes from different clusters don't try and join together but not to ensure clients connect to the right cluster. It would be possible to add this checking to a client though (since the cluster name is exposed to the client) but I'm not aware of any clients doing this.
I'd strongly recommend firewalling off your different environments to avoid this kind of mistake. If that isn't possible, you should choose different ports to avoid confusion. Change this with the 'rpc_port' setting in cassandra.yaml.
You'd have to mirror the data on two different clusters. You cant access the same cluster with different names.
To rename your cluster (from the default 'Test Cluster') you edit the cassandra configuration file found in location/of/cassandra/conf/cassandra.yaml. Its the top line, if you need more details look at the datastax configuration documentation and explanation.

Resources