Thingsboard cluster setup - cassandra

Building a Thingsboard cluster
I need help setting up a Thingsboard cluster, the documentation online is very limited.
The cluster will contain 2 Zookeeper nodes and 4 Thingsboard nodes with Cassandra DB.
Should Zookeeper be installed separately?
A step-by-step guide would be much appreciated!

I cannot provide you detailed step-by-step instructions to setup a ThingsBoard cluster. I can point you into the right direction by sharing the different documents you need to do so.
Bottom line, the following tasks must be completed:
Install and configure a ZooKeeper ensemble.
Check the ZooKeeper documentation for further installation details. Keep in mind that you need at least three different ZK-nodes in a clustered environment and that you always need an odd number of ZK nodes (3,5,7,...). It is a very very very bad idea to build a cluster consisting out of two ZK-nodes, check split brain condition that might appear under these circumstances! Basically you setup the number of individual nodes you wish to use and change the configuration file to enable the different nodes as an ensemble. This is documented quite well in the ZK-docs.
Install and configure a Cassandra cluster.
Again you will setup the number of individual nodes you need for your Cassandra cluster and modify the individual configuration files to convert them into a Cassandra cluster. Check Cassandra documentation for details. Be sure to check proper configuration using the nodetool status command as described at the end of the document. All your nodes should be up and running.
Install and configure a ThingsBoard cluster.
Use the instructions provided with ThingsBoard single node setup.
Install Java
Skip External database installation
ThingsBoard service installation
Configure ThingsBoard to use the external database - Cassandra
Go to Cluster setup and apply the configuration steps depicted (ZK, Cassandra and RPC). Keep in mind to point to ALL members of your ZK, Cassandra cluster. You can also use IP-addresses instead of host names.
Return to single node setup and run the installation script at ONE NODE only!
Start ThingsBoard service
If everything went well, you should be able to access your ThingsBoard nodes directly using the URL http://[NODE_IP]:8080. You can verify proper cluster operation by creating a tenant on one node and check its presence on another node.
I don't know if using an even number of ThingsBoard nodes is a good idea. The documentation does not mention anything about this.
One final remark, you could/should consider putting a proxy in front of your ThingsBoard cluster to provide load balancing to your web clients and improve user experience. This way you shouldn't share the individual host addresses with your users and you will prevent node overloading due to the fact that everybody is using the same web-address to access your dashboard(s). You could also proxy your MQTT broker to provide load balancing as well.
Good luck in setting up your cluster!

Zookeeper needs at least 3 nodes to run in a cluster mode. Each node voting and the valid replica count to gain the QUORUM is 3.

Related

Does "spring data cassandra" have client side loadbalancing?

I'm operating project using spring-boot, spring-data-cassandra.
When I setup that project, I set cassandra properties by ip and port.
(referred by https://www.baeldung.com/spring-data-cassandra-tutorial)
When set it up like this, If I had 3 cassandra nodes and 1 cassandra node died, I think project should fail to connect with cassandra at a 33% probability.
But my project was fine even though 1 cassandra node was dead. (just have some error on one's deathbed)
Do It happen to have A function in spring-data-cassandra like client-side-loadbalancing?
If they have that function, Where can I see that code??
I tried to find that code but failed.
Please give me a little clue.
Spring Data Cassandra relies on the functionality of the DataStax Java driver that is responsible for making everything works. This includes:
establishing the initial connection to the cluster. This is where the contact points play their role. After driver is connected to any of points, it reads information about the whole cluster and establishes connections to all nodes (by default)
establishing the control connection that is used to receive notifications about changes in the cluster - nodes going up & down, changes in schema, etc. If node goes down or up, this information is used to modify the list of the active nodes
providing the load balancing of requests based on the replication, and nodes availability - if the node is down, it's excluded from list of candidates, so we don't send queries to node that is known to be down

Cassndra in Production

Anybody supporting a Cassandra application in production? Curious to know about, how you handle cassandra.yaml file. Also, do you think "seed node" get's a status of master node (partially).
Anybody supporting a Cassandra application in production?
Yes, my team supports several applications which use Cassandra in production.
Curious to know about, how you handle cassandra.yaml file.
By "handle" the cassandra.yaml file, I assume you mean deploy with different values with automation at large scale. We use an open source tool called Rundeck for that.
Rundeck allows you to build options into your jobs, which is useful for properties like cluster_name, seeds, etc. Then, you inject those options into your deploy scripts, using a regex replace (sed) to get them into specific properties in the yaml. Ex:
sed -i "s/cluster_name: 'Test Cluster'/cluster_name: '#cluster_name#'/" cassandra.yaml
Also, do you think "seed node" get's a status of master node (partially).
No, a seed node is not any kind of "master" node.
A seed node is no different from any other node.
In theory, every node in your cluster could be a seed node for another node. All it is, is a way for a new node to discover the network topology of the cluster. Think of it as an entry point to the cluster.

Is there a way to remove a rogue node from our hazelcast cluster?

We are currently running a hazelcast cluster using it to communicate information on a queue to be picked up by a single node in the cluster. We are vulnerable however to a "rogue" node that joins the cluster but without the right version of software to handle the request in a way that's proper.
Is there a way proactively remove rogue nodes of this nature in a way that prevents them from actively re-joining the cluster? I haven't been able to see a way from the documentation.
It looks like you are using default hazelcast xml. You better need to have a custom hazelcast xml with updated Group credentials.

Cassandra 2+ HPC Deployment

I am trying to deploy Cassandra on a Linux Based HPC cluster and I need some guidelines if possible. Specifically, what is the difference between running Cassandra locally and in cluster.
When managing locally (in which case it runs smoothly) we duplicate the original files for every node inside our Cassandra directory and we apply the appropriate changes for IP address, rcp, JMX etc... however, when managing a network which files do we need to install in each node. The whole package with all the files or just some of the required ones
like, bin/cassandra.in.sh, conf/cassandra.yaml, bin/cassandra.
I am a little bit confused on what to store in each node separately so to start working on the cluster.
You need to install Cassandra on each node (VM), i.e. the whole package and then update config files as neccessary. As described here to configure cluster in a single data center you need:
Install Cassandra on each node
Configure cluster name
Configure seeds
Configure snitch, if needed

Set cluster name when using Cassandra CQL/JDBC driver

I'm using the Cassandra CQL/JDBC driver I got from google code but it doesn't seem to let me provide a cluster name - is there a way?
I'm using cluster names to ensure I don't run commands against a live system, it has a different cluster name to my dev systems.
Edit: Just to clarify, I have two totally separate Cassandra clusters, one live and one for test. They have different cluster names to ensure that I don't accidentally run test code meant for the test cluster on the live cluster. Therefore any client I need to use must let me set a cluster name. Hector does this.
There is no inbuilt protection for checking cluster names for Cassandra clients. It is built to ensure nodes from different clusters don't try and join together but not to ensure clients connect to the right cluster. It would be possible to add this checking to a client though (since the cluster name is exposed to the client) but I'm not aware of any clients doing this.
I'd strongly recommend firewalling off your different environments to avoid this kind of mistake. If that isn't possible, you should choose different ports to avoid confusion. Change this with the 'rpc_port' setting in cassandra.yaml.
You'd have to mirror the data on two different clusters. You cant access the same cluster with different names.
To rename your cluster (from the default 'Test Cluster') you edit the cassandra configuration file found in location/of/cassandra/conf/cassandra.yaml. Its the top line, if you need more details look at the datastax configuration documentation and explanation.

Resources