Cassndra in Production - cassandra

Anybody supporting a Cassandra application in production? Curious to know about, how you handle cassandra.yaml file. Also, do you think "seed node" get's a status of master node (partially).

Anybody supporting a Cassandra application in production?
Yes, my team supports several applications which use Cassandra in production.
Curious to know about, how you handle cassandra.yaml file.
By "handle" the cassandra.yaml file, I assume you mean deploy with different values with automation at large scale. We use an open source tool called Rundeck for that.
Rundeck allows you to build options into your jobs, which is useful for properties like cluster_name, seeds, etc. Then, you inject those options into your deploy scripts, using a regex replace (sed) to get them into specific properties in the yaml. Ex:
sed -i "s/cluster_name: 'Test Cluster'/cluster_name: '#cluster_name#'/" cassandra.yaml
Also, do you think "seed node" get's a status of master node (partially).
No, a seed node is not any kind of "master" node.
A seed node is no different from any other node.
In theory, every node in your cluster could be a seed node for another node. All it is, is a way for a new node to discover the network topology of the cluster. Think of it as an entry point to the cluster.

Related

How to sync configuration between hadoop worker machines

We have huge hadoop cluster and we installed one coordinator preso node
and 850 presto workers nodes. now we want to change the values in the file - config.properties but this should be done on all the workers!
so under
/opt/DBtasks/presto/presto-server-0.216/etc
the file is like this
[root#worker01 etc]# more config.properties
#
coordinator=false
http-server.http.port=8008
query.max-memory=50GB
query.max-memory-per-node=1GB
query.max-total-memory-per-node=2GB
discovery.uri=http://master01.sys76.com:8008
and we want to change it to
coordinator=false
http-server.http.port=8008
query.max-memory=500GB
query.max-memory-per-node=5GB
query.max-total-memory-per-node=20GB
discovery.uri=http://master01.sys76.com:8008
but this was done only on the first node - worker01, but we need to do it also on all workers. well - we can copy this file by scp to all other workers , but not in case root is restricted but what I want to know , if presto already think about more elegant approach that sync the configuration on all workers node as all know after we set new values we need also to restart the presto louncer script
dose presto have solution to this ?
I must to tell that my cluster is restricted root , so we cant copy the files VIA ssh
Presto does not have the ability to sync the configurations. This is something you would need to manage outside e.g. using a tool like Ansible. There is also project command line tool presto-admin (https://github.com/prestosql/presto-admin) that can assist with deploying the configs across the cluster.
Additionally, if you are using public clouds such as AWS, there are commercial solutions from Starburst (https://www.starburstdata.com/) that can assist management of the configurations as well.

Thingsboard cluster setup

Building a Thingsboard cluster
I need help setting up a Thingsboard cluster, the documentation online is very limited.
The cluster will contain 2 Zookeeper nodes and 4 Thingsboard nodes with Cassandra DB.
Should Zookeeper be installed separately?
A step-by-step guide would be much appreciated!
I cannot provide you detailed step-by-step instructions to setup a ThingsBoard cluster. I can point you into the right direction by sharing the different documents you need to do so.
Bottom line, the following tasks must be completed:
Install and configure a ZooKeeper ensemble.
Check the ZooKeeper documentation for further installation details. Keep in mind that you need at least three different ZK-nodes in a clustered environment and that you always need an odd number of ZK nodes (3,5,7,...). It is a very very very bad idea to build a cluster consisting out of two ZK-nodes, check split brain condition that might appear under these circumstances! Basically you setup the number of individual nodes you wish to use and change the configuration file to enable the different nodes as an ensemble. This is documented quite well in the ZK-docs.
Install and configure a Cassandra cluster.
Again you will setup the number of individual nodes you need for your Cassandra cluster and modify the individual configuration files to convert them into a Cassandra cluster. Check Cassandra documentation for details. Be sure to check proper configuration using the nodetool status command as described at the end of the document. All your nodes should be up and running.
Install and configure a ThingsBoard cluster.
Use the instructions provided with ThingsBoard single node setup.
Install Java
Skip External database installation
ThingsBoard service installation
Configure ThingsBoard to use the external database - Cassandra
Go to Cluster setup and apply the configuration steps depicted (ZK, Cassandra and RPC). Keep in mind to point to ALL members of your ZK, Cassandra cluster. You can also use IP-addresses instead of host names.
Return to single node setup and run the installation script at ONE NODE only!
Start ThingsBoard service
If everything went well, you should be able to access your ThingsBoard nodes directly using the URL http://[NODE_IP]:8080. You can verify proper cluster operation by creating a tenant on one node and check its presence on another node.
I don't know if using an even number of ThingsBoard nodes is a good idea. The documentation does not mention anything about this.
One final remark, you could/should consider putting a proxy in front of your ThingsBoard cluster to provide load balancing to your web clients and improve user experience. This way you shouldn't share the individual host addresses with your users and you will prevent node overloading due to the fact that everybody is using the same web-address to access your dashboard(s). You could also proxy your MQTT broker to provide load balancing as well.
Good luck in setting up your cluster!
Zookeeper needs at least 3 nodes to run in a cluster mode. Each node voting and the valid replica count to gain the QUORUM is 3.

How to update configuration of a Cassandra cluster

I have a 3 node Cassandra cluster and I want to make some adjustments to the cassandra.yaml
My question is, how should I perform this? One node at a time or is there a way to make it happen without shutting down nodes?
Btw, I am using Cassandra 2.2 and this is a production cluster.
There are multiple approaches here:
If you edit the cassandra.yaml file, you need to restart cassandra to re-read the contents of that file. If you restart all nodes at once, your cluster will be unavailable. Restarting one node at a time is almost always safe (provided you have sane replication-factors and consistency-levels). If your cluster is configured to survive a rack or datacenter outage, then you can safely restart more nodes concurrently.
Many settings can be changed without a restart via JMX, though I don't have a documentation link handy. Changing via JMX WON'T change cassandra.yml though, so you'll need to update that also or your config will revert back to what's in the file when the node restarts.
If you're using DSE, OpsCenter's Lifecycle Manager feature makes updating configs a simple point-and-click affair (disclaimer, I'm biased as I'm an LCM dev).

Set cluster name when using Cassandra CQL/JDBC driver

I'm using the Cassandra CQL/JDBC driver I got from google code but it doesn't seem to let me provide a cluster name - is there a way?
I'm using cluster names to ensure I don't run commands against a live system, it has a different cluster name to my dev systems.
Edit: Just to clarify, I have two totally separate Cassandra clusters, one live and one for test. They have different cluster names to ensure that I don't accidentally run test code meant for the test cluster on the live cluster. Therefore any client I need to use must let me set a cluster name. Hector does this.
There is no inbuilt protection for checking cluster names for Cassandra clients. It is built to ensure nodes from different clusters don't try and join together but not to ensure clients connect to the right cluster. It would be possible to add this checking to a client though (since the cluster name is exposed to the client) but I'm not aware of any clients doing this.
I'd strongly recommend firewalling off your different environments to avoid this kind of mistake. If that isn't possible, you should choose different ports to avoid confusion. Change this with the 'rpc_port' setting in cassandra.yaml.
You'd have to mirror the data on two different clusters. You cant access the same cluster with different names.
To rename your cluster (from the default 'Test Cluster') you edit the cassandra configuration file found in location/of/cassandra/conf/cassandra.yaml. Its the top line, if you need more details look at the datastax configuration documentation and explanation.

Dynamically adding new nodes in Cassandra

Is it possible to add new hosts to a Cassandra cluster dynamically?
What I'm trying to do is set up a program that can:
Set up a local version of the database for each user
Each user's machine will become part of the cluster (the machines will be hosts)
Data will be replicated across all the clusters
Building a cluster of multiple hosts usually entails configuring the cassandra.yaml to store the seeds, listen_address and rpc_address of each host.
My idea is to edit these files through java and insert the new host addresses as required but making sure that data is accurate across each users's cassandra.yaml files would be challenging.
I'm wondering if someone has done something similar or has any advice on a better way to achieve this.
Yes is possible. Look at Netflix's Priam for an complete example of a dynamic cassandra cluster management (but designed to work with Amazon EC2).
For rpc_address and listen_address, you can setup a startup script that configures the cassandra.yaml if it's not ok.
For seeds you can configure a custom seed provider. Look at the seed provider used for Netflix's Priam for some ideas how to implement it
The most difficult part will be managing the tokens assigned to each node in a efficient way. Cassandra 1.2 is around the corner and will include a feature called virtual nodes that, IMO, will work well in your case. See the Acunu presentation about it

Resources