How to add new nodes in Cassandra with different machine configuration - cassandra

I have 6 cassandra nodes in two datacenters with 16GB of memory and 1TB HD drive.
Now I am adding 3 more nodes with 32GB of memory. will these machines will cause overhead for existing machines ( May be in token distribution )? if so please suggest how to configure these machine to avoid those problems.
Thanks in advance.

The "balance" between nodes is best regulated using vnodes. If you recall (if you don't, you should read about it), the ring that Cassandra nodes form is actually consisted out of virtual nodes (vnodes). Each node in the ring has a certain portion of vnodes, which is set up in the Cassandra configuration on each node. Based on that number of vnodes, or rather the proportion between them, the amount of data going to those nodes is calculated. The configuration you are looking for is num_tokens. If you have similarly powerful machines, than an equal vnode number is available. The default is 256.
When adding a new, more powerful machine, you should assign a greater number of vnodes to it. How much? I think it's hard to tell. It's unwise to give it twice more, only be looking at the RAM, since those nodes will have twice as many data than the others. Than you might expect more IO operations on them (remember, you still have the same HDD) and CPU utilization (and the same CPU). You might want to take a look at this answer also.

Related

Does using different hardware specs for different nodes in cassandra cause problems?

We started using Cassandra years ago with four nodes. All the nodes have same instance size, M5.xlarge which is vCPU:4 and Memory:16GB. By time we increased disk space for each node up to 300 GB each.
If we want to add a new node/nodes with different and higher hardware specs such as vCPU 16, memory 64GB (1 TB disk), does it cause problem across the nodes? It may sound silly but IOPS will be different, latency may be lower, even the data may be more up-to-date in higher spec node(maybe it chokes other nodes?). Since the traffic is distributed evenly the way new node handle the I/O will be different than other nodes.
Thank you.
Yes, it will cause problems, especially if you're using consistency level that is higher than LOCAL_ONE or ONE... The latencies when reading or writing data will be different, and even if your new nodes will answer fast, they will need to answers from smaller nodes, etc.
If you're using AWS (or other cloud provider) - why not upgrade the nodes in-place? If you're using EBS, just remount them to bigger nodes, but keep the instance types the same for all nodes.

Cassandra CPU imbalance in Azure

We have a 30+ node Cassandra cluster (3.11.2) in 4 data centers. One of the centers consists of 8 nodes in Azure running on Standard DS12 v2 (4cpu, 28gb) nodes with a 500GB premium SSD drive. All in the same data center (central US).
We are seeing a dramatic CPU imbalance in the node activity when pushed to the max. We have a keyspace with about 200 million records, and we're running a process to check and refresh the records if necessary from another data stream.
What's happening, is we have 4 nodes that are running at 70-90% CPU compared to 15-25% of the other 4. The measurement of the CPU is being done in the nodes themselves, because Azure's own metrics is broken and never represents what is actually happening.
Digging into a pair of nodes (one low CPU and one high) the difference is the iowait% of the two. The data in the keyspace is balanced (within reason - they are all within 5% of another in record count and size). It looks like the number of reads is balanced, and even the read latency as reported by Cassandra is similar.
When I do an iostat compare of the nodes, the high CPU node is reporting a much higher (by 50 to 100%) rKB/s numbers... which is likely leading to the difference in iowait% time.
These nodes are 100% configured the same, running the same version of everything (OS, libraries, everything) that I can think to look. I cannot figure out why some nodes are deciding to do more disk reads that the others, resulting in the cluster as a whole slowing down.
Anybody have any suggestions on where I can look for differences?
The only thing that is a pattern, is the nodes that are slower are the 4 nodes that were added later in our expansion. We started with 4 nodes for a while and added 4 more when we needed space. All the appropriate repairs and other tasks required with node additions were done - the fact that the records and physical size of the data files on disk being equal should attest to that.
When we shut down our refresh process, the all the nodes settle down to a even 5% or less CPU across the board. No compaction or any other maintenance is happening that would indicate something different.
plz help... :)
Our final solution for this - to fix ONLY the unbalanced problem was to cleanup, full repair and compact. At that point the nodes are relatively equally used. We suspect expanding the cluster (adding nodes) may have left elements of data on the older nodes that were not compacted out based on regular compaction events.
We are still working to try to solve the load issue; but now at least all the nodes are feeling the same CPU crunch.

Cassandra with mixed disk sizes?

Can we use Cassandra on nodes with different disk sizes? If so, how does Cassandra balances nodes and do we have any control over it?
I've found this thread but it's quite old http://grokbase.com/t/cassandra/user/113nvs23r4/cassandra-nodes-with-mixed-hard-disk-sizes
Its highly recommended not to introduce imbalance of nodes in a cluster (at least within the same DC) in terms of hard disk, CPU, Memory. All nodes in the cluster are treated equal and there is no intelligence behind the disk capacity on each node.
Unless you can take the pain of manually distributing tokens instead of using vnodes, this is not advisable. In case of manual distribution, one has control over which node to assign more tokens and where less. Again hoping and praying that the data distribution is uniform and hence the node with less tokens will get less data.

Large Write Performance Questions

My company and I have purchased about 80,000$ in hardware to accomplish a goal. We have about 22,000 writes/sec in multiple application database for our Cassandra Cluster. We built 2 x nodes of Dual 3.5Ghz Xeons, 128GB RAM, Areca 1883, all top of the line high throughput. We also have a SSD RAID 10 array for Commitlog/saved_caches so that is not delayed.
The issue we have is the amount of data. In about 4 days we collected 1.8TB of data. We have no intention of ever releasing data. We then got a JBOD enclosure and put 6TB Platter drives in, 10 each, 20 total for about 110TB of space. We run fine with single replication, the issue is when we run to double replication.
We would love to add more nodes, we know that is the correct way, but at 20,000$ a node its costly. My question is, is it true to say if our write speed is the issue, that adding 10 more drives in each machine should allow for double the write speeds?
Does anyone have some of similar things going on and have some tweaks they made to Cassandra.yaml?
We did run htop for a while when we were in double replication, and CPU did seem to get a bit intensive (Read 24% average but it looks pretty close to maxed). RAM is all being used, 128GBs.
ANY thoughts on the matter will be considered and investigated.
Thanks,
Ken
It is not generally true that you can increase write speed simply by increasing disks, unless you are sure that you are IO bound. Cassandra batches writes (mutations go to the commitlog first, then a table in RAM, then are batch written to sstables when that table reaches a certain threshold - linear writes, so it's generally fast, even on spinning disks). At some point, you will max out the commitlog drive, fill the memtable faster than you can flush, or simply get to the point where GC can't keep up.
There are fairly large users of Cassandra who run multiple Cassandra instances on a given server simply to get the benefits of additional nodes without "just" adding disk. By running two JVMs, you can mitigate the pause times of a single node, and still take advantage of your (oversized) hardware. This is easiest if you can assign multiple IPs to your individual servers, but running on different ports also works. This is fairly atypical, and you'll need to pay close attention to your configs to avoid stepping on each other, but it will work, and will make more efficient use of your hardware than simply running huge nodes.
If I'm reading that correctly, you only have 2 nodes total?
If you only have 2 nodes I doubt that disk bandwidth would be the problem. Cassandra is usually CPU limited more than anything else.
Writes generally go to memory, so the disk only comes into play when memtables are flushed to disk as SStables. Now the thing that will probably kill your performance is when those SStables need to be compacted. When compaction starts happening, guess what part of the system that will stress, yup, the CPU.
You will also have a problem running repairs with huge disks like that. Usually I find that sustained transaction throughput is limited by compactions and repairs more than raw write performance.
With two nodes and single replication, you'd be splitting the load between the two nodes, with half going to one and half going to the other. If you set the replication factor to two, now every write would be going to both nodes, which is like going back in time to having a single machine database.
So I think it was a bad call to buy a small number of high end machines. You would have had much better performance with more machines where each machine was less expensive. You need more machines to spread out the load and get more CPUs into the equation.
Also you mention a disk enclosure. I hope you are not trying to use network storage with Cassandra. It needs the disks to be local.

Can I use num_tokens as load factor in Cassandra?

If you run cassandra on machines with different sizes of hard disk (e.g. one with one TB, another with 2TB), can I use num_tokens as load factor?
I want to reduce the risk of one node running out of disk space and balance the usage of disk on different machines.
I know, the more data one node collects, the more probable it might become a hotspot. Apart from that, which other considerations do I need to take care of? Which limits or practical restrictions exist for the number of nodes?
Can I change the number of nodes later if disk space changes without trouble?
I would appreciate some advice on that topic because I have not found much information about that in google or at the website of cassandra.
EDIT: numnodes replaced by num_tokens
Are you referring to num_tokens settings? Yes, you can use a different number of tokens based on the hardware resources. Nodes with a larger number of tokens will see higher load and disk usage. Once set, the num_tokens setting cannot be changed at a later point without decommissioning the node.

Resources