Is it possible to speed up Cassandra cleanup process? - cassandra

I have Cassandra 3.11.1.0 cluster (6 nodes) and cleanup was not done after 2 nodes were joined.
I started nodetool cleanup on first node (192.168.20.197) and cleanup is in progress almost 30 days.
$ nodetool status
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 192.168.20.109 33.47 GiB 256 ? 677dc8b6-eb00-4414-8d15-9f1c79171069 rack1
UN 192.168.20.47 35.41 GiB 256 ? df8c1ee0-fabd-404e-8c55-42531b89d462 rack1
UN 192.168.20.98 20.65 GiB 256 ? 70ce02d7-779b-4b5a-830f-add6ed64bcc2 rack1
UN 192.168.20.21 33.03 GiB 256 ? 40863a80-5f25-464f-aa52-660149bc0070 rack1
UN 192.168.20.197 25.98 GiB 256 ? 5420eae3-e643-49e2-b2d8-703bd5a1f2d4 rack1
UN 192.168.20.151 21.9 GiB 256 ? be7d5df1-3edd-4bc3-8f34-867cb3b8bfca rack1
All nodes which were not cleaned are under load now, (CPU Load ~80-90% ) but new-joined (nodes 192.168.20.98 and 192.168.20.151 ) nodes have CPU Load ~10-20%
It looks like old nodes are loaded because of old data which can be cleaned up.
Each node has 61GB RAM and 8 CPU Cores. HEAP size is 30Gb
So, my questions are
Is it possible to speed up cleaning process?
Is CPU Load related to the old unused (which node is not owns
anymore) data on nodes?

Related

I add nodes(10 nodes) but cassandra-stress result is slower than single node?

Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 192.168.170.128 317.66 MiB 256 62.4% 45e953bd-5cca-44d9-ba26-99e0db28398d rack1
UN 192.168.170.129 527.05 MiB 256 60.2% e0d2faec-9714-49cf-af71-bfe2f2fb0783 rack1
UN 192.168.170.130 669.08 MiB 256 60.6% eaa1e39b-2256-4821-bbc8-39e47debf5e8 rack1
UN 192.168.170.132 537.11 MiB 256 60.0% 126e151f-92bc-4197-8007-247e385be0a6 rack1
UN 192.168.170.133 417.6 MiB 256 56.8% 2eb9dd83-ab44-456c-be69-6cead1b5d1fd rack1
Datacenter: dc2
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 192.168.170.136 386.12 MiB 256 41.0% 2e57fac6-95db-4dc3-88f7-936cd8038cac rack1
UN 192.168.170.137 518.74 MiB 256 40.9% b6d61651-7c65-4ac9-a5b3-053c77cfbd37 rack1
UN 192.168.170.138 554.43 MiB 256 38.6% f1ba3e80-5dac-4a22-9025-85e868685de5 rack1
UN 192.168.170.134 153.76 MiB 256 40.7% 568389b3-304b-4d8f-ae71-58eb2a55601c rack1
UN 192.168.170.135 350.76 MiB 256 38.7% 1a7d557b-8270-4181-957b-98f6e2945fd8 rack1
CREATE KEYSPACE grudb WITH replication = {'class': 'NetworkTopologyStrategy', 'dc1': '3', 'dc2': '2'} AND durable_writes = true;
That's my setting.
CL IS ONE.
In general, a 10-node cluster can sustain a higher throughput, but whether or not this actually translates to higher "cassandra-stress" scores, depends on what exactly you're doing:
First, you need to ensure that the cassandra-stress client is not your bottleneck. For example if the machine running cassandra-stress is at 100% CPU or network utilization, you will never get a better score even if you have 100 server nodes.
Second, you need to ensure that cassandra-stress's concurrency is high enough. In the extreme case, if cassandra-stress sends just one request after another, all you're doing is measuring latency, not throughput. Moreover, it doesn't help if you have 100 nodes if you only send one request at a time to one them. So please try increase cassandra-stress's concurrency to see if that makes any difference.
Now that we got the potential cassandra-stress issues out of the way, let's look at the server. You didn't simply increase your cluster from 1 node to 10 nodes. If you just did that, you'd rightly be surprised if performance didn't increase. But you did something else: You increased to 10 nodes, but also greatly increased the work of writes - in your setup each write needs to go to 5 nodes (!), 3 on one DC and 2 on the other (those are the RFs you configured). So even in the best case, you can't expect write throughput to be more than twice better on this cluster than a single node. Actually, because of all the overhead of this replication, you'll expect even less than twice the performance - so having similar performance is not surprising.
The above estimate was for write performance. For read performance, since you said you're using CL=ONE (you can use CL=LOCAL_ONE, by the way), read throughput should indeed scale linearly with the cluster's size. If it does not, I am guessing you have a problem with the setup like I described above (client bottlenecked or using too little concurrency).
Please try to run read and write benchmarks separately to better understand which of them is the main scalability problem.

Third Cassandra node has different load

We had a cassandra cluster with 2 nodes in the same datacenter with a keyspace replication factor of 2 for keyspace "newts". If i ran nodetool status i could see that the load was somewhat the same between the two nodes and each node sharing 100%.
I went ahead and added a third node and i can see all three nodes in the nodetool status output. I increased the replication factor to three since i now have three nodes and ran "nodetool repair" on the third node. However when i now run nodetool status i can see that the load between the three nodes differ but each node owns 100%. How can this be and is there something im missing here?
nodetool -u cassandra -pw cassandra status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 84.19.159.94 38.6 GiB 256 100.0% 2d597a3e-0120-410a-a7b8-16ccf9498c55 rack1
UN 84.19.159.93 42.51 GiB 256 100.0% f746d694-c5c2-4f51-aa7f-0b788676e677 rack1
UN 84.19.159.92 5.84 GiB 256 100.0% 8f034b7f-fc2d-4210-927f-991815387078 rack1
nodetool status newts output:
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 84.19.159.94 38.85 GiB 256 100.0% 2d597a3e-0120-410a-a7b8-16ccf9498c55 rack1
UN 84.19.159.93 42.75 GiB 256 100.0% f746d694-c5c2-4f51-aa7f-0b788676e677 rack1
UN 84.19.159.92 6.17 GiB 256 100.0% 8f034b7f-fc2d-4210-927f-991815387078 rack1
As you added a node and there are now three nodes and increased your replication factor to three - each node will have a copy of your data and so own 100% of your data.
The different volume for "Load" can result from not running nodetool cleanup after adding your third node on the two old nodes - old data in your sstables won't be removed when adding the node (but later after a cleanup and/or compaction):
Load - updates every 90 seconds The amount of file system data under
the cassandra data directory after excluding all content in the
snapshots subdirectories. Because all SSTable data files are included,
any data that is not cleaned up, such as TTL-expired cell or
tombstoned data) is counted.
(from https://docs.datastax.com/en/cassandra/3.0/cassandra/tools/toolsStatus.html)
You just run nodetool repair on all 3 nodes and run nodetool cleanup one by one on existing nodes then restart the node one after another seems it works.

Cassandra multi-region settings/optimization

I configured two DC with replication in two regions (NCSA and EMEA) using Janusgraph (Gremlin/Cassandra/Elasticsearch). The replication work well and everything, however the performance are not that great.
I get time of around 250ms just for a read on a node on NCSA (vs 30ms when I have only one 1 DC / 1 Node) and for a write it is around 800ms.
I tried to modify some configuration:
storage.cassandra.replication-factor
storage.cassandra.read-consistency-level
storage.cassandra.write-consistency-level
Is there any other settings/configurations that I could modify in order to get better performance for a multi-region setup or that kind of performance is expected with Janusgraph/Cassandra?
Thanks
The lowest time I was able to get were with
storage.replication-strategy-class=org.apache.cassandra.locator.NetworkTopologyStrategy
storage.cassandra.replication-factor=6
storage.cassandra.read-consistency-level=ONE
storage.cassandra.write-consistency-level=ONE
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.130.xxx.xxx 184.02 KB 256 100.0% 7c4c23f4-0112-4023-8af1-81a179f68973 RAC2
UN 10.130.xxx.xxx 540.67 KB 256 100.0% 193f0814-649f-4450-8b2e-85344f2c3cf2 RAC3
UN 10.130.xxx.xxx 187.47 KB 256 100.0% fbbc42d6-a061-4604-935e-dbe1155d4017 RAC1
Datacenter: DC2
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.30.xxx.xxx 93.3 KB 256 100.0% e7221808-ccb4-414a-b5b6-6e578ecb6f25 RAC3
UN 10.30.xxx.xxx 287.62 KB 256 100.0% ca868262-4b5d-44d6-80f9-25439f8d2611 RAC2
UN 10.30.xxx.xxx 282.27 KB 256 100.0% 82d0f75d-635c-4016-84ca-ef9d1afda066 RAC1
Janusgraph comes with different caches levels, activate some of them may help.
About ConsistencyLevel, in a multi-dc configuration LOCAL_xxx values will provide better performances but for safety I will initialize the name of the local or closest Cassandra datacenter. (configuration parameter : storage.cassandra.astyanax.local-datacenter)
Are you able to say where the time is spent (in the Cassandra layer on in the JanusGraph layer)? To know what is the response time of Cassandra, you can run nodetool proxyhistograms which shows the full request latency recorded by the coordinator.

Cassandra data files MUCH larger than expected

I just did an experiment in which I loaded around a dozen csv files, weighing in at around 5.2 GB (compressed). After they are uploaded to Cassandra, they take up 64 GB! (actually around 128 GB but that is due to replication factor being 2).
Frankly I expected Cassandra's data to take up even less than the original 5.2 GB csv because:
1. Cassandra should be able to store data (mostly numbers) in binary format instead of ascii
2. Cassandra should have split a single file into its column constituents and improved compression dramatically
I'm completely new to Cassandra and this was an experiment. It is entirely possible that I misunderstand the product or mis-configured it.
Is it expected that 5.2 GB csvs will end up as 64 GB cassandra files?
EDIT: Additional info:
[cqlsh 5.0.1 | Cassandra 2.1.11 | CQL spec 3.2.1 | Native protocol v3]
[~]$ nodetool status
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN xx.x.xx.xx1 13.17 GB 256 ? HOSTID RAC1
UN xx.x.xx.xx2 14.02 GB 256 ? HOSTID RAC1
UN xx.x.xx.xx3 13.09 GB 256 ? HOSTID RAC1
UN xx.x.xx.xx4 12.32 GB 256 ? HOSTID RAC1
UN xx.x.xx.xx5 12.84 GB 256 ? HOSTID RAC1
UN xx.x.xx.xx6 12.66 GB 256 ? HOSTID RAC1
du -h [director which contains sstables before they are loaded]: 67GB

Completely unbalanced DC after bootstrapping new node

I've just added a new node new into my Cassandra DC. Previously, my topology is as follows:
DC Cassandra: 1 node
DC Solr: 5 nodes
When I bootstrapped a 2nd node for the Cassandra DC, I noticed that the total bytes to be streamed is almost as big as the load of the existing node (916gb to stream; load of existing cassandra node is 956gb). Nevertheless, I allowed the bootstrap to proceed. It completed a few hours ago and now my fear is confirmed: the Cassandra DC is completely unbalanced.
Nodetool status shows the following:
Datacenter: Solr
================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns (effective) Host ID Token Rack
UN solr node4 322.9 GB 40.3% 30f411c3-7419-4786-97ad-395dfc379b40 -8998044611302986942 rack1
UN solr node3 233.16 GB 39.7% c7db42c6-c5ae-439e-ab8d-c04b200fffc5 -9145710677669796544 rack1
UN solr node5 252.42 GB 41.6% 2d3dfa16-a294-48cc-ae3e-d4b99fbc947c -9004172260145053237 rack1
UN solr node2 245.97 GB 40.5% 7dbbcc88-aabc-4cf4-a942-08e1aa325300 -9176431489687825236 rack1
UN solr node1 402.33 GB 38.0% 12976524-b834-473e-9bcc-5f9be74a5d2d -9197342581446818188 rack1
Datacenter: Cassandra
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns (effective) Host ID Token Rack
UN cs node2 705.58 GB 99.4% fa55e0bb-e460-4dc1-ac7a-f71dd00f5380 -9114885310887105386 rack1
UN cs node1 1013.52 GB 0.6% 6ab7062e-47fe-45f7-98e8-3ee8e1f742a4 -3083852333946106000 rack1
Notice the 'Owns' column in the Cassandra DC: node2 owns 99.4% while node1 owns 0.6% (despite node2 having smaller 'Load' than node1). I expect them to own 50% each but this is what I got. I don't know what caused this. What I can remember is that I'm running a full repair in Solr node1 when I started the bootstrap of the new node. The repair is still running as of this moment (I think it actually restarted when the new node finished bootstrapping)
How do I fix this? (repair?)
Is it safe to bulk-load new data while the Cassandra DC is in this state?
Some additional info:
DSE 4.0.3 (Cassandra 2.0.7)
NetworkTopologyStrategy
RF1 in Cassandra DC; RF2 in Solr DC
DC auto-assigned by DSE
Vnodes enabled
Config of new node is modeled after the config of the existing node; so more or less it is correct
EDIT:
Turns out that I can't run cleanup too in cs-node1. I'm getting the following exception:
Exception in thread "main" java.lang.AssertionError: [SSTableReader(path='/home/cassandra/data/my_ks/my_cf/my_ks-my_cf-jb-18509-Data.db'), SSTableReader(path='/home/cassandra/data/my_ks/my_cf/my_ks-my_cf-jb-18512-Data.db'), SSTableReader(path='/home/cassandra/data/my_ks/my_cf/my_ks-my_cf-jb-38320-Data.db'), SSTableReader(path='/home/cassandra/data/my_ks/my_cf/my_ks-my_cf-jb-38325-Data.db'), SSTableReader(path='/home/cassandra/data/my_ks/my_cf/my_ks-my_cf-jb-38329-Data.db'), SSTableReader(path='/home/cassandra/data/my_ks/my_cf/my_ks-my_cf-jb-38322-Data.db'), SSTableReader(path='/home/cassandra/data/my_ks/my_cf/my_ks-my_cf-jb-38330-Data.db'), SSTableReader(path='/home/cassandra/data/my_ks/my_cf/my_ks-my_cf-jb-38331-Data.db'), SSTableReader(path='/home/cassandra/data/my_ks/my_cf/my_ks-my_cf-jb-38321-Data.db'), SSTableReader(path='/home/cassandra/data/my_ks/my_cf/my_ks-my_cf-jb-38323-Data.db'), SSTableReader(path='/home/cassandra/data/my_ks/my_cf/my_ks-my_cf-jb-38344-Data.db'), SSTableReader(path='/home/cassandra/data/my_ks/my_cf/my_ks-my_cf-jb-38345-Data.db'), SSTableReader(path='/home/cassandra/data/my_ks/my_cf/my_ks-my_cf-jb-38349-Data.db'), SSTableReader(path='/home/cassandra/data/my_ks/my_cf/my_ks-my_cf-jb-38348-Data.db'), SSTableReader(path='/home/cassandra/data/my_ks/my_cf/my_ks-my_cf-jb-38346-Data.db'), SSTableReader(path='/home/cassandra/data/my_ks/my_cf/my_ks-my_cf-jb-13913-Data.db'), SSTableReader(path='/home/cassandra/data/my_ks/my_cf/my_ks-my_cf-jb-13915-Data.db'), SSTableReader(path='/home/cassandra/data/my_ks/my_cf/my_ks-my_cf-jb-38389-Data.db'), SSTableReader(path='/home/cassandra/data/my_ks/my_cf/my_ks-my_cf-jb-39845-Data.db'), SSTableReader(path='/home/cassandra/data/my_ks/my_cf/my_ks-my_cf-jb-38390-Data.db')]
at org.apache.cassandra.db.ColumnFamilyStore$13.call(ColumnFamilyStore.java:2115)
at org.apache.cassandra.db.ColumnFamilyStore$13.call(ColumnFamilyStore.java:2112)
at org.apache.cassandra.db.ColumnFamilyStore.runWithCompactionsDisabled(ColumnFamilyStore.java:2094)
at org.apache.cassandra.db.ColumnFamilyStore.markAllCompacting(ColumnFamilyStore.java:2125)
at org.apache.cassandra.db.compaction.CompactionManager.performAllSSTableOperation(CompactionManager.java:214)
at org.apache.cassandra.db.compaction.CompactionManager.performCleanup(CompactionManager.java:265)
at org.apache.cassandra.db.ColumnFamilyStore.forceCleanup(ColumnFamilyStore.java:1105)
at org.apache.cassandra.service.StorageService.forceKeyspaceCleanup(StorageService.java:2220)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75)
at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279)
at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487)
at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97)
at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328)
at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420)
at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848)
at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322)
at sun.rmi.transport.Transport$1.run(Transport.java:177)
at sun.rmi.transport.Transport$1.run(Transport.java:174)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:173)
at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
EDIT:
Nodetool status output (without keyspace)
Note: Ownership information does not include topology; for complete information, specify a keyspace
Datacenter: Solr
================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns Host ID Token Rack
UN solr node4 323.78 GB 17.1% 30f411c3-7419-4786-97ad-395dfc379b40 -8998044611302986942 rack1
UN solr node3 236.69 GB 17.3% c7db42c6-c5ae-439e-ab8d-c04b200fffc5 -9145710677669796544 rack1
UN solr node5 256.06 GB 16.2% 2d3dfa16-a294-48cc-ae3e-d4b99fbc947c -9004172260145053237 rack1
UN solr node2 246.59 GB 18.3% 7dbbcc88-aabc-4cf4-a942-08e1aa325300 -9176431489687825236 rack1
UN solr node1 411.25 GB 13.9% 12976524-b834-473e-9bcc-5f9be74a5d2d -9197342581446818188 rack1
Datacenter: Cassandra
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns Host ID Token Rack
UN cs node2 709.64 GB 17.2% fa55e0bb-e460-4dc1-ac7a-f71dd00f5380 -9114885310887105386 rack1
UN cs node1 1003.71 GB 0.1% 6ab7062e-47fe-45f7-98e8-3ee8e1f742a4 -3083852333946106000 rack1
Cassandra yaml from node1: https://www.dropbox.com/s/ptgzp5lfmdaeq8d/cassandra.yaml (only difference with node2 is listen_address and commitlog_directory)
Regarding CASSANDRA-6774, it's a bit different because I didn't stop a previous cleanup. Although I think I took a wrong route now by starting a scrub (still in-progress) instead of restarting the node first just like their suggested workaround.
UPDATE (2014/04/19):
nodetool cleanup still fails with an assertion error after doing the following:
Full scrub of the keyspace
Full cluster restart
I'm now doing a full repair of the keyspace in cs-node1
UPDATE (2014/04/20):
Any attempt to repair the main keyspace in cs-node1 fails with:
Lost notification. You should check server log for repair status of keyspace
I also saw this just now (output of dsetool ring)
Note: Ownership information does not include topology, please specify a keyspace.
Address DC Rack Workload Status State Load Owns VNodes
solr-node1 Solr rack1 Search Up Normal 447 GB 13.86% 256
solr-node2 Solr rack1 Search Up Normal 267.52 GB 18.30% 256
solr-node3 Solr rack1 Search Up Normal 262.16 GB 17.29% 256
cs-node2 Cassandra rack1 Cassandra Up Normal 808.61 GB 17.21% 256
solr-node5 Solr rack1 Search Up Normal 296.14 GB 16.21% 256
solr-node4 Solr rack1 Search Up Normal 340.53 GB 17.07% 256
cd-node1 Cassandra rack1 Cassandra Up Normal 896.68 GB 0.06% 256
Warning: Node cs-node2 is serving 270.56 times the token space of node cs-node1, which means it will be using 270.56 times more disk space and network bandwidth. If this is unintentional, check out http://wiki.apache.org/cassandra/Operations#Ring_management
Warning: Node solr-node2 is serving 1.32 times the token space of node solr-node1, which means it will be using 1.32 times more disk space and network bandwidth. If this is unintentional, check out http://wiki.apache.org/cassandra/Operations#Ring_management
Keyspace-aware:
Address DC Rack Workload Status State Load Effective-Ownership VNodes
solr-node1 Solr rack1 Search Up Normal 447 GB 38.00% 256
solr-node2 Solr rack1 Search Up Normal 267.52 GB 40.47% 256
solr-node3 Solr rack1 Search Up Normal 262.16 GB 39.66% 256
cs-node2 Cassandra rack1 Cassandra Up Normal 808.61 GB 99.39% 256
solr-node5 Solr rack1 Search Up Normal 296.14 GB 41.59% 256
solr-node4 Solr rack1 Search Up Normal 340.53 GB 40.28% 256
cs-node1 Cassandra rack1 Cassandra Up Normal 896.68 GB 0.61% 256
Warning: Node cd-node2 is serving 162.99 times the token space of node cs-node1, which means it will be using 162.99 times more disk space and network bandwidth. If this is unintentional, check out http://wiki.apache.org/cassandra/Operations#Ring_management
This is a strong indicator that something is wrong with the way cs-node2 bootstrapped (as I described at the beginning of my post).
It looks like your issue is that you most likely switch from single tokens to vnodes on your existing nodes. So all of their tokens are in a row. This is actually not possible to do in current Cassandra versions because it was too hard to get right.
The only real way to fix it and be able to add a new node is to decommission the first new node you added, then follow the current documentation on switching to vnodes from single nodes, which is basically that you need to make brand new data centers with brand new vnodes using nodes in them and then decommission the existing nodes.

Resources