Cassandra data files MUCH larger than expected - cassandra

I just did an experiment in which I loaded around a dozen csv files, weighing in at around 5.2 GB (compressed). After they are uploaded to Cassandra, they take up 64 GB! (actually around 128 GB but that is due to replication factor being 2).
Frankly I expected Cassandra's data to take up even less than the original 5.2 GB csv because:
1. Cassandra should be able to store data (mostly numbers) in binary format instead of ascii
2. Cassandra should have split a single file into its column constituents and improved compression dramatically
I'm completely new to Cassandra and this was an experiment. It is entirely possible that I misunderstand the product or mis-configured it.
Is it expected that 5.2 GB csvs will end up as 64 GB cassandra files?
EDIT: Additional info:
[cqlsh 5.0.1 | Cassandra 2.1.11 | CQL spec 3.2.1 | Native protocol v3]
[~]$ nodetool status
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN xx.x.xx.xx1 13.17 GB 256 ? HOSTID RAC1
UN xx.x.xx.xx2 14.02 GB 256 ? HOSTID RAC1
UN xx.x.xx.xx3 13.09 GB 256 ? HOSTID RAC1
UN xx.x.xx.xx4 12.32 GB 256 ? HOSTID RAC1
UN xx.x.xx.xx5 12.84 GB 256 ? HOSTID RAC1
UN xx.x.xx.xx6 12.66 GB 256 ? HOSTID RAC1
du -h [director which contains sstables before they are loaded]: 67GB

Related

Is it possible to speed up Cassandra cleanup process?

I have Cassandra 3.11.1.0 cluster (6 nodes) and cleanup was not done after 2 nodes were joined.
I started nodetool cleanup on first node (192.168.20.197) and cleanup is in progress almost 30 days.
$ nodetool status
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 192.168.20.109 33.47 GiB 256 ? 677dc8b6-eb00-4414-8d15-9f1c79171069 rack1
UN 192.168.20.47 35.41 GiB 256 ? df8c1ee0-fabd-404e-8c55-42531b89d462 rack1
UN 192.168.20.98 20.65 GiB 256 ? 70ce02d7-779b-4b5a-830f-add6ed64bcc2 rack1
UN 192.168.20.21 33.03 GiB 256 ? 40863a80-5f25-464f-aa52-660149bc0070 rack1
UN 192.168.20.197 25.98 GiB 256 ? 5420eae3-e643-49e2-b2d8-703bd5a1f2d4 rack1
UN 192.168.20.151 21.9 GiB 256 ? be7d5df1-3edd-4bc3-8f34-867cb3b8bfca rack1
All nodes which were not cleaned are under load now, (CPU Load ~80-90% ) but new-joined (nodes 192.168.20.98 and 192.168.20.151 ) nodes have CPU Load ~10-20%
It looks like old nodes are loaded because of old data which can be cleaned up.
Each node has 61GB RAM and 8 CPU Cores. HEAP size is 30Gb
So, my questions are
Is it possible to speed up cleaning process?
Is CPU Load related to the old unused (which node is not owns
anymore) data on nodes?

Third Cassandra node has different load

We had a cassandra cluster with 2 nodes in the same datacenter with a keyspace replication factor of 2 for keyspace "newts". If i ran nodetool status i could see that the load was somewhat the same between the two nodes and each node sharing 100%.
I went ahead and added a third node and i can see all three nodes in the nodetool status output. I increased the replication factor to three since i now have three nodes and ran "nodetool repair" on the third node. However when i now run nodetool status i can see that the load between the three nodes differ but each node owns 100%. How can this be and is there something im missing here?
nodetool -u cassandra -pw cassandra status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 84.19.159.94 38.6 GiB 256 100.0% 2d597a3e-0120-410a-a7b8-16ccf9498c55 rack1
UN 84.19.159.93 42.51 GiB 256 100.0% f746d694-c5c2-4f51-aa7f-0b788676e677 rack1
UN 84.19.159.92 5.84 GiB 256 100.0% 8f034b7f-fc2d-4210-927f-991815387078 rack1
nodetool status newts output:
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 84.19.159.94 38.85 GiB 256 100.0% 2d597a3e-0120-410a-a7b8-16ccf9498c55 rack1
UN 84.19.159.93 42.75 GiB 256 100.0% f746d694-c5c2-4f51-aa7f-0b788676e677 rack1
UN 84.19.159.92 6.17 GiB 256 100.0% 8f034b7f-fc2d-4210-927f-991815387078 rack1
As you added a node and there are now three nodes and increased your replication factor to three - each node will have a copy of your data and so own 100% of your data.
The different volume for "Load" can result from not running nodetool cleanup after adding your third node on the two old nodes - old data in your sstables won't be removed when adding the node (but later after a cleanup and/or compaction):
Load - updates every 90 seconds The amount of file system data under
the cassandra data directory after excluding all content in the
snapshots subdirectories. Because all SSTable data files are included,
any data that is not cleaned up, such as TTL-expired cell or
tombstoned data) is counted.
(from https://docs.datastax.com/en/cassandra/3.0/cassandra/tools/toolsStatus.html)
You just run nodetool repair on all 3 nodes and run nodetool cleanup one by one on existing nodes then restart the node one after another seems it works.

Cassandra multi-region settings/optimization

I configured two DC with replication in two regions (NCSA and EMEA) using Janusgraph (Gremlin/Cassandra/Elasticsearch). The replication work well and everything, however the performance are not that great.
I get time of around 250ms just for a read on a node on NCSA (vs 30ms when I have only one 1 DC / 1 Node) and for a write it is around 800ms.
I tried to modify some configuration:
storage.cassandra.replication-factor
storage.cassandra.read-consistency-level
storage.cassandra.write-consistency-level
Is there any other settings/configurations that I could modify in order to get better performance for a multi-region setup or that kind of performance is expected with Janusgraph/Cassandra?
Thanks
The lowest time I was able to get were with
storage.replication-strategy-class=org.apache.cassandra.locator.NetworkTopologyStrategy
storage.cassandra.replication-factor=6
storage.cassandra.read-consistency-level=ONE
storage.cassandra.write-consistency-level=ONE
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.130.xxx.xxx 184.02 KB 256 100.0% 7c4c23f4-0112-4023-8af1-81a179f68973 RAC2
UN 10.130.xxx.xxx 540.67 KB 256 100.0% 193f0814-649f-4450-8b2e-85344f2c3cf2 RAC3
UN 10.130.xxx.xxx 187.47 KB 256 100.0% fbbc42d6-a061-4604-935e-dbe1155d4017 RAC1
Datacenter: DC2
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.30.xxx.xxx 93.3 KB 256 100.0% e7221808-ccb4-414a-b5b6-6e578ecb6f25 RAC3
UN 10.30.xxx.xxx 287.62 KB 256 100.0% ca868262-4b5d-44d6-80f9-25439f8d2611 RAC2
UN 10.30.xxx.xxx 282.27 KB 256 100.0% 82d0f75d-635c-4016-84ca-ef9d1afda066 RAC1
Janusgraph comes with different caches levels, activate some of them may help.
About ConsistencyLevel, in a multi-dc configuration LOCAL_xxx values will provide better performances but for safety I will initialize the name of the local or closest Cassandra datacenter. (configuration parameter : storage.cassandra.astyanax.local-datacenter)
Are you able to say where the time is spent (in the Cassandra layer on in the JanusGraph layer)? To know what is the response time of Cassandra, you can run nodetool proxyhistograms which shows the full request latency recorded by the coordinator.

Completely unbalanced DC after bootstrapping new node

I've just added a new node new into my Cassandra DC. Previously, my topology is as follows:
DC Cassandra: 1 node
DC Solr: 5 nodes
When I bootstrapped a 2nd node for the Cassandra DC, I noticed that the total bytes to be streamed is almost as big as the load of the existing node (916gb to stream; load of existing cassandra node is 956gb). Nevertheless, I allowed the bootstrap to proceed. It completed a few hours ago and now my fear is confirmed: the Cassandra DC is completely unbalanced.
Nodetool status shows the following:
Datacenter: Solr
================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns (effective) Host ID Token Rack
UN solr node4 322.9 GB 40.3% 30f411c3-7419-4786-97ad-395dfc379b40 -8998044611302986942 rack1
UN solr node3 233.16 GB 39.7% c7db42c6-c5ae-439e-ab8d-c04b200fffc5 -9145710677669796544 rack1
UN solr node5 252.42 GB 41.6% 2d3dfa16-a294-48cc-ae3e-d4b99fbc947c -9004172260145053237 rack1
UN solr node2 245.97 GB 40.5% 7dbbcc88-aabc-4cf4-a942-08e1aa325300 -9176431489687825236 rack1
UN solr node1 402.33 GB 38.0% 12976524-b834-473e-9bcc-5f9be74a5d2d -9197342581446818188 rack1
Datacenter: Cassandra
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns (effective) Host ID Token Rack
UN cs node2 705.58 GB 99.4% fa55e0bb-e460-4dc1-ac7a-f71dd00f5380 -9114885310887105386 rack1
UN cs node1 1013.52 GB 0.6% 6ab7062e-47fe-45f7-98e8-3ee8e1f742a4 -3083852333946106000 rack1
Notice the 'Owns' column in the Cassandra DC: node2 owns 99.4% while node1 owns 0.6% (despite node2 having smaller 'Load' than node1). I expect them to own 50% each but this is what I got. I don't know what caused this. What I can remember is that I'm running a full repair in Solr node1 when I started the bootstrap of the new node. The repair is still running as of this moment (I think it actually restarted when the new node finished bootstrapping)
How do I fix this? (repair?)
Is it safe to bulk-load new data while the Cassandra DC is in this state?
Some additional info:
DSE 4.0.3 (Cassandra 2.0.7)
NetworkTopologyStrategy
RF1 in Cassandra DC; RF2 in Solr DC
DC auto-assigned by DSE
Vnodes enabled
Config of new node is modeled after the config of the existing node; so more or less it is correct
EDIT:
Turns out that I can't run cleanup too in cs-node1. I'm getting the following exception:
Exception in thread "main" java.lang.AssertionError: [SSTableReader(path='/home/cassandra/data/my_ks/my_cf/my_ks-my_cf-jb-18509-Data.db'), SSTableReader(path='/home/cassandra/data/my_ks/my_cf/my_ks-my_cf-jb-18512-Data.db'), SSTableReader(path='/home/cassandra/data/my_ks/my_cf/my_ks-my_cf-jb-38320-Data.db'), SSTableReader(path='/home/cassandra/data/my_ks/my_cf/my_ks-my_cf-jb-38325-Data.db'), SSTableReader(path='/home/cassandra/data/my_ks/my_cf/my_ks-my_cf-jb-38329-Data.db'), SSTableReader(path='/home/cassandra/data/my_ks/my_cf/my_ks-my_cf-jb-38322-Data.db'), SSTableReader(path='/home/cassandra/data/my_ks/my_cf/my_ks-my_cf-jb-38330-Data.db'), SSTableReader(path='/home/cassandra/data/my_ks/my_cf/my_ks-my_cf-jb-38331-Data.db'), SSTableReader(path='/home/cassandra/data/my_ks/my_cf/my_ks-my_cf-jb-38321-Data.db'), SSTableReader(path='/home/cassandra/data/my_ks/my_cf/my_ks-my_cf-jb-38323-Data.db'), SSTableReader(path='/home/cassandra/data/my_ks/my_cf/my_ks-my_cf-jb-38344-Data.db'), SSTableReader(path='/home/cassandra/data/my_ks/my_cf/my_ks-my_cf-jb-38345-Data.db'), SSTableReader(path='/home/cassandra/data/my_ks/my_cf/my_ks-my_cf-jb-38349-Data.db'), SSTableReader(path='/home/cassandra/data/my_ks/my_cf/my_ks-my_cf-jb-38348-Data.db'), SSTableReader(path='/home/cassandra/data/my_ks/my_cf/my_ks-my_cf-jb-38346-Data.db'), SSTableReader(path='/home/cassandra/data/my_ks/my_cf/my_ks-my_cf-jb-13913-Data.db'), SSTableReader(path='/home/cassandra/data/my_ks/my_cf/my_ks-my_cf-jb-13915-Data.db'), SSTableReader(path='/home/cassandra/data/my_ks/my_cf/my_ks-my_cf-jb-38389-Data.db'), SSTableReader(path='/home/cassandra/data/my_ks/my_cf/my_ks-my_cf-jb-39845-Data.db'), SSTableReader(path='/home/cassandra/data/my_ks/my_cf/my_ks-my_cf-jb-38390-Data.db')]
at org.apache.cassandra.db.ColumnFamilyStore$13.call(ColumnFamilyStore.java:2115)
at org.apache.cassandra.db.ColumnFamilyStore$13.call(ColumnFamilyStore.java:2112)
at org.apache.cassandra.db.ColumnFamilyStore.runWithCompactionsDisabled(ColumnFamilyStore.java:2094)
at org.apache.cassandra.db.ColumnFamilyStore.markAllCompacting(ColumnFamilyStore.java:2125)
at org.apache.cassandra.db.compaction.CompactionManager.performAllSSTableOperation(CompactionManager.java:214)
at org.apache.cassandra.db.compaction.CompactionManager.performCleanup(CompactionManager.java:265)
at org.apache.cassandra.db.ColumnFamilyStore.forceCleanup(ColumnFamilyStore.java:1105)
at org.apache.cassandra.service.StorageService.forceKeyspaceCleanup(StorageService.java:2220)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75)
at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279)
at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487)
at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97)
at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328)
at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420)
at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848)
at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322)
at sun.rmi.transport.Transport$1.run(Transport.java:177)
at sun.rmi.transport.Transport$1.run(Transport.java:174)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:173)
at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
EDIT:
Nodetool status output (without keyspace)
Note: Ownership information does not include topology; for complete information, specify a keyspace
Datacenter: Solr
================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns Host ID Token Rack
UN solr node4 323.78 GB 17.1% 30f411c3-7419-4786-97ad-395dfc379b40 -8998044611302986942 rack1
UN solr node3 236.69 GB 17.3% c7db42c6-c5ae-439e-ab8d-c04b200fffc5 -9145710677669796544 rack1
UN solr node5 256.06 GB 16.2% 2d3dfa16-a294-48cc-ae3e-d4b99fbc947c -9004172260145053237 rack1
UN solr node2 246.59 GB 18.3% 7dbbcc88-aabc-4cf4-a942-08e1aa325300 -9176431489687825236 rack1
UN solr node1 411.25 GB 13.9% 12976524-b834-473e-9bcc-5f9be74a5d2d -9197342581446818188 rack1
Datacenter: Cassandra
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns Host ID Token Rack
UN cs node2 709.64 GB 17.2% fa55e0bb-e460-4dc1-ac7a-f71dd00f5380 -9114885310887105386 rack1
UN cs node1 1003.71 GB 0.1% 6ab7062e-47fe-45f7-98e8-3ee8e1f742a4 -3083852333946106000 rack1
Cassandra yaml from node1: https://www.dropbox.com/s/ptgzp5lfmdaeq8d/cassandra.yaml (only difference with node2 is listen_address and commitlog_directory)
Regarding CASSANDRA-6774, it's a bit different because I didn't stop a previous cleanup. Although I think I took a wrong route now by starting a scrub (still in-progress) instead of restarting the node first just like their suggested workaround.
UPDATE (2014/04/19):
nodetool cleanup still fails with an assertion error after doing the following:
Full scrub of the keyspace
Full cluster restart
I'm now doing a full repair of the keyspace in cs-node1
UPDATE (2014/04/20):
Any attempt to repair the main keyspace in cs-node1 fails with:
Lost notification. You should check server log for repair status of keyspace
I also saw this just now (output of dsetool ring)
Note: Ownership information does not include topology, please specify a keyspace.
Address DC Rack Workload Status State Load Owns VNodes
solr-node1 Solr rack1 Search Up Normal 447 GB 13.86% 256
solr-node2 Solr rack1 Search Up Normal 267.52 GB 18.30% 256
solr-node3 Solr rack1 Search Up Normal 262.16 GB 17.29% 256
cs-node2 Cassandra rack1 Cassandra Up Normal 808.61 GB 17.21% 256
solr-node5 Solr rack1 Search Up Normal 296.14 GB 16.21% 256
solr-node4 Solr rack1 Search Up Normal 340.53 GB 17.07% 256
cd-node1 Cassandra rack1 Cassandra Up Normal 896.68 GB 0.06% 256
Warning: Node cs-node2 is serving 270.56 times the token space of node cs-node1, which means it will be using 270.56 times more disk space and network bandwidth. If this is unintentional, check out http://wiki.apache.org/cassandra/Operations#Ring_management
Warning: Node solr-node2 is serving 1.32 times the token space of node solr-node1, which means it will be using 1.32 times more disk space and network bandwidth. If this is unintentional, check out http://wiki.apache.org/cassandra/Operations#Ring_management
Keyspace-aware:
Address DC Rack Workload Status State Load Effective-Ownership VNodes
solr-node1 Solr rack1 Search Up Normal 447 GB 38.00% 256
solr-node2 Solr rack1 Search Up Normal 267.52 GB 40.47% 256
solr-node3 Solr rack1 Search Up Normal 262.16 GB 39.66% 256
cs-node2 Cassandra rack1 Cassandra Up Normal 808.61 GB 99.39% 256
solr-node5 Solr rack1 Search Up Normal 296.14 GB 41.59% 256
solr-node4 Solr rack1 Search Up Normal 340.53 GB 40.28% 256
cs-node1 Cassandra rack1 Cassandra Up Normal 896.68 GB 0.61% 256
Warning: Node cd-node2 is serving 162.99 times the token space of node cs-node1, which means it will be using 162.99 times more disk space and network bandwidth. If this is unintentional, check out http://wiki.apache.org/cassandra/Operations#Ring_management
This is a strong indicator that something is wrong with the way cs-node2 bootstrapped (as I described at the beginning of my post).
It looks like your issue is that you most likely switch from single tokens to vnodes on your existing nodes. So all of their tokens are in a row. This is actually not possible to do in current Cassandra versions because it was too hard to get right.
The only real way to fix it and be able to add a new node is to decommission the first new node you added, then follow the current documentation on switching to vnodes from single nodes, which is basically that you need to make brand new data centers with brand new vnodes using nodes in them and then decommission the existing nodes.

Cassandra: Removing a node

I would like to remove a node from my Cassandra cluster and am following these two related questions (here and here) and the Cassandra document. But I am still not quite sure the exact process.
My first question is: Is the following way to remove a node from a Cassandra cluster correct?
decommission the node that I would like to remove.
removetoken the node that I just decommissioned.
If the above process is right, then how can I tell the decommission process is completed so that I can proceed to the second step? or is it always safe to do step 2 right after step 1?
In addition, Cassandra document says:
You can take a node out of the cluster with nodetool decommission to a
live node, or nodetool removetoken (to any other machine) to remove a
dead one. This will assign the ranges the old node was responsible for
to other nodes, and replicate the appropriate data there. If
decommission is used, the data will stream from the decommissioned
node. If removetoken is used, the data will stream from the remaining
replicas.
No data is removed automatically from the node being decommissioned,
so if you want to put the node back into service at a different token
on the ring, it should be removed manually.
Does this mean a decommissioned node is a dead node? In addition, as no data is removed automatically from the node being decommissioned, how can I tell when it is safe to remove the data from the decommissioned node (i.e., how to know when the data-streaming is completed?)
Removing a node from a Cassandra cluster should be the following steps (in Cassandra v1.2.8):
Decommission the target node by nodetool decommission.
Once the data streaming from the decommissioned node is completed, manually delete the data in the decommissioned node (optional).
From the docs:
nodetool decommission - Decommission the *node I am connecting to*
Update: The above process also works for seed nodes. In such case, the cluster is still able to run smoothly without requiring an restart. When you need to restart the cluster for other reasons, be sure to update the seeds parameter specified in the cassandra.yaml for all nodes.
Decommission the target node
When decommission starts, the decommissioned node will be first labeled as leaving (marked as L). In the following example, we will remove node-76:
> nodetool -host node-76 decommission
> nodetool status
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN node-70 9.79 GB 256 8.3% e0a7fb7a-06f8-4f8b-882d-c60bff51328a 155
UN node-80 8.9 GB 256 9.2% 43dfc22e-b838-4b0b-9b20-66a048f73d5f 155
UN node-72 9.47 GB 256 9.2% 75ebf2a9-e83c-4206-9814-3685e5fa0ab5 155
UN node-71 9.48 GB 256 9.5% cdbfafef-4bfb-4b11-9fb8-27757b0caa47 155
UN node-91 8.05 GB 256 8.4% 6711f8a7-d398-4f93-bd73-47c8325746c3 155
UN node-78 9.11 GB 256 9.4% c82ace5f-9b90-4f5c-9d86-0fbfb7ac2911 155
UL node-76 8.36 GB 256 9.5% 15d74e9e-2791-4056-a341-c02f6614b8ae 155
UN node-73 9.36 GB 256 8.9% c1dfab95-d476-4274-acac-cf6630375566 155
UN node-75 8.93 GB 256 8.2% 8789d89d-2db8-4ddf-bc2d-60ba5edfd0ad 155
UN node-74 8.91 GB 256 9.6% 581fd5bc-20d2-4528-b15d-7475eb2bf5af 155
UN node-79 9.71 GB 256 9.9% 8e192e01-e8eb-4425-9c18-60279b9046ff 155
When a decommissioned node is marked as leaving, it is streaming data to the other living nodes. Once the streaming is completed, the node will not be observed from the ring structure, and the data owned by the other nodes will increase:
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN node-70 9.79 GB 256 9.3% e0a7fb7a-06f8-4f8b-882d-c60bff51328a 155
UN node-80 8.92 GB 256 9.6% 43dfc22e-b838-4b0b-9b20-66a048f73d5f 155
UN node-72 9.47 GB 256 10.2% 75ebf2a9-e83c-4206-9814-3685e5fa0ab5 155
UN node-71 9.69 GB 256 10.6% cdbfafef-4bfb-4b11-9fb8-27757b0caa47 155
UN node-91 8.05 GB 256 9.1% 6711f8a7-d398-4f93-bd73-47c8325746c3 155
UN node-78 9.11 GB 256 10.5% c82ace5f-9b90-4f5c-9d86-0fbfb7ac2911 155
UN node-73 9.36 GB 256 9.7% c1dfab95-d476-4274-acac-cf6630375566 155
UN node-75 9.01 GB 256 9.5% 8789d89d-2db8-4ddf-bc2d-60ba5edfd0ad 155
UN node-74 8.91 GB 256 10.5% 581fd5bc-20d2-4528-b15d-7475eb2bf5af 155
UN node-79 9.71 GB 256 11.0% 8e192e01-e8eb-4425-9c18-60279b9046ff 155
Removing the remaining data manually
Once the streaming is completed, the data stored in the decommissioned node can be removed manually as described in the Cassandra document:
No data is removed automatically from the node being decommissioned,
so if you want to put the node back into service at a different token
on the ring, it should be removed manually.
This can be done by removing the data stored in the data_file_directories, commitlog_directory, and saved_caches_directory specified in the cassandra.yaml file in the decommissioned node.

Resources