Cassandra version: 3.11
I already enabled cdc in cassandra.yaml:
cdc_enabled: true
cdc_raw_directory: /var/lib/cassandra/data/cdc_raw
And enabled the table as well:
cqlsh> describe cycling.cyclist_name;
CREATE TABLE cycling.cyclist_name (
id uuid PRIMARY KEY,
firstname text,
lastname text
) WITH bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND cdc = true <<<<<<<<<
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
After restart, Cassandra created the cdc_raw directory:
root#docker-desktop:/var/lib/cassandra# ls -la /var/lib/cassandra/data/
total 36
drwxr-xr-x 9 cassandra cassandra 4096 Jan 28 10:04 .
drwxrwxrwx 6 cassandra cassandra 4096 Jan 28 09:48 ..
drwxr-xr-x 2 cassandra cassandra 4096 Jan 28 09:48 cdc_raw
drwxr-xr-x 3 cassandra cassandra 4096 Jan 28 10:04 cycling
drwxr-xr-x 26 cassandra cassandra 4096 Jan 28 09:48 system
drwxr-xr-x 6 cassandra cassandra 4096 Jan 28 09:48 system_auth
drwxr-xr-x 5 cassandra cassandra 4096 Jan 28 09:48 system_distributed
drwxr-xr-x 12 cassandra cassandra 4096 Jan 28 09:48 system_schema
drwxr-xr-x 4 cassandra cassandra 4096 Jan 28 09:48 system_traces
I executed a little Python script to insert some data in the fresh table:
>>> for lp in range(50000):
... session.execute("INSERT INTO cycling.cyclist_name (lastname, firstname, id) VALUES (%s, %s, %s)", ["RATTO_BULK", "Rissella", uuid.uuid4()])
...
But even after this 50,000 insert script, the cdc_raw diretory still empty. Can someone explain how CDC works in Cassandra?
root#docker-desktop:/var/lib/cassandra# ls -la /var/lib/cassandra/data/cdc_raw/
total 8
drwxr-xr-x 2 cassandra cassandra 4096 Jan 28 09:48 .
drwxr-xr-x 9 cassandra cassandra 4096 Jan 28 10:04 ..
In Cassandra 3.11 commitlog segments are copied into cdc_raw directory when data in memtable is flushed to disk for some reason - reaching memtable limit, commit log limit, or by nodetool flush. 50k writes may not be enough to trigger this situation.
In Cassandra 4.0 (not yet released) the situation is slightly improved, and you can read data faster. If you're interested in details, you can look into presentations made at DataStax Accelerate 2019 - there were 2 or 3 presentations about CDC.
Related
I have a WiFi adapter plugged into my computer, I can find it's ID numbers by looking through the output of lsusb.
Bus 005 Device 009: ID 1737:0071 Linksys WUSB600N v1 Dual-Band Wireless-N Network Adapter [Ralink RT2870]
This is the only wiFi adapter I have plugged in currently, so this obviously is it. I searched around in /sys/bus/usb/devices/ until I found this path on my machine
# ls -l /sys/bus/usb/devices/5-3.1:1.0/
total 0
-rw-r--r-- 1 root root 4096 Dec 28 18:11 authorized
-r--r--r-- 1 root root 4096 Dec 28 18:11 bAlternateSetting
-r--r--r-- 1 root root 4096 Dec 28 18:06 bInterfaceClass
-r--r--r-- 1 root root 4096 Dec 28 18:06 bInterfaceNumber
-r--r--r-- 1 root root 4096 Dec 28 18:06 bInterfaceProtocol
-r--r--r-- 1 root root 4096 Dec 28 18:06 bInterfaceSubClass
-r--r--r-- 1 root root 4096 Dec 28 18:11 bNumEndpoints
lrwxrwxrwx 1 root root 0 Dec 28 18:06 driver -> ../../../../../../../../bus/usb/drivers/rt2800usb
drwxr-xr-x 3 root root 0 Dec 28 18:11 ep_01
drwxr-xr-x 3 root root 0 Dec 28 18:11 ep_02
drwxr-xr-x 3 root root 0 Dec 28 18:11 ep_03
drwxr-xr-x 3 root root 0 Dec 28 18:11 ep_04
drwxr-xr-x 3 root root 0 Dec 28 18:11 ep_05
drwxr-xr-x 3 root root 0 Dec 28 18:11 ep_06
drwxr-xr-x 3 root root 0 Dec 28 18:11 ep_81
drwxr-xr-x 3 root root 0 Dec 28 18:06 ieee80211
drwxr-xr-x 5 root root 0 Dec 28 18:06 leds
-r--r--r-- 1 root root 4096 Dec 28 18:11 modalias
drwxr-xr-x 3 root root 0 Dec 28 18:06 net
drwxr-xr-x 2 root root 0 Dec 28 18:11 power
lrwxrwxrwx 1 root root 0 Dec 28 18:06 subsystem -> ../../../../../../../../bus/usb
-r--r--r-- 1 root root 4096 Dec 28 18:11 supports_autosuspend
-rw-r--r-- 1 root root 4096 Dec 28 18:06 uevent
By looking at the driver symbolic link I see this is using the rt2800usb driver. So this has to be the correct entry for my WiFi adapter. But identifying based off kernel driver name is inexact and I would prefer not do it that way. Is there a file under /sys/bus/usb/devices/5-3.1:1.0/ that can tell me the vendor ID and the product ID of the entry I am looking at?
If I do a ls -h, I get a total of 126 Gb, whereas du -h is reporting half of it: 63 Gb.
It's a directory with 24 files. If I count all the individual filesizes I have a total of 126 Gb. There are no symbolic links.
What's causing the difference?
ls -alh
total 126G
drwxrwxrwx 3 root root 4.0K Dec 11 12:48 .
drwxrwxrwx 3 root root 4.0K May 19 2008 ..
-rw-rw-rw- 1 root root 0 Dec 11 10:28 auto-opschoning.errtmp
-rw-rw-rw- 1 root root 11M Dec 11 12:33 auto-opschoning.logtmp
drwxrwxrwx 2 root root 4.0K Feb 19 2016 backup
-rw-rw-rw- 2 root root 9.7M Dec 11 12:48 batchkop
-rw-rw-rw- 2 root root 9.7M Dec 11 12:48 batchkop.his
-rw-rw-rw- 2 root root 9.2G Dec 11 12:48 dispudet
-rw-rw-rw- 2 root root 9.2G Dec 11 12:48 dispudet.his
-rw-rw-rw- 2 root root 1.2G Dec 11 12:48 dispukop
-rw-rw-rw- 2 root root 1.2G Dec 11 12:48 dispukop.his
-rw-rw-rw- 2 root root 765M Dec 11 12:48 loktrail
-rw-rw-rw- 2 root root 765M Dec 11 12:48 loktrail.his
-rw-rw-rw- 2 root root 19G Dec 11 12:48 orddet
-rw-rw-rw- 2 root root 19G Dec 11 12:48 orddet.his
-rw-rw-rw- 2 root root 4.1G Dec 11 12:48 orddetkl
-rw-rw-rw- 2 root root 4.1G Dec 11 12:48 orddetkl.his
-rw-rw-rw- 2 root root 977M Dec 11 12:48 ordkop
-rw-rw-rw- 2 root root 977M Dec 11 12:48 ordkop.his
-rw-rw-rw- 2 root root 12G Dec 11 12:48 trail
-rw-rw-rw- 2 root root 12G Dec 11 12:48 trail.his
-rw-rw-rw- 2 root root 5.7G Dec 11 12:48 verzdud
-rw-rw-rw- 2 root root 7.4G Dec 11 12:48 verzdudd
-rw-rw-rw- 2 root root 7.4G Dec 11 12:48 verzdudd.his
-rw-rw-rw- 2 root root 5.7G Dec 11 12:48 verzdud.his
-rw-rw-rw- 2 root root 251M Dec 11 12:48 verzduk
-rw-rw-rw- 2 root root 251M Dec 11 12:48 verzduk.his
-rw-rw-rw- 2 root root 3.5G Dec 11 12:48 voorsnap
-rw-rw-rw- 2 root root 3.5G Dec 11 12:48 voorsnap.his
du -h
4.0K ./backup
63G .
I think the difference here is related to the files that you are trying to get their space.
some files are called sparse files.
sparse files are files that their space is not fully physically allocated (they are virtually allocated not physically)
they are used a lot as virtual machine storage files and some data-structures need them .
you can use dd to create a sparse file and test with it
check this example i just did
h#localhost:~$ mkdir test
h#localhost:~$ cd test/
h#localhost:~/test$ dd if=/dev/zero of=file.img bs=1 count=0 seek=512M
0+0 records in
0+0 records out
0 bytes (0 B) copied, 0.000214033 s, 0.0 kB/s
h#localhost:~/test$ ls -h
file.img
h#localhost:~/test$ ls -alh
total 8.0K
drwxr-xr-x 2 h h 4.0K Dec 16 14:04 .
drwxr-xr-x 3 h h 4.0K Dec 16 14:02 ..
-rw-r--r-- 1 h h 512M Dec 16 14:04 file.img
h#localhost:~/test$ du -c
4 .
4 total
h#localhost:~/test$
and as the link that was posted in comments says the diffrence between ls -h and du -c is that du -c will get the actual used space not the virtually allocated space while ls -h will give the virtual allocated space
I have cassandra 3.7 installed in container and managed by kubernetes
I created a keyspace cathy1 with replication factor 3
Inside the cassandra container on node1, I have created a keyspace cathy1 as following:
CREATE KEYSPACE cathy1 WITH replication = {'class':'SimpleStrategy', 'replication_factor' : 3};
CREATE TABLE cathy1.employees(emp_id int PRIMARY KEY,emp_name text);
INSERT INTO cathy1.employees(emp_id,emp_name) VALUES (1,'cathy');
INSERT INTO cathy1.employees(emp_id,emp_name) VALUES (2,'jon');
so each node owns 100% of the data
I run a cqlsh -f list_tables on each node:
emp_id | emp_name
--------+----------
1 | cathy
2 | jon
(2 rows)
I run on node 2 :
nodetool snapshot -t mycathy1-node2 cathy1
I see a directory mycathy1-node2 under cassandra/data/cathy1/employees* /snapshots containing this:
-rw-r--r-- 1 root root 32 Oct 18 20:27 manifest.json
-rw-r--r-- 2 root root 43 Oct 18 20:22 mb-12-big-CompressionInfo.db
-rw-r--r-- 2 root root 96 Oct 18 20:22 mb-12-big-Data.db
-rw-r--r-- 2 root root 9 Oct 18 20:22 mb-12-big-Digest.crc32
-rw-r--r-- 2 root root 16 Oct 18 20:22 mb-12-big-Filter.db
-rw-r--r-- 2 root root 32 Oct 18 20:22 mb-12-big-Index.db
-rw-r--r-- 2 root root 4610 Oct 18 20:23 mb-12-big-Statistics.db
-rw-r--r-- 2 root root 56 Oct 18 20:22 mb-12-big-Summary.db
-rw-r--r-- 2 root root 92 Oct 18 20:22 mb-12-big-TOC.txt
Then I truncate the table
cqlsh -e "truncate cathy1.employees"
At that moment there are no files under cassandra/data/cathy1/employees* on any nodes
Only the snapshots directory remains
I run a cqlsh -f list_tables on each node:
emp_id | emp_name
--------+----------
(0 rows)
I run a repair on node 2:
nodetool repair cathy1
it finishes successfully
then still on node 2
cd cassandra/data/employees*
cp ./snapshots/mycathy1-node2/* .
-rw-r--r-- 1 root root 32 Oct 18 20:34 manifest.json
-rw-r--r-- 1 root root 43 Oct 18 20:34 mb-12-big-CompressionInfo.db
-rw-r--r-- 1 root root 96 Oct 18 20:34 mb-12-big-Data.db
-rw-r--r-- 1 root root 9 Oct 18 20:34 mb-12-big-Digest.crc32
-rw-r--r-- 1 root root 16 Oct 18 20:34 mb-12-big-Filter.db
-rw-r--r-- 1 root root 32 Oct 18 20:34 mb-12-big-Index.db
-rw-r--r-- 1 root root 4610 Oct 18 20:34 mb-12-big-Statistics.db
-rw-r--r-- 1 root root 56 Oct 18 20:34 mb-12-big-Summary.db
-rw-r--r-- 1 root root 92 Oct 18 20:34 mb-12-big-TOC.txt
drwxr-xr-x 16 root root 4096 Oct 18 20:29 snapshots
Then I run nodetool refresh employees
I run a cqlsh -f list_tables on each node:
emp_id | emp_name
--------+----------
(0 rows)
I run nodetool repair cathy1
and there is still no data visible !!!!!
Pending Flushes: 0 <br>
Table: employees <br>
Space used (live): 4954 <br>
Space used (total): 4954 <br>
Space used by snapshots (total): 59873 <br>
Off heap memory used (total): 32 <br>
SSTable Compression Ratio: 0.75 <br>
**Number of keys (estimate): 4** <br>
Even if statistics says there are 4 keys in table cathy1.employees
nodetool flush cathy1
still no data visible with cqlsh
Why is that ?
You need to run sstableloader at the directory where you've copied your snapshot files.
sstableloader -d <node_ip> -u cassandra -pw cassandra <directory_location>
Note: If your current directory is the directory you have copied your snapshot file, then you don't need to put anything at the field directory_location.
For more details about sstableloader: https://docs.datastax.com/en/cassandra/2.0/cassandra/tools/toolsBulkloader_t.html
Problem:
When i´am copying the whole disk with my virtual machines onto anemty disk with rsync --sparse the disk images (qcow2 files) on the new Disk are bigger then the original files.
Old Disk:
/dev/sda1 => /ssdstor
New Disk:
/dev/sdb1 => /new
Details:
Hardware:
2x SSD Curcial M500 960GB Firmware MU5
OS: Proxmox 3.4
Filesyste: XFS
Command:
rsync -axHv --force --progress --stats --sparse /ssdstor/ /new/
Rsync Version:
dpkg -L | grep rsync
ii rsync 993.1.1-1 amd64 fast, versatile, remote (and local) file-copying tool
file / disk comparison after first copy*
( to check everything was transfered correctly )
rsync -axHv --dry-run --force --progress --stats --sparse /ssdstor/ /new/
sending incremental file list
Number of files: 90,545 (reg: 70,269, dir: 9,395, link: 10,817, dev: 4, special: 60)
Number of created files: 0
Number of deleted files: 0
Number of regular files transferred: 0
Total file size: 634,456,255,674 bytes
Total transferred file size: 0 bytes
Literal data: 0 bytes
Matched data: 0 bytes
File list size: 65,536
File list generation time: 0.001 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 2,097,654
Total bytes received: 9,993
sent 2,097,654 bytes received 9,993 bytes 1,405,098.00 bytes/sec
total size is 634,456,255,674 speedup is 301,025.86 (DRY RUN)
mount | egrep '(sda|sdb)'
/dev/sda1 on /ssdstor type xfs (rw,noatime,nodiratime,attr2,inode64,noquota)
/dev/sdb1 on /new type xfs (rw,noatime,nodiratime,attr2,inode64,noquota)
df -h | egrep '(sda|sdb)'
/dev/sda1 894G 388G 506G 44% /ssdstor
/dev/sdb1 894G 430G 465G 49% /new
ls -alshR /ssdstor | grep qcow2
77G -rw-r--r-- 1 root root 103G Jul 14 09:09 vm-100-disk-1.qcow2
6,2G -rw-r--r-- 1 root root 14G Jul 14 09:07 vm-101-disk-1.qcow2
2,0G -rw-r--r-- 1 root root 4,1G Jul 14 09:07 vm-101-disk-2.qcow2
17G -rw-r--r-- 1 root root 61G Feb 18 09:10 vm-102-disk-1.qcow2
40G -rw-r--r-- 1 root root 78G Jul 14 09:06 vm-103-disk-1.qcow2
40G -rw-r--r-- 1 root root 41G Jul 14 09:05 vm-103-disk-2.qcow2
31G -rw-r--r-- 1 root root 44G Jul 14 09:05 vm-104-disk-1.qcow2
5,2G -rw-r--r-- 1 root root 41G Mai 1 01:00 vm-105-disk-2.qcow2
63G -rw-r--r-- 1 root root 65G Jul 14 10:04 vm-106-disk-1.qcow2
26G -rw-r--r-- 1 root root 65G Jul 14 09:14 vm-107-disk-2.qcow2
51G -rw-r--r-- 1 root root 51G Mai 19 21:21 vm-108-disk-1.qcow2
ls -alshR /new | grep qcow2
79G -rw-r--r-- 1 root root 103G Jul 14 09:09 vm-100-disk-1.qcow2
6,2G -rw-r--r-- 1 root root 14G Jul 14 09:07 vm-101-disk-1.qcow2
2,0G -rw-r--r-- 1 root root 4,1G Jul 14 09:07 vm-101-disk-2.qcow2
17G -rw-r--r-- 1 root root 61G Feb 18 09:10 vm-102-disk-1.qcow2
40G -rw-r--r-- 1 root root 78G Jul 14 09:06 vm-103-disk-1.qcow2
41G -rw-r--r-- 1 root root 41G Jul 14 09:05 vm-103-disk-2.qcow2
37G -rw-r--r-- 1 root root 44G Jul 14 09:05 vm-104-disk-1.qcow2
34G -rw-r--r-- 1 root root 41G Mai 1 01:00 vm-105-disk-2.qcow2
63G -rw-r--r-- 1 root root 65G Jul 14 10:04 vm-106-disk-1.qcow2
33G -rw-r--r-- 1 root root 65G Jul 14 09:14 vm-107-disk-2.qcow2
51G -rw-r--r-- 1 root root 51G Mai 19 21:21 vm-108-disk-1.qcow2
Has anyone an idea?
More Tests:
cp --sparse=always vm-105-disk-2.qcow2 vm-105-disk-2.qcow2.new
5,2G -rw-r--r-- 1 root root 41G Jul 16 08:07 vm-105-disk-2.qcow2
34G -rw-r--r-- 1 root root 41G Jul 16 11:51 vm-105-disk-2.qcow2.new
I just stored a 3,1 GB CSV via Spark-Cassandra-Connector to a Table in a Cassandra Cluster (5 Nodes, 30 GB each, 7.5 GB RAM each instance, cassandra uses ~1.8 GB of that).
I jst saw via DataOpsCenter, that my Cluster holds 16 GB of data (each node ~3.x GB) and my storage usage has grown from 14 GB (before) to 64 GB (after the writing process)!!!
My Keystore has following settings:
replica_placement_strategy org.apache.cassandra.locator.SimpleStrategy
replication_factor 2
CREATE TABLE debs.energydata10m (
id int PRIMARY KEY,
house_id int,
household_id int,
plug_id int,
ts timestamp,
type int,
val float
) WITH
bloom_filter_fp_chance=0.010000 AND
caching='{"keys":"ALL", "rows_per_partition":"NONE"}' AND
comment='' AND
dclocal_read_repair_chance=0.100000 AND
gc_grace_seconds=864000 AND
read_repair_chance=0.000000 AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={'sstable_compression': 'LZ4Compressor'};
Why does Cassandra need that much storage for this 3.1 GB CSV?
Edit: Here is the output of the ls -lR /var/lib/cassandra/data/debs/ command:
ubuntu#ip-xx-xx-xx-xx:~$ ls -lR /var/lib/cassandra/data/debs/
/var/lib/cassandra/data/debs/:
total 24
drwxr-xr-x 2 cassandra cassandra 6 Jun 16 12:43 energydata1000m-52502e00142511e5b5ddabd6d8b6d1d3
drwxr-xr-x 2 cassandra cassandra 16384 Jun 17 13:39 energydata100m-4cb23100142511e5b5ddabd6d8b6d1d3
drwxr-xr-x 2 cassandra cassandra 6 Jun 17 08:41 energydata10m-46487f90142511e5b5ddabd6d8b6d1d3
drwxr-xr-x 2 cassandra cassandra 4096 Jun 17 10:58 energydata10m-f17f204014d811e5b5ddabd6d8b6d1d3
drwxr-xr-x 3 cassandra cassandra 22 Jun 17 10:07 energydata10m-fa83059014cd11e5b5ddabd6d8b6d1d3
drwxr-xr-x 2 cassandra cassandra 6 Jun 16 12:40 energydata-d615ace0141d11e5b5ddabd6d8b6d1d3
/var/lib/cassandra/data/debs/energydata1000m-52502e00142511e5b5ddabd6d8b6d1d3:
total 0
/var/lib/cassandra/data/debs/energydata100m-4cb23100142511e5b5ddabd6d8b6d1d3:
total 3294336
-rw-r--r-- 1 cassandra cassandra 361779 Jun 17 12:36 debs-energydata100m-ka-187-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra 943405306 Jun 17 12:36 debs-energydata100m-ka-187-Data.db
-rw-r--r-- 1 cassandra cassandra 10 Jun 17 12:36 debs-energydata100m-ka-187-Digest.sha1
-rw-r--r-- 1 cassandra cassandra 17615016 Jun 17 12:36 debs-energydata100m-ka-187-Filter.db
-rw-r--r-- 1 cassandra cassandra 254001924 Jun 17 12:36 debs-energydata100m-ka-187-Index.db
-rw-r--r-- 1 cassandra cassandra 9911 Jun 17 12:36 debs-energydata100m-ka-187-Statistics.db
-rw-r--r-- 1 cassandra cassandra 1763968 Jun 17 12:36 debs-energydata100m-ka-187-Summary.db
-rw-r--r-- 1 cassandra cassandra 91 Jun 17 12:36 debs-energydata100m-ka-187-TOC.txt
-rw-r--r-- 1 cassandra cassandra 46747 Jun 17 12:25 debs-energydata100m-ka-211-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra 120719760 Jun 17 12:25 debs-energydata100m-ka-211-Data.db
-rw-r--r-- 1 cassandra cassandra 10 Jun 17 12:25 debs-energydata100m-ka-211-Digest.sha1
-rw-r--r-- 1 cassandra cassandra 2266552 Jun 17 12:25 debs-energydata100m-ka-211-Filter.db
-rw-r--r-- 1 cassandra cassandra 32799168 Jun 17 12:25 debs-energydata100m-ka-211-Index.db
-rw-r--r-- 1 cassandra cassandra 9955 Jun 17 12:25 debs-energydata100m-ka-211-Statistics.db
-rw-r--r-- 1 cassandra cassandra 227840 Jun 17 12:25 debs-energydata100m-ka-211-Summary.db
-rw-r--r-- 1 cassandra cassandra 91 Jun 17 12:25 debs-energydata100m-ka-211-TOC.txt
-rw-r--r-- 1 cassandra cassandra 400275 Jun 17 13:39 debs-energydata100m-ka-353-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra 1053658168 Jun 17 13:39 debs-energydata100m-ka-353-Data.db
-rw-r--r-- 1 cassandra cassandra 9 Jun 17 13:39 debs-energydata100m-ka-353-Digest.sha1
-rw-r--r-- 1 cassandra cassandra 19254504 Jun 17 13:39 debs-energydata100m-ka-353-Filter.db
-rw-r--r-- 1 cassandra cassandra 281034756 Jun 17 13:39 debs-energydata100m-ka-353-Index.db
-rw-r--r-- 1 cassandra cassandra 9911 Jun 17 13:39 debs-energydata100m-ka-353-Statistics.db
-rw-r--r-- 1 cassandra cassandra 1951696 Jun 17 13:39 debs-energydata100m-ka-353-Summary.db
-rw-r--r-- 1 cassandra cassandra 91 Jun 17 13:39 debs-energydata100m-ka-353-TOC.txt
-rw-r--r-- 1 cassandra cassandra 106147 Jun 17 13:32 debs-energydata100m-ka-377-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra 275239666 Jun 17 13:32 debs-energydata100m-ka-377-Data.db
-rw-r--r-- 1 cassandra cassandra 10 Jun 17 13:32 debs-energydata100m-ka-377-Digest.sha1
-rw-r--r-- 1 cassandra cassandra 5209632 Jun 17 13:32 debs-energydata100m-ka-377-Filter.db
-rw-r--r-- 1 cassandra cassandra 74503386 Jun 17 13:32 debs-energydata100m-ka-377-Index.db
-rw-r--r-- 1 cassandra cassandra 9935 Jun 17 13:32 debs-energydata100m-ka-377-Statistics.db
-rw-r--r-- 1 cassandra cassandra 517456 Jun 17 13:32 debs-energydata100m-ka-377-Summary.db
-rw-r--r-- 1 cassandra cassandra 91 Jun 17 13:32 debs-energydata100m-ka-377-TOC.txt
-rw-r--r-- 1 cassandra cassandra 63267 Jun 17 13:36 debs-energydata100m-ka-392-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra 163610575 Jun 17 13:36 debs-energydata100m-ka-392-Data.db
-rw-r--r-- 1 cassandra cassandra 10 Jun 17 13:36 debs-energydata100m-ka-392-Digest.sha1
-rw-r--r-- 1 cassandra cassandra 3146928 Jun 17 13:36 debs-energydata100m-ka-392-Filter.db
-rw-r--r-- 1 cassandra cassandra 44398512 Jun 17 13:36 debs-energydata100m-ka-392-Index.db
-rw-r--r-- 1 cassandra cassandra 9971 Jun 17 13:36 debs-energydata100m-ka-392-Statistics.db
-rw-r--r-- 1 cassandra cassandra 308400 Jun 17 13:36 debs-energydata100m-ka-392-Summary.db
-rw-r--r-- 1 cassandra cassandra 91 Jun 17 13:36 debs-energydata100m-ka-392-TOC.txt
-rw-r--r-- 1 cassandra cassandra 16475 Jun 17 13:37 debs-energydata100m-ka-398-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra 42447012 Jun 17 13:37 debs-energydata100m-ka-398-Data.db
-rw-r--r-- 1 cassandra cassandra 10 Jun 17 13:37 debs-energydata100m-ka-398-Digest.sha1
-rw-r--r-- 1 cassandra cassandra 819112 Jun 17 13:37 debs-energydata100m-ka-398-Filter.db
-rw-r--r-- 1 cassandra cassandra 11540160 Jun 17 13:37 debs-energydata100m-ka-398-Index.db
-rw-r--r-- 1 cassandra cassandra 9915 Jun 17 13:37 debs-energydata100m-ka-398-Statistics.db
-rw-r--r-- 1 cassandra cassandra 80208 Jun 17 13:37 debs-energydata100m-ka-398-Summary.db
-rw-r--r-- 1 cassandra cassandra 91 Jun 17 13:37 debs-energydata100m-ka-398-TOC.txt
-rw-r--r-- 1 cassandra cassandra 3307 Jun 17 13:37 debs-energydata100m-ka-399-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra 8375321 Jun 17 13:37 debs-energydata100m-ka-399-Data.db
-rw-r--r-- 1 cassandra cassandra 10 Jun 17 13:37 debs-energydata100m-ka-399-Digest.sha1
-rw-r--r-- 1 cassandra cassandra 159248 Jun 17 13:37 debs-energydata100m-ka-399-Filter.db
-rw-r--r-- 1 cassandra cassandra 2292966 Jun 17 13:37 debs-energydata100m-ka-399-Index.db
-rw-r--r-- 1 cassandra cassandra 9895 Jun 17 13:37 debs-energydata100m-ka-399-Statistics.db
-rw-r--r-- 1 cassandra cassandra 16000 Jun 17 13:37 debs-energydata100m-ka-399-Summary.db
-rw-r--r-- 1 cassandra cassandra 91 Jun 17 13:37 debs-energydata100m-ka-399-TOC.txt
-rw-r--r-- 1 cassandra cassandra 3299 Jun 17 13:39 debs-energydata100m-ka-400-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra 8332947 Jun 17 13:39 debs-energydata100m-ka-400-Data.db
-rw-r--r-- 1 cassandra cassandra 10 Jun 17 13:39 debs-energydata100m-ka-400-Digest.sha1
-rw-r--r-- 1 cassandra cassandra 159088 Jun 17 13:39 debs-energydata100m-ka-400-Filter.db
-rw-r--r-- 1 cassandra cassandra 2290716 Jun 17 13:39 debs-energydata100m-ka-400-Index.db
-rw-r--r-- 1 cassandra cassandra 9895 Jun 17 13:39 debs-energydata100m-ka-400-Statistics.db
-rw-r--r-- 1 cassandra cassandra 15984 Jun 17 13:39 debs-energydata100m-ka-400-Summary.db
-rw-r--r-- 1 cassandra cassandra 91 Jun 17 13:39 debs-energydata100m-ka-400-TOC.txt
/var/lib/cassandra/data/debs/energydata10m-46487f90142511e5b5ddabd6d8b6d1d3:
total 0
/var/lib/cassandra/data/debs/energydata10m-f17f204014d811e5b5ddabd6d8b6d1d3:
total 326684
-rw-r--r-- 1 cassandra cassandra 95051 Jun 17 10:30 debs-energydata10m-ka-37-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra 245687780 Jun 17 10:30 debs-energydata10m-ka-37-Data.db
-rw-r--r-- 1 cassandra cassandra 10 Jun 17 10:30 debs-energydata10m-ka-37-Digest.sha1
-rw-r--r-- 1 cassandra cassandra 4617168 Jun 17 10:30 debs-energydata10m-ka-37-Filter.db
-rw-r--r-- 1 cassandra cassandra 66716856 Jun 17 10:30 debs-energydata10m-ka-37-Index.db
-rw-r--r-- 1 cassandra cassandra 9923 Jun 17 10:30 debs-energydata10m-ka-37-Statistics.db
-rw-r--r-- 1 cassandra cassandra 463376 Jun 17 10:30 debs-energydata10m-ka-37-Summary.db
-rw-r--r-- 1 cassandra cassandra 91 Jun 17 10:30 debs-energydata10m-ka-37-TOC.txt
-rw-r--r-- 1 cassandra cassandra 3379 Jun 17 10:28 debs-energydata10m-ka-38-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra 8505046 Jun 17 10:28 debs-energydata10m-ka-38-Data.db
-rw-r--r-- 1 cassandra cassandra 9 Jun 17 10:28 debs-energydata10m-ka-38-Digest.sha1
-rw-r--r-- 1 cassandra cassandra 162984 Jun 17 10:28 debs-energydata10m-ka-38-Filter.db
-rw-r--r-- 1 cassandra cassandra 2346732 Jun 17 10:28 debs-energydata10m-ka-38-Index.db
-rw-r--r-- 1 cassandra cassandra 9895 Jun 17 10:28 debs-energydata10m-ka-38-Statistics.db
-rw-r--r-- 1 cassandra cassandra 16368 Jun 17 10:28 debs-energydata10m-ka-38-Summary.db
-rw-r--r-- 1 cassandra cassandra 91 Jun 17 10:28 debs-energydata10m-ka-38-TOC.txt
-rw-r--r-- 1 cassandra cassandra 1811 Jun 17 10:58 debs-energydata10m-ka-39-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra 4475513 Jun 17 10:58 debs-energydata10m-ka-39-Data.db
-rw-r--r-- 1 cassandra cassandra 10 Jun 17 10:58 debs-energydata10m-ka-39-Digest.sha1
-rw-r--r-- 1 cassandra cassandra 86392 Jun 17 10:58 debs-energydata10m-ka-39-Filter.db
-rw-r--r-- 1 cassandra cassandra 1243818 Jun 17 10:58 debs-energydata10m-ka-39-Index.db
-rw-r--r-- 1 cassandra cassandra 9895 Jun 17 10:58 debs-energydata10m-ka-39-Statistics.db
-rw-r--r-- 1 cassandra cassandra 8704 Jun 17 10:58 debs-energydata10m-ka-39-Summary.db
-rw-r--r-- 1 cassandra cassandra 91 Jun 17 10:58 debs-energydata10m-ka-39-TOC.txt
/var/lib/cassandra/data/debs/energydata10m-fa83059014cd11e5b5ddabd6d8b6d1d3:
total 0
drwxr-xr-x 3 cassandra cassandra 40 Jun 17 10:07 snapshots
/var/lib/cassandra/data/debs/energydata10m-fa83059014cd11e5b5ddabd6d8b6d1d3/snapshots:
total 4
drwxr-xr-x 2 cassandra cassandra 4096 Jun 17 10:07 1434535647574-energydata10m
/var/lib/cassandra/data/debs/energydata10m-fa83059014cd11e5b5ddabd6d8b6d1d3/snapshots/1434535647574-energydata10m:
total 326784
-rw-r--r-- 1 cassandra cassandra 92923 Jun 17 09:15 debs-energydata10m-ka-37-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra 240323836 Jun 17 09:15 debs-energydata10m-ka-37-Data.db
-rw-r--r-- 1 cassandra cassandra 10 Jun 17 09:15 debs-energydata10m-ka-37-Digest.sha1
-rw-r--r-- 1 cassandra cassandra 4520064 Jun 17 09:15 debs-energydata10m-ka-37-Filter.db
-rw-r--r-- 1 cassandra cassandra 65218608 Jun 17 09:15 debs-energydata10m-ka-37-Index.db
-rw-r--r-- 1 cassandra cassandra 9919 Jun 17 09:15 debs-energydata10m-ka-37-Statistics.db
-rw-r--r-- 1 cassandra cassandra 452976 Jun 17 09:15 debs-energydata10m-ka-37-Summary.db
-rw-r--r-- 1 cassandra cassandra 91 Jun 17 09:15 debs-energydata10m-ka-37-TOC.txt
-rw-r--r-- 1 cassandra cassandra 3307 Jun 17 09:14 debs-energydata10m-ka-38-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra 8321541 Jun 17 09:14 debs-energydata10m-ka-38-Data.db
-rw-r--r-- 1 cassandra cassandra 10 Jun 17 09:14 debs-energydata10m-ka-38-Digest.sha1
-rw-r--r-- 1 cassandra cassandra 159384 Jun 17 09:14 debs-energydata10m-ka-38-Filter.db
-rw-r--r-- 1 cassandra cassandra 2294964 Jun 17 09:14 debs-energydata10m-ka-38-Index.db
-rw-r--r-- 1 cassandra cassandra 9895 Jun 17 09:14 debs-energydata10m-ka-38-Statistics.db
-rw-r--r-- 1 cassandra cassandra 16016 Jun 17 09:14 debs-energydata10m-ka-38-Summary.db
-rw-r--r-- 1 cassandra cassandra 91 Jun 17 09:14 debs-energydata10m-ka-38-TOC.txt
-rw-r--r-- 1 cassandra cassandra 3307 Jun 17 09:15 debs-energydata10m-ka-39-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra 8316992 Jun 17 09:15 debs-energydata10m-ka-39-Data.db
-rw-r--r-- 1 cassandra cassandra 10 Jun 17 09:15 debs-energydata10m-ka-39-Digest.sha1
-rw-r--r-- 1 cassandra cassandra 159296 Jun 17 09:15 debs-energydata10m-ka-39-Filter.db
-rw-r--r-- 1 cassandra cassandra 2293614 Jun 17 09:15 debs-energydata10m-ka-39-Index.db
-rw-r--r-- 1 cassandra cassandra 9895 Jun 17 09:15 debs-energydata10m-ka-39-Statistics.db
-rw-r--r-- 1 cassandra cassandra 16000 Jun 17 09:15 debs-energydata10m-ka-39-Summary.db
-rw-r--r-- 1 cassandra cassandra 91 Jun 17 09:15 debs-energydata10m-ka-39-TOC.txt
-rw-r--r-- 1 cassandra cassandra 755 Jun 17 10:07 debs-energydata10m-ka-40-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra 1781300 Jun 17 10:07 debs-energydata10m-ka-40-Data.db
-rw-r--r-- 1 cassandra cassandra 10 Jun 17 10:07 debs-energydata10m-ka-40-Digest.sha1
-rw-r--r-- 1 cassandra cassandra 34752 Jun 17 10:07 debs-energydata10m-ka-40-Filter.db
-rw-r--r-- 1 cassandra cassandra 500220 Jun 17 10:07 debs-energydata10m-ka-40-Index.db
-rw-r--r-- 1 cassandra cassandra 9895 Jun 17 10:07 debs-energydata10m-ka-40-Statistics.db
-rw-r--r-- 1 cassandra cassandra 3552 Jun 17 10:07 debs-energydata10m-ka-40-Summary.db
-rw-r--r-- 1 cassandra cassandra 91 Jun 17 10:07 debs-energydata10m-ka-40-TOC.txt
-rw-r--r-- 1 cassandra cassandra 152 Jun 17 10:07 manifest.json
/var/lib/cassandra/data/debs/energydata-d615ace0141d11e5b5ddabd6d8b6d1d3:
total 0
Information: The Data of energydata10m or energydata1000m already existed before the writing process of energydata100m (the 14 GB disk space before launing)!
************** EDIT **************
I found calculation formulas here: http://docs.datastax.com/en/cassandra/1.2/cassandra/architecture/architecturePlanningUserData_t.html They say that the data on disk can be much higher than the original dataset. Can someone explain how to calculate the values of the link above? I don't know about the needed data-sizes...
The following documentation explains the data sizes and their calculation:
http://docs.datastax.com/en/cassandra/1.2/cassandra/architecture/architecturePlanningUserData_t.html