cassandra 1.2 memory per CF basis - cassandra

how much is the default memory size that consider for per CF in cassandra 1.2?
in previous edition it was 128 MB. and it was declare in MemtableThroughputInMB parameter in cassandra.yaml file. but now i cant find it in cassandra 1.2 config file.
thanks .

It is replaced by memtable_total_space_in_mb.
(Default: 1/3 of the heap**) Specifies the total memory used for all memtables on a node. This replaces the per-table storage settings memtable_operations_in_millions and memtable_throughput_in_mb.

Now a days cassandra does not provide you the scope to set the default memory size per column family basis. Rather you can define the whole size of the memtable in your configuration i.e .yaml file using memtable_total_space_in_mb. By default its value is one third of your JVM heap size.
Cassandra manages this space across all your ColumnFamilies and flushes memtables to disk as needed. Do note that a minimum of 1MB per memtable is used by the per-memtable arena allocator , which is worth keeping in mind if you are looking at going from thousands to tens of thousands of ColumnFamilies.

Related

memtable_flush_writer significance and uses

Can anyone tell about memtable_flush_writers use case and significance. And in what situation we should tune from default value? I have already read the datastax docs but not clear the actual uses and benefits.
By default, memtable_cleanup_threshold is computed as: 1 / ( memtable_flush_writers + 1)
There is some guidance in the YAML about how to set this value, as Mehul pointed out. Contrary to that, I would never set that to number of cores, regardless of whether or not you're using SSDs.
The problems come when the memtable_flush_writers is set too high, your node can become overwhelmed with small flushes that trigger compaction. This has the unfortunate side effect of causing your commitlog to fill up, and eventually get to a point where it cannot keep up with the flush frequency.
If that happens, you can force a flush manually using nodetool flush. But if you see your commitlog filling your disk, lowering your memtable_flush_writers is a good thing to try.
NoteL: As with all "tuning" like changes with Cassandra, I'd make incremental changes over time, as opposed to a drastic change. Just to be on the safe side.
memtable_cleanup_threshold : When the total amount of memory used by all non-flushing memtables exceeds this ratio, Cassandra flushes the largest memtable to disk.
memtable_flush_writers : THis defines the number of memtable flush writer threads. The threads will write parallel on disk (sstables). But changing this parameter is suggest in case solid-state drive (SSD) is used.
Note : If your data directories are backed by SSDs, increase this setting to the number of cores.
I hope this solves your query.

Cassandra Compacting wide rows large partitions

I have been searching some docs online to get good understanding of how to tackle large partitions in cassandra.
I followed a document on the below link:
https://www.safaribooksonline.com/library/view/cassandra-high-performance/9781849515122/ch13s10.html.
Regarding "LARGE ROWS WITH COMPACTION LIMITS", below is metioned:
"The default value for in_memory_compaction_limit_in_mb is 64. This value is set in conf/cassandra.yaml. For use cases that have fixed columns, the limit should never be exceeded. Setting this value can work as a sanity check to ensure that processes are not inadvertently writing to many columns to the same key.
Keys with many columns can also be problematic when using the row cache because it requires the entire row to be stored in memory."
In the /conf/cassandra.yaml, I did find a configuration named "in_memory_compaction_limit_in_mb".
The Definition in the cassandra.yaml goes as below:
In Cassandra 2.0:
in_memory_compaction_limit_in_mb
(Default: 64) Size limit for rows being compacted in memory. Larger rows spill to disk and use a slower two-pass compaction process. When this occurs, a message is logged specifying the row key. The recommended value is 5 to 10 percent of the available Java heap size.
In Cassandra 3.0: (No such entries found in cassandra.yaml)
compaction_large_partition_warning_threshold_mb
(Default: 100) Cassandra logs a warning when compacting partitions larger than the set value
I have searching lot on what exactly the setting in_memory_compaction_limit_in_mb does.
It mentions some compaction is done in memory and some compaction is done on disk.
As per my understanding goes, When Compaction process runs:
SSTABLE is being read from disk---->(compared,tombstones removed,stale data removed) all happens in memory--->new sstable written to disk-->old table being removed
This operations accounts to high Disc space requirements and Disk I/O(Bandwidth).
Do help me with,if my understanding of compaction is wrong. Is there anything in compaction that happens in memory.
In my environment the
in_memory_compaction_limit_in_mb is set to 800.
I need to understand the purpose and implications.
Thanks in advance
in_memory_compaction_limit_in_mb is no longer necessary since the size doesn't need to be known before writing. There is no longer a 2 pass compaction so can be ignored. You don't have to do the entire partition at once, just a row at a time.
Now the primary cost is in deserializing the large index at the beginning of the partition that occurs in memory. You can increase the column_index_size_in_kb to reduce the size of that index (at cost of more IO during reads, but likely insignificant compared to the deserialization). Also if you use a newer version (3.11+) the index is lazy loaded after exceeding a certain size which improves things quite a bit.

Cassandra - how to disable memtable flush

I'm running Cassandra with a very small dataset so that the data can exist on memtable only. Below are my configurations:
In jvm.options:
-Xms4G
-Xmx4G
In cassandra.yaml,
memtable_cleanup_threshold: 0.50
memtable_allocation_type: heap_buffers
As per the documentation in cassandra.yaml, the memtable_heap_space_in_mb and memtable_heap_space_in_mb will be set of 1/4 of heap size i.e. 1000MB
According to the documentation here (http://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__memtable_cleanup_threshold), the memtable flush will trigger if the total size of memtabl(s) goes beyond (1000+1000)*0.50=1000MB.
Now if I perform several write requests which results in almost ~300MB of the data, memtable still gets flushed since I see sstables being created on file system (Data.db etc.) and I don't understand why.
Could anyone explain this behavior and point out if I'm missing something here?
One additional trigger for memtable flushing is commitlog space used (default 32mb).
http://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsMemtableThruput.html
http://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__commitlog_total_space_in_mb
Since Cassandra should be persistent, it should do writes to disk to come up with the data after the node failing. If you don't need this durability, you can use any other memory based databases - redis, memcache etc.
Below is the response I got from Cassandra user group, copying it here in case someone else is looking for the similar info.
After thinking about your scenario I believe your small SSTable size might be due to data compression. By default, all tables enable SSTable compression.
Let go through your scenario. Let's say you have allocated 4GB to your Cassandra node. Your memtable_heap_space_in_mb and
memtable_offheap_space_in_mb will roughly come to around 1GB. Since you have memtable_cleanup_threshold to .50 table cleanup will be triggered when total allocated memtable space exceeds 1/2GB. Note the cleanup threshold is .50 of 1GB and not a combination of heap and off heap space. This memtable allocation size is the total amount available for all tables on your node. This includes all system related keyspaces. The cleanup process will write the largest memtable to disk.
For your case, I am assuming that you are on a single node with only one table with insert activity. I do not think the commit log will trigger a flush in this circumstance as by default the commit log has 8192 MB of space unless the commit log is placed on a very small disk.
I am assuming your table on disk is smaller than 500MB because of compression. You can disable compression on your table and see if this helps get the desired size.
I have written up a blog post explaining memtable flushing (http://abiasforaction.net/apache-cassandra-memtable-flush/)
Let me know if you have any other question.
I hope this helps.

Too much disk space used by Apache Kudu for WALs

I have a hive table that is of 2.7 MB (which is stored in a parquet format). When I use impala-shell to convert this hive table to kudu, I notice that the /tserver/ folder size increases by around 300 MB. Upon exploring further, I see it is the /tserver/wals/ folder that holds the majority of this increase. I am facing serious issues due to this. If a 2.7 MB file generates a 300 MB WAL, then I cannot really work on bigger data. Is there a solution to this?
My kudu version is 1.1.0 and impala is 2.7.0.
I never used KUDU but I'm able to Google on a few keywords, and read some documentation.
From the Kudu configuration reference section "Unsupported flags"...
--log_preallocate_segments Whether the WAL should preallocate the entire segment before writing to it Default true
--log_segment_size_mb The default segment size for log roll-overs, in MB Default 64
--log_min_segments_to_retain The minimum number of past log segments to keep at all times, regardless of what is required for
durability. Must be at least 1. Default 2
--log_max_segments_to_retain The maximum number of past log segments to keep at all times for the purposes of catching up other
peers. Default 10
Looks like you have a minimum disk requirement of (2+1)x64 MB per tablet, for the WAL only. And it can grow up to 10x64 MB if some tablets are straggling and cannot catch up.
Plus some temp disk space for compaction etc. etc.
[Edit] these default values have changed in Kudu 1.4 (released in June 2017); quoting the Release Notes...
The default size for Write Ahead Log (WAL) segments has been reduced
from 64MB to 8MB. Additionally, in the case that all replicas of a
tablet are fully up to date and data has been flushed from memory,
servers will now retain only a single WAL segment rather than two.
These changes are expected to reduce the average consumption of disk
space on the configured WAL disk by 16x

Cassandra In-Memory option

There is an In-Memory option introduced in the Cassandra by DataStax Enterprise 4.0:
http://www.datastax.com/documentation/datastax_enterprise/4.0/datastax_enterprise/inMemory.html
But with 1GB size limited for an in-memory table.
Anyone know the consideration why limited it as 1GB? And possible extend to a large size of in-memory table, such as 64GB?
To answer your question: today it's not possible to bypass this limitation.
In-Memory tables are stored within the JVM Heap, regardless the amount of memory available on single node allocating more than 8GB to JVM Heap is not recommended.
The main reason of this limitation is that Java Garbage Collector slow down when dealing with huge memory amount.
However if you consider Cassandra as a distributed system 1GB is not the real limitation.
(nodes*allocated_memory)/ReplicationFactor
allocated_memory is max 1GB -- So your table may contains many GB in memory allocated in different nodes.
I think that in future something will improve but dealing with 64GB in memory it could be a real problem when you need to flush data on disk. One more consideration that creates limitation: avoid TTL when working with In-Memory tables. TTL creates tombstones, a tombstone is not deallocated until the GCGraceSeconds period passes -- so considering a default value of 10 days each tombstone will keep the portion of memory busy and unavailable, possibly for long time.
HTH,
Carlo

Resources