Cassandra Mutation too Large, for small insert

Cassandra Mutation too Large, for small insert - cassandra

I'm getting these errors:
java.lang.IllegalArgumentException: Mutation of 16.000MiB is too large for the maximum size of 16.000MiB
in Apache Cassandra 3.x. I'm doing inserts of 4MB or 8MB blobs, but not anything greater than 8MB. Why am I hitting the 16MB limit? Is Cassandra batching up multiple writes (inserts) and creating a "mutation" that is too large? (If so, why would it do that, since the configured limit is 8MB?)
There is little documentation on mutations -- except to say that a mutation is an insert or delete. How can I prevent these errors?

you can increase the commit log size to 64 mb in cassandra.yaml
commitlog_segment_size_in_mb: 64
By default the commitLog size is 32 mb.
By design intent the maximum allowed segment size is 50% of the configured commit_log_segment_size_in_mb. This is so Cassandra avoids writing segments with large amounts of empty space.
you should investigate why the write size has suddenly increased. If it is not expected i.e. due to a planned change then it may well be a problem with the client application that needs further inspection.

Related

Cassandra Compacting wide rows large partitions

I have been searching some docs online to get good understanding of how to tackle large partitions in cassandra.
I followed a document on the below link:
https://www.safaribooksonline.com/library/view/cassandra-high-performance/9781849515122/ch13s10.html.
Regarding "LARGE ROWS WITH COMPACTION LIMITS", below is metioned:
"The default value for in_memory_compaction_limit_in_mb is 64. This value is set in conf/cassandra.yaml. For use cases that have fixed columns, the limit should never be exceeded. Setting this value can work as a sanity check to ensure that processes are not inadvertently writing to many columns to the same key.
Keys with many columns can also be problematic when using the row cache because it requires the entire row to be stored in memory."
In the /conf/cassandra.yaml, I did find a configuration named "in_memory_compaction_limit_in_mb".
The Definition in the cassandra.yaml goes as below:
In Cassandra 2.0:
in_memory_compaction_limit_in_mb
(Default: 64) Size limit for rows being compacted in memory. Larger rows spill to disk and use a slower two-pass compaction process. When this occurs, a message is logged specifying the row key. The recommended value is 5 to 10 percent of the available Java heap size.
In Cassandra 3.0: (No such entries found in cassandra.yaml)
compaction_large_partition_warning_threshold_mb
(Default: 100) Cassandra logs a warning when compacting partitions larger than the set value
I have searching lot on what exactly the setting in_memory_compaction_limit_in_mb does.
It mentions some compaction is done in memory and some compaction is done on disk.
As per my understanding goes, When Compaction process runs:
SSTABLE is being read from disk---->(compared,tombstones removed,stale data removed) all happens in memory--->new sstable written to disk-->old table being removed
This operations accounts to high Disc space requirements and Disk I/O(Bandwidth).
Do help me with,if my understanding of compaction is wrong. Is there anything in compaction that happens in memory.
In my environment the
in_memory_compaction_limit_in_mb is set to 800.
I need to understand the purpose and implications.
Thanks in advance

in_memory_compaction_limit_in_mb is no longer necessary since the size doesn't need to be known before writing. There is no longer a 2 pass compaction so can be ignored. You don't have to do the entire partition at once, just a row at a time.
Now the primary cost is in deserializing the large index at the beginning of the partition that occurs in memory. You can increase the column_index_size_in_kb to reduce the size of that index (at cost of more IO during reads, but likely insignificant compared to the deserialization). Also if you use a newer version (3.11+) the index is lazy loaded after exceeding a certain size which improves things quite a bit.

Batch insert overflow

I am using Cassandra 3.10 and am trying to follow best practice by having a table per query so I am using the Batch insert proncipal to insert into multiple tables as a single transaction however I get the following error in the cassandra log.
Batch for [zed.payment, zed.trade_party_b_ref, zed.trade_product_type, zed.trade, zed.fx_variance_swap, zed.trade_party_a_ref, zed.trade_party_b_trade_id, zed.market_value] is of size 5.926KiB, exceeding specified threshold of 5.000KiB by 0.926KiB.

The log is saying that you are sending a batch of almost 6MB when the limit is 5MB.
You should send smaller batches of data to avoid going over that batch size limit.
You can also change the batch size limit in cassandra.yaml, but I would not recommend to change it.

Thanks for the info, the parameter in cassandra.yaml is
Log WARN on any multiple-partition batch size exceeding this value. 5kb per batch by default.
Caution should be taken on increasing the size of this threshold as it can lead to node instability.
batch_size_warn_threshold_in_kb: 5
which is in KB, not MB so my batch statement is really 6KB not 6MB.
After 30 years working with Oracle, this is my first venture into Cassandra so I have tried to follow the guidelines of having a separate table for each query so where I have a financial trade table which has to be queried in up to 8 different ways I have 8 tables. That then implies that an insert into the tables must be done in a batch to create what would be a single transaction in Oracle. The master table of the eight has a significant number of sibling tables which must also be included in the batch so here is my point:
If cassandra does not support transactions but relies on the batch functionality to achieve the same effect it must not impose a limit on the size of the batch. If this is not possible then cassandra is really limited to applications with VERY simple data structures.

Cassandra - how to disable memtable flush

I'm running Cassandra with a very small dataset so that the data can exist on memtable only. Below are my configurations:
In jvm.options:
-Xms4G
-Xmx4G
In cassandra.yaml,
memtable_cleanup_threshold: 0.50
memtable_allocation_type: heap_buffers
As per the documentation in cassandra.yaml, the memtable_heap_space_in_mb and memtable_heap_space_in_mb will be set of 1/4 of heap size i.e. 1000MB
According to the documentation here (http://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__memtable_cleanup_threshold), the memtable flush will trigger if the total size of memtabl(s) goes beyond (1000+1000)*0.50=1000MB.
Now if I perform several write requests which results in almost ~300MB of the data, memtable still gets flushed since I see sstables being created on file system (Data.db etc.) and I don't understand why.
Could anyone explain this behavior and point out if I'm missing something here?

One additional trigger for memtable flushing is commitlog space used (default 32mb).
http://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsMemtableThruput.html
http://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__commitlog_total_space_in_mb
Since Cassandra should be persistent, it should do writes to disk to come up with the data after the node failing. If you don't need this durability, you can use any other memory based databases - redis, memcache etc.

Below is the response I got from Cassandra user group, copying it here in case someone else is looking for the similar info.
After thinking about your scenario I believe your small SSTable size might be due to data compression. By default, all tables enable SSTable compression.
Let go through your scenario. Let's say you have allocated 4GB to your Cassandra node. Your memtable_heap_space_in_mb and
memtable_offheap_space_in_mb will roughly come to around 1GB. Since you have memtable_cleanup_threshold to .50 table cleanup will be triggered when total allocated memtable space exceeds 1/2GB. Note the cleanup threshold is .50 of 1GB and not a combination of heap and off heap space. This memtable allocation size is the total amount available for all tables on your node. This includes all system related keyspaces. The cleanup process will write the largest memtable to disk.
For your case, I am assuming that you are on a single node with only one table with insert activity. I do not think the commit log will trigger a flush in this circumstance as by default the commit log has 8192 MB of space unless the commit log is placed on a very small disk.
I am assuming your table on disk is smaller than 500MB because of compression. You can disable compression on your table and see if this helps get the desired size.
I have written up a blog post explaining memtable flushing (http://abiasforaction.net/apache-cassandra-memtable-flush/)
Let me know if you have any other question.
I hope this helps.

Too much disk space used by Apache Kudu for WALs

I have a hive table that is of 2.7 MB (which is stored in a parquet format). When I use impala-shell to convert this hive table to kudu, I notice that the /tserver/ folder size increases by around 300 MB. Upon exploring further, I see it is the /tserver/wals/ folder that holds the majority of this increase. I am facing serious issues due to this. If a 2.7 MB file generates a 300 MB WAL, then I cannot really work on bigger data. Is there a solution to this?
My kudu version is 1.1.0 and impala is 2.7.0.

I never used KUDU but I'm able to Google on a few keywords, and read some documentation.
From the Kudu configuration reference section "Unsupported flags"...
--log_preallocate_segments Whether the WAL should preallocate the entire segment before writing to it Default true
--log_segment_size_mb The default segment size for log roll-overs, in MB Default 64
--log_min_segments_to_retain The minimum number of past log segments to keep at all times, regardless of what is required for
durability. Must be at least 1. Default 2
--log_max_segments_to_retain The maximum number of past log segments to keep at all times for the purposes of catching up other
peers. Default 10
Looks like you have a minimum disk requirement of (2+1)x64 MB per tablet, for the WAL only. And it can grow up to 10x64 MB if some tablets are straggling and cannot catch up.
Plus some temp disk space for compaction etc. etc.
[Edit] these default values have changed in Kudu 1.4 (released in June 2017); quoting the Release Notes...
The default size for Write Ahead Log (WAL) segments has been reduced
from 64MB to 8MB. Additionally, in the case that all replicas of a
tablet are fully up to date and data has been flushed from memory,
servers will now retain only a single WAL segment rather than two.
These changes are expected to reduce the average consumption of disk
space on the configured WAL disk by 16x

Cassandra configs

Recently I began to study Cassandra. Please help me understand what effect these settings (I need your interpretation, I read the file cassandra.yaml):
memtable_flush_writers
memtable_flush_queue_size
thrift_framed_transport_size_in_mb
in_memory_compaction_limit_in_mb
slised_buffer_size_in_kb
thrift_max_message_length_in_mb
binary_memtable_throughput_in_mb
column_index_size_in_kb

I know it's very late to answer.But I am answering it as it might help someone else.
The most of the parameters you have mentioned above are related to the Cassandra write operation.
memtable_flush_writers :
It Sets the number of memtable flush writer threads. These threads are blocked by disk I/O, and each one holds a memtable in memory while blocked. If your data directories are backed by SSD, increase this setting to the number of cores.
memtable_flush_queue_size :
The number of full memtables to allow pending flush (memtables waiting for a write thread). At a minimum, set to the maximum number of indexes created on a single table
in_memory_compaction_limit_in_mb : Size limit for rows being compacted in memory. Larger rows spill to disk and use a slower two-pass compaction process. When this occurs, a message is logged specifying the row key. The recommended value is 5 to 10 percent of the available Java heap size.
thrift_framed_transport_size_in_mb : Frame size (maximum field length) for Thrift. The frame is the row or part of the row that the application is inserting.
thrift_max_message_length_in_mb: The maximum length of a Thrift message in megabytes, including all fields and internal Thrift overhead (1 byte of overhead for each frame). Message length is usually used in conjunction with batches. A frame length greater than or equal to 24 accommodates a batch with four inserts, each of which is 24 bytes. The required message length is greater than or equal to 24+24+24+24+4 (number of frames).
You can find more details at Datastax Cassandra documentation

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string