Too much disk space used by Apache Kudu for WALs - apache-kudu

I have a hive table that is of 2.7 MB (which is stored in a parquet format). When I use impala-shell to convert this hive table to kudu, I notice that the /tserver/ folder size increases by around 300 MB. Upon exploring further, I see it is the /tserver/wals/ folder that holds the majority of this increase. I am facing serious issues due to this. If a 2.7 MB file generates a 300 MB WAL, then I cannot really work on bigger data. Is there a solution to this?
My kudu version is 1.1.0 and impala is 2.7.0.

I never used KUDU but I'm able to Google on a few keywords, and read some documentation.
From the Kudu configuration reference section "Unsupported flags"...
--log_preallocate_segments Whether the WAL should preallocate the entire segment before writing to it Default true
--log_segment_size_mb The default segment size for log roll-overs, in MB Default 64
--log_min_segments_to_retain The minimum number of past log segments to keep at all times, regardless of what is required for
durability. Must be at least 1. Default 2
--log_max_segments_to_retain The maximum number of past log segments to keep at all times for the purposes of catching up other
peers. Default 10
Looks like you have a minimum disk requirement of (2+1)x64 MB per tablet, for the WAL only. And it can grow up to 10x64 MB if some tablets are straggling and cannot catch up.
Plus some temp disk space for compaction etc. etc.
[Edit] these default values have changed in Kudu 1.4 (released in June 2017); quoting the Release Notes...
The default size for Write Ahead Log (WAL) segments has been reduced
from 64MB to 8MB. Additionally, in the case that all replicas of a
tablet are fully up to date and data has been flushed from memory,
servers will now retain only a single WAL segment rather than two.
These changes are expected to reduce the average consumption of disk
space on the configured WAL disk by 16x

Related

Where does the idea of a 10MB partition size come from?

I'm doing some data modelling for time series data in Cassandra, and I've decided to implement buckets to regulate my partition sizes and maintain reasonable distribution on my cluster.
I decided to bucketise such that my partitions would not exceed a size of 10MB, as I've seen numerous sources that state this as an ideal partition size, but I can't find any information on why 10MB was chosen. On top of this I can't find anything from DataStax or Apache that mentions this soft 10MB limit at all.
Our data can be requested for large periods of time, meaning lots of partitions will be required to service 1 request if the partition sizes remain at 10MB. I'd rather increase the size of the partitions, and have fewer partitions required to service these requests.
Where does this idea of a 10MB partition size come from? Is it still relevant? What would be so bad if my partitions were 20MB in size? Or even 50MB?
With 10MB referenced in so many places, I feel like there must be something to it. Any information would be appreciated. Cheers.
I think that many of these advises are coming from old time, when support for wide partitions weren't very good - it was a lot of pressure on heap when we read data, etc.. Since Cassandra 3.0 the situation heavily improved, but it's still recommended to keep the size on the disk under 100Mb.
For example, DataStax planning guide says in section "Estimating partition size":
a good rule of thumb is to keep the maximum number of rows below 100,000 items and the disk size under 100 MB
In recent versions of Cassandra we can go beyond this recommendation, but it still not advised, although it heavily depends on the access patterns. You can find more information in the following blog post, and this video.
I have seen users with 60+Gb partitions - system still works, but the data distribution is not ideal, so nodes are becoming "hot", and performance may suffer.

Cassandra Mutation too Large, for small insert

I'm getting these errors:
java.lang.IllegalArgumentException: Mutation of 16.000MiB is too large for the maximum size of 16.000MiB
in Apache Cassandra 3.x. I'm doing inserts of 4MB or 8MB blobs, but not anything greater than 8MB. Why am I hitting the 16MB limit? Is Cassandra batching up multiple writes (inserts) and creating a "mutation" that is too large? (If so, why would it do that, since the configured limit is 8MB?)
There is little documentation on mutations -- except to say that a mutation is an insert or delete. How can I prevent these errors?
you can increase the commit log size to 64 mb in cassandra.yaml
commitlog_segment_size_in_mb: 64
By default the commitLog size is 32 mb.
By design intent the maximum allowed segment size is 50% of the configured commit_log_segment_size_in_mb. This is so Cassandra avoids writing segments with large amounts of empty space.
you should investigate why the write size has suddenly increased. If it is not expected i.e. due to a planned change then it may well be a problem with the client application that needs further inspection.

Cassandra Cache Memory Management

I have 4 Node Cassandra 2.1.13 Cluster with the below configurations.
32 GB Ram
Max HEAP SIZE - 8 GB
250 GB Hard Disk Each (Not SSD).
I am trying to do a load test on write and read. I have created a multi threaded program to create 50 Million Records. Each row has 30 Columns.
I was able to insert 50 Million records in 84 Minutes at a rate of 9.5K insert per seconds.
Next I was trying to read those 50 Million records randomly using 32 clients and I was able to do read at 28K per second.
The problem is after some time, the memory gets full and most of it cached. almost 20GB.After some time the system hangs because of out of memory.
If I clean the cache Memory, my read throughput goes down to 100 per second.
How should I manage my cache memory without affecting read performance.
Let me know if you need any more any more information.
What you noticed is the Linux disk cache, which is supposed to serve data from RAM instead of going to disk in order to speed up data read access. Please make sure to understand how it works, e.g. see here.
As you're already using top, I'd recommend add "cache misses" as well to the overview (hit F + select nMaj). This will show you whenever a disk read cannot be served by the cache. You should see an increase of misses once the page cache starts to become saturated.
How should I manage my cache memory without affecting read performance.
The cache is fully managed by Linux and does not need any actions from your side to take care of.

How to prevent Cassandra commit logs filling up disk space

I'm running a two node Datastax AMI cluster on AWS. Yesterday, Cassandra started refusing connections from everything. The system logs showed nothing. After a lot of tinkering, I discovered that the commit logs had filled up all the disk space on the allotted mount and this seemed to be causing the connection refusal (deleted some of the commit logs, restarted and was able to connect).
I'm on DataStax AMI 2.5.1 and Cassandra 2.1.7
If I decide to wipe and restart everything from scratch, how do I ensure that this does not happen again?
You could try lowering the commitlog_total_space_in_mb setting in your cassandra.yaml. The default is 8192MB for 64-bit systems (it should be commented-out in your .yaml file... you'll have to un-comment it when setting it). It's usually a good idea to plan for that when sizing your disk(s).
You can verify this by running a du on your commitlog directory:
$ du -d 1 -h ./commitlog
8.1G ./commitlog
Although, a smaller commit log space will cause more frequent flushes (increased disk I/O), so you'll want to keep any eye on that.
Edit 20190318
Just had a related thought (on my 4-year-old answer). I saw that it received some attention recently, and wanted to make sure that the right information is out there.
It's important to note that sometimes the commit log can grow in an "out of control" fashion. Essentially, this can happen because the write load on the node exceeds Cassandra's ability to keep up with flushing the memtables (and thus, removing old commitlog files). If you find a node with dozens of commitlog files, and the number seems to keep growing, this might be your issue.
Essentially, your memtable_cleanup_threshold may be too low. Although this property is deprecated, you can still control how it is calculated by lowering the number of memtable_flush_writers.
memtable_cleanup_threshold = 1 / (memtable_flush_writers + 1)
The documentation has been updated as of 3.x, but used to say this:
# memtable_flush_writers defaults to the smaller of (number of disks,
# number of cores), with a minimum of 2 and a maximum of 8.
#
# If your data directories are backed by SSD, you should increase this
# to the number of cores.
#memtable_flush_writers: 8
...which (I feel) led to many folks setting this value WAY too high.
Assuming a value of 8, the memtable_cleanup_threshold is .111. When the footprint of all memtables exceeds this ratio of total memory available, flushing occurs. Too many flush (blocking) writers can prevent this from happening expediently. With a single /data dir, I recommend setting this value to 2.
In addition to decreasing the commitlog size as suggested by BryceAtNetwork23, a proper solution to ensure it won't happen again will have monitoring of the disk setup so that you are alerted when its getting full and have time to act/increase the disk size.
Seeing as you are using DataStax, you could set an alert for this in OpsCenter. Haven't used this within the cloud myself, but I imagine it would work. Alerts can be set by clicking Alerts in the top banner -> Manage Alerts -> Add Alert. Configure the mounts to watch and the thresholds to trigger on.
Or, I'm sure there are better tools to monitor disk space out there.

cassandra 1.2 memory per CF basis

how much is the default memory size that consider for per CF in cassandra 1.2?
in previous edition it was 128 MB. and it was declare in MemtableThroughputInMB parameter in cassandra.yaml file. but now i cant find it in cassandra 1.2 config file.
thanks .
It is replaced by memtable_total_space_in_mb.
(Default: 1/3 of the heap**) Specifies the total memory used for all memtables on a node. This replaces the per-table storage settings memtable_operations_in_millions and memtable_throughput_in_mb.
Now a days cassandra does not provide you the scope to set the default memory size per column family basis. Rather you can define the whole size of the memtable in your configuration i.e .yaml file using memtable_total_space_in_mb. By default its value is one third of your JVM heap size.
Cassandra manages this space across all your ColumnFamilies and flushes memtables to disk as needed. Do note that a minimum of 1MB per memtable is used by the per-memtable arena allocator , which is worth keeping in mind if you are looking at going from thousands to tens of thousands of ColumnFamilies.

Resources