Logstash memory heap issues - logstash

I am trying to ingest JSON records using logstash but am running into memory issues. Previously our pipeline could run with default settings (memory queue, batch size 125, one worker per core) and process 5k events per second. We added some data to the JSON records and now the heap memory goes up and gradually falls apart after one hour of ingesting.
What I have tried to improve this is:
Increase memory via options in docker-compose to "LS_JAVA_OPTS=-Xmx8g -Xms8g". The virtual machine has 16GB of memory.
Switched to a persistent queue.
Lowered pipeline batch size from 125 down to 75.
While these have helped, it just delays the time until the memory issues start to occur. Any ideas on what I should do to fix this? Should I increase the memory some more? Should I increase the size of the persistent queue? I am at my wits end!
Edit: Here is another image of memory usage after reducing pipeline works to 6 and batch size to 75:

For anybody who runs into this and is using a lot of different field names, my problem was due to an issue with logstash here that will be fixed in version 7.17. Logstash is caching field names and if your events have a lot of unique field names, it will cause out of memory errors like in my attached graphs.

Related

How to restrict Cassandra to fixed memory

We are using cassandra in order to collect the data from thingsboard. The memory it started with was 4GB (after executing the systemctl status for cassandra) and after 15 hours it has reached up to 9.3GB.
I want to know why is there this much increase in memory and is there any way to control it or to restrict it to use fixed amount of memory without the data being lost.
Check this for setting max heap size used . But tune cassandra gc properly when you change this.

Too much disk space used by Apache Kudu for WALs

I have a hive table that is of 2.7 MB (which is stored in a parquet format). When I use impala-shell to convert this hive table to kudu, I notice that the /tserver/ folder size increases by around 300 MB. Upon exploring further, I see it is the /tserver/wals/ folder that holds the majority of this increase. I am facing serious issues due to this. If a 2.7 MB file generates a 300 MB WAL, then I cannot really work on bigger data. Is there a solution to this?
My kudu version is 1.1.0 and impala is 2.7.0.
I never used KUDU but I'm able to Google on a few keywords, and read some documentation.
From the Kudu configuration reference section "Unsupported flags"...
--log_preallocate_segments Whether the WAL should preallocate the entire segment before writing to it Default true
--log_segment_size_mb The default segment size for log roll-overs, in MB Default 64
--log_min_segments_to_retain The minimum number of past log segments to keep at all times, regardless of what is required for
durability. Must be at least 1. Default 2
--log_max_segments_to_retain The maximum number of past log segments to keep at all times for the purposes of catching up other
peers. Default 10
Looks like you have a minimum disk requirement of (2+1)x64 MB per tablet, for the WAL only. And it can grow up to 10x64 MB if some tablets are straggling and cannot catch up.
Plus some temp disk space for compaction etc. etc.
[Edit] these default values have changed in Kudu 1.4 (released in June 2017); quoting the Release Notes...
The default size for Write Ahead Log (WAL) segments has been reduced
from 64MB to 8MB. Additionally, in the case that all replicas of a
tablet are fully up to date and data has been flushed from memory,
servers will now retain only a single WAL segment rather than two.
These changes are expected to reduce the average consumption of disk
space on the configured WAL disk by 16x

Cassandra Mutation too Large, for small insert

I'm getting these errors:
java.lang.IllegalArgumentException: Mutation of 16.000MiB is too large for the maximum size of 16.000MiB
in Apache Cassandra 3.x. I'm doing inserts of 4MB or 8MB blobs, but not anything greater than 8MB. Why am I hitting the 16MB limit? Is Cassandra batching up multiple writes (inserts) and creating a "mutation" that is too large? (If so, why would it do that, since the configured limit is 8MB?)
There is little documentation on mutations -- except to say that a mutation is an insert or delete. How can I prevent these errors?
you can increase the commit log size to 64 mb in cassandra.yaml
commitlog_segment_size_in_mb: 64
By default the commitLog size is 32 mb.
By design intent the maximum allowed segment size is 50% of the configured commit_log_segment_size_in_mb. This is so Cassandra avoids writing segments with large amounts of empty space.
you should investigate why the write size has suddenly increased. If it is not expected i.e. due to a planned change then it may well be a problem with the client application that needs further inspection.

Cassandra Cache Memory Management

I have 4 Node Cassandra 2.1.13 Cluster with the below configurations.
32 GB Ram
Max HEAP SIZE - 8 GB
250 GB Hard Disk Each (Not SSD).
I am trying to do a load test on write and read. I have created a multi threaded program to create 50 Million Records. Each row has 30 Columns.
I was able to insert 50 Million records in 84 Minutes at a rate of 9.5K insert per seconds.
Next I was trying to read those 50 Million records randomly using 32 clients and I was able to do read at 28K per second.
The problem is after some time, the memory gets full and most of it cached. almost 20GB.After some time the system hangs because of out of memory.
If I clean the cache Memory, my read throughput goes down to 100 per second.
How should I manage my cache memory without affecting read performance.
Let me know if you need any more any more information.
What you noticed is the Linux disk cache, which is supposed to serve data from RAM instead of going to disk in order to speed up data read access. Please make sure to understand how it works, e.g. see here.
As you're already using top, I'd recommend add "cache misses" as well to the overview (hit F + select nMaj). This will show you whenever a disk read cannot be served by the cache. You should see an increase of misses once the page cache starts to become saturated.
How should I manage my cache memory without affecting read performance.
The cache is fully managed by Linux and does not need any actions from your side to take care of.

How to prevent Cassandra commit logs filling up disk space

I'm running a two node Datastax AMI cluster on AWS. Yesterday, Cassandra started refusing connections from everything. The system logs showed nothing. After a lot of tinkering, I discovered that the commit logs had filled up all the disk space on the allotted mount and this seemed to be causing the connection refusal (deleted some of the commit logs, restarted and was able to connect).
I'm on DataStax AMI 2.5.1 and Cassandra 2.1.7
If I decide to wipe and restart everything from scratch, how do I ensure that this does not happen again?
You could try lowering the commitlog_total_space_in_mb setting in your cassandra.yaml. The default is 8192MB for 64-bit systems (it should be commented-out in your .yaml file... you'll have to un-comment it when setting it). It's usually a good idea to plan for that when sizing your disk(s).
You can verify this by running a du on your commitlog directory:
$ du -d 1 -h ./commitlog
8.1G ./commitlog
Although, a smaller commit log space will cause more frequent flushes (increased disk I/O), so you'll want to keep any eye on that.
Edit 20190318
Just had a related thought (on my 4-year-old answer). I saw that it received some attention recently, and wanted to make sure that the right information is out there.
It's important to note that sometimes the commit log can grow in an "out of control" fashion. Essentially, this can happen because the write load on the node exceeds Cassandra's ability to keep up with flushing the memtables (and thus, removing old commitlog files). If you find a node with dozens of commitlog files, and the number seems to keep growing, this might be your issue.
Essentially, your memtable_cleanup_threshold may be too low. Although this property is deprecated, you can still control how it is calculated by lowering the number of memtable_flush_writers.
memtable_cleanup_threshold = 1 / (memtable_flush_writers + 1)
The documentation has been updated as of 3.x, but used to say this:
# memtable_flush_writers defaults to the smaller of (number of disks,
# number of cores), with a minimum of 2 and a maximum of 8.
#
# If your data directories are backed by SSD, you should increase this
# to the number of cores.
#memtable_flush_writers: 8
...which (I feel) led to many folks setting this value WAY too high.
Assuming a value of 8, the memtable_cleanup_threshold is .111. When the footprint of all memtables exceeds this ratio of total memory available, flushing occurs. Too many flush (blocking) writers can prevent this from happening expediently. With a single /data dir, I recommend setting this value to 2.
In addition to decreasing the commitlog size as suggested by BryceAtNetwork23, a proper solution to ensure it won't happen again will have monitoring of the disk setup so that you are alerted when its getting full and have time to act/increase the disk size.
Seeing as you are using DataStax, you could set an alert for this in OpsCenter. Haven't used this within the cloud myself, but I imagine it would work. Alerts can be set by clicking Alerts in the top banner -> Manage Alerts -> Add Alert. Configure the mounts to watch and the thresholds to trigger on.
Or, I'm sure there are better tools to monitor disk space out there.

Resources