Clearing logstash memory - logstash

Please explain me how logstash works with memory and events. I understand that when an event occurs, it is written to elasticsearch (in my case) and after that it should be cleaned from memory by the garbage collector.
But in debug mode, I see in the logs all the entries that went to elasticsearch and I don’t see them being cleaned out.
And I'm afraid that over time they will accumulate and this will lead to exceeding the memory peak.
On my volume of transmitted data, I still do not see a strong change in memory consumption, but I want to understand how to do it right.
Accordingly, the question is whether it is necessary to forcefully clean up the events so that they do not clog the memory?
If so, how to do it?
logstash.conf
input {
jdbc {
jdbc_connection_string => "jdbc:postgresql://<server>:5432/<DB>"
jdbc_user => "user"
jdbc_password => "pass"
jdbc_driver_class => "org.postgresql.Driver"
tracking_column => "unix_ts_in_secs"
schedule => "*/3 * * * * *"
statement => "SELECT uuid, header, location, price, parse_date AS unix_ts_in_secs FROM pdf_store WHERE (parse_date > :sql_last_value AND parse_date < NOW()) ORDER BY parse_date;"
}
}
output {
stdout { codec => json_lines }
elasticsearch {
hosts => ["http://localhost:9200"]
document_id => "%{uuid}"
index => "test_index7"
}
}
pipelines.yml
- pipeline.id: pipeline_test
pipeline.workers: 1
pipeline.batch.size: 1
pipeline.batch.delay: 1
path.config: "/config/logstash.conf"

According to Elastic recommandation you have to check the JVM heap:
Be aware of the fact that Logstash runs on the Java VM. This means that Logstash will always use the maximum amount of memory you allocate to it.
Look for other applications that use large amounts of memory and may be causing Logstash to swap to disk. This can happen if the total memory used by applications exceeds physical memory.
The recommended heap size for typical ingestion scenarios should be no less than 4GB and no more than 8GB.
CPU utilization can increase unnecessarily if the heap size is too low, resulting in the JVM constantly garbage collecting. You can check for this issue by doubling the heap size to see if performance improves.
Do not increase the heap size past the amount of physical memory. Some memory must be left to run the OS and other processes. As a general guideline for most installations, don’t exceed 50-75% of physical memory. The more memory you have, the higher percentage you can use.
Set the minimum (Xms) and maximum (Xmx) heap allocation size to the same value to prevent the heap from resizing at runtime, which is a very costly process.
Link can help you : https://www.elastic.co/guide/en/logstash/master/performance-troubleshooting.html

Related

Why is the default value of spark.memory.fraction so low?

From the Spark configuration docs, we understand the following about the spark.memory.fraction configuration parameter:
Fraction of (heap space - 300MB) used for execution and storage. The lower this is, the more frequently spills and cached data eviction occur. The purpose of this config is to set aside memory for internal metadata, user data structures, and imprecise size estimation in the case of sparse, unusually large records. Leaving this at the default value is recommended.
The default value for this configuration parameter is 0.6 at the time of writing this question. This means that for an executor with, for example, 32GB of heap space and the default configurations we have:
300MB of reserved space (a hardcoded value on this line)
(32GB - 300MB) * 0.6 = 19481MB of shared memory for execution + storage
(32GB - 300MB) * 0.4 = 12987MB of user memory
This "user memory" is (according to the docs) used for the following:
The rest of the space (40%) is reserved for user data structures, internal metadata in Spark, and safeguarding against OOM errors in the case of sparse and unusually large records.
On an executor with 32GB of heap space, we're allocating 12,7GB of memory for this, which feels rather large!
Do these user data structures/internal metadata/safeguarding against OOM errors really need that much space? Are there some striking examples of user memory usage which illustrate the need of this big of a user memory region?
I did some research and imo its 0.6 not to ensure enough memory for user memory but to ensure that execution + storage can fit into old gen region of jvm
Here i found something interesting: Spark tuning
The tenured generation size is controlled by the JVM’s NewRatio
parameter, which defaults to 2, meaning that the tenured generation is
2 times the size of the new generation (the rest of the heap). So, by
default, the tenured generation occupies 2/3 or about 0.66 of the
heap. A value of 0.6 for spark.memory.fraction keeps storage and
execution memory within the old generation with room to spare. If
spark.memory.fraction is increased to, say, 0.8, then NewRatio may
have to increase to 6 or more.
So by default in OpenJvm this ratio is set to 2 so you have 0,66% for old-gen, they choose to use 0,6 to have small margin
I found that in version 1.6 this was changed to 0,75 and it was causing some issues, here is Jira ticket
In the description you will find sample code which is adding records to cache just to use whole memory reserved for exeution + storage. With storage + execution set to higher amount than old gen overhead for gc was really high and code which was executed on older version (with this setting equal to 0.6) was 6 time faster (40-50 sec vs 6 min)
There was discussion and community decided to roll it back to 0.6 in Spark 2.0, here is PR with changes
I think that if you want to increase performance a little bit, you can try to change it up to 0.66 but if you want to have more memory for execution+storageyou need to also adjust your jvm and change old/new ratio as well otherwise you may face performance issues

Has anyone faced Cassandra issue "Maximum memory usage reached"?

I'm using apache cassandra-3.0.6 ,4 node cluster, RF=3, CONSISTENCY is '1', Heap 16GB.
Im getting info message in system.log as
INFO [SharedPool-Worker-1] 2017-03-14 20:47:14,929 NoSpamLogger.java:91 - Maximum memory usage reached (536870912 bytes), cannot allocate chunk of 1048576 bytes
don't know exactly which memory it mean and I have tried by increasing the file_cache_size_in_mb to 1024 from 512 in Cassandra.yaml file But again it immediatly filled the remaining 512MB increased and stoping the application recording by showing the same info message as
INFO [SharedPool-Worker-5] 2017-03-16 06:01:27,034 NoSpamLogger.java:91 - Maximum memory usage reached (1073741824 bytes), cannot allocate chunk of 1048576 bytes
please suggest if anyone has faced the same issue..Thanks!!
Bhargav
As far as I can tell with Cassandra 3.11, no matter how large you set file_cache_size_in_mb, you will still get this message. The cache fills up, and writes this useless message. It happens in my case whether I set it to 2GB or 20GB. This may be a bug in the cache eviction strategy, but I can't tell.
The log message indicates that the node's off-heap cache is full because the node is busy servicing reads.
The 536870912 bytes in the log message is equivalent to 512 MB which is the default file_cache_size_in_mb.
It is fine to see the occasional occurrences of the message in the logs which is why it is logged at INFO level but if it gets logged repeatedly, it is an indicator that the node is getting overloaded and you should consider increasing the capacity of your cluster by adding more nodes.
For more info, see my post on DBA Stack Exchange -- What does "Maximum memory usage reached" mean in the Cassandra logs?. Cheers!

What is spark.driver.maxResultSize?

The ref says:
Limit of total size of serialized results of all partitions for each
Spark action (e.g. collect). Should be at least 1M, or 0 for
unlimited. Jobs will be aborted if the total size is above this limit.
Having a high limit may cause out-of-memory errors in driver (depends
on spark.driver.memory and memory overhead of objects in JVM). Setting
a proper limit can protect the driver from out-of-memory errors.
What does this attribute do exactly? I mean at first (since I am not battling with a job that fails due to out of memory errors) I thought I should increase that.
On second thought, it seems that this attribute defines the max size of the result a worker can send to the driver, so leaving it at the default (1G) would be the best approach to protect the driver..
But will happen on this case, the worker will have to send more messages, so the overhead will be just that the job will be slower?
If I understand correctly, assuming that a worker wants to send 4G of data to the driver, then having spark.driver.maxResultSize=1G, will cause the worker to send 4 messages (instead of 1 with unlimited spark.driver.maxResultSize). If so, then increasing that attribute to protect my driver from being assassinated from Yarn should be wrong.
But still the question above remains..I mean what if I set it to 1M (the minimum), will it be the most protective approach?
assuming that a worker wants to send 4G of data to the driver, then having spark.driver.maxResultSize=1G, will cause the worker to send 4 messages (instead of 1 with unlimited spark.driver.maxResultSize).
No. If estimated size of the data is larger than maxResultSize given job will be aborted. The goal here is to protect your application from driver loss, nothing more.
if I set it to 1M (the minimum), will it be the most protective approach?
In sense yes, but obviously it is not useful in practice. Good value should allow application to proceed normally but protect application from unexpected conditions.

Severe degradation in Cassandra Write performance with continuous streaming data over time

I notice a severe degradation in Cassandra write performance with continuous writes over time.
I am inserting time series data with time stamp (T) as the column name in a wide column that stores 24 hours worth of data in a single row.
Streaming data is written from data generator (4 instances, each with 256 threads) inserting data into multiple rows in parallel.
Additionally, data is also inserted into a column family that has indexes over DateType and UUIDType.
CF1:
Col1 | Col2 | Col3(DateType) | Col(UUIDType4) |
RowKey1
RowKey2
:
:
CF2 (Wide column family):
RowKey1 (T1, V1) (T2, V3) (T4, V4) ......
RowKey2 (T1, V1) (T3, V3) .....
:
:
The no. of data points inserted/sec decreases over time until no further inserts are possible. The initial performance is of the order of 60000 ops/sec for ~6-8 hours and then it gradually tapers down to 0 ops/sec. Restarting the DataStax_Cassandra_Community_Server on all nodes helps restore the original throughput, but the behaviour is observed again after a few hours.
OS: Windows Server 2008
No.of nodes: 5
Cassandra version: DataStax Community 1.2.3
RAM: 8GB
HeapSize: 3GB
Garbage collector: default settings [ParNewGC]
I also notice a phenomenal increase in the no. of Pending write requests as reported by the OpsCenter (~of magnitude 200,000) when the performance begins to degrade.
I fail to understand what is preventing the write operations to be completed and why do they pile up over time? I do not see anything suspicious in the Cassandra logs.
Has the OS settings got anything to do with this?
Any suggestions to probe this issue further?
Do you see an increase in pending compactions (nodetool compactionstats)? Or are you seeing blocked flush writers (nodetool tpstats)? I'm guessing you're writing data to Cassandra faster than it can be consumed.
Cassandra won't block on writes, but that doesn't mean that you won't see an increase in the amount of heap used. Pending writes have overhead, as do blocked memtables. In addition, each SSTable has some memory overhead. If compactions fall behind this is magnified. At some point you probably don't have enough headroom in your heap to allocate the objects required for a single write, and you end up spending all your time waiting for an allocation that the GC can't provide.
With increased total capacity, or more IO on the machines consuming the data you would be able to sustain this write rate, but everything indicates you don't have enough capacity to sustain that load over time.
Bringing your write timeout in line with the new default in 2.0 (of 2s instead of 10s) will help with your write backlog by allowing load shedding to kick in faster: https://issues.apache.org/jira/browse/CASSANDRA-6059

Couchdb 100% CPU usage

I used Couchdb to create a private NPM mirror, but I found that beam.smp kept my CPU usage to 100%, is there any way to make it lower, like 50%?
Thank you very much.
You cannot directly limit CPU/memory usage for CouchDB, but you may tweak Replicator options to reduce their usage. Options you're interested:
http_connections
Defines maximum number of HTTP connections per replication. Keeping them lower reduces transfer bandwidth.
[replicator]
http_connections = 20
worker_batch_size
With lower batch sizes checkpoints are done more frequently. Lower batch sizes also reduce the total amount of used RAM memory.
[replicator]
worker_batch_size = 500
worker_processes
Amount of replication workers. Keeping them lower reduces amount of data replication handled => reduces CPU usage because of less data to process.
[replicator]
worker_processes = 4
Play with these options to find right combination to fit your limits.

Resources