Why does Apache Ignite use more memory than configured - memory-leaks

When using Ignite 2.8.1-1 version and default configuration(1GB heap, 20% default region for off-heap storage and persistence enabled) on a Linux host with 16GB memory, I notice the ignite process could use up to 11GB of memory(verified by checking the resident size of memory used by the process in top, see attachment). When I check the metrics in the log, the consumed memory(heap+off-heap) doesn't add up to close to 7GB. One possibility is the extra memory could be used by the checkpoint buffer but that shall be by default 1/4 of the default region, that is, only about a quarter of 0.25 * 0.2 * 16GB.
Any hints on what the rest of the memory is used for?
Thanks!
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
^-- Node [id=a52857f2, name=dummy-ignite-node, uptime=6 days, 10:41:45.541]
^-- H/N/C [hosts=98, nodes=221, CPUs=1481]
^-- CPU [cur=16.17%, avg=16.02%, GC=0%]
^-- PageMemory [pages=788278]
^-- Heap [used=1474MB, free=58.24%, comm=2020MB]
^-- Off-heap [used=3115MB, free=10.39%, comm=3376MB]
^-- sysMemPlc region [used=0MB, free=99.99%, comm=100MB]
^-- default region [used=3115MB, free=1.93%, comm=3176MB]
^-- metastoreMemPlc region [used=0MB, free=99.73%, comm=0MB]
^-- TxLog region [used=0MB, free=100%, comm=100MB]
^-- Ignite persistence [used=57084MB]
^-- sysMemPlc region [used=0MB]
^-- default region [used=57084MB]
^-- metastoreMemPlc region [used=0MB]
^-- TxLog region [used=0MB]
^-- Outbound messages queue [size=0]
^-- Public thread pool [active=0, idle=0, qSize=0]
^-- System thread pool [active=0, idle=8, qSize=0]
=== Some logs regarding settings ===
dsCfg=DataStorageConfiguration [sysRegionInitSize=41943040, sysRegionMaxSize=104857600, pageSize=0, concLvl=0, dfltDataRegConf=DataRegionConfiguration [name=default, *maxSize=3330655027, initSize=268435456,* swapPath=null, pageEvictionMode=DISABLED, evictionThreshold=0.9, emptyPagesPoolSize=100, metricsEnabled=false, metricsSubIntervalCount=5, metricsRateTimeInterval=60000, persistenceEnabled=true, checkpointPageBufSize=0, lazyMemoryAllocation=true], dataRegions=null, storagePath=null, checkpointFreq=180000, lockWaitTime=10000, checkpointThreads=4, checkpointWriteOrder=SEQUENTIAL, walHistSize=20, maxWalArchiveSize=1073741824, walSegments=10, walSegmentSize=67108864, walPath=db/wal, walArchivePath=db/wal/archive, metricsEnabled=false, walMode=FSYNC, walTlbSize=131072, walBuffSize=0, walFlushFreq=2000, walFsyncDelay=1000, walRecordIterBuffSize=67108864, alwaysWriteFullPages=false, fileIOFactory=org.apache.ignite.internal.processors.cache.persistence.file.AsyncFileIOFactory#285c08c8, metricsSubIntervalCnt=5, metricsRateTimeInterval=60000, walAutoArchiveAfterInactivity=-1, writeThrottlingEnabled=false, walCompactionEnabled=false, walCompactionLevel=1, checkpointReadLockTimeout=null, walPageCompression=DISABLED, walPageCompressionLevel=null], activeOnStart=true, autoActivation=true, longQryWarnTimeout=3000, sqlConnCfg=null, cliConnCfg=ClientConnectorConfiguration [host=null, port=10800, portRange=100, sockSndBufSize=0, sockRcvBufSize=0, tcpNoDelay=true, maxOpenCursorsPerConn=128, threadPoolSize=8, idleTimeout=0, handshakeTimeout=10000, jdbcEnabled=true, odbcEnabled=true, thinCliEnabled=true, sslEnabled=false, useIgniteSslCtxFactory=true, sslClientAuth=false, sslCtxFactory=null, thinCliCfg=ThinClientConfiguration [maxActiveTxPerConn=100]], mvccVacuumThreadCnt=2, mvccVacuumFreq=5000, authEnabled=false, failureHnd=RestartProcessFailureHandler [super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet []]], commFailureRslvr=DefaultCommunicationFailureResolver []]
[2022-01-27T16:41:44,227][INFO ][main][IgniteKernal%ignite-jetstream-prd1] Daemon mode: on
[2022-01-27T16:41:44,227][INFO ][main][IgniteKernal%ignite-jetstream-prd1] OS: Linux 3.10.0-1160.49.1.el7.x86_64 amd64
[2022-01-27T16:41:44,227][INFO ][main][IgniteKernal%ignite-jetstream-prd1] OS user: root
[2022-01-27T16:41:44,227][INFO ][main][IgniteKernal%ignite-jetstream-prd1] PID: 7181
[2022-01-27T16:41:44,228][INFO ][main][IgniteKernal%ignite-jetstream-prd1] Language runtime: Scala ver. 2.11.12
[2022-01-27T16:41:44,228][INFO ][main][IgniteKernal%ignite-jetstream-prd1] VM information: OpenJDK Runtime Environment 1.8.0_312-b07 Red Hat, Inc. OpenJDK 64-Bit Server VM 25.312-b07
[2022-01-27T16:41:44,229][INFO ][main][IgniteKernal%ignite-jetstream-prd1] VM total memory: 0.96GB
[2022-01-27T16:41:44,229][INFO ][main][IgniteKernal%ignite-jetstream-prd1] Remote Management [restart: off, REST: off, JMX (remote: off)]
[2022-01-27T16:41:44,229][INFO ][main][IgniteKernal%ignite-jetstream-prd1] Logger: Log4J2Logger [quiet=true, config=config/ignite-log4j2.xml]
[2022-01-27T16:41:44,229][INFO ][main][IgniteKernal%ignite-jetstream-prd1] IGNITE_HOME=/usr/share/apache-ignite
[2022-01-27T16:41:44,230][INFO ][main][IgniteKernal%ignite-jetstream-prd1] VM arguments: [-XX:+AggressiveOpts, *-Xms1g, -Xmx1g, -XX:MaxPermSize=128M*, -Dfile.encoding=UTF-8, -DIGNITE_QUIET=true, -DIGNITE_HOME=/usr/share/apache-ignite, -DIGNITE_PROG_NAME=/usr/share/apache-ignite/bin/ignitevisorcmd.sh, -DIGNITE_DEPLOYMENT_MODE_OVERRIDE=ISOLATED]
Screen shot of top command output

Yes, the checkpoint buffer size is also taken into account here, if you haven't overridden the defaults, it should be 3GB/4 as you correctly highlighted. I wonder if it might be changed automatically since you have a lot more data ^-- Ignite persistence [used=57084MB] stored than the region capacity is - only 3GB. Also, this might be related to Direct Memory usage which I suppose is not being counted for the Java heap usage.
Anyway, I think it's better to check for Ignite memory metrics explicitly like data region and onheap usage and inspect them in detail.

Apache Ignite RAM necessary:
ON_HEAP_MEMORY (JVM heap max size)
+ OFF_HEAP_MEMORY (default region max size)
+ OFF_HEAP_MEMORY * 0.3 (Indexes also require memory. Basic use cases will add a 30% increase)
+ MIN(256MB, OFF_HEAP_MEMORY) (OFF_HEAP_MEMORY < 1 GB) || OFF_HEAP_MEMORY/4 (OFF_HEAP_MEMORY between 1 GB and 8 GB) || 2GB (OFF_HEAP_MEMORY > 8 GB) (default region checkpointing buffer size)
+ 100MB (sysMemPlc region max size)
+ 100MB (metastoreMemPlc region max size)
+ 100MB (TxLog region max size)
+ 100MB (volatileDsMemPlc region max size)

Related

Understanding trigger points for vm.dirty_background_ratio and vm.dirty_ratio

We use prometheus node exporter to monitor our systems
It contains node_memory_Dirty_bytes, node_memory_MemAvailable_bytes
so is it safe to say that when node_memory_Dirty_bytes / node_memory_MemAvailable_bytes * 100 exceeds vm.dirty_background_ratio then the background flusher runs?
Similarly when it exeeeds vm.dirty_ratio then all writers block?
Some other documentation said it's more than just available bytes
dirty_background_ratio
Contains, as a percentage of total available memory that contains free pages and reclaimable pages, the number of pages at which the background kernel flusher threads will start writing out dirty data.
So here it says available = free + reclaimable pages
So does the formula change to
node_memory_Dirty_bytes / (node_memory_MemFree_bytes + node_memory_KReclaimable_bytes + node_memory_SReclaimable_bytes ) * 100

How is memory used value derived in check_snmp_mem.pl?

I was configuring icinga2 to get memory used information from one linux client using script at check_snmp_mem.pl . Any idea how the memory used is derived in this script ?
Here is free command output
# free
total used free shared buff/cache available
Mem: 500016 59160 89564 3036 351292 408972
Swap: 1048572 4092 1044480
where as the performance data shown in icinga dashboard is
Label Value Max Warning Critical
ram_used 137,700.00 500,016.00 470,015.00 490,016.00
swap_used 4,092.00 1,048,572.00 524,286.00 838,858.00
Looking through the source code, it mentions ram_used for example in this line:
$n_output .= " | ram_used=" . ($$resultat{$nets_ram_total}-$$resultat{$nets_ram_free}-$$resultat{$nets_ram_cache}).";";
This strongly suggests that ram_used is calculated as the difference of the total RAM and the free RAM and the RAM used for cache. These values are retrieved via the following SNMP ids:
my $nets_ram_free = "1.3.6.1.4.1.2021.4.6.0"; # Real memory free
my $nets_ram_total = "1.3.6.1.4.1.2021.4.5.0"; # Real memory total
my $nets_ram_cache = "1.3.6.1.4.1.2021.4.15.0"; # Real memory cached
I don't know how they correlate to the output of free. The difference in free memory reported by free and to Icinga is 48136, so maybe you can find that number somewhere.

PostgreSQL out of memory: Linux OOM killer

I am having issues with a large query, that I expect to rely on wrong configs of my postgresql.config. My setup is PostgreSQL 9.6 on Ubuntu 17.10 with 32GB RAM and 3TB HDD. The query is running pgr_dijkstraCost to create an OD-Matrix of ~10.000 points in a network of 25.000 links. Resulting table is thus expected to be very big ( ~100'000'000 rows with columns from, to, costs). However, creating simple test as select x,1 as c2,2 as c3
from generate_series(1,90000000) succeeds.
The query plan:
QUERY PLAN
--------------------------------------------------------------------------------------
Function Scan on pgr_dijkstracost (cost=393.90..403.90 rows=1000 width=24)
InitPlan 1 (returns $0)
-> Aggregate (cost=196.82..196.83 rows=1 width=32)
-> Seq Scan on building_nodes b (cost=0.00..166.85 rows=11985 width=4)
InitPlan 2 (returns $1)
-> Aggregate (cost=196.82..196.83 rows=1 width=32)
-> Seq Scan on building_nodes b_1 (cost=0.00..166.85 rows=11985 width=4)
This leads to a crash of PostgreSQL:
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
normally and possibly corrupted shared memory.
Running dmesg I could trace it down to be an Out of memory issue:
Out of memory: Kill process 5630 (postgres) score 949 or sacrifice child
[ 5322.821084] Killed process 5630 (postgres) total-vm:36365660kB,anon-rss:32344260kB, file-rss:0kB, shmem-rss:0kB
[ 5323.615761] oom_reaper: reaped process 5630 (postgres), now anon-rss:0kB,file-rss:0kB, shmem-rss:0kB
[11741.155949] postgres invoked oom-killer: gfp_mask=0x14201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), nodemask=(null), order=0, oom_score_adj=0
[11741.155953] postgres cpuset=/ mems_allowed=0
When running the query I also can observe with topthat my RAM is going down to 0 before the crash. The amount of committed memory just before the crash:
$grep Commit /proc/meminfo
CommitLimit: 18574304 kB
Committed_AS: 42114856 kB
I would expect the HDD is used to write/buffer temporary data, when RAM is not enough. But the available space on my hdd does not change during the processing. So I began to dig for missing configs (expecting issues due to my relocated data-directory) and following different sites:
https://www.postgresql.org/docs/current/static/kernel-resources.html#LINUX-MEMORY-OVERCOMMIT
https://www.credativ.com/credativ-blog/2010/03/postgresql-and-linux-memory-management
My original settings of postgresql.conf are default except for changes in the data-directory:
data_directory = '/hdd_data/postgresql/9.6/main'
shared_buffers = 128MB # min 128kB
#huge_pages = try # on, off, or try
#temp_buffers = 8MB # min 800kB
#max_prepared_transactions = 0 # zero disables the feature
#work_mem = 4MB # min 64kB
#maintenance_work_mem = 64MB # min 1MB
#replacement_sort_tuples = 150000 # limits use of replacement selection sort
#autovacuum_work_mem = -1 # min 1MB, or -1 to use maintenance_work_mem
#max_stack_depth = 2MB # min 100kB
dynamic_shared_memory_type = posix # the default is the first option
I changed the config:
shared_buffers = 128MB
work_mem = 40MB # min 64kB
maintenance_work_mem = 64MB
Relaunched with sudo service postgresql reload and tested the same query, but found no change in behavior. Does this simply mean, such a large query can not be done? Any help appreciated.
I'm having similar trouble, but not with PostgreSQL (which is running happily): what is happening is simply that the kernel cannot allocate more RAM to the process, whichever process it is.
It would certainly help to add some swap to your configuration.
To check how much RAM and swap you have, run: free -h
On my machine, here is what it returns:
total used free shared buff/cache available
Mem: 7.7Gi 5.3Gi 928Mi 865Mi 1.5Gi 1.3Gi
Swap: 9.4Gi 7.1Gi 2.2Gi
You can clearly see that my machine is quite overloaded: about 8Gb of RAM, and 9Gb of swap, from which 7 are used.
When the RAM-hungry process got killed after Out of memory, I saw both RAM and swap being used at 100%.
So, allocating more swap may improve our problems.

Linux `top` command: how much process memory is physically stored in swap space?

Let's say I run my program on a 64-bit Linux machine with 64 Gb of RAM. In my very small C program immediately after the start I do
void *p = sbrk(1024ull * 1024 * 1024 * 120);
this moving my data segment break forward by 120 Gb.
After the above sbrk call top entry for my process shows RES at some low value, VIRT at 120g, and SWAP at 120g.
After this operation I write something into the first 90 Gb of the above region
memset(p, 0xAB, 1024ull * 1024 * 1024 * 90);
This causes some changes in the top entry for my process: VIRT expectedly remains at 120g, RES becomes almost 64g, SWAP drops to around 56g.
The common Swap stats in the header of top output show that swap file usage increases, which is expected since my program will have to push about 26 Gb of memory pages into the swap file.
So, according to the above observations, SWAP column simply reports my process's non-RES address space regardless of whether this address space has been "materialized", i.e. regardless of whether I already wrote something into that region of virtual memory.
But is there any way to figure out how much of that SWAP size has actually been "materialized" and backed up by something stored in the swap file? I.e. is there any way to make top to display that 26 Gb value for my process?
The behavior depends on a version of procps you are using. For instance, in version 3.0.5 SWAP value equals:
task->size - task->resident
and it is exactly what you are encountering. Man top.1 says:
VIRT = SWAP + RES
Procps-ng, however, reads /proc/pid/status and sets SWAP correctly
https://gitlab.com/procps-ng/procps/blob/master/proc/readproc.c#L383
So, you can update procps or look at /proc/pid/status directly

Why is my process taking higher resident memory as compared to virtual memory?

'top' logs of my linux process show that its resident memory is around 6 times of the virtual memory. I have researched a lot but couldn't find any reason for such a behavior. Ideally VIRT is always higher than RES due to linux kernel's memory management. Top output is below -
13743 root 20 0 15.234g 0.010t 4372 R 13.4 4.0 7:43.41 q
Not quite.
The g suffix indicates Gibibyte(s), and t indicates Tebibyte(s).
Let's do the conversion of 0.010t to g (GiB):
zsh% print $((0.010 * 1024))g
10.24g
And 10.24g < 15.234g, so yor assumption is not correct i.e. top is correctly showing the correct values for virtual set size (VSZ) and resident set size (RSS) -- just in different units (need to take a peek at the source for why).

Resources