Identify Container Memory (MEM) consumption (what uses all the memory?) - linux

I got a container (registry.access.redhat.com/ubi8/ubi-minimal) which runs a bash script (move files) in indefinitely loop
This test is with 900 files every minute and each file is ~ 1 KB (just a small XML)
here part of the yml file including the cmd executed by the pod
command: ["/bin/sh", "-c", "shopt -s nullglob && while true ; do for f in $vfsourcefolder/*.xml ; do randomNum=$(shuf -i $FolderStartNumber-$FolderEndNumber -n 1) ; mkdir -p $vfsourcefolder/$vfsubfolderprefix$randomNum ; mv $f $_ ; done ; done"]
livenessProbe:
exec:
command: ["/bin/sh", "-c", "test $checkFiles -gt $(ls -f $vfsourcefolder | wc -l)"]
initialDelaySeconds: 15
periodSeconds: 15
timeoutSeconds: 3
resources:
requests:
memory: 256Mi
cpu: 25m
limits:
memory: 4Gi
cpu: 2
after running for ~3days it consumes 3 GB of memory (according to kubectl top)
tilo#myserver:/$ kubectl top pod fs-probe-spreader1-0
NAME CPU(cores) MEMORY(bytes)
fs-probe-spreader1-0 217m 3207Mi
but I can't find out what takes all the memory.
The slabinfo shows lots of object in cifs_inode_cache and cifs_inode_cache
here stats from the pod
ps aux
top -b
df -TPh
cat /sys/fs/cgroup/memory/memory.usage_in_bytes
cat /sys/fs/cgroup/memory/memory.stat
cat /sys/fs/cgroup/memory/memory.kmem.slabinfo
[root#fs-probe-spreader1-0 /]# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 6.5 0.0 13148 3136 ? Ds Jan18 279:58 /bin/sh -c shopt -s nullglob && while true ; do for f in $vfsourcefolder/*.xml ; do randomNum=$(shuf -i $FolderStar
root 1266813 0.0 0.0 19352 3764 pts/0 Ss 23:12 0:00 bash
root 1372717 0.0 0.0 1092036 9720 ? Rsl 23:45 0:00 /usr/bin/runc init
root 1372719 0.0 0.0 51860 3676 pts/0 R+ 23:45 0:00 ps aux
[root#fs-probe-spreader1-0 /]# top -b
top - 23:53:56 up 4 days, 2:52, 0 users, load average: 2.46, 2.31, 2.26
Tasks: 3 total, 1 running, 2 sleeping, 0 stopped, 0 zombie
%Cpu(s): 5.0 us, 0.0 sy, 0.0 ni, 95.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 16009.0 total, 507.8 free, 6127.2 used, 9374.0 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 9551.1 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 root 20 0 13148 3136 2544 D 6.7 0.0 280:56.60 sh
1398222 root 20 0 19352 3716 3148 S 0.0 0.0 0:00.01 bash
1401883 root 20 0 56192 4208 3660 R 0.0 0.0 0:00.00 top
[root#fs-probe-spreader1-0 /]# df -TPh
Filesystem Type Size Used Avail Use% Mounted on
overlay overlay 97G 23G 75G 24% /
tmpfs tmpfs 64M 0 64M 0% /dev
tmpfs tmpfs 7.9G 0 7.9G 0% /sys/fs/cgroup
//myshare1.file.core.windows.net/mainfs cifs 100G 28G 73G 28% /trex/root
/dev/sda1 ext4 97G 23G 75G 24% /etc/hosts
shm tmpfs 64M 0 64M 0% /dev/shm
tmpfs tmpfs 4.0G 12K 4.0G 1% /run/secrets/kubernetes.io/serviceaccount
tmpfs tmpfs 7.9G 0 7.9G 0% /proc/acpi
tmpfs tmpfs 7.9G 0 7.9G 0% /proc/scsi
tmpfs tmpfs 7.9G 0 7.9G 0% /sys/firmware
[root#fs-probe-spreader1-0 /]# cat /sys/fs/cgroup/memory/memory.usage_in_bytes
3374436352
[root#fs-probe-spreader1-0 /]# cat /sys/fs/cgroup/memory/memory.stat
cache 19505152
rss 1482752
rss_huge 0
shmem 0
mapped_file 0
dirty 135168
writeback 0
pgpgin 989469294
pgpgout 989464142
pgfault 2149218225
pgmajfault 0
inactive_anon 0
active_anon 1368064
inactive_file 6352896
active_file 13246464
unevictable 0
hierarchical_memory_limit 4294967296
total_cache 19505152
total_rss 1482752
total_rss_huge 0
total_shmem 0
total_mapped_file 0
total_dirty 135168
total_writeback 0
total_pgpgin 989469294
total_pgpgout 989464142
total_pgfault 2149218225
total_pgmajfault 0
total_inactive_anon 0
total_active_anon 1368064
total_inactive_file 6352896
total_active_file 13246464
total_unevictable 0
[root#fs-probe-spreader1-0 /]# cat /sys/fs/cgroup/memory/memory.kmem.slabinfo
slabinfo - version: 2.1
# name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
kmalloc-rcl-128 64 64 128 32 1 : tunables 0 0 0 : slabdata 2 2 0
TCP 42 42 2240 14 8 : tunables 0 0 0 : slabdata 3 3 0
kmalloc-rcl-64 320 320 64 64 1 : tunables 0 0 0 : slabdata 5 5 0
kmalloc-rcl-96 126 126 96 42 1 : tunables 0 0 0 : slabdata 3 3 0
radix_tree_node 252 252 584 28 4 : tunables 0 0 0 : slabdata 9 9 0
UDPv6 96 96 1344 24 8 : tunables 0 0 0 : slabdata 4 4 0
kmalloc-96 168 168 96 42 1 : tunables 0 0 0 : slabdata 4 4 0
kmalloc-2k 64 64 2048 16 8 : tunables 0 0 0 : slabdata 4 4 0
cifs_inode_cache 3454395 3454395 776 21 4 : tunables 0 0 0 : slabdata 164495 164495 0
kmalloc-8 2048 2048 8 512 1 : tunables 0 0 0 : slabdata 4 4 0
buffer_head 5460 5460 104 39 1 : tunables 0 0 0 : slabdata 140 140 0
ext4_inode_cache 290 290 1096 29 8 : tunables 0 0 0 : slabdata 10 10 0
shmem_inode_cache 66 66 720 22 4 : tunables 0 0 0 : slabdata 3 3 0
ovl_inode 736 736 688 23 4 : tunables 0 0 0 : slabdata 32 32 0
pde_opener 408 408 40 102 1 : tunables 0 0 0 : slabdata 4 4 0
eventpoll_pwq 224 224 72 56 1 : tunables 0 0 0 : slabdata 4 4 0
kmalloc-1k 64 64 1024 16 4 : tunables 0 0 0 : slabdata 4 4 0
kmalloc-32 512 512 32 128 1 : tunables 0 0 0 : slabdata 4 4 0
kmalloc-4k 32 32 4096 8 8 : tunables 0 0 0 : slabdata 4 4 0
kmalloc-512 64 64 512 16 2 : tunables 0 0 0 : slabdata 4 4 0
skbuff_head_cache 64 64 256 16 1 : tunables 0 0 0 : slabdata 4 4 0
kmalloc-192 84 84 192 21 1 : tunables 0 0 0 : slabdata 4 4 0
inode_cache 104 104 608 26 4 : tunables 0 0 0 : slabdata 4 4 0
pid 128 128 128 32 1 : tunables 0 0 0 : slabdata 4 4 0
anon_vma 2028 2028 104 39 1 : tunables 0 0 0 : slabdata 52 52 0
vm_area_struct 837 912 208 19 1 : tunables 0 0 0 : slabdata 48 48 0
mm_struct 120 120 1088 30 8 : tunables 0 0 0 : slabdata 4 4 0
signal_cache 112 112 1152 28 8 : tunables 0 0 0 : slabdata 4 4 0
sighand_cache 60 60 2112 15 8 : tunables 0 0 0 : slabdata 4 4 0
anon_vma_chain 1957 2368 64 64 1 : tunables 0 0 0 : slabdata 37 37 0
files_cache 92 92 704 23 4 : tunables 0 0 0 : slabdata 4 4 0
task_delay_info 204 204 80 51 1 : tunables 0 0 0 : slabdata 4 4 0
kmalloc-64 3264 3264 64 64 1 : tunables 0 0 0 : slabdata 51 51 0
cred_jar 1323 1323 192 21 1 : tunables 0 0 0 : slabdata 63 63 0
task_struct 33 52 7680 4 8 : tunables 0 0 0 : slabdata 13 13 0
PING 64 64 1024 16 4 : tunables 0 0 0 : slabdata 4 4 0
sock_inode_cache 76 76 832 19 4 : tunables 0 0 0 : slabdata 4 4 0
proc_inode_cache 432 432 680 24 4 : tunables 0 0 0 : slabdata 18 18 0
dentry 3346497 3346497 192 21 1 : tunables 0 0 0 : slabdata 159357 159357 0
filp 576 576 256 16 1 : tunables 0 0 0 : slabdata 36 36 0

Related

How to interpret such value of the time column in /proc/self/mountstats - does it indicate a performance issue?

I have some bladefs volume and I just checked /proc/self/mountstats where I see statistics per operations:
...
opts: rw,vers=3,rsize=131072,wsize=131072,namlen=255,acregmin=1800,acregmax=1800,acdirmin=1800,acdirmax=1800,hard,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.0.2.100,mountvers=3,mountport=903,mountproto=tcp,local_lock=all
age: 18129
caps: caps=0x3fc7,wtmult=512,dtsize=32768,bsize=0,namlen=255
sec: flavor=1,pseudoflavor=1
events: 18840 116049 23 5808 22138 21048 146984 13896 287 2181 0 7560 31380 0 9565 5106 0 6471 0 0 13896 0 0 0 0 0 0
bytes: 339548407 48622919 0 0 311167118 48622919 76846 13896
RPC iostats version: 1.0 p/v: 100003/3 (nfs)
xprt: tcp 875 1 7 0 0 85765 85764 1 206637 0 37 1776 35298
per-op statistics
NULL: 0 0 0 0 0 0 0 0
GETATTR: 18840 18840 0 2336164 2110080 92 8027 8817
SETATTR: 0 0 0 0 0 0 0 0
LOOKUP: 21391 21392 0 3877744 4562876 118 103403 105518
ACCESS: 20183 20188 0 2584304 2421960 72 10122 10850
READLINK: 0 0 0 0 0 0 0 0
READ: 3425 3425 0 465848 311606600 340 97323 97924
WRITE: 2422 2422 0 48975488 387520 763 200645 201522
CREATE: 2616 2616 0 447392 701088 21 870 1088
MKDIR: 858 858 0 188760 229944 8 573 705
SYMLINK: 0 0 0 0 0 0 0 0
MKNOD: 0 0 0 0 0 0 0 0
REMOVE: 47 47 0 6440 6768 0 8 76
RMDIR: 23 23 0 4876 3312 0 3 5
RENAME: 23 23 0 7176 5980 0 5 6
LINK: 0 0 0 0 0 0 0 0
READDIR: 160 160 0 23040 4987464 0 16139 16142
READDIRPLUS: 15703 15703 0 2324044 8493604 43 1041634 1041907
FSSTAT: 1 1 0 124 168 0 0 0
FSINFO: 2 2 0 248 328 0 0 0
PATHCONF: 1 1 0 124 140 0 0 0
COMMIT: 68 68 0 9248 10336 2 272 275...
about my bladefs. I am interested in READ operation statistics. As I know the last column (97924) means:
execute: How long ops of this type take to execute (from
rpc_init_task to rpc_exit_task) (microsecond)
How to interpret this? Is it the average time of each read operation regardless of the block size? I have very strong suspicion that I have problems with NFS: am I right? The value of 0.1 sec looks bad for me, but I am not sure how exactly to interpret this time: average, some sum...?
After reading the kernel source, the statistics are printed from net/sunrpc/stats.c rpc_clnt_show_stats() and the 8th column of per-op statistics statistics seems to printed from _print_rpc_iostats, it's printing struct rpc_iostats member om_execute. (The newest kernel has 9 columns with errors on the last column.)
That member looks to be only referenced/actually changed in rpc_count_iostats_metrics with:
execute = ktime_sub(now, task->tk_start);
op_metrics->om_execute = ktime_add(op_metrics->om_execute, execute);
Assuming ktime_add does what it says, the value of om_execute only increases. So the 8th column of mountstats would be the sum of the time of operations of this type.

Why does Cassandra major compaction fail to clear expired tombstones?

We have deployed a global Apache Cassandra cluster (node: 12, RF: 3, version: 3.11.2) in our production environment. We are running into an issue where running major compaction on column family is failing to clear tombstones from one node (out of 3 replicas) even though metadata information shows min timestamp passed gc_grace_seconds set on the table.
Here is sstable metadata output
SSTable: mc-4302-big
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Bloom Filter FP chance: 0.010000
Minimum timestamp: 1
Maximum timestamp: 1560326019515476
SSTable min local deletion time: 1560233203
SSTable max local deletion time: 2147483647
Compressor: org.apache.cassandra.io.compress.LZ4Compressor
Compression ratio: 0.8808303792058351
TTL min: 0
TTL max: 0
First token: -9201661616334346390 (key=bca773eb-ecbb-49ec-9330-cc16da310b58:::)
Last token: 9117719078924671254 (key=7c23b975-5354-4c82-82e5-1762bac75a8d:::)
minClustringValues: [00000f8f-74a9-4ce3-9d87-0a4dabef30c1]
maxClustringValues: [ffffc966-a02c-4e1f-bdd1-256556624288]
Estimated droppable tombstones: 46.31761624099541
SSTable Level: 0
Repaired at: 0
Replay positions covered: {}
totalColumnsSet: 0
totalRows: 618382
Estimated tombstone drop times:
1560233680: 353
1560234658: 237
1560235604: 176
1560236803: 471
1560237652: 402
1560238342: 195
1560239166: 373
1560239969: 356
1560240586: 262
1560241207: 247
1560242037: 387
1560242847: 357
1560243742: 280
1560244469: 283
1560245095: 353
1560245957: 357
1560246773: 362
1560247956: 449
1560249034: 217
1560249849: 310
1560251080: 296
1560251984: 304
1560252993: 239
1560253907: 407
1560254839: 977
1560255761: 671
1560256486: 317
1560257199: 679
1560258020: 703
1560258795: 507
1560259378: 298
1560260093: 2302
1560260869: 2488
1560261535: 2818
1560262176: 2842
1560262981: 1685
1560263708: 1830
1560264308: 808
1560264941: 1990
1560265753: 1340
1560266708: 2174
1560267629: 2253
1560268400: 1627
1560269174: 2347
1560270019: 2579
1560270888: 3947
1560271690: 1727
1560272446: 2573
1560273249: 1523
1560274086: 3438
1560275149: 2737
1560275966: 3487
1560276814: 4101
1560277660: 2012
1560278617: 1198
1560279680: 769
1560280441: 1337
1560281033: 608
1560281876: 2065
1560282546: 2926
1560283128: 6305
1560283836: 824
1560284574: 71
1560285166: 140
1560285828: 118
1560286404: 83
1560295835: 72
1560296951: 456
1560297814: 670
1560298496: 271
1560299333: 473
1560300159: 284
1560300831: 127
1560301551: 536
1560302309: 425
1560303302: 860
1560304064: 465
1560304782: 319
1560305657: 323
1560306552: 236
1560307454: 368
1560308409: 320
1560309178: 210
1560310091: 177
1560310881: 85
1560311970: 147
1560312706: 76
1560313495: 88
1560314847: 687
1560315817: 1618
1560316544: 1245
1560317423: 5361
1560318491: 2060
1560319595: 5853
1560320587: 5390
1560321473: 3868
1560322644: 5784
1560323703: 6861
1560324838: 7200
1560325744: 5642
Count Row Size Cell Count
1 0 3054
2 0 0
3 0 0
4 0 0
5 0 0
6 0 0
7 0 0
8 0 0
10 0 0
12 0 0
14 0 0
17 0 0
20 0 0
24 0 0
29 0 0
35 0 0
42 0 0
50 0 0
60 98 0
72 49 0
86 46 0
103 2374 0
124 39 0
149 36 0
179 43 0
215 18 0
258 26 0
310 24 0
372 18 0
446 16 0
535 19 0
642 27 0
770 17 0
924 12 0
1109 14 0
1331 23 0
1597 20 0
1916 12 0
2299 11 0
2759 11 0
3311 11 0
3973 12 0
4768 5 0
5722 8 0
6866 5 0
8239 5 0
9887 6 0
11864 5 0
14237 10 0
17084 1 0
20501 8 0
24601 2 0
29521 2 0
35425 3 0
42510 2 0
51012 2 0
61214 1 0
73457 2 0
88148 3 0
105778 0 0
126934 3 0
152321 2 0
182785 1 0
219342 0 0
263210 0 0
315852 0 0
379022 0 0
454826 0 0
545791 0 0
654949 0 0
785939 0 0
943127 0 0
1131752 0 0
1358102 0 0
1629722 0 0
1955666 0 0
2346799 0 0
2816159 0 0
3379391 1 0
4055269 0 0
4866323 0 0
5839588 0 0
7007506 0 0
8409007 0 0
10090808 1 0
12108970 0 0
14530764 0 0
17436917 0 0
20924300 0 0
25109160 0 0
30130992 0 0
36157190 0 0
43388628 0 0
52066354 0 0
62479625 0 0
74975550 0 0
89970660 0 0
107964792 0 0
129557750 0 0
155469300 0 0
186563160 0 0
223875792 0 0
268650950 0 0
322381140 0 0
386857368 0 0
464228842 0 0
557074610 0 0
668489532 0 0
802187438 0 0
962624926 0 0
1155149911 0 0
1386179893 0 0
1663415872 0 0
1996099046 0 0
2395318855 0 0
2874382626 0
3449259151 0
4139110981 0
4966933177 0
5960319812 0
7152383774 0
8582860529 0
10299432635 0
12359319162 0
14831182994 0
17797419593 0
21356903512 0
25628284214 0
30753941057 0
36904729268 0
44285675122 0
53142810146 0
63771372175 0
76525646610 0
91830775932 0
110196931118 0
132236317342 0
158683580810 0
190420296972 0
228504356366 0
274205227639 0
329046273167 0
394855527800 0
473826633360 0
568591960032 0
682310352038 0
818772422446 0
982526906935 0
1179032288322 0
1414838745986 0
Estimated cardinality: 3054
EncodingStats minTTL: 0
EncodingStats minLocalDeletionTime: 1560233203
EncodingStats minTimestamp: 1
KeyType: org.apache.cassandra.db.marshal.CompositeType(org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type)
ClusteringTypes: [org.apache.cassandra.db.marshal.UUIDType]
StaticColumns: {}
RegularColumns: {}
So far here is what we have tried,
1) major compaction with lower gc_grace_seconds
2) nodetool garbagecollect
3) nodetool scrub
None of the above methods is helping. Again, this is only happening for one node (out of total 3 replicas)
The tombstone markers generated during your major compaction are just that, markers. The data has been removed but a delete marker is left in place so that the other replicas can have gc_grace_seconds to process them too. The tombstone markers are fully dropped the next time the SSTable is compacted. Unfortunately because you've run a major compaction (rarely ever recommended) it may be a long time until there are suitable SSTables for compaction with it to clean up the tombstones. Remember that the tombstone drop will also only happen after local_delete_time + gc_grace_seconds as defined by the table.
If you're interested in learning more about how tombstones and compaction work together in the context of delete operations I suggest reading the following articles:
https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/dml/dmlAboutDeletes.html
https://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html

Is there a way to find out the total number of bytes actually written on each node per second in a Cassandra Cluster

I see bytes being written on commit log is in MB's but the data that is being sent was actually couple of MBs(< 4 MB). Not sure why am I seeing such stats?
Here are dstats o/p of my disk(commitlog)
date/time |usr sys idl wai hiq siq| 1m 5m 15m | read writ| read writ|util| recv send
23-03 12:08:06| 27 4 66 2 0 0|13.8 6.14 3.50| 0 110M| 0 893 |66.8| 73M 79M
23-03 12:08:07| 29 5 64 2 0 0|13.8 6.14 3.50| 0 119M| 0 970 |58.8| 84M 81M
23-03 12:08:08| 29 4 64 3 0 0|13.8 6.14 3.50| 0 114M| 0 925 |70.4| 76M 75M
23-03 12:08:09| 30 6 63 2 0 0|13.2 6.13 3.52| 0 104M| 0 852 |58.0| 84M 73M
23-03 12:08:10| 30 5 63 2 0 0|13.2 6.13 3.52| 0 147M| 0 1190 |62.4| 92M 93M
23-03 12:08:11| 30 4 64 2 0 0|13.2 6.13 3.52| 0 113M| 0 923 |61.6| 77M 74M
23-03 12:08:12| 26 4 67 2 0 0|13.2 6.13 3.52| 0 134M| 0 1094 |56.0| 94M 90M
23-03 12:08:13| 39 5 54 1 0 0|13.2 6.13 3.52| 0 121M| 0 986 |54.4| 98M 88M
23-03 12:08:14| 25 4 68 3 0 0|12.7 6.15 3.53| 0 121M| 0 979 |71.2| 99M 87M
23-03 12:08:15| 36 6 55 3 0 0|12.7 6.15 3.53| 0 123M| 0 993 |62.0| 90M 93M
23-03 12:08:16| 31 6 60 2 0 0|12.7 6.15 3.53| 0 106M| 0 854 |54.8| 98M 104M
23-03 12:08:17| 37 6 54 2 0 1|12.7 6.15 3.53| 0 133M| 0 1067 |59.2| 92M 93M
23-03 12:08:18| 27 4 66 3 0 0|12.7 6.15 3.53| 0 116M| 0 936 |64.8| 97M 96M
23-03 12:08:19| 33 6 59 2 0 0|

Benchmarking CPU and File IO for an application running on Linux

I wrote two programs to run on Linux, each using a different algorithm, and I want to find a way (preferably using a benchmarking software) to compare the CPU usage and IO operations between these two programs.
Is there such a thing? and if yes, where can I find them. Thanks.
You can try hardinfo
Or there are like n different tools measuring system performance if measuring it while running your app solves your purpose
And you can also check this thread
You might try vmstat command:
vmstat 2 20 > vmstat.txt
20 samples of 2 seconds
bi = KB in, bo = KB out with wa = waiting for I/O
I/O can also increase cache demands
%CPU utilisation = us (user) = sy (system)
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 0 277504 17060 82732 0 0 91 87 1432 236 11 3 84 1
0 0 0 277372 17068 82732 0 0 0 24 1361 399 23 8 59 10
test start
0 1 0 275240 17068 82732 0 0 0 512 1342 305 24 4 69 4
2 1 0 275232 17068 82780 0 0 24 10752 4176 216 7 8 0 85
1 1 0 275240 17076 82732 0 0 12288 2590 5295 243 15 8 0 77
0 1 0 275240 17076 82748 0 0 8 11264 4329 214 6 12 0 82
0 1 0 275240 17076 82780 0 0 16 11264 4278 233 15 10 0 75
0 1 0 275240 17084 82780 0 0 19456 542 6563 255 10 7 0 83
0 1 0 275108 17084 82748 0 0 5128 3072 3501 265 16 37 0 47
3 1 0 275108 17084 82748 0 0 924 5120 8369 3845 12 33 0 55
0 1 0 275116 17092 82748 0 0 1576 85 11483 6645 5 50 0 45
1 1 0 275116 17092 82748 0 0 0 136 2304 689 3 9 0 88
2 1 0 275084 17100 82732 0 0 0 352 2374 800 14 26 0 61
0 0 0 275076 17100 82732 0 0 546 118 2408 1014 35 17 47 1
0 1 0 275076 17104 82732 0 0 0 62 1324 76 3 2 89 7
1 1 0 275076 17108 82732 0 0 0 452 1879 442 8 13 66 12
0 0 0 275116 17108 82732 0 0 800 352 2456 1195 19 17 56 8
0 1 0 275116 17112 82732 0 0 0 54 1325 76 4 1 88 8
test end
1 1 0 275116 17116 82732 0 0 0 510 1717 286 6 10 72 11
1 0 0 275076 17116 82732 0 0 1600 1152 3087 1344 23 29 41 7

Understanding the Linux oom-killer's logs

My app was killed by the oom-killer. It is Ubuntu 11.10 running on a live USB with no swap and the PC has 1 Gig of RAM. The only app running (other than all the built in Ubuntu stuff) is my program flasherav. Note that /tmp is memory mapped and at the time of the crash had about 200MB of files in it (so was taking up ~200MB of RAM).
I'm trying to understand how to analyze the om-killer log such that I can understand where exactly all the memory is being used- i.e. what are the different chunks that will add up to ~1 gig which resulted in the oom-killer kicking in? Once I understand that, I can work on reducing the offender's usage so the app will run on a machine with 1 GB of ram. My specific questions are.
To try to analyze the situation, I summed up the "total_vm" column and I only get 609342KB (which when added to the 200MB in /tmp is still only 809MB). Maybe I'm wrong on what the "total_vm" column is- does it include allocated but not used memory plus shared memory. If yes, then shouldn't it far overstate actually used memory (and therefore I shouldn't be out of memory), right? Are there other chunks of memory in use that aren't accounted for in the list below?
[11686.040460] flasherav invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
[11686.040467] flasherav cpuset=/ mems_allowed=0
[11686.040472] Pid: 2859, comm: flasherav Not tainted 3.0.0-12-generic #20-Ubuntu
[11686.040476] Call Trace:
[11686.040488] [<c10e1c15>] dump_header.isra.7+0x85/0xc0
[11686.040493] [<c10e1e6c>] oom_kill_process+0x5c/0x80
[11686.040498] [<c10e225f>] out_of_memory+0xbf/0x1d0
[11686.040503] [<c10e6123>] __alloc_pages_nodemask+0x6c3/0x6e0
[11686.040509] [<c10e78d3>] ? __do_page_cache_readahead+0xe3/0x170
[11686.040514] [<c10e0fc8>] filemap_fault+0x218/0x390
[11686.040519] [<c1001c24>] ? __switch_to+0x94/0x1a0
[11686.040525] [<c10fb5ee>] __do_fault+0x3e/0x4b0
[11686.040530] [<c1069971>] ? enqueue_hrtimer+0x21/0x80
[11686.040535] [<c10fec2c>] handle_pte_fault+0xec/0x220
[11686.040540] [<c10fee68>] handle_mm_fault+0x108/0x210
[11686.040546] [<c152fa00>] ? vmalloc_fault+0xee/0xee
[11686.040551] [<c152fb5b>] do_page_fault+0x15b/0x4a0
[11686.040555] [<c1069a90>] ? update_rmtp+0x80/0x80
[11686.040560] [<c106a7b6>] ? hrtimer_start_range_ns+0x26/0x30
[11686.040565] [<c106aeaf>] ? sys_nanosleep+0x4f/0x60
[11686.040569] [<c152fa00>] ? vmalloc_fault+0xee/0xee
[11686.040574] [<c152cfcf>] error_code+0x67/0x6c
[11686.040580] [<c1520000>] ? reserve_backup_gdb.isra.11+0x26d/0x2c0
[11686.040583] Mem-Info:
[11686.040585] DMA per-cpu:
[11686.040588] CPU 0: hi: 0, btch: 1 usd: 0
[11686.040592] CPU 1: hi: 0, btch: 1 usd: 0
[11686.040594] Normal per-cpu:
[11686.040597] CPU 0: hi: 186, btch: 31 usd: 5
[11686.040600] CPU 1: hi: 186, btch: 31 usd: 30
[11686.040603] HighMem per-cpu:
[11686.040605] CPU 0: hi: 42, btch: 7 usd: 7
[11686.040608] CPU 1: hi: 42, btch: 7 usd: 22
[11686.040613] active_anon:113150 inactive_anon:113378 isolated_anon:0
[11686.040615] active_file:86 inactive_file:1964 isolated_file:0
[11686.040616] unevictable:0 dirty:0 writeback:0 unstable:0
[11686.040618] free:13274 slab_reclaimable:2239 slab_unreclaimable:2594
[11686.040619] mapped:1387 shmem:4380 pagetables:1375 bounce:0
[11686.040627] DMA free:4776kB min:784kB low:980kB high:1176kB active_anon:5116kB inactive_anon:5472kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15804kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:80kB slab_unreclaimable:168kB kernel_stack:96kB pagetables:64kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:6 all_unreclaimable? yes
[11686.040634] lowmem_reserve[]: 0 865 1000 1000
[11686.040644] Normal free:48212kB min:44012kB low:55012kB high:66016kB active_anon:383196kB inactive_anon:383704kB active_file:344kB inactive_file:7884kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:885944kB mlocked:0kB dirty:0kB writeback:0kB mapped:5548kB shmem:17520kB slab_reclaimable:8876kB slab_unreclaimable:10208kB kernel_stack:1960kB pagetables:3976kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:930 all_unreclaimable? yes
[11686.040652] lowmem_reserve[]: 0 0 1078 1078
[11686.040662] HighMem free:108kB min:132kB low:1844kB high:3560kB active_anon:64288kB inactive_anon:64336kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:138072kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:1460kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:61 all_unreclaimable? yes
[11686.040669] lowmem_reserve[]: 0 0 0 0
[11686.040675] DMA: 20*4kB 24*8kB 34*16kB 26*32kB 19*64kB 13*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 4784kB
[11686.040690] Normal: 819*4kB 607*8kB 357*16kB 176*32kB 99*64kB 49*128kB 23*256kB 4*512kB 0*1024kB 0*2048kB 2*4096kB = 48212kB
[11686.040704] HighMem: 16*4kB 0*8kB 1*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 80kB
[11686.040718] 14680 total pagecache pages
[11686.040721] 8202 pages in swap cache
[11686.040724] Swap cache stats: add 2191074, delete 2182872, find 1247325/1327415
[11686.040727] Free swap = 0kB
[11686.040729] Total swap = 524284kB
[11686.043240] 262100 pages RAM
[11686.043244] 34790 pages HighMem
[11686.043246] 5610 pages reserved
[11686.043248] 2335 pages shared
[11686.043250] 240875 pages non-shared
[11686.043253] [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name
[11686.043266] [ 1084] 0 1084 662 1 0 0 0 upstart-udev-br
[11686.043271] [ 1094] 0 1094 743 79 0 -17 -1000 udevd
[11686.043276] [ 1104] 101 1104 7232 42 0 0 0 rsyslogd
[11686.043281] [ 1149] 103 1149 1066 188 1 0 0 dbus-daemon
[11686.043286] [ 1165] 0 1165 1716 66 0 0 0 modem-manager
[11686.043291] [ 1220] 106 1220 861 42 0 0 0 avahi-daemon
[11686.043296] [ 1221] 106 1221 829 0 1 0 0 avahi-daemon
[11686.043301] [ 1255] 0 1255 6880 117 0 0 0 NetworkManager
[11686.043306] [ 1308] 0 1308 5988 144 0 0 0 polkitd
[11686.043311] [ 1334] 0 1334 723 85 0 -17 -1000 udevd
[11686.043316] [ 1335] 0 1335 730 108 0 -17 -1000 udevd
[11686.043320] [ 1375] 0 1375 663 37 0 0 0 upstart-socket-
[11686.043325] [ 1464] 0 1464 1333 120 1 0 0 login
[11686.043330] [ 1467] 0 1467 1333 135 1 0 0 login
[11686.043335] [ 1486] 0 1486 1333 135 1 0 0 login
[11686.043339] [ 1487] 0 1487 1333 136 1 0 0 login
[11686.043344] [ 1493] 0 1493 1333 134 1 0 0 login
[11686.043349] [ 1528] 0 1528 496 45 0 0 0 acpid
[11686.043354] [ 1529] 0 1529 607 46 1 0 0 cron
[11686.043359] [ 1549] 0 1549 10660 100 0 0 0 lightdm
[11686.043363] [ 1550] 0 1550 570 28 0 0 0 atd
[11686.043368] [ 1584] 0 1584 855 35 0 0 0 irqbalance
[11686.043373] [ 1703] 0 1703 17939 9653 0 0 0 Xorg
[11686.043378] [ 1874] 0 1874 7013 174 0 0 0 console-kit-dae
[11686.043382] [ 1958] 0 1958 1124 52 1 0 0 bluetoothd
[11686.043388] [ 2048] 999 2048 2435 641 1 0 0 bash
[11686.043392] [ 2049] 999 2049 2435 595 0 0 0 bash
[11686.043397] [ 2050] 999 2050 2435 587 1 0 0 bash
[11686.043402] [ 2051] 999 2051 2435 634 1 0 0 bash
[11686.043406] [ 2054] 999 2054 2435 569 0 0 0 bash
[11686.043411] [ 2155] 0 2155 1333 128 0 0 0 login
[11686.043416] [ 2222] 0 2222 684 67 1 0 0 dhclient
[11686.043420] [ 2240] 999 2240 2435 415 0 0 0 bash
[11686.043425] [ 2244] 0 2244 3631 58 0 0 0 accounts-daemon
[11686.043430] [ 2258] 999 2258 11683 277 0 0 0 gnome-session
[11686.043435] [ 2407] 999 2407 964 24 0 0 0 ssh-agent
[11686.043440] [ 2410] 999 2410 937 53 0 0 0 dbus-launch
[11686.043444] [ 2411] 999 2411 1319 300 1 0 0 dbus-daemon
[11686.043449] [ 2413] 999 2413 2287 88 0 0 0 gvfsd
[11686.043454] [ 2418] 999 2418 7867 123 1 0 0 gvfs-fuse-daemo
[11686.043459] [ 2427] 999 2427 32720 804 0 0 0 gnome-settings-
[11686.043463] [ 2437] 999 2437 10750 124 0 0 0 gnome-keyring-d
[11686.043468] [ 2442] 999 2442 2321 244 1 0 0 gconfd-2
[11686.043473] [ 2447] 0 2447 6490 156 0 0 0 upowerd
[11686.043478] [ 2467] 999 2467 7590 87 0 0 0 dconf-service
[11686.043482] [ 2529] 999 2529 11807 211 0 0 0 gsd-printer
[11686.043487] [ 2531] 999 2531 12162 587 0 0 0 metacity
[11686.043492] [ 2535] 999 2535 19175 960 0 0 0 unity-2d-panel
[11686.043496] [ 2536] 999 2536 19408 1012 0 0 0 unity-2d-launch
[11686.043502] [ 2539] 999 2539 16154 1120 1 0 0 nautilus
[11686.043506] [ 2540] 999 2540 17888 534 0 0 0 nm-applet
[11686.043511] [ 2541] 999 2541 7005 253 0 0 0 polkit-gnome-au
[11686.043516] [ 2544] 999 2544 8930 430 0 0 0 bamfdaemon
[11686.043521] [ 2545] 999 2545 11217 442 1 0 0 bluetooth-apple
[11686.043525] [ 2547] 999 2547 510 16 0 0 0 sh
[11686.043530] [ 2548] 999 2548 11205 301 1 0 0 gnome-fallback-
[11686.043535] [ 2565] 999 2565 6614 179 1 0 0 gvfs-gdu-volume
[11686.043539] [ 2567] 0 2567 5812 164 1 0 0 udisks-daemon
[11686.043544] [ 2571] 0 2571 1580 69 0 0 0 udisks-daemon
[11686.043549] [ 2579] 999 2579 16354 1035 0 0 0 unity-panel-ser
[11686.043554] [ 2602] 0 2602 1188 47 0 0 0 sudo
[11686.043559] [ 2603] 0 2603 374634 181503 0 0 0 flasherav
[11686.043564] [ 2607] 999 2607 12673 189 0 0 0 indicator-appli
[11686.043569] [ 2609] 999 2609 19313 311 1 0 0 indicator-datet
[11686.043573] [ 2611] 999 2611 15738 225 0 0 0 indicator-messa
[11686.043578] [ 2615] 999 2615 17433 237 1 0 0 indicator-sessi
[11686.043583] [ 2627] 999 2627 2393 132 0 0 0 gvfsd-trash
[11686.043588] [ 2640] 999 2640 1933 85 0 0 0 geoclue-master
[11686.043592] [ 2650] 0 2650 2498 1136 1 0 0 mount.ntfs
[11686.043598] [ 2657] 999 2657 6624 128 1 0 0 telepathy-indic
[11686.043602] [ 2659] 999 2659 2246 112 0 0 0 mission-control
[11686.043607] [ 2662] 999 2662 5431 346 1 0 0 gdu-notificatio
[11686.043612] [ 2664] 0 2664 3716 2392 0 0 0 mount.ntfs
[11686.043617] [ 2679] 999 2679 12453 197 1 0 0 zeitgeist-datah
[11686.043621] [ 2685] 999 2685 5196 1581 1 0 0 zeitgeist-daemo
[11686.043626] [ 2934] 999 2934 16305 710 0 0 0 gnome-terminal
[11686.043631] [ 2938] 999 2938 553 0 0 0 0 gnome-pty-helpe
[11686.043636] [ 2939] 999 2939 1814 406 0 0 0 bash
[11686.043641] Out of memory: Kill process 2603 (flasherav) score 761 or sacrifice child
[11686.043647] Killed process 2603 (flasherav) total-vm:1498536kB, anon-rss:721784kB, file-rss:4228kB
Memory management in Linux is a bit tricky to understand, and I can't say I fully understand it yet, but I'll try to share a little bit of my experience and knowledge.
Short answer to your question: Yes there are other stuff included than whats in the list.
What's being shown in your list is applications run in userspace. The kernel uses memory for itself and modules, on top of that it also has a lower limit of free memory that you can't go under. When you've reached that level it will try to free up resources, and when it can't do that anymore, you end up with an OOM problem.
From the last line of your list you can read that the kernel reports a total-vm usage of: 1498536kB (1,5GB), where the total-vm includes both your physical RAM and swap space. You stated you don't have any swap but the kernel seems to think otherwise since your swap space is reported to be full (Total swap = 524284kB, Free swap = 0kB) and it reports a total vmem size of 1,5GB.
Another thing that can complicate things further is memory fragmentation. You can hit the OOM killer when the kernel tries to allocate lets say 4096kB of continous memory, but there are no free ones availible.
Now that alone probably won't help you solve the actual problem. I don't know if it's normal for your program to require that amount of memory, but I would recommend to try a static code analyzer like cppcheck to check for memory leaks or file descriptor leaks. You could also try to run it through Valgrind to get a bit more information out about memory usage.
Sum of total_vm is 847170 and sum of rss is 214726, these two values are counted in 4kB pages, which means when oom-killer was running, you had used 214726*4kB=858904kB physical memory and swap space.
Since your physical memory is 1GB and ~200MB was used for memory mapping, it's reasonable for invoking oom-killer when 858904kB was used.
rss for process 2603 is 181503, which means 181503*4KB=726012 rss, was equal to sum of anon-rss and file-rss.
[11686.043647] Killed process 2603 (flasherav) total-vm:1498536kB,
anon-rss:721784kB, file-rss:4228kB
This webpage have an explanation and a solution.
The solution is:
To fix this problem the behavior of the kernel has to be changed, so it will no longer overcommit the memory for application requests. Finally I have included those mentioned values into the /etc/sysctl.conf file, so they get automatically applied on start-up:
vm.overcommit_memory = 2
vm.overcommit_ratio = 80
You can parse the different columns, here is an online example:
root#device:~# cat /var/log/syslog | grep kernel | rev | cut -d"]" -f1 | rev | awk '{ print $3, $4, $5, $8 }' | grep '^[0-9].*[a-Z][a-Z]' | perl -MData::Dumper -p -e 'BEGIN { $db = {}; } ($total_vm, $rss, $pgtables_bytes, $name) = split; $db->{$name}->{total_vm} += $total_vm; $db->{$name}->{rss} += $rss; $db->{$name}->{pgtables_bytes} += $pgtables_bytes; $_=undef; END { map { printf("%.1fG %s\n", ($db->{$_}->{rss} * 4096)/(1024*1024*1024), $_) } sort { $db->{$a}->{rss} <=> $db->{$b}->{rss} } keys %{$db}; }' | tail -n 10 | tac
8.1G mysql
5.2G php5.6
0.7G nothing-server
0.2G apache2
0.1G systemd-journal
0.1G python3.7
0.1G nginx
0.1G stats
0.0G php-login
0.0G python3

Resources