Understanding the Linux oom-killer's logs - linux

My app was killed by the oom-killer. It is Ubuntu 11.10 running on a live USB with no swap and the PC has 1 Gig of RAM. The only app running (other than all the built in Ubuntu stuff) is my program flasherav. Note that /tmp is memory mapped and at the time of the crash had about 200MB of files in it (so was taking up ~200MB of RAM).
I'm trying to understand how to analyze the om-killer log such that I can understand where exactly all the memory is being used- i.e. what are the different chunks that will add up to ~1 gig which resulted in the oom-killer kicking in? Once I understand that, I can work on reducing the offender's usage so the app will run on a machine with 1 GB of ram. My specific questions are.
To try to analyze the situation, I summed up the "total_vm" column and I only get 609342KB (which when added to the 200MB in /tmp is still only 809MB). Maybe I'm wrong on what the "total_vm" column is- does it include allocated but not used memory plus shared memory. If yes, then shouldn't it far overstate actually used memory (and therefore I shouldn't be out of memory), right? Are there other chunks of memory in use that aren't accounted for in the list below?
[11686.040460] flasherav invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
[11686.040467] flasherav cpuset=/ mems_allowed=0
[11686.040472] Pid: 2859, comm: flasherav Not tainted 3.0.0-12-generic #20-Ubuntu
[11686.040476] Call Trace:
[11686.040488] [<c10e1c15>] dump_header.isra.7+0x85/0xc0
[11686.040493] [<c10e1e6c>] oom_kill_process+0x5c/0x80
[11686.040498] [<c10e225f>] out_of_memory+0xbf/0x1d0
[11686.040503] [<c10e6123>] __alloc_pages_nodemask+0x6c3/0x6e0
[11686.040509] [<c10e78d3>] ? __do_page_cache_readahead+0xe3/0x170
[11686.040514] [<c10e0fc8>] filemap_fault+0x218/0x390
[11686.040519] [<c1001c24>] ? __switch_to+0x94/0x1a0
[11686.040525] [<c10fb5ee>] __do_fault+0x3e/0x4b0
[11686.040530] [<c1069971>] ? enqueue_hrtimer+0x21/0x80
[11686.040535] [<c10fec2c>] handle_pte_fault+0xec/0x220
[11686.040540] [<c10fee68>] handle_mm_fault+0x108/0x210
[11686.040546] [<c152fa00>] ? vmalloc_fault+0xee/0xee
[11686.040551] [<c152fb5b>] do_page_fault+0x15b/0x4a0
[11686.040555] [<c1069a90>] ? update_rmtp+0x80/0x80
[11686.040560] [<c106a7b6>] ? hrtimer_start_range_ns+0x26/0x30
[11686.040565] [<c106aeaf>] ? sys_nanosleep+0x4f/0x60
[11686.040569] [<c152fa00>] ? vmalloc_fault+0xee/0xee
[11686.040574] [<c152cfcf>] error_code+0x67/0x6c
[11686.040580] [<c1520000>] ? reserve_backup_gdb.isra.11+0x26d/0x2c0
[11686.040583] Mem-Info:
[11686.040585] DMA per-cpu:
[11686.040588] CPU 0: hi: 0, btch: 1 usd: 0
[11686.040592] CPU 1: hi: 0, btch: 1 usd: 0
[11686.040594] Normal per-cpu:
[11686.040597] CPU 0: hi: 186, btch: 31 usd: 5
[11686.040600] CPU 1: hi: 186, btch: 31 usd: 30
[11686.040603] HighMem per-cpu:
[11686.040605] CPU 0: hi: 42, btch: 7 usd: 7
[11686.040608] CPU 1: hi: 42, btch: 7 usd: 22
[11686.040613] active_anon:113150 inactive_anon:113378 isolated_anon:0
[11686.040615] active_file:86 inactive_file:1964 isolated_file:0
[11686.040616] unevictable:0 dirty:0 writeback:0 unstable:0
[11686.040618] free:13274 slab_reclaimable:2239 slab_unreclaimable:2594
[11686.040619] mapped:1387 shmem:4380 pagetables:1375 bounce:0
[11686.040627] DMA free:4776kB min:784kB low:980kB high:1176kB active_anon:5116kB inactive_anon:5472kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15804kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:80kB slab_unreclaimable:168kB kernel_stack:96kB pagetables:64kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:6 all_unreclaimable? yes
[11686.040634] lowmem_reserve[]: 0 865 1000 1000
[11686.040644] Normal free:48212kB min:44012kB low:55012kB high:66016kB active_anon:383196kB inactive_anon:383704kB active_file:344kB inactive_file:7884kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:885944kB mlocked:0kB dirty:0kB writeback:0kB mapped:5548kB shmem:17520kB slab_reclaimable:8876kB slab_unreclaimable:10208kB kernel_stack:1960kB pagetables:3976kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:930 all_unreclaimable? yes
[11686.040652] lowmem_reserve[]: 0 0 1078 1078
[11686.040662] HighMem free:108kB min:132kB low:1844kB high:3560kB active_anon:64288kB inactive_anon:64336kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:138072kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:1460kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:61 all_unreclaimable? yes
[11686.040669] lowmem_reserve[]: 0 0 0 0
[11686.040675] DMA: 20*4kB 24*8kB 34*16kB 26*32kB 19*64kB 13*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 4784kB
[11686.040690] Normal: 819*4kB 607*8kB 357*16kB 176*32kB 99*64kB 49*128kB 23*256kB 4*512kB 0*1024kB 0*2048kB 2*4096kB = 48212kB
[11686.040704] HighMem: 16*4kB 0*8kB 1*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 80kB
[11686.040718] 14680 total pagecache pages
[11686.040721] 8202 pages in swap cache
[11686.040724] Swap cache stats: add 2191074, delete 2182872, find 1247325/1327415
[11686.040727] Free swap = 0kB
[11686.040729] Total swap = 524284kB
[11686.043240] 262100 pages RAM
[11686.043244] 34790 pages HighMem
[11686.043246] 5610 pages reserved
[11686.043248] 2335 pages shared
[11686.043250] 240875 pages non-shared
[11686.043253] [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name
[11686.043266] [ 1084] 0 1084 662 1 0 0 0 upstart-udev-br
[11686.043271] [ 1094] 0 1094 743 79 0 -17 -1000 udevd
[11686.043276] [ 1104] 101 1104 7232 42 0 0 0 rsyslogd
[11686.043281] [ 1149] 103 1149 1066 188 1 0 0 dbus-daemon
[11686.043286] [ 1165] 0 1165 1716 66 0 0 0 modem-manager
[11686.043291] [ 1220] 106 1220 861 42 0 0 0 avahi-daemon
[11686.043296] [ 1221] 106 1221 829 0 1 0 0 avahi-daemon
[11686.043301] [ 1255] 0 1255 6880 117 0 0 0 NetworkManager
[11686.043306] [ 1308] 0 1308 5988 144 0 0 0 polkitd
[11686.043311] [ 1334] 0 1334 723 85 0 -17 -1000 udevd
[11686.043316] [ 1335] 0 1335 730 108 0 -17 -1000 udevd
[11686.043320] [ 1375] 0 1375 663 37 0 0 0 upstart-socket-
[11686.043325] [ 1464] 0 1464 1333 120 1 0 0 login
[11686.043330] [ 1467] 0 1467 1333 135 1 0 0 login
[11686.043335] [ 1486] 0 1486 1333 135 1 0 0 login
[11686.043339] [ 1487] 0 1487 1333 136 1 0 0 login
[11686.043344] [ 1493] 0 1493 1333 134 1 0 0 login
[11686.043349] [ 1528] 0 1528 496 45 0 0 0 acpid
[11686.043354] [ 1529] 0 1529 607 46 1 0 0 cron
[11686.043359] [ 1549] 0 1549 10660 100 0 0 0 lightdm
[11686.043363] [ 1550] 0 1550 570 28 0 0 0 atd
[11686.043368] [ 1584] 0 1584 855 35 0 0 0 irqbalance
[11686.043373] [ 1703] 0 1703 17939 9653 0 0 0 Xorg
[11686.043378] [ 1874] 0 1874 7013 174 0 0 0 console-kit-dae
[11686.043382] [ 1958] 0 1958 1124 52 1 0 0 bluetoothd
[11686.043388] [ 2048] 999 2048 2435 641 1 0 0 bash
[11686.043392] [ 2049] 999 2049 2435 595 0 0 0 bash
[11686.043397] [ 2050] 999 2050 2435 587 1 0 0 bash
[11686.043402] [ 2051] 999 2051 2435 634 1 0 0 bash
[11686.043406] [ 2054] 999 2054 2435 569 0 0 0 bash
[11686.043411] [ 2155] 0 2155 1333 128 0 0 0 login
[11686.043416] [ 2222] 0 2222 684 67 1 0 0 dhclient
[11686.043420] [ 2240] 999 2240 2435 415 0 0 0 bash
[11686.043425] [ 2244] 0 2244 3631 58 0 0 0 accounts-daemon
[11686.043430] [ 2258] 999 2258 11683 277 0 0 0 gnome-session
[11686.043435] [ 2407] 999 2407 964 24 0 0 0 ssh-agent
[11686.043440] [ 2410] 999 2410 937 53 0 0 0 dbus-launch
[11686.043444] [ 2411] 999 2411 1319 300 1 0 0 dbus-daemon
[11686.043449] [ 2413] 999 2413 2287 88 0 0 0 gvfsd
[11686.043454] [ 2418] 999 2418 7867 123 1 0 0 gvfs-fuse-daemo
[11686.043459] [ 2427] 999 2427 32720 804 0 0 0 gnome-settings-
[11686.043463] [ 2437] 999 2437 10750 124 0 0 0 gnome-keyring-d
[11686.043468] [ 2442] 999 2442 2321 244 1 0 0 gconfd-2
[11686.043473] [ 2447] 0 2447 6490 156 0 0 0 upowerd
[11686.043478] [ 2467] 999 2467 7590 87 0 0 0 dconf-service
[11686.043482] [ 2529] 999 2529 11807 211 0 0 0 gsd-printer
[11686.043487] [ 2531] 999 2531 12162 587 0 0 0 metacity
[11686.043492] [ 2535] 999 2535 19175 960 0 0 0 unity-2d-panel
[11686.043496] [ 2536] 999 2536 19408 1012 0 0 0 unity-2d-launch
[11686.043502] [ 2539] 999 2539 16154 1120 1 0 0 nautilus
[11686.043506] [ 2540] 999 2540 17888 534 0 0 0 nm-applet
[11686.043511] [ 2541] 999 2541 7005 253 0 0 0 polkit-gnome-au
[11686.043516] [ 2544] 999 2544 8930 430 0 0 0 bamfdaemon
[11686.043521] [ 2545] 999 2545 11217 442 1 0 0 bluetooth-apple
[11686.043525] [ 2547] 999 2547 510 16 0 0 0 sh
[11686.043530] [ 2548] 999 2548 11205 301 1 0 0 gnome-fallback-
[11686.043535] [ 2565] 999 2565 6614 179 1 0 0 gvfs-gdu-volume
[11686.043539] [ 2567] 0 2567 5812 164 1 0 0 udisks-daemon
[11686.043544] [ 2571] 0 2571 1580 69 0 0 0 udisks-daemon
[11686.043549] [ 2579] 999 2579 16354 1035 0 0 0 unity-panel-ser
[11686.043554] [ 2602] 0 2602 1188 47 0 0 0 sudo
[11686.043559] [ 2603] 0 2603 374634 181503 0 0 0 flasherav
[11686.043564] [ 2607] 999 2607 12673 189 0 0 0 indicator-appli
[11686.043569] [ 2609] 999 2609 19313 311 1 0 0 indicator-datet
[11686.043573] [ 2611] 999 2611 15738 225 0 0 0 indicator-messa
[11686.043578] [ 2615] 999 2615 17433 237 1 0 0 indicator-sessi
[11686.043583] [ 2627] 999 2627 2393 132 0 0 0 gvfsd-trash
[11686.043588] [ 2640] 999 2640 1933 85 0 0 0 geoclue-master
[11686.043592] [ 2650] 0 2650 2498 1136 1 0 0 mount.ntfs
[11686.043598] [ 2657] 999 2657 6624 128 1 0 0 telepathy-indic
[11686.043602] [ 2659] 999 2659 2246 112 0 0 0 mission-control
[11686.043607] [ 2662] 999 2662 5431 346 1 0 0 gdu-notificatio
[11686.043612] [ 2664] 0 2664 3716 2392 0 0 0 mount.ntfs
[11686.043617] [ 2679] 999 2679 12453 197 1 0 0 zeitgeist-datah
[11686.043621] [ 2685] 999 2685 5196 1581 1 0 0 zeitgeist-daemo
[11686.043626] [ 2934] 999 2934 16305 710 0 0 0 gnome-terminal
[11686.043631] [ 2938] 999 2938 553 0 0 0 0 gnome-pty-helpe
[11686.043636] [ 2939] 999 2939 1814 406 0 0 0 bash
[11686.043641] Out of memory: Kill process 2603 (flasherav) score 761 or sacrifice child
[11686.043647] Killed process 2603 (flasherav) total-vm:1498536kB, anon-rss:721784kB, file-rss:4228kB

Memory management in Linux is a bit tricky to understand, and I can't say I fully understand it yet, but I'll try to share a little bit of my experience and knowledge.
Short answer to your question: Yes there are other stuff included than whats in the list.
What's being shown in your list is applications run in userspace. The kernel uses memory for itself and modules, on top of that it also has a lower limit of free memory that you can't go under. When you've reached that level it will try to free up resources, and when it can't do that anymore, you end up with an OOM problem.
From the last line of your list you can read that the kernel reports a total-vm usage of: 1498536kB (1,5GB), where the total-vm includes both your physical RAM and swap space. You stated you don't have any swap but the kernel seems to think otherwise since your swap space is reported to be full (Total swap = 524284kB, Free swap = 0kB) and it reports a total vmem size of 1,5GB.
Another thing that can complicate things further is memory fragmentation. You can hit the OOM killer when the kernel tries to allocate lets say 4096kB of continous memory, but there are no free ones availible.
Now that alone probably won't help you solve the actual problem. I don't know if it's normal for your program to require that amount of memory, but I would recommend to try a static code analyzer like cppcheck to check for memory leaks or file descriptor leaks. You could also try to run it through Valgrind to get a bit more information out about memory usage.

Sum of total_vm is 847170 and sum of rss is 214726, these two values are counted in 4kB pages, which means when oom-killer was running, you had used 214726*4kB=858904kB physical memory and swap space.
Since your physical memory is 1GB and ~200MB was used for memory mapping, it's reasonable for invoking oom-killer when 858904kB was used.
rss for process 2603 is 181503, which means 181503*4KB=726012 rss, was equal to sum of anon-rss and file-rss.
[11686.043647] Killed process 2603 (flasherav) total-vm:1498536kB,
anon-rss:721784kB, file-rss:4228kB

This webpage have an explanation and a solution.
The solution is:
To fix this problem the behavior of the kernel has to be changed, so it will no longer overcommit the memory for application requests. Finally I have included those mentioned values into the /etc/sysctl.conf file, so they get automatically applied on start-up:
vm.overcommit_memory = 2
vm.overcommit_ratio = 80

You can parse the different columns, here is an online example:
root#device:~# cat /var/log/syslog | grep kernel | rev | cut -d"]" -f1 | rev | awk '{ print $3, $4, $5, $8 }' | grep '^[0-9].*[a-Z][a-Z]' | perl -MData::Dumper -p -e 'BEGIN { $db = {}; } ($total_vm, $rss, $pgtables_bytes, $name) = split; $db->{$name}->{total_vm} += $total_vm; $db->{$name}->{rss} += $rss; $db->{$name}->{pgtables_bytes} += $pgtables_bytes; $_=undef; END { map { printf("%.1fG %s\n", ($db->{$_}->{rss} * 4096)/(1024*1024*1024), $_) } sort { $db->{$a}->{rss} <=> $db->{$b}->{rss} } keys %{$db}; }' | tail -n 10 | tac
8.1G mysql
5.2G php5.6
0.7G nothing-server
0.2G apache2
0.1G systemd-journal
0.1G python3.7
0.1G nginx
0.1G stats
0.0G php-login
0.0G python3

Related

Identify Container Memory (MEM) consumption (what uses all the memory?)

I got a container (registry.access.redhat.com/ubi8/ubi-minimal) which runs a bash script (move files) in indefinitely loop
This test is with 900 files every minute and each file is ~ 1 KB (just a small XML)
here part of the yml file including the cmd executed by the pod
command: ["/bin/sh", "-c", "shopt -s nullglob && while true ; do for f in $vfsourcefolder/*.xml ; do randomNum=$(shuf -i $FolderStartNumber-$FolderEndNumber -n 1) ; mkdir -p $vfsourcefolder/$vfsubfolderprefix$randomNum ; mv $f $_ ; done ; done"]
livenessProbe:
exec:
command: ["/bin/sh", "-c", "test $checkFiles -gt $(ls -f $vfsourcefolder | wc -l)"]
initialDelaySeconds: 15
periodSeconds: 15
timeoutSeconds: 3
resources:
requests:
memory: 256Mi
cpu: 25m
limits:
memory: 4Gi
cpu: 2
after running for ~3days it consumes 3 GB of memory (according to kubectl top)
tilo#myserver:/$ kubectl top pod fs-probe-spreader1-0
NAME CPU(cores) MEMORY(bytes)
fs-probe-spreader1-0 217m 3207Mi
but I can't find out what takes all the memory.
The slabinfo shows lots of object in cifs_inode_cache and cifs_inode_cache
here stats from the pod
ps aux
top -b
df -TPh
cat /sys/fs/cgroup/memory/memory.usage_in_bytes
cat /sys/fs/cgroup/memory/memory.stat
cat /sys/fs/cgroup/memory/memory.kmem.slabinfo
[root#fs-probe-spreader1-0 /]# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 6.5 0.0 13148 3136 ? Ds Jan18 279:58 /bin/sh -c shopt -s nullglob && while true ; do for f in $vfsourcefolder/*.xml ; do randomNum=$(shuf -i $FolderStar
root 1266813 0.0 0.0 19352 3764 pts/0 Ss 23:12 0:00 bash
root 1372717 0.0 0.0 1092036 9720 ? Rsl 23:45 0:00 /usr/bin/runc init
root 1372719 0.0 0.0 51860 3676 pts/0 R+ 23:45 0:00 ps aux
[root#fs-probe-spreader1-0 /]# top -b
top - 23:53:56 up 4 days, 2:52, 0 users, load average: 2.46, 2.31, 2.26
Tasks: 3 total, 1 running, 2 sleeping, 0 stopped, 0 zombie
%Cpu(s): 5.0 us, 0.0 sy, 0.0 ni, 95.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 16009.0 total, 507.8 free, 6127.2 used, 9374.0 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 9551.1 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 root 20 0 13148 3136 2544 D 6.7 0.0 280:56.60 sh
1398222 root 20 0 19352 3716 3148 S 0.0 0.0 0:00.01 bash
1401883 root 20 0 56192 4208 3660 R 0.0 0.0 0:00.00 top
[root#fs-probe-spreader1-0 /]# df -TPh
Filesystem Type Size Used Avail Use% Mounted on
overlay overlay 97G 23G 75G 24% /
tmpfs tmpfs 64M 0 64M 0% /dev
tmpfs tmpfs 7.9G 0 7.9G 0% /sys/fs/cgroup
//myshare1.file.core.windows.net/mainfs cifs 100G 28G 73G 28% /trex/root
/dev/sda1 ext4 97G 23G 75G 24% /etc/hosts
shm tmpfs 64M 0 64M 0% /dev/shm
tmpfs tmpfs 4.0G 12K 4.0G 1% /run/secrets/kubernetes.io/serviceaccount
tmpfs tmpfs 7.9G 0 7.9G 0% /proc/acpi
tmpfs tmpfs 7.9G 0 7.9G 0% /proc/scsi
tmpfs tmpfs 7.9G 0 7.9G 0% /sys/firmware
[root#fs-probe-spreader1-0 /]# cat /sys/fs/cgroup/memory/memory.usage_in_bytes
3374436352
[root#fs-probe-spreader1-0 /]# cat /sys/fs/cgroup/memory/memory.stat
cache 19505152
rss 1482752
rss_huge 0
shmem 0
mapped_file 0
dirty 135168
writeback 0
pgpgin 989469294
pgpgout 989464142
pgfault 2149218225
pgmajfault 0
inactive_anon 0
active_anon 1368064
inactive_file 6352896
active_file 13246464
unevictable 0
hierarchical_memory_limit 4294967296
total_cache 19505152
total_rss 1482752
total_rss_huge 0
total_shmem 0
total_mapped_file 0
total_dirty 135168
total_writeback 0
total_pgpgin 989469294
total_pgpgout 989464142
total_pgfault 2149218225
total_pgmajfault 0
total_inactive_anon 0
total_active_anon 1368064
total_inactive_file 6352896
total_active_file 13246464
total_unevictable 0
[root#fs-probe-spreader1-0 /]# cat /sys/fs/cgroup/memory/memory.kmem.slabinfo
slabinfo - version: 2.1
# name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
kmalloc-rcl-128 64 64 128 32 1 : tunables 0 0 0 : slabdata 2 2 0
TCP 42 42 2240 14 8 : tunables 0 0 0 : slabdata 3 3 0
kmalloc-rcl-64 320 320 64 64 1 : tunables 0 0 0 : slabdata 5 5 0
kmalloc-rcl-96 126 126 96 42 1 : tunables 0 0 0 : slabdata 3 3 0
radix_tree_node 252 252 584 28 4 : tunables 0 0 0 : slabdata 9 9 0
UDPv6 96 96 1344 24 8 : tunables 0 0 0 : slabdata 4 4 0
kmalloc-96 168 168 96 42 1 : tunables 0 0 0 : slabdata 4 4 0
kmalloc-2k 64 64 2048 16 8 : tunables 0 0 0 : slabdata 4 4 0
cifs_inode_cache 3454395 3454395 776 21 4 : tunables 0 0 0 : slabdata 164495 164495 0
kmalloc-8 2048 2048 8 512 1 : tunables 0 0 0 : slabdata 4 4 0
buffer_head 5460 5460 104 39 1 : tunables 0 0 0 : slabdata 140 140 0
ext4_inode_cache 290 290 1096 29 8 : tunables 0 0 0 : slabdata 10 10 0
shmem_inode_cache 66 66 720 22 4 : tunables 0 0 0 : slabdata 3 3 0
ovl_inode 736 736 688 23 4 : tunables 0 0 0 : slabdata 32 32 0
pde_opener 408 408 40 102 1 : tunables 0 0 0 : slabdata 4 4 0
eventpoll_pwq 224 224 72 56 1 : tunables 0 0 0 : slabdata 4 4 0
kmalloc-1k 64 64 1024 16 4 : tunables 0 0 0 : slabdata 4 4 0
kmalloc-32 512 512 32 128 1 : tunables 0 0 0 : slabdata 4 4 0
kmalloc-4k 32 32 4096 8 8 : tunables 0 0 0 : slabdata 4 4 0
kmalloc-512 64 64 512 16 2 : tunables 0 0 0 : slabdata 4 4 0
skbuff_head_cache 64 64 256 16 1 : tunables 0 0 0 : slabdata 4 4 0
kmalloc-192 84 84 192 21 1 : tunables 0 0 0 : slabdata 4 4 0
inode_cache 104 104 608 26 4 : tunables 0 0 0 : slabdata 4 4 0
pid 128 128 128 32 1 : tunables 0 0 0 : slabdata 4 4 0
anon_vma 2028 2028 104 39 1 : tunables 0 0 0 : slabdata 52 52 0
vm_area_struct 837 912 208 19 1 : tunables 0 0 0 : slabdata 48 48 0
mm_struct 120 120 1088 30 8 : tunables 0 0 0 : slabdata 4 4 0
signal_cache 112 112 1152 28 8 : tunables 0 0 0 : slabdata 4 4 0
sighand_cache 60 60 2112 15 8 : tunables 0 0 0 : slabdata 4 4 0
anon_vma_chain 1957 2368 64 64 1 : tunables 0 0 0 : slabdata 37 37 0
files_cache 92 92 704 23 4 : tunables 0 0 0 : slabdata 4 4 0
task_delay_info 204 204 80 51 1 : tunables 0 0 0 : slabdata 4 4 0
kmalloc-64 3264 3264 64 64 1 : tunables 0 0 0 : slabdata 51 51 0
cred_jar 1323 1323 192 21 1 : tunables 0 0 0 : slabdata 63 63 0
task_struct 33 52 7680 4 8 : tunables 0 0 0 : slabdata 13 13 0
PING 64 64 1024 16 4 : tunables 0 0 0 : slabdata 4 4 0
sock_inode_cache 76 76 832 19 4 : tunables 0 0 0 : slabdata 4 4 0
proc_inode_cache 432 432 680 24 4 : tunables 0 0 0 : slabdata 18 18 0
dentry 3346497 3346497 192 21 1 : tunables 0 0 0 : slabdata 159357 159357 0
filp 576 576 256 16 1 : tunables 0 0 0 : slabdata 36 36 0

Why does Cassandra major compaction fail to clear expired tombstones?

We have deployed a global Apache Cassandra cluster (node: 12, RF: 3, version: 3.11.2) in our production environment. We are running into an issue where running major compaction on column family is failing to clear tombstones from one node (out of 3 replicas) even though metadata information shows min timestamp passed gc_grace_seconds set on the table.
Here is sstable metadata output
SSTable: mc-4302-big
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Bloom Filter FP chance: 0.010000
Minimum timestamp: 1
Maximum timestamp: 1560326019515476
SSTable min local deletion time: 1560233203
SSTable max local deletion time: 2147483647
Compressor: org.apache.cassandra.io.compress.LZ4Compressor
Compression ratio: 0.8808303792058351
TTL min: 0
TTL max: 0
First token: -9201661616334346390 (key=bca773eb-ecbb-49ec-9330-cc16da310b58:::)
Last token: 9117719078924671254 (key=7c23b975-5354-4c82-82e5-1762bac75a8d:::)
minClustringValues: [00000f8f-74a9-4ce3-9d87-0a4dabef30c1]
maxClustringValues: [ffffc966-a02c-4e1f-bdd1-256556624288]
Estimated droppable tombstones: 46.31761624099541
SSTable Level: 0
Repaired at: 0
Replay positions covered: {}
totalColumnsSet: 0
totalRows: 618382
Estimated tombstone drop times:
1560233680: 353
1560234658: 237
1560235604: 176
1560236803: 471
1560237652: 402
1560238342: 195
1560239166: 373
1560239969: 356
1560240586: 262
1560241207: 247
1560242037: 387
1560242847: 357
1560243742: 280
1560244469: 283
1560245095: 353
1560245957: 357
1560246773: 362
1560247956: 449
1560249034: 217
1560249849: 310
1560251080: 296
1560251984: 304
1560252993: 239
1560253907: 407
1560254839: 977
1560255761: 671
1560256486: 317
1560257199: 679
1560258020: 703
1560258795: 507
1560259378: 298
1560260093: 2302
1560260869: 2488
1560261535: 2818
1560262176: 2842
1560262981: 1685
1560263708: 1830
1560264308: 808
1560264941: 1990
1560265753: 1340
1560266708: 2174
1560267629: 2253
1560268400: 1627
1560269174: 2347
1560270019: 2579
1560270888: 3947
1560271690: 1727
1560272446: 2573
1560273249: 1523
1560274086: 3438
1560275149: 2737
1560275966: 3487
1560276814: 4101
1560277660: 2012
1560278617: 1198
1560279680: 769
1560280441: 1337
1560281033: 608
1560281876: 2065
1560282546: 2926
1560283128: 6305
1560283836: 824
1560284574: 71
1560285166: 140
1560285828: 118
1560286404: 83
1560295835: 72
1560296951: 456
1560297814: 670
1560298496: 271
1560299333: 473
1560300159: 284
1560300831: 127
1560301551: 536
1560302309: 425
1560303302: 860
1560304064: 465
1560304782: 319
1560305657: 323
1560306552: 236
1560307454: 368
1560308409: 320
1560309178: 210
1560310091: 177
1560310881: 85
1560311970: 147
1560312706: 76
1560313495: 88
1560314847: 687
1560315817: 1618
1560316544: 1245
1560317423: 5361
1560318491: 2060
1560319595: 5853
1560320587: 5390
1560321473: 3868
1560322644: 5784
1560323703: 6861
1560324838: 7200
1560325744: 5642
Count Row Size Cell Count
1 0 3054
2 0 0
3 0 0
4 0 0
5 0 0
6 0 0
7 0 0
8 0 0
10 0 0
12 0 0
14 0 0
17 0 0
20 0 0
24 0 0
29 0 0
35 0 0
42 0 0
50 0 0
60 98 0
72 49 0
86 46 0
103 2374 0
124 39 0
149 36 0
179 43 0
215 18 0
258 26 0
310 24 0
372 18 0
446 16 0
535 19 0
642 27 0
770 17 0
924 12 0
1109 14 0
1331 23 0
1597 20 0
1916 12 0
2299 11 0
2759 11 0
3311 11 0
3973 12 0
4768 5 0
5722 8 0
6866 5 0
8239 5 0
9887 6 0
11864 5 0
14237 10 0
17084 1 0
20501 8 0
24601 2 0
29521 2 0
35425 3 0
42510 2 0
51012 2 0
61214 1 0
73457 2 0
88148 3 0
105778 0 0
126934 3 0
152321 2 0
182785 1 0
219342 0 0
263210 0 0
315852 0 0
379022 0 0
454826 0 0
545791 0 0
654949 0 0
785939 0 0
943127 0 0
1131752 0 0
1358102 0 0
1629722 0 0
1955666 0 0
2346799 0 0
2816159 0 0
3379391 1 0
4055269 0 0
4866323 0 0
5839588 0 0
7007506 0 0
8409007 0 0
10090808 1 0
12108970 0 0
14530764 0 0
17436917 0 0
20924300 0 0
25109160 0 0
30130992 0 0
36157190 0 0
43388628 0 0
52066354 0 0
62479625 0 0
74975550 0 0
89970660 0 0
107964792 0 0
129557750 0 0
155469300 0 0
186563160 0 0
223875792 0 0
268650950 0 0
322381140 0 0
386857368 0 0
464228842 0 0
557074610 0 0
668489532 0 0
802187438 0 0
962624926 0 0
1155149911 0 0
1386179893 0 0
1663415872 0 0
1996099046 0 0
2395318855 0 0
2874382626 0
3449259151 0
4139110981 0
4966933177 0
5960319812 0
7152383774 0
8582860529 0
10299432635 0
12359319162 0
14831182994 0
17797419593 0
21356903512 0
25628284214 0
30753941057 0
36904729268 0
44285675122 0
53142810146 0
63771372175 0
76525646610 0
91830775932 0
110196931118 0
132236317342 0
158683580810 0
190420296972 0
228504356366 0
274205227639 0
329046273167 0
394855527800 0
473826633360 0
568591960032 0
682310352038 0
818772422446 0
982526906935 0
1179032288322 0
1414838745986 0
Estimated cardinality: 3054
EncodingStats minTTL: 0
EncodingStats minLocalDeletionTime: 1560233203
EncodingStats minTimestamp: 1
KeyType: org.apache.cassandra.db.marshal.CompositeType(org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type)
ClusteringTypes: [org.apache.cassandra.db.marshal.UUIDType]
StaticColumns: {}
RegularColumns: {}
So far here is what we have tried,
1) major compaction with lower gc_grace_seconds
2) nodetool garbagecollect
3) nodetool scrub
None of the above methods is helping. Again, this is only happening for one node (out of total 3 replicas)
The tombstone markers generated during your major compaction are just that, markers. The data has been removed but a delete marker is left in place so that the other replicas can have gc_grace_seconds to process them too. The tombstone markers are fully dropped the next time the SSTable is compacted. Unfortunately because you've run a major compaction (rarely ever recommended) it may be a long time until there are suitable SSTables for compaction with it to clean up the tombstones. Remember that the tombstone drop will also only happen after local_delete_time + gc_grace_seconds as defined by the table.
If you're interested in learning more about how tombstones and compaction work together in the context of delete operations I suggest reading the following articles:
https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/dml/dmlAboutDeletes.html
https://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html

How to check where did memory goes when oom-killer invoked in linux

In our system, oom-killer is invoked when we have large throughput, we believe that most of the memory should be consumed by kernel driver, but we can't find the dedicated consumer, really appreciate anyone can give some suggestion for it.
followed is the detailed log of dmesg
[14839.077171] passkey-agent invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
[14839.077187] CPU: 0 PID: 3443 Comm: passkey-agent Tainted: G O 4.1.35-rt41 #1
[14839.077190] Hardware name: LS1043A RDB Board (DT)
[14839.077193] Call trace:
[14839.079644] [<ffff8000000898f4>] dump_backtrace+0x0/0x154
[14839.079650] [<ffff800000089a5c>] show_stack+0x14/0x1c
[14839.079656] [<ffff8000008f3174>] dump_stack+0x90/0xb0
[14839.079663] [<ffff80000013eea4>] dump_header.isra.10+0x88/0x1b8
[14839.079668] [<ffff80000013f5f8>] oom_kill_process+0x210/0x3d4
[14839.079672] [<ffff80000013faec>] __out_of_memory.isra.15+0x330/0x374
[14839.079676] [<ffff80000013fd5c>] out_of_memory+0x5c/0x80
[14839.079682] [<ffff8000001442d8>] __alloc_pages_nodemask+0x55c/0x7c4
[14839.079687] [<ffff80000013e0cc>] filemap_fault+0x188/0x400
[14839.079693] [<ffff80000015f424>] __do_fault+0x3c/0x98
[14839.079698] [<ffff8000001641c8>] handle_mm_fault+0xc28/0x14f8
[14839.079704] [<ffff800000094c04>] do_page_fault+0x224/0x2b4
[14839.079709] [<ffff8000000822a0>] do_mem_abort+0x40/0xa0
[14839.079713] Exception stack(0xffff80001e47be20 to 0xffff80001e47bf50)
[14839.079719] be20: 00000000 00000000 000001f4 00000000 ffffffff ffffffff a6f90990 0000ffff
[14839.079725] be40: ffffffff ffffffff 3b969772 00000000 dbbcc280 0000ffff 00085db0 ffff8000
[14839.079730] be60: 00000000 00000000 000001f4 00000000 ffffffff ffffffff a6f90990 0000ffff
[14839.079736] be80: 1e47bea0 ffff8000 000895f8 ffff8000 00000008 00000000 00085b90 ffff8000
[14839.079742] bea0: dbbcc280 0000ffff 00085c9c ffff8000 00000000 00000000 0ee34088 00000000
[14839.079747] bec0: 00000000 00000000 00000001 00000000 dbbcc2b0 0000ffff 00000000 00000000
[14839.079752] bee0: 00000000 00000000 00000000 00000000 000f4240 00000000 00000000 00000000
[14839.079758] bf00: 00000049 00000000 0000001c 00000000 0000011b 00000000 00000013 00000000
[14839.079763] bf20: 00000028 00000000 00000000 00000000 a71c1c20 0000ffff 00000000 003b9aca
[14839.079767] bf40: a720a990 0000ffff a6f90918 0000ffff
[14839.082683] Mem-Info:
[14839.082700] active_anon:16910 inactive_anon:6202 isolated_anon:0
active_file:15 inactive_file:0 isolated_file:26
unevictable:62887 dirty:0 writeback:0 unstable:0
slab_reclaimable:944 slab_unreclaimable:8027
mapped:5421 shmem:2349 pagetables:527 bounce:0
free:5120 free_pcp:627 free_cma:0
[14839.082719] DMA free:20480kB min:22528kB low:28160kB high:33792kB active_anon:67640kB inactive_anon:24808kB active_file:60kB inactive_file:0kB unevictable:251548kB isolated(anon):0kB isolated(file):104kB present:1046528kB managed:890652kB mlocked:251548kB dirty:0kB writeback:0kB mapped:21684kB shmem:9396kB slab_reclaimable:3776kB slab_unreclaimable:32108kB kernel_stack:6064kB pagetables:2108kB unstable:0kB bounce:0kB free_pcp:2508kB local_pcp:424kB free_cma:0kB writeback_tmp:0kB pages_scanned:208 all_unreclaimable? no
[14839.082723] lowmem_reserve[]: 0 0 0
[14839.082729] DMA: 755*4kB (EM) 486*8kB (UEM) 617*16kB (UEM) 2*32kB (M) 1*64kB (R) 2*128kB (R) 1*256kB (R) 0*512kB 1*1024kB (R) 1*2048kB (R) 0*4096kB = 20492kB
[14839.082752] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[14839.082755] 7756 total pagecache pages
[14839.082760] 0 pages in swap cache
[14839.082763] Swap cache stats: add 0, delete 0, find 0/0
[14839.082765] Free swap = 0kB
[14839.082768] Total swap = 0kB
[14839.082856] 261632 pages RAM
[14839.082858] 0 pages HighMem/MovableOnly
[14839.082861] 34873 pages reserved
[14839.082863] 4096 pages cma reserved
[14839.082867] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
[14839.082890] [ 1353] 0 1353 876 161 7 3 0 -1000 udevd
[14839.082899] [ 1863] 999 1863 695 48 5 3 0 0 dbus-daemon
[14839.082906] [ 1944] 0 1944 833 23 5 3 0 0 syslogd
[14839.082913] [ 1947] 0 1947 833 18 5 3 0 0 klogd
[14839.082919] [ 1990] 0 1990 2307 686 8 2 0 0 php-fpm
[14839.082925] [ 1991] 65534 1991 2307 857 8 2 0 0 php-fpm
[14839.082932] [ 1992] 65534 1992 2307 857 8 2 0 0 php-fpm
[14839.082938] [ 1999] 0 1999 720 31 5 3 0 0 bash
[14839.083042] [ 2001] 0 2001 1083 393 6 3 0 0 start_appli
[14839.083049] [ 2010] 0 2010 849 26 5 3 0 0 getty
[14839.083055] [ 2115] 0 2115 1262 96 6 4 0 -1000 sshd
[14839.083062] [ 3051] 0 3051 2709 210 6 2 0 0 optf_write
[14839.083068] [ 3052] 0 3052 1719 686 7 2 0 0 launcher
[14839.083074] [ 3055] 0 3055 5056 4196 13 2 0 0 globMW0
[14839.083081] [ 3066] 0 3066 10430 6805 27 2 0 0 confd
[14839.083088] [ 3085] 0 3085 9735 7449 23 2 0 0 hal0
[14839.083095] [ 3086] 0 3086 7781 6642 19 2 0 0 SystemMgr
[14839.083102] [ 3087] 0 3087 7455 6372 20 2 0 0 HWMgr
[14839.083108] [ 3088] 0 3088 8319 7118 20 2 0 0 SWMgr
[14839.083115] [ 3089] 0 3089 7824 6696 19 2 0 0 FaultMgr
[14839.083121] [ 3090] 0 3090 7488 6359 20 2 0 0 TSMgr
[14839.083127] [ 3091] 0 3091 7009 6144 20 2 0 0 SecurityMgr
[14839.083133] [ 3092] 0 3092 7736 6337 20 2 0 0 DHCPRelayMgr
[14839.083225] [ 3093] 0 3093 8747 6555 21 2 0 0 ItfMgr
[14839.083232] [ 3094] 0 3094 8192 6686 21 2 0 0 WlanItfMgr
[14839.083239] [ 3095] 0 3095 7602 6518 20 2 0 0 L2Mgr
[14839.083246] [ 3096] 0 3096 7399 6017 20 2 0 0 QoSMgr
[14839.083252] [ 3097] 0 3097 8647 6486 21 2 0 0 L3Mgr
[14839.083258] [ 3098] 0 3098 7482 6356 17 2 0 0 MulticastMgr
[14839.083264] [ 3099] 0 3099 7783 6609 21 2 0 0 DHCPMgr
[14839.083271] [ 3100] 0 3100 6864 6409 16 2 0 0 CallHomeMgr
[14839.083279] [ 3422] 0 3422 472 23 4 3 0 0 hciattach
[14839.083286] [ 3426] 0 3426 1035 50 6 3 0 0 bluetoothd
[14839.083292] [ 3443] 0 3443 2039 112 8 3 0 0 passkey-agent
[14839.083298] [ 3462] 0 3462 3852 2368 11 3 0 0 dhcpd
[14839.083304] [ 3517] 0 3517 860 161 7 3 0 -1000 udevd
[14839.083393] [ 3518] 0 3518 860 161 7 3 0 -1000 udevd
[14839.083400] [ 3650] 0 3650 1629 132 6 3 0 0 wpa_supplicant
[14839.083406] [ 3720] 0 3720 3134 1711 10 3 0 0 dhclient
[14839.083412] [ 3747] 0 3747 891 149 6 3 0 0 zebra
[14839.083419] [ 3751] 0 3751 834 132 7 3 0 0 ripd
[14839.083425] [ 3949] 0 3949 1037 67 6 4 0 0 ntpd
[14839.083431] [ 8000] 0 8000 721 33 5 3 0 0 sh
[14839.083436] Out of memory: Kill process 3085 (hal0) score 32 or sacrifice child
[14839.083447] Killed process 3085 (hal0) total-vm:38940kB, anon-rss:15236kB, file-rss:14560kB
We have 1G memory in total, and I can found that slab occupied about 35M (3776kB + 32108kB), the kernel stuck is 6064kB, and active_anon + inactive_anon is about 92M (67640kB + 24808kB), and the user space memory consumption is normal as usual.
So where did the memory left goes ? How can I check with it ?
For example, how can I check how much memory it consumed by a dedicated driver of pcie network card?

Benchmarking CPU and File IO for an application running on Linux

I wrote two programs to run on Linux, each using a different algorithm, and I want to find a way (preferably using a benchmarking software) to compare the CPU usage and IO operations between these two programs.
Is there such a thing? and if yes, where can I find them. Thanks.
You can try hardinfo
Or there are like n different tools measuring system performance if measuring it while running your app solves your purpose
And you can also check this thread
You might try vmstat command:
vmstat 2 20 > vmstat.txt
20 samples of 2 seconds
bi = KB in, bo = KB out with wa = waiting for I/O
I/O can also increase cache demands
%CPU utilisation = us (user) = sy (system)
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 0 277504 17060 82732 0 0 91 87 1432 236 11 3 84 1
0 0 0 277372 17068 82732 0 0 0 24 1361 399 23 8 59 10
test start
0 1 0 275240 17068 82732 0 0 0 512 1342 305 24 4 69 4
2 1 0 275232 17068 82780 0 0 24 10752 4176 216 7 8 0 85
1 1 0 275240 17076 82732 0 0 12288 2590 5295 243 15 8 0 77
0 1 0 275240 17076 82748 0 0 8 11264 4329 214 6 12 0 82
0 1 0 275240 17076 82780 0 0 16 11264 4278 233 15 10 0 75
0 1 0 275240 17084 82780 0 0 19456 542 6563 255 10 7 0 83
0 1 0 275108 17084 82748 0 0 5128 3072 3501 265 16 37 0 47
3 1 0 275108 17084 82748 0 0 924 5120 8369 3845 12 33 0 55
0 1 0 275116 17092 82748 0 0 1576 85 11483 6645 5 50 0 45
1 1 0 275116 17092 82748 0 0 0 136 2304 689 3 9 0 88
2 1 0 275084 17100 82732 0 0 0 352 2374 800 14 26 0 61
0 0 0 275076 17100 82732 0 0 546 118 2408 1014 35 17 47 1
0 1 0 275076 17104 82732 0 0 0 62 1324 76 3 2 89 7
1 1 0 275076 17108 82732 0 0 0 452 1879 442 8 13 66 12
0 0 0 275116 17108 82732 0 0 800 352 2456 1195 19 17 56 8
0 1 0 275116 17112 82732 0 0 0 54 1325 76 4 1 88 8
test end
1 1 0 275116 17116 82732 0 0 0 510 1717 286 6 10 72 11
1 0 0 275076 17116 82732 0 0 1600 1152 3087 1344 23 29 41 7

cstime error in /proc/pid/stat file

stime or cstime in /proc/pid/stat file is so huge that doesn't make any sense. But just some processes have this wrong cstime on occasion. Just as following:
# ps -eo pid,ppid,stime,etime,time,%cpu,%mem,command |grep nsc
4815 1 Jan08 1-01:20:02 213503-23:34:33 20226149 0.1 /usr/sbin/nscd
#
# cat /proc/4815/stat
4815 (nscd) S 1 4815 4815 0 -1 4202560 2904 0 0 0 21 1844674407359 0 0 20 0 9 0 4021 241668096 326 18446744073709551615 139782748139520 139782748261700 140737353849984 140737353844496 139782734487251 0 0 3674112 16390 18446744073709551615 0 0 17 1 0 0 0 0 0
You can see the stime of proc 4815, nscd, is 1844674407359, equal to 213503-23:34:33, but has just been running for 1-01:20:02.
Another problem process has wrong cstime is following:
a bash fork a sh, which fork a sleep.
8155 (bash) S 3124 8155 8155 0 -1 4202752 1277 6738 0 0 3 0 4 1844674407368 20 0 1 0 1738175 13258752 451 18446744073709551615 4194304 4757932 140736528897536 140736528896544 47722675403157 0 65536 4100 65538 18446744071562341410 0 0 17 5 0 0 0 0 0
8184 (sh) S 8155 8155 8155 0 -1 4202496 475 0 0 0 0 0 0 0 20 0 1 0 1738185 11698176 357 18446744073709551615 4194304 4757932 140733266239472 140733266238480 47964680542613 0 65536 4100 65538 18446744071562341410 0 0 17 6 0 0 0 0 0
8185 (sleep) S 8184 8155 8155 0 -1 4202496 261 0 0 0 0 0 0 0 20 0 1 0 1738186 8577024 177 18446744073709551615 4194304 4212204 140734101195248 140734101194776 48002231427168 0 0 4096 0 0 0 0 17 7 0 0 0 0 0
So you can see that cstime in proc bash is 1844674407368, which is much larger than total cpu time of its children.
My server has one Intel(R) Xeon(R) CPU E5620 # 2.40GHz, which is 4 cores and 8 threads. Operating system is Suse Linux Enterprise Server SP1 x86_64, as following
# lsb_release -a
LSB Version: core-2.0-noarch:core-3.2-noarch:core-4.0-noarch:core-2.0-x86_64:core-3.2-x86_64:core-4.0-x86_64:desktop-4.0-amd64:desktop-4.0-noarch:graphics-2.0-amd64:graphics-2.0-noarch:graphics-3.2-amd64:graphics-3.2-noarch:graphics-4.0-amd64:graphics-4.0-noarch
Distributor ID: SUSE LINUX
Description: SUSE Linux Enterprise Server 11 (x86_64)
Release: 11
Codename: n/a
#
# uname -a
Linux node2 2.6.32.12-0.7-xen #1 SMP 2010-05-20 11:14:20 +0200 x86_64 x86_64 x86_64 GNU/Linux
So is it the kernel's problem? Can anybody help fix it?
I suspect that you might be simply seeing a kernel bug. Update to the latest offered update kernel for SLES (which is something like 2.6.32.42 or so) and see if it still occurs. Btw, it's stime, not cstime, that is unusually high—in fact, looking close you will notice it is a value that is like a string truncation of 18446744073709551615 (2^64-1) ±a few clocks offset.
pid_nr: 4815
tcomm: (nscd)
state: S
ppid: 1
pgid: 4815
sid: 4815
tty_nr: 0
tty_pgrp: -1
task_flags: 4202560 / 0x402040
min_flt: 2904
cmin_flt: 0
max_flt: 0
cmax_flt: 0
utime: 21 clocks (= 21 clocks) (= 0.210000 s)
stime: 1844674407359 clocks (= 1844674407359 clocks) (= 18446744073.590000 s)
cutime: 0 clocks (= 0 clocks) (= 0.000000 s)
cstime: 0 clocks (= 0 clocks) (= 0.000000 s)
priority: 20
nice: 0
num_threads: 9
always-zero: 0
start_time: 4021
vsize: 241668096
get_mm_rss: 326
rsslim: 18446744073709551615 / 0xffffffffffffffff
mm_start_code: 139782748139520 / 0x7f21b50c7000
mm_end_code: 139782748261700 / 0x7f21b50e4d44
mm_start_stack: 140737353849984 / 0x7ffff7fb9c80
esp: 140737353844496 / 0x7ffff7fb8710
eip: 139782734487251 / 0x7f21b43c1ed3
obsolete-pending-signals: 0 / 0x0
obsolete-blocked-signals: 0 / 0x0
obsolete-sigign: 3674112 / 0x381000
obsolete-sigcatch: 16390 / 0x4006
wchan: 18446744073709551615 / 0xffffffffffffffff
always-zero: 0
always-zero: 0
task_exit_signal: 17
task_cpu: 1
task_rt_priority: 0
task_policy: 0
delayacct_blkio_ticks: 0
gtime: 0 clocks (= 0 clocks) (= 0.000000 s)
cgtime: 0 clocks (= 0 clocks) (= 0.000000 s)

Resources