CPU stuck Centos 7.2 on Azure - azure

i'm running a CentOS 7.2 VM on Azure and get a CPU stuck kernel-bug warning. top shows that CPU#0 is 100% in use.
[admin#bench2 ~]$
Message from syslogd#bench2 at Feb 9 10:06:43 ...
kernel:BUG: soft lockup - CPU#0 stuck for 22s! [kworker/u128:1:13777]
This is the topoutput:
Tasks: 258 total, 7 running, 251 sleeping, 0 stopped, 0 zombie
%Cpu0 : 0.0 us,100.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu1 : 0.3 us, 0.3 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu2 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 28813448 total, 26938144 free, 653860 used, 1221444 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 27557900 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
73 root 20 0 0 0 0 S 0.7 0.0 1:03.03 rcu_sched
1 root 20 0 43668 6204 3796 S 0.0 0.0 0:04.70 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.03 kthreadd
3 root 20 0 0 0 0 R 0.0 0.0 0:00.10 ksoftirqd/0
5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H
Centos + Kernel Version:
CentOS Linux release 7.1.1503 (Core)
Linux bench2 3.10.0-229.7.2.el7.x86_64 #1 SMP Tue Jun 23 22:06:11 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
If noticed that this error also appears on CentOS 7.2 versions.
[84176.089080] BUG: soft lockup - CPU#0 stuck for 23s! [kworker/u128:1:13777]
[84176.089080] Modules linked in: vfat fat isofs xfs libcrc32c iptable_filter ip_tables udf crc_itu_t hyperv_fb hyperv_keyboard hv_utils i2c_piix4 i2c_core serio_raw pcspkr crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_common hv_netvsc hv_storvsc hid_hyperv sr_mod cdrom ata_generic pata_acpi ata_piix libata floppy hv_vmbus
[84176.089080] CPU: 0 PID: 13777 Comm: kworker/u128:1 Tainted: G W -------------- 3.10.0-229.7.2.el7.x86_64 #1
[84176.089080] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090006 05/23/2012
If this version does problems on Azure it is no problem to switch it. If this is the case, I want to know which CentOS version would be the best to run on an Azure environment.

I solved the problem by setting the Host caching on VHD to None. Odd behaviour but it works.
see screen here

I had the same issue, it's a disk performance issue (high IOPS/Latency etc.), not related to CPU or RAM (at least in my case).
The storage (NetApp) was very loaded, I solve it by moving to SSD, even using a large raid group with HDD (without a special load) didn't help.
We used a K8S setup, but I saw it on lot of CentOS with simple applications as well.
Regards,

Related

Ratio of sched_rt_runtime_us to sched_rt_period_us is NOT shown in 'top' in Linux

I am checking the impact of Linux's sched_rt_runtime_us.
My understanding of the Linux RT scheduling is sched_rt_period_us defines scheduling period of RT process, and sched_rt_runtime_us defines how much the RT process can run within that period.
In my Linux-4.18.20, the kernel.sched_rt_period_us = 1000000, kernel.sched_rt_runtime_us = 950000, so in each second, 95% time is used by RT process, 5% is for SCHED_OTHER processes.
By changing the kernel.sched_rt_runtime_us, the CPU usage of RT process shown in top should be proportional with sched_rt_runtime_us/sched_rt_period_us.
But my testing does NOT get the expected results, and what I got is as follows,
%CPU
kernel.sched_rt_runtime_us = 50000
2564 root rt 0 4516 748 684 R 19.9 0.0 0:37.82 testsched_top
kernel.sched_rt_runtime_us = 100000
2564 root rt 0 4516 748 684 R 40.5 0.0 0:23.16 testsched_top
kernel.sched_rt_runtime_us = 150000
2564 root rt 0 4516 748 684 R 60.1 0.0 0:53.29 testsched_top
kernel.sched_rt_runtime_us = 200000
2564 root rt 0 4516 748 684 R 80.1 0.0 1:24.96 testsched_top
The testsched_top is a SCHED_FIFO process with priority 99, and it is running in an isolated CPU.
The cgroup is configured in grub.cfg as cgroup_disable=cpuset,cpu,cpuacct to disable CPU related stuff.
I don't know why this happens, is there anything missing or wrong in my testing and understanding of Linux SCHED_FIFO scheduling?
N.B.: I am running this in Ubuntu VM, which is configured with 8 vCPUs, in which 4-7 are isolated to run RT processes. The host is Intel X86_64 with 6Cores (12 Threads), and there is NO other VMs running in the host. The above testsched_top was copied from https://viviendolared.blogspot.com/2017/03/death-by-real-time-scheduling.html?m=0, it sets priority 99 for SCHED_FIFO and loops indefinitely in one isolated CPU. I checked that isolated CPU usage, and got above results. –
I think I got the answer, and thank Rachid for the question.
In short, the kernel.sched_rt_period_us is the sum of RT time slice in a group of CPUs.
For example, in my 8vCPU VM configuration, CPU4-7 are isolated for running specific processes. So the kernel.sched_rt_period_us should be evenly divided among these 4 isolated CPUs, which means kernel.sched_rt_period_us/4 = 250000 is 100% CPU quota for each CPU in the isolated group. Setting kernel.sched_rt_period_us to 250000 makes the SCHED_FIFO process take all of the CPU. Accordingly, 25000 means 10% CPU usage for the CPU, 50000 means 20%, etc.
This is validated when CPU6 and CPU7 are isolated, in this case, 500000 can make the CPU to be 100% used by SCHED_FIFO process, 250000 makes 50% CPU usage.
Since these two kernel parameters are global ones, which means if the SCHED_FIFO process is put into the CPU0-5, 1000000/6 = 166000 should be the 100% quota for each CPU, 83000 makes 50% CPU usage, I also validated this.
Here is the snapshot of top,
%Cpu4 : 49.7 us, 0.0 sy, 0.0 ni, 50.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu5 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu6 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu7 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 16422956 total, 14630144 free, 964880 used, 827932 buff/cache
KiB Swap: 1557568 total, 1557568 free, 0 used. 15245156 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3748 root rt 0 4516 764 700 R 49.5 0.0 30:21.03 testsched_top

Apache Spark - one Spark core divided into several CPU cores

I have a question about Apache Spark. I set up an Apache Spark standalone cluster on my Ubuntu desktop. Then I wrote two lines in the spark_env.sh file: SPARK_WORKER_INSTANCES=4 and SPARK_WORKER_CORES=1. (I found that export is not necessary in spark_env.sh file if I start the cluster after I edit the spark_env.sh file.)
I wanted to have 4 worker instances in my single desktop and let them occupy 1 CPU core each. And the result was like this:
top - 14:37:54 up 2:35, 3 users, load average: 1.30, 3.60, 4.84
Tasks: 255 total, 1 running, 254 sleeping, 0 stopped, 0 zombie
%Cpu0 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu1 : 1.7 us, 0.3 sy, 0.0 ni, 98.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu2 : 41.6 us, 0.0 sy, 0.0 ni, 58.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu4 : 0.3 us, 0.0 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu5 : 0.3 us, 0.3 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu6 : 59.0 us, 0.0 sy, 0.0 ni, 41.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu7 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 16369608 total, 11026436 used, 5343172 free, 62356 buffers
KiB Swap: 16713724 total, 360 used, 16713364 free. 2228576 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
10829 aaaaaa 20 0 42.624g 1.010g 142408 S 101.2 6.5 0:22.78 java
10861 aaaaaa 20 0 42.563g 1.044g 142340 S 101.2 6.7 0:22.75 java
10831 aaaaaa 20 0 42.704g 1.262g 142344 S 100.8 8.1 0:24.86 java
10857 aaaaaa 20 0 42.833g 1.315g 142456 S 100.5 8.4 0:26.48 java
1978 aaaaaa 20 0 1462096 186480 102652 S 1.0 1.1 0:34.82 compiz
10720 aaaaaa 20 0 7159748 1.579g 32008 S 1.0 10.1 0:16.62 java
1246 root 20 0 326624 101148 65244 S 0.7 0.6 0:50.37 Xorg
1720 aaaaaa 20 0 497916 28968 20624 S 0.3 0.2 0:02.83 unity-panel-ser
2238 aaaaaa 20 0 654868 30920 23052 S 0.3 0.2 0:06.31 gnome-terminal
I think java in the first 4 lines are Spark workers. If it's correct, it's nice that there are four Spark workers and each of them are using 1 physical core each (e.g., 101.2%).
But I see that 5 physical cores are used. Among them, CPU0, CPU3, CPU7 are fully used. I think one Spark worker is using one of those physical cores. It's fine.
However, the usage levels of CPU2 and CPU6 are 41.6% and 59.0%, respectively. They add up to 100.6%, and I think one worker's job is distributed to those 2 physical cores.
With SPARK_WORKER_INSTANCES=4 AND SPARK_WORKER_CORES=1, is this a normal situation? Or is this a sign of some errors or problems?
This is perfectly normal behavior. Whenever Spark uses term core it actually means either process or thread and neither one is bound to a single core or processor.
In any multitasking environment processes are not executed continuously. Instead, operating system is constantly switching between different processes which each one getting only small share of available processor time.

linux network interface irq smp_affinity

I am doing network performance tests, and realized the interface's interrupts processing on 8 cpus is not balanced. So I want to make them more balanced.
I just set the files :
echo 11 > /proc/irq/16/smp_affinity
echo 22 > /proc/irq/17/smp_affinity
echo 44 > /proc/irq/18/smp_affinity
echo 88 > /proc/irq/19/smp_affinity
where the 16 17 18 and 19 are my four IRQ no of my network interfaces.
[root#localhost ~]# cat /proc/interrupts | grep ens
16: 30490 0 16838 427032 379 0 10678 0 IO-APIC-fasteoi vmwgfx, ens34, ens42
17: 799858 0 68176 0 78056 0 44715 0 IO-APIC-fasteoi ioc0, ens35, ens43, ens39
18: 2673 0 6149 0 7651 0 5585 0 IO-APIC-fasteoi uhci_hcd:usb2, snd_ens1371, ens40, ens44
19: 145769 1431206 0 0 0 0 305 0 IO-APIC-fasteoi ehci_hcd:usb1, ens41, ens45, ens33
But, sadly, I still found the IRQ is not balanced over the CPUs:
Tasks: 263 total, 2 running, 261 sleeping, 0 stopped, 0 zombie
%Cpu0 : 7.5 us, 10.0 sy, 0.0 ni, 65.3 id, 0.0 wa, 0.4 hi, 16.7 si, 0.0 st
%Cpu1 : 9.7 us, 15.0 sy, 0.0 ni, 59.1 id, 0.0 wa, 0.0 hi, 16.2 si, 0.0 st
%Cpu2 : 11.7 us, 21.6 sy, 0.0 ni, 66.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 : 10.4 us, 16.6 sy, 0.0 ni, 66.0 id, 0.0 wa, 0.0 hi, 6.9 si, 0.0 st
%Cpu4 : 10.9 us, 24.5 sy, 0.0 ni, 64.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu5 : 11.8 us, 29.4 sy, 0.0 ni, 58.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu6 : 9.0 us, 19.8 sy, 0.0 ni, 71.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu7 : 11.5 us, 22.6 sy, 0.0 ni, 65.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
So, why the IRQs not occur on all CPUs?
How can I balance the irq processing over all CPUs?

Memory usage up 105% on mediatemple

Three hours ago the server memory usage blowed up to 105% from around 60%.I am using a dedicated MediaTemple server with 512mb RAM.Should I be worried?Why would something like this happen?
Any help would be greatly appreciated.
Tasks: 38 total, 2 running, 36 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 946344k total, 550344k used, 396000k free, 0k buffers
Swap: 0k total, 0k used, 0k free, 0k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 root 15 0 10364 740 620 S 0.0 0.1 0:38.54 init
3212 root 18 0 96620 4068 3200 R 0.0 0.4 0:00.21 sshd
3214 root 15 0 12080 1728 1316 S 0.0 0.2 0:00.05 bash
3267 apache 15 0 412m 43m 4396 S 0.0 4.7 0:03.88 httpd
3290 apache 15 0 412m 43m 4340 S 0.0 4.7 0:02.98 httpd
3348 root 15 0 114m 52m 2112 S 0.0 5.6 0:48.94 spamd
3349 popuser 15 0 114m 50m 972 S 0.0 5.5 0:00.06 spamd
3455 sw-cp-se 18 0 60116 3216 1408 S 0.0 0.3 0:00.12 sw-cp-serverd
3525 admin 18 0 81572 4604 2912 S 0.0 0.5 0:01.74 in.proftpd
3585 apache 18 0 379m 15m 3356 S 0.0 1.7 0:00.01 httpd
3589 root 15 0 12624 1224 956 R 0.0 0.1 0:00.00 top
7397 root 15 0 21660 944 712 S 0.0 0.1 0:00.58 xinetd
9500 named 16 0 301m 5284 1968 S 0.0 0.6 0:00.43 named
9575 root 15 -4 12632 680 356 S 0.0 0.1 0:00.00 udevd
9788 root 25 0 13184 608 472 S 0.0 0.1 0:00.00 couriertcpd
9790 root 25 0 3672 380 312 S 0.0 0.0 0:00.00 courierlogger
9798 root 25 0 13184 608 472 S 0.0 0.1 0:00.00 couriertcpd
First analyze the process which was taking that much of CPU by the same top command. If the process was a multi-threaded program use the following top command:
top -H -p "pid of that process"
It will help you find the thread whichever is taking a lot of CPU for further diagnosis.

Can I measure memory taken by mod_perl?

Problem: my mod_perl leaks and I cannot control it.
I run mod_perl script under Ubuntu (production code).
Usually there are 8-10 script instances running concurrently.
According to Unix "top" utilty each instance takes 55M of memory.
55M is a lot, but I was told here that most of this memory is shared.
The memory is leaking.
There are 512M on the server.
There is a significant decrease of free memory in 24 hours after reboot.
Test: free memory on the system at the moment 10 scripts are running:
-after reboot: 270M
-in 24 hours since reboot: 50M
In 24 hours memory taken by each script is roughly the same - 55M (according to "top" utility).
I don't understand where the memory leakes out.
And don't know how can I find the leaks.
I share memory, I preload all the modules required by the script in startup.pl.
One more test.
A very simple mod_perl script ("Hello world!") takes 52M (according to "top")
According to "Practical mod_perl" I can use GTop utility to measure the real memory taken by mod_perl.
I have made a very simple script that measures the memory with GTop.
It shows there are 54M real memory taken by a very simple perl script!
54 Megabytes by "Hello world"?!!!
proc-mem-size: 59,707392
proc-mem-share: 52,59264
diff: 54,448128
There must be something wrong in the way I measure mod_perl memory.
Help please!
This problem is driving me mad for several days.
These are the snapshots of "top" output after reboot and in 24 hours after reboot.
The processes are sorted by Memory.
---- RIGHT AFTER REBOOT ----
top - 10:25:24 up 55 min, 2 users, load average: 0.10, 0.07, 0.07
Tasks: 59 total, 3 running, 56 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.0%sy, 0.0%ni, 97.3%id, 0.7%wa, 0.0%hi, 0.0%si, 2.0%st
Mem: 524456k total, 269300k used, 255156k free, 12024k buffers
Swap: 0k total, 0k used, 0k free, 71276k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2307 www-data 15 0 58500 27m 5144 S 0.0 5.3 0:02.02 apache2
2301 www-data 15 0 58492 27m 4992 S 0.0 5.3 0:02.09 apache2
2302 www-data 15 0 57936 26m 4960 R 0.0 5.2 0:01.74 apache2
2895 www-data 15 0 57812 26m 5048 S 0.0 5.2 0:00.98 apache2
2903 www-data 15 0 56944 26m 4792 S 0.0 5.1 0:01.12 apache2
2886 www-data 15 0 56860 26m 4784 S 0.0 5.1 0:01.20 apache2
2896 www-data 15 0 56520 26m 4804 S 0.0 5.1 0:00.85 apache2
2911 www-data 15 0 56404 25m 4768 S 0.0 5.1 0:00.87 apache2
2901 www-data 15 0 56520 25m 4744 S 0.0 5.1 0:00.84 apache2
2893 www-data 15 0 56608 25m 4740 S 0.0 5.1 0:00.73 apache2
2277 root 15 0 51504 22m 6332 S 0.0 4.5 0:01.02 apache2
2056 mysql 18 0 98628 21m 5164 S 0.0 4.2 0:00.64 mysqld
3162 root 15 0 6356 3660 1276 S 0.0 0.7 0:00.00 vi
2622 root 15 0 8584 2980 2392 R 0.0 0.6 0:00.07 sshd
3083 root 15 0 8448 2968 2392 S 0.0 0.6 0:00.06 sshd
3164 par 15 0 5964 2828 1868 S 0.0 0.5 0:00.05 proftpd
1 root 18 0 3060 1900 576 S 0.0 0.4 0:00.00 init
2690 root 17 0 4272 1844 1416 S 0.0 0.4 0:00.00 bash
3151 root 15 0 4272 1844 1416 S 0.0 0.4 0:00.00 bash
2177 root 15 0 8772 1640 520 S 0.0 0.3 0:00.00 sendmail-mta
2220 proftpd 15 0 5276 1448 628 S 0.0 0.3 0:00.00 proftpd
2701 root 15 0 2420 1120 876 R 0.0 0.2 0:00.09 top
1966 root 18 0 5396 1084 692 S 0.0 0.2 0:00.00 sshd
---- ROUGHLY IN 24 HOURS AFTER REBOOT
top - 17:45:38 up 23:39, 1 user, load average: 0.02, 0.09, 0.11
Tasks: 55 total, 2 running, 53 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 524456k total, 457660k used, 66796k free, 127780k buffers
Swap: 0k total, 0k used, 0k free, 114620k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
16248 www-data 15 0 63712 35m 6668 S 0.0 6.8 0:23.79 apache2
19417 www-data 15 0 60396 31m 6472 S 0.0 6.2 0:10.95 apache2
19419 www-data 15 0 60276 31m 6376 S 0.0 6.1 0:11.71 apache2
19321 www-data 15 0 60480 29m 4888 S 0.0 5.8 0:11.51 apache2
21241 www-data 15 0 58632 29m 6260 S 0.0 5.8 0:05.18 apache2
22063 www-data 15 0 57400 28m 6396 S 0.0 5.6 0:02.05 apache2
21240 www-data 15 0 58520 27m 4856 S 0.0 5.5 0:04.60 apache2
21236 www-data 15 0 58244 27m 4868 S 0.0 5.4 0:05.24 apache2
22499 www-data 15 0 56736 26m 4776 S 0.0 5.1 0:00.70 apache2
2055 mysql 15 0 100m 25m 5656 S 0.0 5.0 0:20.95 mysqld
2277 root 18 0 51500 22m 6332 S 0.0 4.5 0:01.07 apache2
22686 www-data 15 0 53004 21m 4092 S 0.0 4.3 0:00.21 apache2
22689 root 15 0 8584 2980 2392 R 0.0 0.6 0:00.06 sshd
2176 root 15 0 8768 1928 736 S 0.0 0.4 0:00.00 sendmail-
+mta
1 root 18 0 3064 1900 576 S 0.0 0.4 0:00.02 init
22757 root 15 0 4268 1844 1416 S 0.0 0.4 0:00.00 bash
2220 proftpd 18 0 5276 1448 628 S 0.0 0.3 0:00.00 proftpd
22768 root 15 0 2424 1100 876 R 0.0 0.2 0:00.00 top
1965 root 15 0 5400 1088 692 S 0.0 0.2 0:00.00 sshd
2258 root 18 0 3416 1036 820 S 0.0 0.2 0:00.01 cron
1928 klog 25 0 2248 1008 420 S 0.0 0.2 0:00.04 klogd
1946 messageb 19 0 2648 804 596 S 0.0 0.2 0:01.63 dbus-daem
+on
1908 syslog 18 0 2016 716 556 S 0.0 0.1 0:00.17 syslogd
It doesn't actually look like the number of apache/mod_perl processes in existence or the memory they use has changed much between the two reports you post. I note you did not post the header for the second report. It would be interesting to see the "cached" figure after 24 hours. I am going to go out on a limb and guess that this is where your memory is going - Linux is using it for caching file I/O. You can think of the file I/O cache as essentially free memory, since Linux will make that memory available if processes need it.
You can also check that this is what's going on by performing
sync; echo 3 > /proc/sys/vm/drop_caches
as root to cause the memory in use by the caches to be released, and confirming that this causes the amount of free memory reported to revert to initial values.

Resources