Why is the system CPU time (% sy) high? - linux

I am running a script that loads big files. I ran the same script in a single core OpenSuSe server and quad core PC. As expected in my PC it is much more faster than in the server. But, the script slows down the server and makes it impossible to do anything else.
My script is
for 100 iterations
Load saved data (about 10 mb)
time myscript (in PC)
real 0m52.564s
user 0m51.768s
sys 0m0.524s
time myscript (in server)
real 32m32.810s
user 4m37.677s
sys 12m51.524s
I wonder why "sys" is so high when i run the code in server. I used top command to check the memory and cpu usage.
It seems there is still free memory, so swapping is not the reason. % sy is so high, its probably the reason for the speed of server but I dont know what is causing % sy so high. The process that is using highest percent of CPU (99%) is "myscript". %wa is zero in the screenshot but sometimes it gets very high (50 %).
When the script is running, load average is greater than 1 but have never seen to be as high as 2.
I also checked my disc:
strt:~ # hdparm -tT /dev/sda
/dev/sda:
Timing cached reads: 16480 MB in 2.00 seconds = 8247.94 MB/sec
Timing buffered disk reads: 20 MB in 3.44 seconds = 5.81 MB/sec
john#strt:~> df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 245G 102G 131G 44% /
udev 4.0G 152K 4.0G 1% /dev
tmpfs 4.0G 76K 4.0G 1% /dev/shm
I have checked these things but I am still not sure what is the real problem in my server and how to fix it. Can anyone identify a probable reason for the slowness? What could be the solution?
Or is there anything else I should check?
Thanks!

You're getting a high sys activity because the load of the data you're doing takes system calls that happen in kernel. To resolve your slowness problems without upgrading the server might be possible. You can modify scheduling priority. See the man pages for nice and renice. See here and especially:
Niceness values range from -20 (the highest priority, lowest niceness) and 19 (the lowest priority, highest niceness).
$ ps -lp 941
F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
4 S 0 941 1 0 70 -10 - 1713 poll_s ? 00:00:00 sshd
$ nice -n 19 ./test.sh
My niceness value is 19
$ renice -n 10 -p 941
941 (process ID) old priority -10, new priority 10

Related

Is there an equivalent for time([some command]) for checking peak memory usage of a bash command?

I want to figure out how much memory a specific command uses but I'm not sure how to check for the peak memory of the command. Is there anything like the time([command]) usage but for memory?
Basically, I'm going to have to run an interactive queue using SLURM, then test a command for a program I need to use for a single sample, see how much memory was used, then submit a bunch of jobs using that info.
Yes, time is the program that monitors programs and shows the Maximum resident set size. Not to be confused with time Bash builtin that only shows real/user/sys times. On my Arch Linux you have to install time with pacman -S time, it's a separate package.
$ command time -v echo 1
1
Command being timed: "echo 1"
User time (seconds): 0.00
System time (seconds): 0.00
Percent of CPU this job got: 0%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.00
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 1968
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 90
Voluntary context switches: 1
Involuntary context switches: 1
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
Note:
$ type time
time is a shell keyword
$ time -V
bash: -V: command not found
real 0m0.002s
user 0m0.000s
sys 0m0.002s
$ command time -V
time (GNU Time) 1.9
$ /bin/time -V
time (GNU Time) 1.9
$ /usr/bin/time -V
time (GNU Time) 1.9

Solving the SIGKILL killing signal

I'm trying to run a simulation on my local computer in university, but after some iterations it's being killed by a SIGKILL. Even when I check the available swap space it shows that still I have enough space !!!
:~$ free -m
total used free shared buffers cached
Mem: 3937 2091 1845 0 64 677
-/+ buffers/cache: 1349 2587
Swap: 3860 738 3122
The same story repeats when I use another server by ssh
:~$ free -m
total used free shared buffers cached
Mem: 129043 98281 30761 52 4 32901
-/+ buffers/cache: 65375 63668
Swap: 4095 120 3975
When I run it on my own laptop it works properly.
I'd really appreciate if help me out.
Are you checking the swap space after the fact or during the run? If there is a memory crunch the operating system's out of memory killer (OOM Killer) may kill the process ( depending on the configuration this could be the worst offender, random or anything else). Execute "sar" command and see the system state around the time your process got killed.

CentOs 7 fails to boot crash kernel and generate dump in /var/crash

We have an issue where our CentOS 7 server will not generate a kernel dump file in /var/crash upon Kernel panic. It appears the crash kernel never boots. We’ve followed the Rhel guide (http://red.ht/1sCztdv) on configuring crash dumps and at first glance everything appears to be configured correctly. We are triggering a panic like this:
echo 1 > /proc/sys/kernel/sysrq
echo c > /proc/sysrq-trigger
This causes the system to freeze. We get no messages on the console and the console becomes unresponsive. At this point I would imagine the system would boot a crash kernel and begin writing a dump out to /var/crash. I’ve left it in this frozen state for up to 30 minutes to give it time to complete the entire dump. However after a hard cold reboot /var/crash is empty.
Additionally, I've replicated the configuration in a KVM virtual machine and kdump words as expected. So there is either something wrong with my configuration on the physical system or something odd about that hardware config that causes the hang rather than the dump.
Our server is an HP G9 with 24 cores and 128GB of memory. Here are some other details:
[user#host]$ cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-3.10.0-123.el7.x86_64 root=UUID=287798f7-fe7a-4172-a35a-6a78051af4d2 ro rd.lvm.lv=vg_sda/lv_root vconsole.font=latarcyrheb-sun16 rd.lvm.lv=vg_sda/lv_swap crashkernel=auto vconsole.keymap=us rhgb nosoftlockup intel_idle.max_cstate=0 mce=ignore_ce processor.max_cstate=0 idle=mwait isolcpus=2-11,14-23
[user#host]$ systemctl is-active kdump
active
[user#host]$ cat /etc/kdump.conf
path /var/crash
core_collector makedumpfile -l --message-level 1 -d 31 -c
[user#host]$ cat /proc/iomem |grep Crash
2b000000-357fffff : Crash kernel
[user#host]$ dmesg|grep Reserving
[ 0.000000] Reserving 168MB of memory at 688MB for crashkernel (System RAM: 131037MB)
[user#host]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg_sda-lv_root 133G 4.7G 128G 4% /
devtmpfs 63G 0 63G 0% /dev
tmpfs 63G 0 63G 0% /dev/shm
tmpfs 63G 9.1M 63G 1% /run
tmpfs 63G 0 63G 0% /sys/fs/cgroup
/dev/sda1 492M 175M 318M 36% /boot
/dev/mapper/vg_sdb-lv_data 2.8T 145G 2.6T 6% /data
After modifying the following parameters we were able to reliably get crash dumps:
Changed crashkernel=auto to crashkernel=1G: I'm not sure why we need 1G as the formula indicated 128M+64M for every 1TB of ram.
/etc/sysconfig/kdump: Removed everything from KDUMP_COMMANDLINE_APPEND excpet irqpoll nr_cpus=1 resulting in: KDUMP_COMMANDLINE_APPEND="irqpoll nr_cpus=1
/etc/kdump.cfg: Add compression (“-c”) to makedump
Not 100% sure why this works but it does. Would love to know what others think
Eric
Eric,
1G seems a bit large. I've never seen anything larger than 200M for a normal server. Not sure about the sysconfig settings. Compression is a good idea but I don't think it would affect the issue since you're target is close to total memory and you're only dumping the kernel ring.

Unknown memory utilization in Ubuntu14.04 Trusty

I'm running Ubuntu Trusty 14.04 on a new machine with 8GB of RAM, and it seems to be locking up periodically and nothing is in syslog file. I've installed Nagios and have been watching the graphs, and it looks like memory is going high from 7% to 72% in just a span of 10 mins. Only node process are running on server. In top I found all process are running very normal memory consumption. Even after stopping node process. Memory remains with same utilization.
free agrees, claiming I'm using more than 5.7G of memory:
free -h
total used free shared buffers cached
Mem: 7.8G 6.5G 1.3G 2.2M 233M 612M
-/+ buffers/cache: 5.7G 2.1G
Swap: 2.0G 0B 2.0G
This other formula for totaling the memory roughly agrees:
# ps -e -orss=,args= | sort -b -k1,1n | awk '{total = total + $1}END{print total}'
503612
If the processes only total 500 MiB, where's the rest of the memory going?
I've got solution on this... so just wanna to update the same...
echo 2 > /proc/sys/vm/drop_caches
This resolved my issue. So I have added the same in my cron for every 5 mins on each of ubuntu server

Getting CPU utilization information

How could I get the CPU utilization with time info of a process in linux? Basically I want to let my application run overnight. At the same time, I would like to monitor the CPU utilization during the period the application is run.
I tried top | grep appName >& log, it does not seem to return me anything in the log. Could someone help me with this?
Thanks.
vmstat and iostat can both give you periodic information of this nature; I would suggest either setting up the number of times manually, or putting a single poll into a cron job, and then redirecting the output to a file:
vmstat 20 4230 >> cpu_log_file
This would give you a snapshot of usage every 20 seconds for 24 hours.
install sysstat package and run sar
nohup sar -o output.file 12 8 >/dev/null 2>&1 &
use the top or watch command
PID COMMAND %CPU TIME #TH #WQ #PORT #MREG RPRVT RSHRD RSIZE VPRVT VSIZE PGRP PPID STATE UID FAULTS COW MSGSENT MSGRECV SYSBSD SYSMACH CSW PAGEINS USER
10764 top 8.4 00:01.04 1/1 0 24 33 2000K 244K 2576K 17M 2378M 10764 10719 running 0 9908+ 54 564790+ 282365+ 3381+ 283412+ 838+ 27 root
10763 taskgated 0.0 00:00.00 2 0 25 27 432K 244K 1004K 27M 2387M 10763 1 sleeping 0 376 60 140 60 160 109 11 0 root
Write a program that invokes your process and then calls getrusage(2) and reports statistics for its children.
You can monitor the time used by your program with top while it is running.
Alternatively, you can launch your application with the time command, which will print the total amount of CPU time used by your program at the end of its execution. Just type time ./my_app instead of just ./my_app
For more info, man 1 time

Resources