How do I get 'stress --hdd' to stop at end of '--timeout' period? - linux

I am using stress --hdd 1 --timeout 10s -v to stress the IO/disk on an Ubuntu system. Sometimes when running this command, I get something like:
stress: info: [16420] dispatching hogs: 0 cpu, 0 io, 0 vm, 1 hdd
...
stress: info: [16420] successful run completed in 13s
and other times I got something like:
...
stress: info: [16390] successful run completed in 31s
With iostat -yx 1 open in a separate terminal, I can indeed see that the %util and aqu-sz size stay high for roughly that longer-than-specified amount of time. I also see the mmcqd/0 process at the top of top.
How can I get the IO stress to stop within +2 seconds of the --timeout argument given in the stress command?
I have tried stuff like timeout 10 stress --hdd 1 --timeout 10s -v; pkill stress; pkill mmcqd;
without much success. Is there a way to purge the IO queue?
Thanks!

Related

What does `--oom-kill-disable` do for a Docker container?

I have understood that docker run -m 256m --memory-swap 256m will limit a container so that it can use at most 256 MB of memory and no swap. If it allocates more, then a process in the container (not "the container") will be killed. For example:
$ sudo docker run -it --rm -m 256m --memory-swap 256m \
stress --vm 1 --vm-bytes 2000M --vm-hang 0
stress: info: [1] dispatching hogs: 0 cpu, 0 io, 1 vm, 0 hdd
stress: FAIL: [1] (415) <-- worker 7 got signal 9
stress: WARN: [1] (417) now reaping child worker processes
stress: FAIL: [1] (421) kill error: No such process
stress: FAIL: [1] (451) failed run completed in 1s
Apparently one of the workers allocates more memory than is allowed and receives a SIGKILL. Note that the parent process stays alive.
Now if the effect of -m is to invoke the OOM killer if a process allocates too much memory, then what happens when specifying -m and --oom-kill-disable? Trying it like above has the following result:
$ sudo docker run -it --rm -m 256m --memory-swap 256m --oom-kill-disable \
stress --vm 1 --vm-bytes 2000M --vm-hang 0
stress: info: [1] dispatching hogs: 0 cpu, 0 io, 1 vm, 0 hdd
(waits here)
In a different shell:
$ docker stats
CONTAINER CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
f5e4c30d75c9 0.00% 256 MiB / 256 MiB 100.00% 0 B / 508 B 0 B / 0 B 2
$ top
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
19391 root 20 0 2055904 262352 340 D 0.0 0.1 0:00.05 stress
I see the docker stats shows a memory consumption of 256 MB, and top shows a RES of 256 MB and a VIRT of 2000 MB. But, what does that actually mean? What will happen to a process inside the container that tries to use more memory than allowed? In which sense it is constrained by -m?
As i understand the docs --oom-kill-disable is not constrained by -m but actually requires it:
By default, kernel kills processes in a container if an out-of-memory
(OOM) error occurs. To change this behaviour, use the
--oom-kill-disable option. Only disable the OOM killer on containers where you have also set the -m/--memory option. If the -m flag is not
set, this can result in the host running out of memory and require
killing the host’s system processes to free memory.
A developer stated back in 2015 that
The host can run out of memory with or without the -m flag set. But
it's also irrelevant as --oom-kill-disable does nothing unless -m is
passed.
In regard to your update, what happens when OOM-killer is disabled and yet the memory limit is hit (intresting OOM article), id say that new calls to malloc and such will just fail as described here but it also depends on the swap configuration and the hosts available memory. If your -m limit is above the actual available memory, the host will start killing processes, one of which might be the docker daemon (which they try to avoid by changing its OOM priority).
The kernel docs (cgroup/memory.txt) say
If OOM-killer is disabled, tasks under cgroup will hang/sleep in
memory cgroup's OOM-waitqueue when they request accountable memory
For the actual implementation (which docker utilizes as well) of cgroups, youd have to check the sourcecode.
The job of the 'oom killer' in Linux is to sacrifice one or more processes in order to free up memory for the system when all else fails. OOM killer is only enabled if the host has memory overcommit enabled
The setting of --oom-kill-disable will set the cgroup parameter to disable the oom killer for this specific container when a condition specified by -m is met. Without the -m flag, oom killer will be irrelevant.
The -m flag doesn’t mean stop the process when it uses more than xmb of ram, it’s only that you’re ensuring that docker container doesn’t consume all host memory, which can force the kernel to kill its process. With -m flag, the container is not allowed to use more than a given amount of user or system memory.
When container hits OOM, it won’t be killed but it can hang and stay in defunct state hence processes inside the container can’t respond until you manually intervene and do a restart or kill the container. Hope this helps clear your questions.
For more details on how kernel act on OOM, check Linux OOM management and Docker memory Limitations page.

Why is the system CPU time (% sy) high?

I am running a script that loads big files. I ran the same script in a single core OpenSuSe server and quad core PC. As expected in my PC it is much more faster than in the server. But, the script slows down the server and makes it impossible to do anything else.
My script is
for 100 iterations
Load saved data (about 10 mb)
time myscript (in PC)
real 0m52.564s
user 0m51.768s
sys 0m0.524s
time myscript (in server)
real 32m32.810s
user 4m37.677s
sys 12m51.524s
I wonder why "sys" is so high when i run the code in server. I used top command to check the memory and cpu usage.
It seems there is still free memory, so swapping is not the reason. % sy is so high, its probably the reason for the speed of server but I dont know what is causing % sy so high. The process that is using highest percent of CPU (99%) is "myscript". %wa is zero in the screenshot but sometimes it gets very high (50 %).
When the script is running, load average is greater than 1 but have never seen to be as high as 2.
I also checked my disc:
strt:~ # hdparm -tT /dev/sda
/dev/sda:
Timing cached reads: 16480 MB in 2.00 seconds = 8247.94 MB/sec
Timing buffered disk reads: 20 MB in 3.44 seconds = 5.81 MB/sec
john#strt:~> df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 245G 102G 131G 44% /
udev 4.0G 152K 4.0G 1% /dev
tmpfs 4.0G 76K 4.0G 1% /dev/shm
I have checked these things but I am still not sure what is the real problem in my server and how to fix it. Can anyone identify a probable reason for the slowness? What could be the solution?
Or is there anything else I should check?
Thanks!
You're getting a high sys activity because the load of the data you're doing takes system calls that happen in kernel. To resolve your slowness problems without upgrading the server might be possible. You can modify scheduling priority. See the man pages for nice and renice. See here and especially:
Niceness values range from -20 (the highest priority, lowest niceness) and 19 (the lowest priority, highest niceness).
$ ps -lp 941
F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
4 S 0 941 1 0 70 -10 - 1713 poll_s ? 00:00:00 sshd
$ nice -n 19 ./test.sh
My niceness value is 19
$ renice -n 10 -p 941
941 (process ID) old priority -10, new priority 10

Siege aborted due to excessive socket failure

I have encountered this problem whilst trying to run off the following cmd from siege on Mac OS X 10.8.3.
siege -d1 -c 20 -t2m -i -f -r10 urls.txt
The output from Siege is the following:
** SIEGE 2.74
** Preparing 20 concurrent users for battle.
The server is now under siege...
done.
siege aborted due to excessive socket failure; you
can change the failure threshold in $HOME/.siegerc
Transactions: 0 hits
Availability: 0.00 %
Elapsed time: 27.04 secs
Data transferred: 0.00 MB
Response time: 0.00 secs
Transaction rate: 0.00 trans/sec
Throughput: 0.00 MB/sec
Concurrency: 0.00
Successful transactions: 0
Failed transactions: 1043
Longest transaction: 0.00
Shortest transaction: 0.00
FILE: /usr/local/var/siege.log
You can disable this annoying message by editing
the .siegerc file in your home directory; change
the directive 'show-logfile' to false.
The problem may be that you run out of ephemeral ports. To remedy that, either expand the number of ports to use, or reduce the duration that ports stay in TIME_WAIT, or both.
Expand the usable ports:
Check your current setting:
$ sudo sysctl net.inet.ip.portrange.hifirst
net.inet.ip.portrange.hifirst: 49152
Set it lower to expand your window:
$ sudo sysctl -w net.inet.ip.portrange.hifirst=32768
net.inet.ip.portrange.hifirst: 49152 -> 32768
(hilast should already be at the max, 65536)
Reduce the maximum segment lifetime
$ sudo sysctl -w net.inet.tcp.msl=1000
net.inet.tcp.msl: 15000 -> 1000
I had this error too. Turned out my URI's were faulty. Most of them returned a 404 or 500 status. When I fixed the uri's all went well.

why is cygwin so slow

I run a script on Ubuntu, and tested its time:
$ time ./merger
./merger 0.02s user 0.03s system 99% cpu 0.050 total
it spent less than 1 second.
but if I used cygwin:
$ time ./merger
real 3m22.407s
user 0m0.367s
sys 0m0.354s
It spent more than 3 minutes.
Why did this happen? What shall I do to increase the executing speed on cygwin?
As others have already mentioned, Cygwin's implementation of fork and process spawning on Windows in general are slow.
Using this fork() benchmark, I get following results:
rr-#cygwin:~$ ./test 1000
Forked, executed and destroyed 1000 processes in 5.660011 seconds.
rr-#arch:~$ ./test 1000
Forked, executed and destroyed 1000 processes in 0.142595 seconds.
rr-#debian:~$ ./test 1000
Forked, executed and destroyed 1000 processes in 1.141982 seconds.
Using time (for i in {1..10000};do cat /dev/null;done) to benchmark process spawning performance, I get following results:
rr-#work:~$ time (for i in {1..10000};do cat /dev/null;done)
(...) 19.11s user 38.13s system 87% cpu 1:05.48 total
rr-#arch:~$ time (for i in {1..10000};do cat /dev/null;done)
(...) 0.06s user 0.56s system 18% cpu 3.407 total
rr-#debian:~$ time (for i in {1..10000};do cat /dev/null;done)
(...) 0.51s user 4.98s system 21% cpu 25.354 total
Hardware specifications:
cygwin: Intel(R) Core(TM) i7-3770K CPU # 3.50GHz
arch: Intel(R) Core(TM) i7-4790K CPU # 4.00GHz
debian: Intel(R) Core(TM)2 Duo CPU T5270 # 1.40GHz
So as you see, no matter what you use, Cygwin will always operate worse. It loses hands down even to worse hardware (cygwin vs. debian in this benchmark, as per this comparison).

Getting CPU utilization information

How could I get the CPU utilization with time info of a process in linux? Basically I want to let my application run overnight. At the same time, I would like to monitor the CPU utilization during the period the application is run.
I tried top | grep appName >& log, it does not seem to return me anything in the log. Could someone help me with this?
Thanks.
vmstat and iostat can both give you periodic information of this nature; I would suggest either setting up the number of times manually, or putting a single poll into a cron job, and then redirecting the output to a file:
vmstat 20 4230 >> cpu_log_file
This would give you a snapshot of usage every 20 seconds for 24 hours.
install sysstat package and run sar
nohup sar -o output.file 12 8 >/dev/null 2>&1 &
use the top or watch command
PID COMMAND %CPU TIME #TH #WQ #PORT #MREG RPRVT RSHRD RSIZE VPRVT VSIZE PGRP PPID STATE UID FAULTS COW MSGSENT MSGRECV SYSBSD SYSMACH CSW PAGEINS USER
10764 top 8.4 00:01.04 1/1 0 24 33 2000K 244K 2576K 17M 2378M 10764 10719 running 0 9908+ 54 564790+ 282365+ 3381+ 283412+ 838+ 27 root
10763 taskgated 0.0 00:00.00 2 0 25 27 432K 244K 1004K 27M 2387M 10763 1 sleeping 0 376 60 140 60 160 109 11 0 root
Write a program that invokes your process and then calls getrusage(2) and reports statistics for its children.
You can monitor the time used by your program with top while it is running.
Alternatively, you can launch your application with the time command, which will print the total amount of CPU time used by your program at the end of its execution. Just type time ./my_app instead of just ./my_app
For more info, man 1 time

Resources