Running perf-top in a script - linux

I have some intermittent performance issues that I want to capture via perf top. The issue is intermittent, so I want to write a script that runs perf top when the issue is occurring so that I can save the data and view it at a later time.
I can't seem to figure out how to get perf top to put its output in a file, it seems to demand to be run interactively. Here is what I've tried so far:
# timeout 10 perf top --stdio -E 20 > 'perf-top'
This does not kill perf, just leaves it running in the background forever until I create another console session, find the PID, and kill it.
# timeout --signal=9 10 perf top --stdio -E 20 > 'perf-top'
This kills perf in the expected 10 seconds, but the output is not written to the file that I specified.
Is there some special way that this command needs to be run? It works if I run it from an interactive ssh session, but I'd really like to be able to run it from a script. I'm trying to put it into an ansible task with a few other metrics-gathering programs.

SIGKILL (9) isn't catchable, so it's impossible for perf to flush any buffered output.
Use any another signal so perf can clean up after receiving the signal and write output.
If the default SIGTERM doesn't work, then maybe try SIGHUP, SIGINT, or others.
timeout --signal=INT
timeout 10 perf top --stdio -E 20 > perf-top (with the default SIGTERM) works for me (as non-root) on my Arch Linux desktop. perf version 5.0.g1c163f4

Related

Two xwin-xdg-menu processes with high CPU consumption

I have a Windows 7 computer with intel i7 with 2 cores and hyperthreading and a linux virtual machine in a cloud. I don't like VNC (it's laggy) so I use X windowing.
I start my Cygwin XWin with the following command:
C:\cygwin64\bin\run.exe --quote /usr/bin/bash.exe -l -c "cd; /usr/bin/xinit /etc/X11/xinit/startxwinrc -- /usr/bin/XWin :0 -multiwindow -listen tcp"
It's working otherwise just as intended but for some reason it's spawning two xwin-xdg-menu processes of which the other one is consuming 25% of my CPU. When I kill it, the CPU usage returns to normal and everything is working fine, including the other xwin-xdg-menu process.
I tried also this:
C:\cygwin64\bin\XWin.exe :0 -multiwindow -listen tcp
but it makes the application run slowly and with a bad resolution.
Is there a way to start X with listen-tcp with an adapted resolution to my multiple screens I have and without having to manually kill the extra process every time?
It seems I'm not the only one with this problem but for now I haven't found any solution to this.
https://cygwin.com/ml/cygwin/2017-05/msg00345.html
https://superuser.com/questions/1210325/cygwin-at-spi-bus-launcher-and-xwin-xdg-menu-high-cpu (I don't have problems with at-spi-bus-launcher though)
Solution:
Create a ~/.startxwinrc file, and add one line:
exec sleep infinity
Make ~/.startxwinrc executable by running chmod +x ~/.startxwinrc.
Reason I suspect this worked:
startxwin searches for a ~/.startxwinrc file to execute when launching. If startxwin does not find a ~/.startxwinrc file, startxwin will follow the default routine outlined in /etc/X11/xinit/startxwinrc.
The default routine launches /usr/bin/xwin-xdg-menu, somehow causing me to have two xwin-xdg-menu processes, one of them with very high cpu. Creating ~/.startxwinrc bypasses the default routine, disabling /usr/bin/xwin-xdg-menu from launching altogether.
exec sleep infinity keeps the x server alive after launching.

strace on Linux not logging all calls to open()

I am using strace to capture calls to open(), close() and read() on Linux. The target process is the jetty web server. As far as I can tell, strace is not logging all calls to open(). Maybe the others too, I have not tried to correlate the file descriptors to open() calls.
For example, starting strace:
strace -f -e trace=open,close,read -o/tmp/strace.out -p62881
I then use wget to fetch 100 static files; all were retrieved successfully. In one run, only 56 open events were logged; on another run of 100 different files, I got 66 open events.
I believe that using "-f" results in strace attaching to all the LWPIDs for the threads ("Process 62881 attached with 25 threads - interrupt to quit
"); when I try to explicitly attach to all using multiple "-p" options, I get a single "attach" success message, but multiple "Operation not permitted messages", one for each child PID.
I restarted Jetty to clear its cache before my tests.
Kernel version is 2.6.32-504.3.3.el6.x86_64 (Red Hat). Strace package version is strace-4.5.19-1.19.el6.x86_64.
What am I missing?
Thanks
On some systems you have to use openat() instead of open().
Try:
strace -f -e trace=openat,close,read -o/tmp/strace.out -p62881
Try -ff (in addition to -f):
-ff: If the -o filename option is in effect, each processes trace is written to filename.pid where pid is
the numeric process id of each process. This is incompatible with -c, since no per-process counts are
kept.

Simulating a process stuck in a blocking system call

I'm trying to test a behaviour which is hard to reproduce in a controlled environment.
Use case:
Linux system; usually Redhat EL 5 or 6 (we're just starting with RHEL 7 and systemd, so it's currently out of scope).
There're situations where I need to restart a service. The script we use for stopping the service usually works quite well; it sends a SIGTERM to the process, which is designed to handle it; if the process doesn't handle the SIGTERM within a timeout (usually a couple of minutes) the script sends a SIGKILL, then waits a couple minutes more.
The problem is: in some (rare) situations, the process doesn't exit after a SIGKILL; this usually happens when it's badly stuck on a system call, possibly because of a kernel-level issue (corrupt filesystem, or not-working NFS filesystem, or something equally bad requiring manual intervention).
A bug arose when the script didn't realize that the "old" process hadn't actually exited and started a new process while the old was still running; we're fixing this with a stronger locking system (so that at least the new process doesn't start if the old is running), but I find it difficult to test the whole thing because I haven't found a way to simulate an hard-stuck process.
So, the question is:
How can I manually simulate a process that doesn't exit when sending a SIGKILL to it, even as a privileged user?
If your process are stuck doing I/O, You can simulate your situation in this way:
lvcreate -n lvtest -L 2G vgtest
mkfs.ext3 -m0 /dev/vgtest/lvtest
mount /dev/vgtest/lvtest /mnt
dmsetup suspend /dev/vgtest/lvtest && dd if=/dev/zero of=/mnt/file.img bs=1M count=2048 &
In this way the dd process will stuck waiting for IO and will ignore every signal, I know the signals aren't ignore in the latest kernel when processes are waiting for IO on nfs filesystem.
Well... How about just not sending SIGKILL? So your env will behave like it was sent, but the process didn't quit.
Once a proces is in "D" state (or TASK_UNINTERRUPTIBLE) in a kernel code path where the execution can not be interrupted while a task is processed, which means sending any signals to the process would not be useful and would be ignored.
This can be caused due to device driver getting too many interrupts from the hardware, getting too many incoming network packets, data from NIC firmware or blocked on a HDD performing I/O. Normally if this happens very quickly and threads remain in this state for very short span of time.
Therefore what you need to be doing is look at the syslog and sar reports during the time when the process was stuck in D-state. If you find stack traces in the log, try to search kernel.bugzilla.org for similar issues or seek support from the Linux vendor.
I would code the opposite way. Have your server process write its pid in e.g. /var/run/yourserver.pid (this is common practice). Have the starting script read that file and test that the process does not exist e.g. with kill of signal 0, or with
yourserver_pid=$(cat /var/run/yourserver.pid)
if [ -f /proc/$yourserver_pid/exe ]; then
You could improve that by readlink /proc/$yourserver_pid/exe and comparing that to /usr/bin/yourserver
BTW, having a process still alive a few seconds after a SIGKILL is a serious situation (the common case when it could happen is if the process is stuck in a D state, waiting for some NFS server), and you probably should detect and syslog it (e.g. with logger in your script).
I also would try to first send SIGTERM, wait a few seconds, send SIGQUIT, wait a few seconds, and at last send SIGKILL and only a few seconds later test that the server process has gone
A bug arose when the script didn't realize that the "old" process hadn't actually exited and started a new process while the old was still running;
This is the bug in the OS/kernel level, not in your service script. The situation is rare and is hard to simulate because the OS is supposed to kill the process when SIGKILL signal happens. So I guess your goal is to let your script work well under a buggy kernel. Is that correct?
You can attach gdb to the process, SIGKILL won't remove such process from processlist but it will flag it as zombie, which might still be acceptable for your purpose.
void#tahr:~$ ping 8.8.8.8 > /tmp/ping.log &
[1] 3770
void#tahr:~$ ps 3770
PID TTY STAT TIME COMMAND
3770 pts/13 S 0:00 ping 8.8.8.8
void#tahr:~$ sudo gdb -p 3770
...
(gdb)
Other terminal
void#tahr:~$ ps 3770
PID TTY STAT TIME COMMAND
3770 pts/13 t 0:00 ping 8.8.8.8
sudo kill -9 3770
...
void#tahr:~$ ps 3770
PID TTY STAT TIME COMMAND
3770 pts/13 Z 0:00 [ping] <defunct>
First terminal again
(gdb) quit

Running 'top' command of linux for few minutes and then come out of system command in perl

I am new to Perl. I am working on a web UI where I have to give CPU and memory monitoring data, so I am using top and gnuplot command. I am able to do it through my terminal. but when I am executing same commands with a perl script its not working. The problem I am having is that whenever I am executing top command in my terminal than I have to wait for few minutes and then I have to plot it using GNUPLOT but when I am doing the same work using system command in perl I am unable to give that delay.
Here is what I am doing
system "top -p 1758 -b -d5 | tee -a stats.log";
system "./top_stats.sh -f stats.log"
Here 1758 is the app ID whose top data I have to monitor and stats.log file is the one where I am saving logs and then using this stats.log file as input to top_stats.sh script I am plotting graphs. This top_stats.sh script takes the log file and uses gnuplot to plot data
Now the problem is whenever I am executing this first system command in terminal I have to wait for some time say 2 to 3 minutes to have ample number of data points and then I have to press Ctrl+C to come out of top command and then run the script. but here as soon as this first system command is encountered it is executed and then next command is executed without any delay so I am not getting any data points to plot the graph. Is there any way I can execute my first command and wait for 3 minutes without coming out of system command and then execute the next command.??
Is there any way I can execute my first command and wait for 3 minutes without coming out of system command and then execute the next command.??
Why not specify the number of iteration for top. So your command will rougly run for (N-1)*5 seconsds. 37 iterations with an interval of 5 seconds are going to take 36*5 seconds:
system "top -p 1758 -b -d5 -n37 | tee -a stats.log && ./top_stats.sh -f stats.log";

linux - running process background

I want to run a process in a remote linux server and keep that process alive after close the putty terminal,
what is the correct command?
You have two options:
Use GNU screen, which will allow you to run the command and detach it from your terminal, and later re-attach it to a different session. I use it for long-running processes whose output I want to be able to monitor at any time. Screen is a truly powerful tool and I would highly recommend spending some time to learn it.
Run the command as nohup some-command &, which will run the command in the background, detach it from the console, and redirect its output into nohup.out. It will swallow SIGHUPs that are sent to the process. (When you close the terminal or log out, SIGHUP is sent to all processes that were started by the login shell, and the default action the kernel will take is to kill the process off. This is why appending & to put the process in the background is not enough for it to survive a logout.)
don't use that nohup junk, i hate seeing that on servers; screen is a wasting pile of bits and rot -- use tmux.
if you want to background a process, double fork like every other daemon since the beginning of time:
# ((exec sleep 30)&)
# grep PPid /proc/`pgrep sleep`/status
PPid: 1
# jobs
# disown
bash: disown: current: no such job
enjoy.
The modern and easy to use approach that allows managing multiple processes and has a nice terminal UI is hapless utility.
Install with pip install hapless (or python3 -m pip install hapless) and just run
$ hap run my-command # e.g. hap run python my_long_running_script.py
$ hap status # check all the launched processes
See docs for more info.
A command launched enclosed in parenthesis
(command &)
will survive the death of the originating shell.

Resources