I'm running Valgrind's Helgrind tool on a program in a script.
Here's the relevant part of the script:
(The only line I wrote is the first one)
sudo valgrind --tool=helgrind ./core-linux.bin --reset PO 2>> ../Test_CFE_SB/valgrindLog.txt &
PID=$!
printf "\n" >> ../Test_CFE_SB/valgrindLog.txt
sleep $sleepTime
#did it crash?
ps ax | grep $PID | grep -vc grep
RESULT=$?
if [ $RESULT -eq 0 ]
then
sudo kill $PID
echo "Process killed by buildscript."
else
echo $name >> crash.log
OS: 32 bit XUbuntu 14.04
The program helgrind is running on, core-linux.bin, does not shut down by it self, like a server. Runs until it gets a kill command.
What happens is that the program shuts down after the kill $PID command but Helgrind keeps going in the background taking about 94% of the CPU according to top. I then have to kill it using kill -9 and valgrindLog.txt only contains the starting message from Valgrind, no report or anything. I have let it run through the night with the same result so it's not that it's just slow.
I ran the exact same script except used --tool=memcheck instead and that runs perfectly well. valgrindLog.txt contains everything it should and all is well there. Same if I use --tool=drd, all good. But helgrind doesn't want to play ball and unfortunately I'm not so familiar with Valgrind that I can figure this out on my own, so far at least.
To see what your application is doing under Valgrind/helgrind,
you can attach using gdb+vgdb and examine if your program advances
or else, where it stays blocked.
If you cannot attach, then it means that Valgrind is running in its own
code, and that might be a valgrind/helgrind bug.
If you have a small reproducer, file a bug in valgrind bugzilla
Related
Hi all/ I'm trying to make something like parallel tool for shell simply because the functionality of parallel is not enough for my task. The reason is that I need to run different versions of compiler.
Imagine that I need to compile 12 programs with different compilers, but I can run only 4 of them simultaneously (otherwise PC runs out of memory and crashes :). I also want to be able to observe what's going on with each compile, therefore I execute every compile in new window.
Just to make it easier here I'll replace compiler that I run with small script that waits and returns it's process id sleep.sh:
#!/bin/bash
sleep 30
echo $$
So the main script should look like parallel_run.sh :
#!/bin/bash
for i in {0..11}; do
xfce4-terminal -H -e "./sleep.sh" &
pids[$i]=$!
pstree -p $pids
if (( $i % 4 == 0 ))
then
for pid in ${pids[*]}; do
wait $pid
done
fi
done
The problem is that with $! I get pid of xfce4-terminal and not the program it executes. So if I look at ptree of 1st iteration I can see output from main script:
xfce4-terminal(31666)----{xfce4-terminal}(31668)
|--{xfce4-terminal}(31669)
and sleep.sh says that it had pid = 30876 at that time. Thus wait doesn't work at all in this case.
Q: How to get right PID of compiler that runs in subshell?
Maybe there is the other way to solve task like this?
It seems like there is no way to trace PID from parent to child if you invoke process in new xfce4-terminal as terminal process dies right after it executed given command. So I came to the solution which is not perfect, but acceptable in my situation. I run and put compiler's processes in background and redirect output to .log file. Then I run tail on these logfiles and I kill all tails which belongs to current $USER when compilers from current batch are done, then I run the other batch.
#!/bin/bash
for i in {1..8}; do
./sleep.sh > ./process_$i.log &
prcid=$!
xfce4-terminal -e "tail -f ./process_$i.log" &
pids[$i]=$prcid
if (( $i % 4 == 0 ))
then
for pid in ${pids[*]}; do
wait $pid
done
killall -u $USER tail
fi
done
Hopefully there will be no other tails running at that time :)
I am currently working on a NFC system using the ACR122U reader and not using the constructor drivers which lead to some occasionnal crashes of those drivers.
The problem here is that when it crashes, the whole process isn't crashed, my program keep running but the drivers don't. (No need to say that it makes my code useless)
I am aware of the ways to restart a crashed program but not crashed drivers. I thought of using a watchdog to hard reset the raspberry but needless to say that a reboot isn't the best choice because of the time it takes. ( I am using the very first Raspberry).
So, is there a way to reboot only the driver and more important, detect when it fails ?
I found a solution to my own problem after many hours of research and trials. The solution is actually very simple : just a background running script (my program in my case), and a check using grep, every two seconds :
#!/usr/bin/env bash
command="/your/path/to/your_script"
log="prog.log"
match="error libnfc"
matchnosp="$(echo -e "${match}" | tr -d '[:space:]')"
$command > "$log" 2>&1 &
pid=$!
while sleep 2
do
if fgrep --quiet "$matchnosp" "$log"
then
echo "SOME MESSAGE"
kill $(pidof your_script)
$command > "$log" 2>&1 &
sleep 5
truncate -s 0 $log
echo "SOME OTHER MESSAGE..."
fi
done
This restart the program when some message matching "error libnfc" is found in the log file.
Writing a small bash script.
I have a command that spawns a process then returns, leaving the spawned process running. I need to wait for the spawned process to terminate, then run some commands. How can I do this?
The specific case is:
VBoxManage startvm "my_vm"
#when my_vm closes
do_things
However, I've encountered this issue before in other contexts, so if possible I'm looking for a general solution, rather than just one relating to virtualbox vm's.
I have an answer, and it's not pretty
tail --pid=$pid -f /dev/null
In the context of a VBox vm, the following heroic one-liner has proved successful.
VBoxManage startvm "my_vm"; tail --pid=$(awk '/Process ID:/ {print $4;}' /path_to/my_vm/Logs/VBox.log) -f /dev/null; echo hello
Running this, I was able to see that 'hello' was not output until after my_vm had shut down.
Good grief, there needs to be a better way of doing that than a side effect of an option of a command out of left-field. Any better answers? Please....
Unfortunately, I do not think you can generalize VirtualBox (or any process that spawns processes in the background) to one-size-fits-all solution.
Here is a VirtualBox specific answer.
As you noticed VBox runs the machine (and a bunch other things) in the background. But you can query VBoxManage for what is running:
vm_name="my_vm"
VBoxManage startvm $vm_name
while [[ `VBoxManage list runningvms | grep $vm_name` ]]; do
echo Sleeping ...
sleep 5
done
echo Done!
do_things
Hopefully you can tune this to your specific needs.
I am trying to find a way to monitor a process. If the process is not running it should be checked again to make sure it has really crashed. If it has really crashed run a script (start.sh)
I have tried monit with no succes, I have also tried adding this script in crontab: I made it executable with chmod +x monitor.sh
the actual program is called program1
case "$(pidof program | wc -w)" in
0) echo "Restarting program1: $(date)" >> /var/log/program1_log.txt
/home/user/files/start.sh &
;;
1) # all ok
;;
*) echo "Removed double program1: $(date)" >> /var/log/program1_log.txt
kill $(pidof program1 | awk '{print $1}')
;;
esac
The problem is this script does not work, I added it to crontab and set it to run every 2 minutes. If I close the program it won't restart.
Is there any other way to check a process, and run start.sh when it has crashed?
Not to be rude, but have you considered a more obvious solution?
When a shell (e.g. bash or tcsh) starts a subprocess, by default it waits for that subprocess to complete.
So why not have a shell that runs your process in a while(1) loop? Whenever the process terminates, for any reason, legitimate or not, it will automatically restart your process.
I ran into this same problem with mythtv. The backend keeps crashing on me. It's a Heisenbug. Happens like once a month (on average). Very hard to track down. So I just wrote a little script that I run in an xterm.
The, ahh, oninter business means that control-c will terminate the subprocess and not my (parent-process) script. Similarly, the sleep is in there so I can control-c several times to kill the subprocess and then kill the parent-process script while it's sleeping...
Coredumpsize is limited just because I don't want to fill up my disk with corefiles that I cannot use.
#!/bin/tcsh -f
limit coredumpsize 0
while( 1 )
echo "`date`: Running mythtv-backend"
# Now we cannot control-c this (tcsh) process...
onintr -
# This will let /bin/ls directory-sort my logfiles based on day & time.
# It also keeps the logfile names pretty unique.
mythbackend |& tee /....../mythbackend.log.`date "+%Y.%m.%d.%H.%M.%S"`
# Now we can control-c this (tcsh) process.
onintr
echo "`date`: mythtv-backend exited. Sleeping for 30 seconds, then restarting..."
sleep 30
end
p.s. That sleep will also save you in the event your subprocess dies immediately. Otherwise the constant respawning without delay will drive your IO and CPU through the roof, making it difficult to correct the problem.
I would like a "system" that monitors a process and would kill said process if:
the process exceeds some memory requirements
the process does not respond to a message from the "system" in some period of time
I assume this "system" could be something as simple as a monitoring process? A code example of how this could be done would be useful. I am of course not averse to a completely different solution to this problem.
For the first requirement, you might want to look into either using ulimit, or tweaking the kernel OOM-killer settings on your system.
Monitoring daemons exist for this sort of thing as well. God is a recent example.
I wrote a script that runs as a cron job and can be customized to kill problem processes:
#!/usr/local/bin/perl
use strict;
use warnings;
use Proc::ProcessTable;
my $table = Proc::ProcessTable->new;
for my $process (#{$table->table}) {
# skip root processes
next if $process->uid == 0 or $process->gid == 0;
# skip anything other than Passenger application processes
#next unless $process->fname eq 'ruby' and $process->cmndline =~ /\bRails\b/;
# skip any using less than 1 GiB
next if $process->rss < 1_073_741_824;
# document the slaughter
(my $cmd = $process->cmndline) =~ s/\s+\z//;
print "Killing process: pid=", $process->pid, " uid=", $process->uid, " rss=", $process->rss, " fname=", $process->fname, " cmndline=", $cmd, "\n";
# try first to terminate process politely
kill 15, $process->pid;
# wait a little, then kill ruthlessly if it's still around
sleep 5;
kill 9, $process->pid;
}
https://www.endpointdev.com/blog/2012/08/automatically-kill-process-using-too/
To limit memory usage of processes, check /etc/security/limits.conf
Try Process Resource Monitor for a classic, easy-to-use process monitor. Code available under the GPL.
There's a few other monitoring scripts there you might find interesting too.
If you want to set up a fairly comprehensive monitoring system, check out monit. It can be very chatty at times, but it will do a lot of monitoring, restart services, alert you, etc.
That said, don't be surprised if you're getting dozens of e-mails a day until you get used to configuring it and telling it what not to bug you about.
I have a shell script here that could be your start point. I did it because I also had some issues with processes exceeding memory limit. Actually it just checks a given limit of CPU usage, but you can easily change to watch memory, or the jobs list for an idle process.
file: pkill.sh
#!/bin/bash
if [ -z "$1" ]
then
maxlimit=99
else
maxlimit=$1
fi
ps axo user,%cpu,pid,vsz,rss,uid,gid --sort %cpu,rss\
| awk -v max=$maxlimit '$6 != 0 && $7 != 0 && $2 > max'\
| awk '{print $3}'\
| while read line;\
do\
ps u --no-headers -p $line;\
echo "$(date) - $(ps u --no-headers -p $line)" >> pkill.log;\
notify-send 'Killing proccess!' $(ps -p $line -o command --no-headers | awk '{print $1}') -u normal -i dialog-warning -t 3000;\
kill $line;\
done;
Simple run it once like: sh ./pkill.sh <limit-cpu>
Or, to keep it running: watch -n 10 sh ./pkill.sh 90
In the case above it will keep running each 10 seconds, killing processes that exceeds 90% of CPU
Are the monitored processes ones you're writing, or just any process?
If they're arbitrary processes then it might be hard to monitor for responsiveness. Unless the process is already set up to handle and respond to events that you can send it, then I doubt you'll be able to monitor them. If they're processes that you're writing, you'd need to add some kind of message handling that you can use the check against.