Problem:
I'm writing program in golang on linux that needs to execute long running process so that:
I redirect stdout of running process to file.
I control the user of process.
Process doesn't die when my program exits.
The process doesn't become a zombie when it crashes.
I get PID of running process.
I'm running my program with root permissions.
Attempted solution:
func Run(pathToBin string, args []string, uid uint32, stdLogFile *os.File) (int, error) {
cmd := exec.Command(pathToBin, args...)
cmd.SysProcAttr = &syscall.SysProcAttr{
Credential: &syscall.Credential{
Uid: uid,
},
}
cmd.Stdout = stdLogFile
if err := cmd.Start(); err != nil {
return -1, err
}
go func() {
cmd.Wait() //Wait is necessary so cmd doesn't become a zombie
}()
return cmd.Process.Pid, nil
}
This solution seems to satisfy almost all of my requirements except that when I send SIGTERM/SIGKILL to my program the underlying process crashes. In fact I want my background process to be as separate as possible: it has different parent pid, group pid etc. from my program. I want to run it as daemon.
Other solutions on stackoverflow suggested to use cmd.Process.Release() for similar use cases, but it doesn't seem to work.
Solutions which are not applicable in my case:
I have no control over code of process I'm running. My solution has to work for any process.
I can't use external commands to run it, just pure go. So using systemd or something similar is not applicable.
I can in fact use library that is easily importable using import from github etc.
TLDR;
Just use https://github.com/hashicorp/go-reap
There is a great Russian expression which reads "don't try to give birth to a bicycle" and it means don't reinvent the wheel and keep it simple. I think it applies here. If I were you, I'd reconsider using one of:
https://github.com/krallin/tini
https://busybox.net/
https://software.clapper.org/daemonize/
https://wiki.gentoo.org/wiki/OpenRC
https://www.freedesktop.org/wiki/Software/systemd/
This issue has already been solved ;)
Your question is imprecise or you are asking for non-standard features.
In fact I want my background process to be as separate as possible: it has different parent pid, group pid etc. from my program. I want to run it as daemon.
That is not how process inheritance works. You can not have process A start Process B and somehow change the parent of B to C. To the best of my knowledge this is not possible in Linux.
In other words, if process A (pid 55) starts process B (100), then B must have parent pid 55.
The only way to avoid that is have something else start the B process such as atd, crond, or something else - which is not what you are asking for.
If parent 55 dies, then PID 1 will be the parent of 100, not some arbitrary process.
Your statement "it has different parent pid" does not makes sense.
I want to run it as daemon.
That's excellent. However, in a GNU / Linux system, all daemons have a parent pid and those parents have a parent pid going all the way up to pid 1, strictly according to the parent -> child rule.
when I send SIGTERM/SIGKILL to my program the underlying process crashes.
I can not reproduce that behavior. See case8 and case7 from the proof-of-concept repo.
make case8
export NOSIGN=1; make build case7
unset NOSIGN; make build case7
$ make case8
{ sleep 6 && killall -s KILL zignal; } &
./bin/ctrl-c &
sleep 2; killall -s TERM ctrl-c
kill with:
{ pidof ctrl-c; pidof signal ; } | xargs -r -t kill -9
main() 2476074
bashed 2476083 (2476081)
bashed 2476084 (2476081)
bashed 2476085 (2476081)
zignal 2476088 (2476090)
go main() got 23 urgent I/O condition
go main() got 23 urgent I/O condition
zignal 2476098 (2476097)
go main() got 23 urgent I/O condition
zignal 2476108 (2476099)
main() wait...
p 2476088
p 2476098
p 2476108
p 2476088
go main() got 15 terminated
sleep 1; killall -s TERM ctrl-c
p 2476098
p 2476108
p 2476088
go main() got 15 terminated
sleep 1; killall -s TERM ctrl-c
p 2476098
p 2476108
p 2476088
Bash c 2476085 EXITs ELAPSED 4
go main() got 17 child exited
go main() got 23 urgent I/O condition
main() children done: 1 %!s(<nil>)
main() wait...
go main() got 15 terminated
go main() got 23 urgent I/O condition
sleep 1; killall -s KILL ctrl-c
p 2476098
p 2476108
p 2476088
balmora: ~/src/my/go/doodles/sub-process [main]
$ p 2476098
p 2476108
Bash _ 2476083 EXITs ELAPSED 6
Bash q 2476084 EXITs ELAPSED 8
The bash processes keep running after the parent is killed.
killall -s KILL ctrl-c;
All 3 "zignal" sub-processes are running until killed by
killall -s KILL zignal;
In both cases the sub-processes continue to run despite main process being signaled with TERM, HUP, INT. This behavior is different in a shell environment because of convenience reasons. See the related questions about signals. This particular answer illustrates a key difference for SIGINT. Note that SIGSTOP and SIGKILL cannot be caught by an application.
It was necessary to clarify the above before proceeding with the other parts of the question.
So far you have already solved the following:
redirect stdout of sub-process to a file
set owner UID of sub-process
sub-process survives death of parent (my program exits)
the PID of sub-process can be seen by the main program
The next one depends on whether the children are "attached" to a shell or not
sub-process survives the parent being killed
The last one is hard to reproduce, but I have heard about this problem in the docker world, so the rest of this answer is focused on addressing this issue.
sub-process survives if the parent crashes and does not become a zombie
As you have noted, the Cmd.Wait() is necessary to avoid creating zombies. After some experimentation I was able to consistency produce zombies in a docker environment using an intentionally simple replacement for /bin/sh. This "shell" implemented in go will only run a single command and not much else in terms of reaping children. You can study the code over at github.
The zombie solution
the simple wrapper which causes zombies
package main
func main() {
Sh()
}
The reaper wrapper
package main
import (
"fmt"
"sync"
"github.com/fatih/color"
"github.com/hashicorp/go-reap"
)
func main() {
if reap.IsSupported() {
done := make(chan struct{})
var reapLock sync.RWMutex
pids := make(reap.PidCh, 1)
errors := make(reap.ErrorCh, 1)
go reap.ReapChildren(pids, errors, done, &reapLock)
go report(pids, errors, done)
Sh()
close(done)
} else {
fmt.Println("Sorry, go-reap isn't supported on your platform.")
}
}
func report(pids reap.PidCh, errors reap.ErrorCh, done chan struct{}) {
sprintf := color.New(color.FgWhite, color.Bold).SprintfFunc()
for ;; {
select {
case pid := <-pids:
println(sprintf("raeper pid %d", pid))
case err := <-errors:
println(sprintf("raeper er %s", err))
case <-done:
return
}
}
}
The init / sh (pid 1) process which runs other commands
package main
import (
"os"
"os/exec"
"strings"
"time"
"github.com/google/shlex"
"github.com/tox2ik/go-poc-reaper/fn"
)
func Sh() {
args := os.Args[1:]
script := args[0:0]
if len(args) >= 1 {
if args[0] == "-c" {
script = args[1:]
}
}
if len(script) == 0 {
fn.CyanBold("cmd: expecting sh -c 'foobar'")
os.Exit(111)
}
var cmd *exec.Cmd
parts, _ := shlex.Split(strings.Join(script, " "))
if len(parts) >= 2 {
cmd = fn.Merge(exec.Command(parts[0], parts[1:]...), nil)
}
if len(parts) == 1 {
cmd = fn.Merge(exec.Command(parts[0]), nil)
}
if fn.IfEnv("HANG") {
fn.CyanBold("cmd: %v\n start", parts)
ex := cmd.Start()
if ex != nil {
fn.CyanBold("cmd %v err: %s", parts, ex)
}
go func() {
time.Sleep(time.Millisecond * 100)
errw := cmd.Wait()
if errw != nil {
fn.CyanBold("cmd %v err: %s", parts, errw)
} else {
fn.CyanBold("cmd %v all done.", parts)
}
}()
fn.CyanBold("cmd: %v\n dispatched, hanging forever (i.e. to keep docker running)", parts)
for {
time.Sleep(time.Millisecond * time.Duration(fn.EnvInt("HANG", 2888)))
fn.SystemCyan("/bin/ps", "-e", "-o", "stat,comm,user,etime,pid,ppid")
}
} else {
if fn.IfEnv("NOWAIT") {
ex := cmd.Start()
if ex != nil {
fn.CyanBold("cmd %v start err: %s", parts, ex)
}
} else {
ex := cmd.Run()
if ex != nil {
fn.CyanBold("cmd %v run err: %s", parts, ex)
}
}
fn.CyanBold("cmd %v\n dispatched, exit docker.", parts)
}
}
The Dockerfile
FROM scratch
# for sh.go
ENV HANG ""
# for sub-process.go
ENV ABORT ""
ENV CRASH ""
ENV KILL ""
# for ctrl-c.go, signal.go
ENV NOSIGN ""
COPY bin/sh /bin/sh ## <---- wrapped or simple /bin/sh or "init"
COPY bin/sub-process /bin/sub-process
COPY bin/zleep /bin/zleep
COPY bin/fork-if /bin/fork-if
COPY --from=busybox:latest /bin/find /bin/find
COPY --from=busybox:latest /bin/ls /bin/ls
COPY --from=busybox:latest /bin/ps /bin/ps
COPY --from=busybox:latest /bin/killall /bin/killall
Remaining code / setup can be seen here:
https://github.com/tox2ik/go-poc-reaper
Case 5 (simple /bin/sh)
The gist of it is we start two sub-processes from go, using the "parent" sub-process binary. The first child is zleep and the second fork-if. The second one starts a "daemon" that runs a forever-loop in addition to a few short-lived threads. After a while, we kill the sub-procss parent, forcing sh to take over the parenting for these children.
Since this simple implementation of sh does not know how to deal with abandoned children, the children become zombies.
This is standard behavior. To avoid this, init systems are usually responsible for cleaning up any such children.
Check out this repo and run the cases:
$ make prep build
$ make prep build2
The first one will use the simple /bin/sh in the docker container, and the socond one will use the same code wrapped in a reaper.
With zombies:
$ make prep build case5
(…)
main() Daemon away! 16 (/bin/zleep)
main() Daemon away! 22 (/bin/fork-if)
(…)
main() CRASH imminent
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x49e45c]
goroutine 1 [running]:
main.main()
/home/jaroslav/src/my/go/doodles/sub-process/sub-process.go:137 +0xfc
cmd [/bin/sub-process /log/case5 3 /bin/zleep 111 2 -- /dev/stderr 3 /bin/fork-if --] err: exit status 2
Child '1' done
thread done
STAT COMMAND USER ELAPSED PID PPID
R sh 0 0:02 1 0
S zleep 3 0:02 16 1
Z fork-if 3 0:02 22 1
R fork-child-A 3 0:02 25 1
R fork-child-B 3 0:02 26 25
S fork-child-C 3 0:02 27 26
S fork-daemon 3 0:02 28 27
R ps 0 0:01 30 1
Child '2' done
thread done
daemon
(…)
STAT COMMAND USER ELAPSED PID PPID
R sh 0 0:04 1 0
Z zleep 3 0:04 16 1
Z fork-if 3 0:04 22 1
Z fork-child-A 3 0:04 25 1
R fork-child-B 3 0:04 26 1
S fork-child-C 3 0:04 27 26
S fork-daemon 3 0:04 28 27
R ps 0 0:01 33 1
(…)
With reaper:
$ make -C ~/src/my/go/doodles/sub-process case5
(…)
main() CRASH imminent
(…)
Child '1' done
thread done
raeper pid 24
STAT COMMAND USER ELAPSED PID PPID
S sh 0 0:02 1 0
S zleep 3 0:01 18 1
R fork-child-A 3 0:01 27 1
R fork-child-B 3 0:01 28 27
S fork-child-C 3 0:01 30 28
S fork-daemon 3 0:01 31 30
R ps 0 0:01 32 1
Child '2' done
thread done
raeper pid 27
daemon
STAT COMMAND USER ELAPSED PID PPID
S sh 0 0:03 1 0
S zleep 3 0:02 18 1
R fork-child-B 3 0:02 28 1
S fork-child-C 3 0:02 30 28
S fork-daemon 3 0:02 31 30
R ps 0 0:01 33 1
STAT COMMAND USER ELAPSED PID PPID
S sh 0 0:03 1 0
S zleep 3 0:02 18 1
R fork-child-B 3 0:02 28 1
S fork-child-C 3 0:02 30 28
S fork-daemon 3 0:02 31 30
R ps 0 0:01 34 1
raeper pid 18
daemon
STAT COMMAND USER ELAPSED PID PPID
S sh 0 0:04 1 0
R fork-child-B 3 0:03 28 1
S fork-child-C 3 0:03 30 28
S fork-daemon 3 0:03 31 30
R ps 0 0:01 35 1
(…)
Here is a picture of the same output, which may be less confusing to read.
Zombies
Reaper
How to run the cases in the poc repo
Get the code
git clone https://github.com/tox2ik/go-poc-reaper.git
One terminal:
make tail-cases
Another terminal
make prep
make build
or make build2
make case0 case1
...
Related questions:
go
How to create a daemon process in Golang?
How to start a Go program as a daemon in Ubuntu?
how to keep subprocess running after program exit in golang?
Prevent Ctrl+C from interrupting exec.Command in Golang
signals
https://unix.stackexchange.com/questions/386999/what-terminal-related-signals-are-sent-to-the-child-processes-of-the-shell-direc
https://unix.stackexchange.com/questions/6332/what-causes-various-signals-to-be-sent
https://en.wikipedia.org/wiki/Signal_(IPC)#List_of_signals
Related discussions:
https://github.com/golang/go/issues/227
https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/
Relevant projects:
http://software.clapper.org/daemonize/ (what I would use)
https://github.com/hashicorp/go-reap (if you must have run go on pid 1)
https://github.com/sevlyar/go-daemon (mimics posix fork)
Relevant prose:
A zombie process is a process whose execution is completed but it still has an entry in the process table. Zombie processes usually occur for child processes, as the parent process still needs to read its child’s exit status. Once this is done using the wait system call, the zombie process is eliminated from the process table. This is known as reaping the zombie process.
from https://www.tutorialspoint.com/what-is-zombie-process-in-linux
Related
I have this simple test:
int main() {
int res = fork();
if (res == 0) { // child
printf("Son running now, pid = %d\n", getpid());
}
else { // parent
printf("Parent running now, pid = %d\n", getpid());
wait(NULL);
}
return 0;
}
When I run it a hundred times, i.e. run this command,
for ((i=0;i<100;i++)); do echo ${i}:; ./test; done
I get:
0:
Parent running now, pid = 1775
Son running now, pid = 1776
1:
Parent running now, pid = 1777
Son running now, pid = 1778
2:
Parent running now, pid = 1779
Son running now, pid = 1780
and so on; whereas when I first write to a file and then read the file, i.e. run this command,
for ((i=0;i<100;i++)); do echo ${i}:; ./test; done > forout
cat forout
I get it flipped! That is,
0:
Son running now, pid = 1776
Parent running now, pid = 1775
1:
Son running now, pid = 1778
Parent running now, pid = 1777
2:
Son running now, pid = 1780
Parent running now, pid = 1779
I know about the scheduler. What does this result not mean, in terms of who runs first after forking?
The forking function, do_fork() (at kernel/fork.c) ends with setting the need_resched flag to 1, with the comment by kernel developers saying, "let the child process run first."
I guessed that this has something to do with the buffers that the printf writes to.
Also, is it true to say that the input redirection (>) writes everything to a buffer first and only then copies to the file? And even so, why would this change the order of the prints?
Note: I am running the test on a single-core virtual machine with a Linux kernel v2.4.14.
Thank you for your time.
When you redirect, glibc detects that stdout is not tty turns on output buffering for efficiency. The buffer is therefore not written until the process exits. You can see this with e.g.:
int main() {
printf("hello world\n");
sleep(60);
}
When you run it interactively, it prints "hello world" and waits. When you redirect to a file, you will see that nothing is written for 60 seconds:
$ ./foo > file & tail -f file
(no output for 60 seconds)
Since your parent process waits for the child, it will necessarily always exit last, and therefore flush its output last.
Is there any option in perf to look into processes running on a particular cpu /core, and how much percentage of that core is taken by each process.
Reference links would be helpful.
perf is intended to do a profiling which is not good fit for your case. You may try to do sampling /proc/sched_debug (if it is compiled in your kernel). For example you may check which process is currently running on CPU:
egrep '^R|cpu#' /proc/sched_debug
cpu#0, 917.276 MHz
R egrep 2614 37730.177313 ...
cpu#1, 917.276 MHz
R bash 2023 218715.010833 ...
By using his PID as a key, you may check how many CPU time in milliseconds it consumed:
grep se.sum_exec_runtime /proc/2023/sched
se.sum_exec_runtime : 279346.058986
However, as #BrenoLeitão mentioned, SystemTap is quite useful for your script. Here is script for your task.
global cputimes;
global cmdline;
global oncpu;
global NS_PER_SEC = 1000000000;
probe scheduler.cpu_on {
oncpu[pid()] = local_clock_ns();
}
probe scheduler.cpu_off {
if(oncpu[pid()] == 0)
next;
cmdline[pid()] = cmdline_str();
cputimes[pid(), cpu()] <<< local_clock_ns() - oncpu[pid()];
delete oncpu[pid()];
}
probe timer.s(1) {
printf("%6s %3s %6s %s\n", "PID", "CPU", "PCT", "CMDLINE");
foreach([pid+, cpu] in cputimes) {
cpupct = #sum(cputimes[pid, cpu]) * 10000 / NS_PER_SEC;
printf("%6d %3d %3d.%02d %s\n", pid, cpu,
cpupct / 100, cpupct % 100, cmdline[pid]);
}
delete cputimes;
}
It traces moments when process is running on CPU and stops execution on that (due to migration or sleeping) by attaching to scheduler.cpu_on and scheduler.cpu_off probes. Second probe calculates time difference between these events and saves it to cputimes aggregation along with process command line arguments.
timer.s(1) fires once per second -- it walks over aggregation and calculates percentage. Here is sample output for Centos 7 with bash running infinite loop:
0 0 100.16
30 1 0.00
51 0 0.00
380 0 0.02 /usr/bin/python -Es /usr/sbin/tuned -l -P
2016 0 0.08 sshd: root#pts/0 "" "" "" ""
2023 1 100.11 -bash
2630 0 0.04 /usr/libexec/systemtap/stapio -R stap_3020c9e7ba76838179be68cd2390a10c_2630 -F3
I understand that perf is not the proper way to do it, although you can limit perf per CPU, as using perf record -C <cpulist> or even perf stat -c <cpulist>.
The close you are going to see is the context-switch event, but, this is not going to provide you the application names at all.
I think you are going to need something more powerful, as systemtap.
I'm learning about the scheduler and trying to print all runnable proceeses. So I have written a kernel module that uses the for_each_process macro to iterate over all processes, and prints the ones at "runnable" state. But this seems like a stupid (and inefficient) way of doing this. So I thought about getting a reference to all running queues and use their Red-Black-Tree to go over the runnable processes, but couldn't find a way to do this.
I have found out that there is a list of sched_classs for each CPU which are stop_sched_class->rt_sched_class->fair_sched_class->idle_sched_class and each one of them has it's own running queue. But couldn't find a way to reach them all.
I have used the module that uses the tasks_timeline to find all runnable processes, to print the address of the running queues - seems I have 3 running queues (while having only two processors).
The module:
#include <linux/module.h> /* Needed by all modules */
#include <linux/kernel.h> /* Needed for KERN_INFO */
#include <linux/sched.h>
MODULE_LICENSE("GPL");
struct cfs_rq {
struct load_weight load;
unsigned int nr_running, h_nr_running;
};
void printList(void){
int count;
struct task_struct * tsk;
count = 0;
for_each_process(tsk){
if(tsk->state)
continue;
printk("pid: %d rq: %p (%d)\n", tsk->pid, tsk->se.cfs_rq, tsk->se.cfs_rq->nr_running);
count++;
}
printk("count is: %d\n", count);
}
int init_module(void)
{
printList();
return 0;
}
void cleanup_module(void)
{
printk(KERN_INFO "Goodbye world proc.\n");
}
The output:
[ 8215.627038] pid: 9147 ffff88007bbe9200 (3)
[ 8215.627043] pid: 9148 ffff8800369b0200 (2)
[ 8215.627045] pid: 9149 ffff8800369b0200 (2)
[ 8215.627047] pid: 9150 ffff88007bbe9200 (3)
[ 8215.627049] pid: 9151 ffff88007bbe9200 (3)
[ 8215.627051] pid: 9154 ffff8800a46d4600 (1)
[ 8215.627053] count is: 6
[ 8215.653741] Goodbye world proc.
About the computer:
$ uname -a
Linux k 3.13.0-39-generic #66-Ubuntu SMP Tue Oct 28 13:30:27 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
$ cat /proc/cpuinfo | grep 'processor' | wc -l
2
So my questions are:
How can I print all runnable processes in a nicer way?
How are running queues made and managed?
Are the running queues somehow linked each other? (How?)
$ps -A -l and find the instance where both the process state (R) and the Process Flags (1) are as mentioned.
You can try this below cmd.
Sample output.
127:~$ ps -A -l | grep -e R -e D
F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
1 S 0 1367 2 0 80 0 - 0 - ? 00:00:01 SEPDRV_ABNORMAL
4 R 1000 2634 2569 2 80 0 - 794239 - ? 00:25:06 Web Content
1 D 0 20091 2 0 80 0 - 0 - ? 00:00:00 kworker/3:2
4 R 1000 21077 9361 0 80 0 - 7229 - pts/17 00:00:00 ps
This happens if parent crashes after cloning child process, but before sending the unblocking byte with SendContinueSignalToChild(). In this case pipe file handle remains opened and child stays infinitely blocked on read(...) within WaitForContinueSignal(). After the crash, child is adopted by init process.
Steps to reproduce:
l. Simulate parent crash in google_breakpad::ExceptionHandler::GenerateDump(CrashContext *context):
...
const pid_t child = sys_clone(
ThreadEntry, stack, CLONE_FILES | CLONE_FS | CLONE_UNTRACED, &thread_arg, NULL, NULL, NULL);
int r, status;
// Allow the child to ptrace us
sys_prctl(PR_SET_PTRACER, child, 0, 0, 0);
int *ptr = 0;
*ptr = 42; // <------- Crash here
SendContinueSignalToChild();
...
Send one of the handled signal to the parent (e.g. SIGSEGV), so that the above GenerateDump(...) method is envoked.
Observe that parent exits but child still exists, blocked on WaitForContinueSignal().
Output for the above steps:
dmytro#db:~$ ./test &
[1] 25050
dmytro#db:~$ Test: started
dmytro#db:~$ ps aflxw | grep test
0 1000 25050 18923 20 0 40712 2680 - R pts/37 0:13 | | \_ ./test
0 1000 25054 18923 20 0 6136 856 pipe_w S+ pts/37 0:00 | | \_ grep --color=auto test
dmytro#db:~$ kill -11 25050
[1]+ Segmentation fault (core dumped) ./test
dmytro#db:~$ ps aflxw | grep test
0 1000 25058 18923 20 0 6136 852 pipe_w S+ pts/37 0:00 | | \_ grep --color=auto test
1 1000 25055 1687 20 0 40732 356 pipe_w S pts/37 0:00 \_ ./test
1687 is the init pid.
In the real world the crash happens in a thread parallel to the one that handles signal.
NOTE: the issue can also happen because of normal program termination (i.e. exit(0) is called in a parallel thread).
Tested on Linux 3.3.8-2.2., mips and i686 platforms.
So, my 2 questions:
Is it the expected behavior for the breakpad library to keep child alive? My expectation is that child should exit immediately after parent crashes/exits.
If it is not expected behavior, what is the best solution to finish client after parent crash/exit?
Thanks in advance!
Any clue on possible solution?
This can also happen during shutdown crash, if the crashed thread is not main, and the parent process exits from main() in exactly this time slot, so apparently it's not that unlikely to happen as it seems at a first glance.
At this moment, I think this is happening because of CLONE_FILES flag of clone() function. This leads to the situation where read() on pipe in child is not returning EOF if parent process quits.
I have not yet done the examination if we can safely get rid of this flag in clone() call.
I have this simple perl daemon:
#!/usr/bin/perl
use strict;
use warnings;
use Proc::Daemon;
Proc::Daemon::Init;
my $continue = 1;
$SIG{TERM} = sub { $continue = 0 };
$SIG{USR1} = sub { do_process(1) };
# basic daemon
boxesd_log("started boxesd");
while ($continue) {
do_process(0);
sleep(30);
}
boxesd_log("finished boxesd");
exit(0);
# required subroutines
sub do_process {
my ($notified) = #_;
boxesd_log("doing it $notified");
}
But there is something that is not working right.
When the daemon starts, it logs every 30 seconds without the notification as expected:
Sat Oct 30 21:05:47 2010 doing it 0
Sat Oct 30 21:06:17 2010 doing it 0
Sat Oct 30 21:06:47 2010 doing it 0
The problem comes when I send the USR1 signal to the process using kill -USR1 xxxxx. The output is not what I expect:
Sat Oct 30 21:08:25 2010 doing it 1
Sat Oct 30 21:08:25 2010 doing it 0
I get two continuous entries, one from the signal handling subroutine and another form the ever running loop. It seems as if the sleep gets interrupted whenever the USR1 signal is received.
What is going on?
The sleep is getting interrupted in the sense that your program will handle the incoming signal, but the loop will continue (going back to sleep) until a TERM signal is received. This is documented behaviour for the sleep() function:
May be interrupted if the process receives a signal such as "SIGALRM".
Note that if you need to sleep for 30 seconds, even if interrupted by a signal, you can determine the number of seconds slept and then sleep again for the remainder:
while (1)
{
my $actual = sleep(30);
sleep(30 - $actual) if $actual < 30;
}
PS. You can read more about signal handling in perldoc perlipc, and in virtually any unix programming book by W. Richard Stevens. :)