Why the node child spawn process detached from parent process and start running independently ? - node.js

I have a node child spawn process which is continuously write in to the file using writestream on every "data event" received. The script is run under ssh and facing an edge case problem.
Consider multiple terminals open with same ssh host and the script is start to run in one terminal. Someone accidentally close the ssh terminal during it's execution and doesn't want to stop child process execution. while using a command ps -ef | grep "command name" the child process still running with different parent process id (it shows 1) but the writestream in the child process stops writing to the file. It seems like child process become zombie process eventhough i detached the process from parent. You can find the script below:
var execSpawn = require('child_process').spawn;
var Promise = require('bluebird');
var spawnAction = function(path, cmd, cb){
return function(resolve, reject, onCancel){
cmdExec = execSpawn(path, cmd, {detached: true});
//cmdExec = execSpawn(path, cmd, {detached: true}).unref();
var fileData = {}
var count = 0;
var stream = fs.createWriteStream('filepath');
cmdExec.stdout.setEncoding('utf8');
cmdExec.stdout.on('data', function(data){
//Certain actions with filedata and count;
stream.write(data);
});
cmdExec.stderr.on('data', function(data){
//some actions
stream.write("error");
});
cmdExec.on('close', function(){
stream.end();
if(cb){
resolve(cb(fileData));
}else{
resolve(count);
}
});
}
}
This script is running properly when it is allow to run completely without any interruption. When the script execution terminal closes the child process stop the writestream to the file. If i try with detach along with unref() it throws an error like it couldn't figure out the event stdout.on over the child process.
Cannot read property 'stdout' of undefined
More information during the script running. This is taken in the same host in different terminal
ps -ef | grep command_name
root 19904 19191 0 20:16 ? 00:00:00 cli command_name pfitzner7 /dev/sdb
root 19905 19191 0 20:16 ? 00:00:00 cli command_name pfitzner7 /dev/sdc
root 19906 19191 0 20:16 ? 00:00:00 cli command_name pfitzner7 /dev/sdd
root 19907 19191 0 20:16 ? 00:00:00 cli command_name pfitzner7 /dev/sde
root 23101 13105 0 20:16 pts/0 00:00:00 grep --color=auto command_name
After closing the script running terminal before it finishes. I am getting this:
ps -ef | grep command_name
root 19904 1 0 20:16 ? 00:00:00 cli command_name pfitzner7 /dev/sdb
root 19905 1 0 20:16 ? 00:00:00 cli command_name pfitzner7 /dev/sdc
root 19906 1 0 20:16 ? 00:00:00 cli command_name pfitzner7 /dev/sdd
root 19907 1 0 20:16 ? 00:00:00 cli command_name pfitzner7 /dev/sde
root 23163 13105 0 20:16 pts/0 00:00:00 grep --color=auto command_name
I am trying to figure out why it's happening this issue and will it be there any possible ways to do in different way. Why did the parent process id changed to 1 even if it's detached child process? How can a child spawn process can run independently from the parent process?
Please let me know your suggestions on this approach or the reason for the error.
Thanks in advance.

I think, you don't need to detach your childrens. The childrens are spawned asynchronously, so you may manipulate with a few child's stdio at the same time

Related

Run command in golang and detach it from process

Problem:
I'm writing program in golang on linux that needs to execute long running process so that:
I redirect stdout of running process to file.
I control the user of process.
Process doesn't die when my program exits.
The process doesn't become a zombie when it crashes.
I get PID of running process.
I'm running my program with root permissions.
Attempted solution:
func Run(pathToBin string, args []string, uid uint32, stdLogFile *os.File) (int, error) {
cmd := exec.Command(pathToBin, args...)
cmd.SysProcAttr = &syscall.SysProcAttr{
Credential: &syscall.Credential{
Uid: uid,
},
}
cmd.Stdout = stdLogFile
if err := cmd.Start(); err != nil {
return -1, err
}
go func() {
cmd.Wait() //Wait is necessary so cmd doesn't become a zombie
}()
return cmd.Process.Pid, nil
}
This solution seems to satisfy almost all of my requirements except that when I send SIGTERM/SIGKILL to my program the underlying process crashes. In fact I want my background process to be as separate as possible: it has different parent pid, group pid etc. from my program. I want to run it as daemon.
Other solutions on stackoverflow suggested to use cmd.Process.Release() for similar use cases, but it doesn't seem to work.
Solutions which are not applicable in my case:
I have no control over code of process I'm running. My solution has to work for any process.
I can't use external commands to run it, just pure go. So using systemd or something similar is not applicable.
I can in fact use library that is easily importable using import from github etc.
TLDR;
Just use https://github.com/hashicorp/go-reap
There is a great Russian expression which reads "don't try to give birth to a bicycle" and it means don't reinvent the wheel and keep it simple. I think it applies here. If I were you, I'd reconsider using one of:
https://github.com/krallin/tini
https://busybox.net/
https://software.clapper.org/daemonize/
https://wiki.gentoo.org/wiki/OpenRC
https://www.freedesktop.org/wiki/Software/systemd/
This issue has already been solved ;)
Your question is imprecise or you are asking for non-standard features.
In fact I want my background process to be as separate as possible: it has different parent pid, group pid etc. from my program. I want to run it as daemon.
That is not how process inheritance works. You can not have process A start Process B and somehow change the parent of B to C. To the best of my knowledge this is not possible in Linux.
In other words, if process A (pid 55) starts process B (100), then B must have parent pid 55.
The only way to avoid that is have something else start the B process such as atd, crond, or something else - which is not what you are asking for.
If parent 55 dies, then PID 1 will be the parent of 100, not some arbitrary process.
Your statement "it has different parent pid" does not makes sense.
I want to run it as daemon.
That's excellent. However, in a GNU / Linux system, all daemons have a parent pid and those parents have a parent pid going all the way up to pid 1, strictly according to the parent -> child rule.
when I send SIGTERM/SIGKILL to my program the underlying process crashes.
I can not reproduce that behavior. See case8 and case7 from the proof-of-concept repo.
make case8
export NOSIGN=1; make build case7
unset NOSIGN; make build case7
$ make case8
{ sleep 6 && killall -s KILL zignal; } &
./bin/ctrl-c &
sleep 2; killall -s TERM ctrl-c
kill with:
{ pidof ctrl-c; pidof signal ; } | xargs -r -t kill -9
main() 2476074
bashed 2476083 (2476081)
bashed 2476084 (2476081)
bashed 2476085 (2476081)
zignal 2476088 (2476090)
go main() got 23 urgent I/O condition
go main() got 23 urgent I/O condition
zignal 2476098 (2476097)
go main() got 23 urgent I/O condition
zignal 2476108 (2476099)
main() wait...
p 2476088
p 2476098
p 2476108
p 2476088
go main() got 15 terminated
sleep 1; killall -s TERM ctrl-c
p 2476098
p 2476108
p 2476088
go main() got 15 terminated
sleep 1; killall -s TERM ctrl-c
p 2476098
p 2476108
p 2476088
Bash c 2476085 EXITs ELAPSED 4
go main() got 17 child exited
go main() got 23 urgent I/O condition
main() children done: 1 %!s(<nil>)
main() wait...
go main() got 15 terminated
go main() got 23 urgent I/O condition
sleep 1; killall -s KILL ctrl-c
p 2476098
p 2476108
p 2476088
balmora: ~/src/my/go/doodles/sub-process [main]
$ p 2476098
p 2476108
Bash _ 2476083 EXITs ELAPSED 6
Bash q 2476084 EXITs ELAPSED 8
The bash processes keep running after the parent is killed.
killall -s KILL ctrl-c;
All 3 "zignal" sub-processes are running until killed by
killall -s KILL zignal;
In both cases the sub-processes continue to run despite main process being signaled with TERM, HUP, INT. This behavior is different in a shell environment because of convenience reasons. See the related questions about signals. This particular answer illustrates a key difference for SIGINT. Note that SIGSTOP and SIGKILL cannot be caught by an application.
It was necessary to clarify the above before proceeding with the other parts of the question.
So far you have already solved the following:
redirect stdout of sub-process to a file
set owner UID of sub-process
sub-process survives death of parent (my program exits)
the PID of sub-process can be seen by the main program
The next one depends on whether the children are "attached" to a shell or not
sub-process survives the parent being killed
The last one is hard to reproduce, but I have heard about this problem in the docker world, so the rest of this answer is focused on addressing this issue.
sub-process survives if the parent crashes and does not become a zombie
As you have noted, the Cmd.Wait() is necessary to avoid creating zombies. After some experimentation I was able to consistency produce zombies in a docker environment using an intentionally simple replacement for /bin/sh. This "shell" implemented in go will only run a single command and not much else in terms of reaping children. You can study the code over at github.
The zombie solution
the simple wrapper which causes zombies
package main
func main() {
Sh()
}
The reaper wrapper
package main
import (
"fmt"
"sync"
"github.com/fatih/color"
"github.com/hashicorp/go-reap"
)
func main() {
if reap.IsSupported() {
done := make(chan struct{})
var reapLock sync.RWMutex
pids := make(reap.PidCh, 1)
errors := make(reap.ErrorCh, 1)
go reap.ReapChildren(pids, errors, done, &reapLock)
go report(pids, errors, done)
Sh()
close(done)
} else {
fmt.Println("Sorry, go-reap isn't supported on your platform.")
}
}
func report(pids reap.PidCh, errors reap.ErrorCh, done chan struct{}) {
sprintf := color.New(color.FgWhite, color.Bold).SprintfFunc()
for ;; {
select {
case pid := <-pids:
println(sprintf("raeper pid %d", pid))
case err := <-errors:
println(sprintf("raeper er %s", err))
case <-done:
return
}
}
}
The init / sh (pid 1) process which runs other commands
package main
import (
"os"
"os/exec"
"strings"
"time"
"github.com/google/shlex"
"github.com/tox2ik/go-poc-reaper/fn"
)
func Sh() {
args := os.Args[1:]
script := args[0:0]
if len(args) >= 1 {
if args[0] == "-c" {
script = args[1:]
}
}
if len(script) == 0 {
fn.CyanBold("cmd: expecting sh -c 'foobar'")
os.Exit(111)
}
var cmd *exec.Cmd
parts, _ := shlex.Split(strings.Join(script, " "))
if len(parts) >= 2 {
cmd = fn.Merge(exec.Command(parts[0], parts[1:]...), nil)
}
if len(parts) == 1 {
cmd = fn.Merge(exec.Command(parts[0]), nil)
}
if fn.IfEnv("HANG") {
fn.CyanBold("cmd: %v\n start", parts)
ex := cmd.Start()
if ex != nil {
fn.CyanBold("cmd %v err: %s", parts, ex)
}
go func() {
time.Sleep(time.Millisecond * 100)
errw := cmd.Wait()
if errw != nil {
fn.CyanBold("cmd %v err: %s", parts, errw)
} else {
fn.CyanBold("cmd %v all done.", parts)
}
}()
fn.CyanBold("cmd: %v\n dispatched, hanging forever (i.e. to keep docker running)", parts)
for {
time.Sleep(time.Millisecond * time.Duration(fn.EnvInt("HANG", 2888)))
fn.SystemCyan("/bin/ps", "-e", "-o", "stat,comm,user,etime,pid,ppid")
}
} else {
if fn.IfEnv("NOWAIT") {
ex := cmd.Start()
if ex != nil {
fn.CyanBold("cmd %v start err: %s", parts, ex)
}
} else {
ex := cmd.Run()
if ex != nil {
fn.CyanBold("cmd %v run err: %s", parts, ex)
}
}
fn.CyanBold("cmd %v\n dispatched, exit docker.", parts)
}
}
The Dockerfile
FROM scratch
# for sh.go
ENV HANG ""
# for sub-process.go
ENV ABORT ""
ENV CRASH ""
ENV KILL ""
# for ctrl-c.go, signal.go
ENV NOSIGN ""
COPY bin/sh /bin/sh ## <---- wrapped or simple /bin/sh or "init"
COPY bin/sub-process /bin/sub-process
COPY bin/zleep /bin/zleep
COPY bin/fork-if /bin/fork-if
COPY --from=busybox:latest /bin/find /bin/find
COPY --from=busybox:latest /bin/ls /bin/ls
COPY --from=busybox:latest /bin/ps /bin/ps
COPY --from=busybox:latest /bin/killall /bin/killall
Remaining code / setup can be seen here:
https://github.com/tox2ik/go-poc-reaper
Case 5 (simple /bin/sh)
The gist of it is we start two sub-processes from go, using the "parent" sub-process binary. The first child is zleep and the second fork-if. The second one starts a "daemon" that runs a forever-loop in addition to a few short-lived threads. After a while, we kill the sub-procss parent, forcing sh to take over the parenting for these children.
Since this simple implementation of sh does not know how to deal with abandoned children, the children become zombies.
This is standard behavior. To avoid this, init systems are usually responsible for cleaning up any such children.
Check out this repo and run the cases:
$ make prep build
$ make prep build2
The first one will use the simple /bin/sh in the docker container, and the socond one will use the same code wrapped in a reaper.
With zombies:
$ make prep build case5
(…)
main() Daemon away! 16 (/bin/zleep)
main() Daemon away! 22 (/bin/fork-if)
(…)
main() CRASH imminent
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x49e45c]
goroutine 1 [running]:
main.main()
/home/jaroslav/src/my/go/doodles/sub-process/sub-process.go:137 +0xfc
cmd [/bin/sub-process /log/case5 3 /bin/zleep 111 2 -- /dev/stderr 3 /bin/fork-if --] err: exit status 2
Child '1' done
thread done
STAT COMMAND USER ELAPSED PID PPID
R sh 0 0:02 1 0
S zleep 3 0:02 16 1
Z fork-if 3 0:02 22 1
R fork-child-A 3 0:02 25 1
R fork-child-B 3 0:02 26 25
S fork-child-C 3 0:02 27 26
S fork-daemon 3 0:02 28 27
R ps 0 0:01 30 1
Child '2' done
thread done
daemon
(…)
STAT COMMAND USER ELAPSED PID PPID
R sh 0 0:04 1 0
Z zleep 3 0:04 16 1
Z fork-if 3 0:04 22 1
Z fork-child-A 3 0:04 25 1
R fork-child-B 3 0:04 26 1
S fork-child-C 3 0:04 27 26
S fork-daemon 3 0:04 28 27
R ps 0 0:01 33 1
(…)
With reaper:
$ make -C ~/src/my/go/doodles/sub-process case5
(…)
main() CRASH imminent
(…)
Child '1' done
thread done
raeper pid 24
STAT COMMAND USER ELAPSED PID PPID
S sh 0 0:02 1 0
S zleep 3 0:01 18 1
R fork-child-A 3 0:01 27 1
R fork-child-B 3 0:01 28 27
S fork-child-C 3 0:01 30 28
S fork-daemon 3 0:01 31 30
R ps 0 0:01 32 1
Child '2' done
thread done
raeper pid 27
daemon
STAT COMMAND USER ELAPSED PID PPID
S sh 0 0:03 1 0
S zleep 3 0:02 18 1
R fork-child-B 3 0:02 28 1
S fork-child-C 3 0:02 30 28
S fork-daemon 3 0:02 31 30
R ps 0 0:01 33 1
STAT COMMAND USER ELAPSED PID PPID
S sh 0 0:03 1 0
S zleep 3 0:02 18 1
R fork-child-B 3 0:02 28 1
S fork-child-C 3 0:02 30 28
S fork-daemon 3 0:02 31 30
R ps 0 0:01 34 1
raeper pid 18
daemon
STAT COMMAND USER ELAPSED PID PPID
S sh 0 0:04 1 0
R fork-child-B 3 0:03 28 1
S fork-child-C 3 0:03 30 28
S fork-daemon 3 0:03 31 30
R ps 0 0:01 35 1
(…)
Here is a picture of the same output, which may be less confusing to read.
Zombies
Reaper
How to run the cases in the poc repo
Get the code
git clone https://github.com/tox2ik/go-poc-reaper.git
One terminal:
make tail-cases
Another terminal
make prep
make build
or make build2
make case0 case1
...
Related questions:
go
How to create a daemon process in Golang?
How to start a Go program as a daemon in Ubuntu?
how to keep subprocess running after program exit in golang?
Prevent Ctrl+C from interrupting exec.Command in Golang
signals
https://unix.stackexchange.com/questions/386999/what-terminal-related-signals-are-sent-to-the-child-processes-of-the-shell-direc
https://unix.stackexchange.com/questions/6332/what-causes-various-signals-to-be-sent
https://en.wikipedia.org/wiki/Signal_(IPC)#List_of_signals
Related discussions:
https://github.com/golang/go/issues/227
https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/
Relevant projects:
http://software.clapper.org/daemonize/ (what I would use)
https://github.com/hashicorp/go-reap (if you must have run go on pid 1)
https://github.com/sevlyar/go-daemon (mimics posix fork)
Relevant prose:
A zombie process is a process whose execution is completed but it still has an entry in the process table. Zombie processes usually occur for child processes, as the parent process still needs to read its child’s exit status. Once this is done using the wait system call, the zombie process is eliminated from the process table. This is known as reaping the zombie process.
from https://www.tutorialspoint.com/what-is-zombie-process-in-linux

Self duplicate background script

This is a background script test.
When run it launch two processes and I don't understand why.
One stop after sleep 20. And other forgets.
#!/bin/bash
back(){
n=0
while [ 1 ]
do
echo $n
n=$(($n+1))
sleep 5
done
}
back &
sleep 20
exit
command "ps -a" in call:
PID TTY TIME CMD
8964 pts/2 00:00:00 backgroundtest
8965 pts/2 00:00:00 backgroundtest
8966 pts/2 00:00:00 sleep
8982 pts/2 00:00:00 sleep
after sleep 20:
PID TTY TIME CMD
8965 pts/2 00:00:00 backgroundtest
9268 pts/2 00:00:00 sleep
then run forever...
why?
while [ 1 ] is an infinite loop. [ 1 ] is always true.
So back & is an infinite loop, started in background (&), then execution continues with sleep 20, which does end after 20 seconds, leaving you with two processes for 20 seconds (& starts a new process in background), then the infinite one after that.

pgrep in linux only identify 15 bytes proc name

in my Linux , while I run shell : ps -ef | grep Speed , I got the following :
myid 143410 49092 0 10:21 pts/12 00:00:00 ./OutSpeedyOrderConnection
myid 145492 49053 0 10:35 pts/11 00:00:00 ./SpeedyOrderConnection
That means , the pid of these 2 process are 143410 and 145492 .
Then I run shell : pgrep -l Speed , I got the following :
143410 OutSpeedyOrderC
145492 SpeedyOrderConn
and I run shell : pgrep OutSpeedyOrderC , I got :
143410
pgrep OutSpeedyOrderCo will get nothing !!!!!
look like pgrep will only identify 15 bytes of processname ,
anything I can do to get the right answer while I run
pgrep OutSpeedyOrderConnection ?!

not running cron with casperjs

sh_run.sh file..
#!/bin/bash
PHANTOMJS_EXECUTABLE=/usr/local/bin/phantomjs /usr/local/bin/casperjs /home/test/html/run/site_check.js
Setting in crontab..
# cat /etc/crontab
45 0 * * * root sh /home/test/html/run/sh_run.sh
but casperjs not running..
crontab status is Rl..
what is Rl ??
# ps ax|grep phantomjs
28155 ? Rl 0:18 /usr/local/bin/phantomjs /usr/local/casperjs/bin/bootstrap.js --casper-path=/usr/local/casperjs --cli /home/test/html/run/site_check.js
of course..
# casperjs site_check.js
is running..
add comment...
# sh sh_run.sh &
# ps ax|grep phantomjs
1625 pts/0 Sl 0:01 /usr/local/bin/phantomjs /usr/local/casperjs/bin/bootstrap.js --casper-path=/usr/local/casperjs --cli /home/test/html/run/site_check.js
is running...
if ps status is Sl, data change.. (namely running..)
but ps status is Rl, data not change.. run by cron, status is always Rl.
Rl status does not change.
what problem? plz..
help me..

Child hangs if parent crashes or exits in google_breakpad::ExceptionHandler::SignalHandler

This happens if parent crashes after cloning child process, but before sending the unblocking byte with SendContinueSignalToChild(). In this case pipe file handle remains opened and child stays infinitely blocked on read(...) within WaitForContinueSignal(). After the crash, child is adopted by init process.
Steps to reproduce:
l. Simulate parent crash in google_breakpad::ExceptionHandler::GenerateDump(CrashContext *context):
...
const pid_t child = sys_clone(
ThreadEntry, stack, CLONE_FILES | CLONE_FS | CLONE_UNTRACED, &thread_arg, NULL, NULL, NULL);
int r, status;
// Allow the child to ptrace us
sys_prctl(PR_SET_PTRACER, child, 0, 0, 0);
int *ptr = 0;
*ptr = 42; // <------- Crash here
SendContinueSignalToChild();
...
Send one of the handled signal to the parent (e.g. SIGSEGV), so that the above GenerateDump(...) method is envoked.
Observe that parent exits but child still exists, blocked on WaitForContinueSignal().
Output for the above steps:
dmytro#db:~$ ./test &
[1] 25050
dmytro#db:~$ Test: started
dmytro#db:~$ ps aflxw | grep test
0 1000 25050 18923 20 0 40712 2680 - R pts/37 0:13 | | \_ ./test
0 1000 25054 18923 20 0 6136 856 pipe_w S+ pts/37 0:00 | | \_ grep --color=auto test
dmytro#db:~$ kill -11 25050
[1]+ Segmentation fault (core dumped) ./test
dmytro#db:~$ ps aflxw | grep test
0 1000 25058 18923 20 0 6136 852 pipe_w S+ pts/37 0:00 | | \_ grep --color=auto test
1 1000 25055 1687 20 0 40732 356 pipe_w S pts/37 0:00 \_ ./test
1687 is the init pid.
In the real world the crash happens in a thread parallel to the one that handles signal.
NOTE: the issue can also happen because of normal program termination (i.e. exit(0) is called in a parallel thread).
Tested on Linux 3.3.8-2.2., mips and i686 platforms.
So, my 2 questions:
Is it the expected behavior for the breakpad library to keep child alive? My expectation is that child should exit immediately after parent crashes/exits.
If it is not expected behavior, what is the best solution to finish client after parent crash/exit?
Thanks in advance!
Any clue on possible solution?
This can also happen during shutdown crash, if the crashed thread is not main, and the parent process exits from main() in exactly this time slot, so apparently it's not that unlikely to happen as it seems at a first glance.
At this moment, I think this is happening because of CLONE_FILES flag of clone() function. This leads to the situation where read() on pipe in child is not returning EOF if parent process quits.
I have not yet done the examination if we can safely get rid of this flag in clone() call.

Resources