execl()-ing in parent process: SIGCHLD caught by ps - linux

I'm doing an assignment on fork(),exec() and related UNIX calls where I need to show the zombie state of a (child) process. Here's the relevant piece of code:
pid = vfork(); //used vfork() for showing z state
if(pid>0)
{
(some sorting code)
execl("/bin/ps","/bin/ps","a",(char*)0);
}
What I expect is:
(child's output)
(parent's output)
(Output of the ps command where I then would be able to show a 'defunct' entry)
What I get is:
(child's output)
(parent's output)
No ps command output. Instead I get: Signal 17 (CHLD) caught by ps (procps version 3.2.8)
However, when sleep(int time) (some integer time in seconds) is inserted before the execl call, I get the desired output and no Signal errors are reported.
What's happening here? Does ps becomes the new parent of the (as yet-zombie) child?
And why does the ps command not execute? What does sleep() do that makes ps to execute as required?
I'm new to POSIX/Linux programing so any relevance of this SIGCHLD signal with respect to my particular situation would be appreciated. Thanks!

I might be wrong, but I think what's happening is this:
Your child starts and does the sorting code while the parent blocks.
The child exits.
The parent does it's half of the if, executing ps.
After ps is started, SIGCHLD is sent to the parent process because of the termination of the child (signals can be slow and unpredictable)
If you add the sleep, SIGCHLD is delivered to the parent, who ignores it, and then control passes to ps.

Title
ps -ef fails with "Signal 17 (CHLD) caught by ps (procps version 3.2.8)"" on Redhat 6.6
Description
When running a ps -ef command on Redhat 6.6 it fails with the following error:
"Signal 17 (CHLD) caught by ps (procps version 3.2.8)"
Cause
This is a 3rd Party issue.
Redhat have created the following article to track the issue:
https://access.redhat.com/solutions/1235753
Resolution
Please refer to the Redhat article for the latest workarounds. https://access.redhat.com/solutions/1235753
These include the renaming of the libfreebl3.chk files as follows:
# mv /lib/libfreebl3.chk /lib/libfreebl3.chk-bz1153759
# mv /lib64/libfreebl3.chk /lib64/libfreebl3.chk-bz1153759
Additional Information
This appears to have been fixed by RedHat now. See RHBA-2014:1867

Related

Does adding '&' makes it run as a daemon?

I am aware that adding a '&' in the end makes it run as a background but does it also mean that it runs as a daemon?
Like:
celery -A project worker -l info &
celery -A project worker -l info --detach
I am sure that the first one runs in a background however the second as stated in the document runs in the background as a daemon.
I would love to know the main difference of the commands above
They are different!
"&" version is background , but not run as daemon, daemon process will detach with terminal.
in C language ,daemon can write in code :
fork()
setsid()
close(0) /* and /dev/null as fd 0, 1 and 2 */
close(1)
close(2)
fork()
This ensures that the process is no longer in the same process group as the terminal and thus won't be killed together with it. The IO redirection is to make output not appear on the terminal.(see:https://unix.stackexchange.com/questions/56495/whats-the-difference-between-running-a-program-as-a-daemon-and-forking-it-into)
a daemon make it to be in its own session, not be attached to a terminal, not have any file descriptor inherited from the parent open to anything, not have a parent caring for you (other than init) have the current directory in / so as not to prevent a umount... while "&" version do not
Yes the process will be ran as a daemon, or background process; they both do the same thing.
You can verify this by looking at the opt parser in the source code (if you really want to verify this):
. cmdoption:: --detach
Detach and run in the background as a daemon.
https://github.com/celery/celery/blob/d59518f5fb68957b2d179aa572af6f58cd02de40/celery/bin/beat.py#L12
https://github.com/celery/celery/blob/d59518f5fb68957b2d179aa572af6f58cd02de40/celery/platforms.py#L365
Ultimately, the code below is what detaches it in the DaemonContext. Notice the fork and exit calls:
def _detach(self):
if os.fork() == 0: # first child
os.setsid() # create new session
if os.fork() > 0: # pragma: no cover
# second child
os._exit(0)
else:
os._exit(0)
return self
Not really. The process started with & runs in the background, but is attached to the shell that started it, and the process output goes to the terminal.
Meaning, if the shell dies or is killed (or the terminal is closed), that process will be sent a HUG signal and will die as well (if it doesn't catch it, or if its output goes to the terminal).
The command nohup detaches a process (command) from the shell and redirects its I/O, and prevents it from dying when the parent process (shell) dies.
Example:
You can see that by opening two terminals. In one run
sleep 500 &
in the other one run ps -ef to see the list of processes, and near the bottom something like
me 1234 1201 ... sleep 500
^ ^
process id parent process (shell)
close the terminal in which sleep sleeps in the background, and then do a ps -ef again, the sleep process is gone.
A daemon job is usually started by the system (its owner may be changed to a regular user) by upstart or init.

What happens to other processes when a Docker container's PID1 exits?

Consider the following, which runs sleep 60 in the background and then exits:
$ cat run.sh
sleep 60&
ps
echo Goodbye!!!
$ docker run --rm -v $(pwd)/run.sh:/run.sh ubuntu:16.04 bash /run.sh
PID TTY TIME CMD
1 ? 00:00:00 bash
5 ? 00:00:00 sleep
6 ? 00:00:00 ps
Goodbye!!!
This will start a Docker container, with bash as PID1. It then fork/execs a sleep process, and then bash exits. When the Docker container dies, the sleep process somehow dies too.
My question is: what is the mechanism by which the sleep process is killed? I tried trapping SIGTERM in a child process, and that appears to not get tripped. My presumption is that something (either Docker or the Linux kernel) is sending SIGKILL when shutting down the cgroup the container is using, but I've found no documentation anywhere clarifying this.
EDIT The closest I've come to an explanation is the following quote from baseimage-docker:
If your init process is your app, then it'll probably only shut down itself, not all the other processes in the container. The kernel will then forcefully kill those other processes, not giving them a chance to gracefully shut down, potentially resulting in file corruption, stale temporary files, etc. You really want to shut down all your processes gracefully.
So at least according to this, the implication is that when the container exits, the kernel will sending a SIGKILL to all remaining processes. But I'd still like clarity on how it decides to do that (i.e., is it a feature of cgroups?), and ideally a more authoritative source would be nice.
OK, I seem to have come up with some more solid evidence that this is, in fact, the Linux kernel doing the terminating. In the clone(2) man page, there's this useful section:
CLONE_NEWPID (since Linux 2.6.24)
The first process created in a new namespace (i.e., the process
created using the CLONE_NEWPID flag) has the PID 1, and is the
"init" process for the namespace. Children that are orphaned
within the namespace will be reparented to this process rather than
init(8). Unlike the traditional init process, the "init" process of a
PID namespace can terminate, and if it does, all of the processes in
the namespace are terminated.
Unfortunately this is still vague on how exactly the processes in the namespace are terminated, but perhaps that's because, unlike a normal process exit, no entry is left in the process table. Whatever the case is, it seems clear that:
The kernel itself is killing the other processes
They are not killed in a way that allows them any chance to do cleanup, making it (almost?) identical to a SIGKILL

Howto debug running bash script

I have a bash script running on Ubuntu.
Is it possible to see the line/command executed now without script restart.
The issue is that script sometimes never exits. This is really hard to reproduce (now I caught it), so I can't just stop the script and start the debugging.
Any help would be really appreciated
P.S. Script logic is hard to understand, so I can't to figure out why it's frozen by power of thoughts.
Try to find the process id (pid) of the shell, you may use ps -ef | grep <script_name>
Let's set this pid in the shell variable $PID.
Find all the child processes of this $PID by:
ps --ppid $PID
You might find one or more (if for example it's stuck in a pipelined series of commands). Repeat this command couple of times. If it doesn't change this means the script is stuck in certain command. In this case, you may attach trace command to the running child process:
sudo strace -p $PID
This will show you what is being executed, either indefinite loop (like reading from a pipe) or waiting on some event that never happens.
In case you find ps --ppid $PID changes, this indicates that your script is advancing but it's stuck somewhere, e.g. local loop in the script. From the changing commands, it can give you a hint where in the script it's looping.

kill signal from a underpriviledged user to root user

I have been bangin my head on this problem.
I want to send a kill(pid,SIGUSR1) signal to a process running in root user with a process
running in tom user.However everytime,I do this Operation not permitted comes up.
I searched up the net for any programmatical solution but to no avail.All responses are its impossible.But i am a bit skeptical and think it can be done programatically using c.
I need a sample program or lines which can explain how this can be acheived.
i tried using execl also.
To be more specific this kill signal is generated from mysql user to a process running in root and tried running in mysql aswell returned the same result operation not permitted.
Tom
Have you considered creating a process with an setuid() setting ?
The following is what you'd do from a unix/linux command line. Haven't used c in a while, but I'm pretty sure there's some "system" or "shell" function you can pass a shell command to.
If you can use sudo from your, that should do it:
sudo kill -9
Normally, you'd just need
kill -9
but some processes need more authority to kill.
You can get the process id with
ps -aux | grep
I'm afraid I don't know any more than that, hope this helps!
kyle

Stracing to attach to a multi-threaded process

If I want to strace a multi-threaded process (of all of its threads), how should I do it?
I know that one can do strace -f to follow forked process? But how about attaching to a process which is already multi-threaded when I start stracing? Is a way to tell strace to trace all of system calls of all the threads which belong to this process?
2021 update
strace -fp PID just does the right thing on my system (Ubuntu 20.04.1 LTS). The strace manual page points this out:
-f Trace child processes as they are created by currently traced processes as a result of the fork(2), vfork(2) and clone(2) system
calls. Note that -p PID -f will attach all threads of process PID if it is multi-threaded, not only thread with thread_id = PID.
Looks like this text was added back in 2013. If -f had this behavior on my system at the time, I didn't realize it. It does now, though!
Original 2013 answer
I just did this in a kludgy way, by listing each tid to be traced.
You can find them through ps:
$ ps auxw -T | fgrep program_to_trace
me pid tid1 ...
me pid tid2 ...
me pid tid3 ...
me pid tid4 ...
and then, according to man strace, you can attach to multiple pids at once:
-p pid Attach to the process with the process ID pid and begin tracing. The trace may be terminated at any time by a keyboard interrupt
signal (CTRL-C). strace will respond by detaching itself from the traced process(es) leaving it (them) to continue running. Mul‐
tiple -p options can be used to attach to up to 32 processes in addition to command (which is optional if at least one -p option is
given).
It says pid, but iirc on Linux the pid and tid share the same namespace, and this appeared to work:
$ strace -f -p tid1 -p tid2 -p tid3 -p tid4
I think that might be the best you can do for now. But I suppose someone could extend strace with a flag for expanding tids. There would probably still be a race between finding the processes and attaching to them in which a freshly started one would be missed. It'd fit in with the existing caveat about strace -f:
-f Trace child processes as they are created by currently traced processes as a result of the fork(2) system call.
On non-Linux platforms the new process is attached to as soon as its pid is known (through the return value of fork(2) in the par‐
ent process). This means that such children may run uncontrolled for a while (especially in the case of a vfork(2)), until the par‐
ent is scheduled again to complete its (v)fork(2) call. On Linux the child is traced from its first instruction with no delay. If
the parent process decides to wait(2) for a child that is currently being traced, it is suspended until an appropriate child
process either terminates or incurs a signal that would cause it to terminate (as determined from the child's current signal dispo‐
sition).
On SunOS 4.x the tracing of vforks is accomplished with some dynamic linking trickery.
As answered in multiple comments, strace -fp <pid> will show the trace of all threads owned by that process - even ones that process already has spawned before strace begins.

Resources