strace init process (PID 1) in linux - linux

The strace manpage says:
On Linux, exciting as it would be, tracing the init process is
forbidden.
I checked the same and it doesn't allow it:
$ strace -p 1
attach: ptrace(PTRACE_ATTACH, ...): Operation not permitted
Why isn't it possible? Even the ptrace manpage says the same about tracing init process. Aren't these tools safe or just that the init process is deemed too special that no other processes (strace/ptrace) can signal it.

sudo strace -p 1 works for me ( you need root privileges for strace )
There was work to allow debugging of init. In 2.4.37 you can't attach to init, but in some kernel this condition was removed - I've found 3.8 kernel
Edit: on my Kubuntu 15.10 there is no On Linux, exciting as it would be, tracing the init process is forbidden. in strace man. Updated man?

Related

Sudo gets separate PID when starting a command

I do not understand why sudo gets a separate PID (e.g. 1620) while starting dockerd (e.g. 1628) with sudo? To which PID should I send SIGTERM to stop the dockerd?
ps aux | grep dockerd
pstree -ps
I do not understand why sudo gets a separate PID (e.g. 1620) while starting dockerd (e.g. 1628) with sudo?
It is just the way that sudo works. It runs the command as a child process because it needs to do things after the child process exits.
You may be able to tweak the sudo configs to so that sudo doesn't fork a child process. On my system, man sudo says:
"If no I/O logging plugins are loaded and the policy plugin has not defined a close() function, set a command timeout or required that the command be run in a new pty, sudo may execute the command directly instead of running it as a child process."
But notice that :
it says may rather than will, and
you are necessarily sacrificing some functionality to achieve this "no fork" behavior.
To which PID should I send SIGTERM to stop the dockerd?
You can send signals to the sudo process and they will be relayed to the dockerd process. That's what man sudo says. Look in the man page's section on signal handling.

/bin/sh: can't access tty; job control turned off

I have been following the commands from the book "Mastering embedded linux programming" by Chris Simmonds. I have created the toolchain, kernel zImage and busybox file system. When I combine these together to run on QEMU, It should display a root shell prompt.
When I run the command, I get the following.
/bin/sh: can't access tty; job control turned off
input: ImExPS/2 Generic Explorer Mouse as /devices/platform/amba/amba:fpga/10007000.kmi/serio1/input/input2
When I press enter, I am able to see the root shell prompt and I am able to execute simple shell commands.
However, when I press exit, I get the following error.
Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000000
CPU: 0 PID: 1 Comm: sh Not tainted 4.9.13 #1
Hardware name: ARM-Versatile (Device Tree Support)
[<c001b5a4>] (unwind_backtrace) from [<c0018860>] (show_stack+0x10/0x14)
[<c0018860>] (show_stack) from [<c00737f4>] (panic+0xb8/0x230)
[<c00737f4>] (panic) from [<c0024e24>] (do_exit+0x8e8/0x938)
[<c0024e24>] (do_exit) from [<c0025cf8>] (do_group_exit+0x38/0xb4)
[<c0025cf8>] (do_group_exit) from [<c0025d84>] (__wake_up_parent+0x0/0x18)
---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000000
How do I resolve this
EDIT:
The following is the QEMU command that I ran
QEMU_AUDIO_DRV=none \qemu-system-arm
-m 256M -nographic
-M versatilepb
-kernel ~/linux-4.9.13/arch/arm/boot/zImage
\-append "console=ttyAMA0,115200 rdinit=/bin/sh"
-dtb ~/linux-4.9.13/arch/arm/boot/dts/versatile-pb.dtb
-initrd ~/busybox/initramfs.cpio.gz
(You don't say what your QEMU command line is.)
These error messages generally are what you should expect if you tell the kernel to run /bin/sh directly as its process 1 (eg with "init=/bin/sh" on the kernel command line). First the shell complains that it doesn't have a tty, but it can continue anyway with some facilities disabled. Then, when you eventually tell the shell to exit, because the shell itself is process 1 the kernel complains. (Usually process 1 is supposed to be an "init" program, which runs forever and deals with starting other processes in the system. If "init" ever dies the kernel has nothing else it can do.)
If you were intending to run /bin/sh as your process 1, then this is all normal. If you didn't want to do that, then you have an issue with either your root filesystem or with your command line which means that it isn't properly starting an /sbin/init in the guest, and you should look at why.

strace on Linux not logging all calls to open()

I am using strace to capture calls to open(), close() and read() on Linux. The target process is the jetty web server. As far as I can tell, strace is not logging all calls to open(). Maybe the others too, I have not tried to correlate the file descriptors to open() calls.
For example, starting strace:
strace -f -e trace=open,close,read -o/tmp/strace.out -p62881
I then use wget to fetch 100 static files; all were retrieved successfully. In one run, only 56 open events were logged; on another run of 100 different files, I got 66 open events.
I believe that using "-f" results in strace attaching to all the LWPIDs for the threads ("Process 62881 attached with 25 threads - interrupt to quit
"); when I try to explicitly attach to all using multiple "-p" options, I get a single "attach" success message, but multiple "Operation not permitted messages", one for each child PID.
I restarted Jetty to clear its cache before my tests.
Kernel version is 2.6.32-504.3.3.el6.x86_64 (Red Hat). Strace package version is strace-4.5.19-1.19.el6.x86_64.
What am I missing?
Thanks
On some systems you have to use openat() instead of open().
Try:
strace -f -e trace=openat,close,read -o/tmp/strace.out -p62881
Try -ff (in addition to -f):
-ff: If the -o filename option is in effect, each processes trace is written to filename.pid where pid is
the numeric process id of each process. This is incompatible with -c, since no per-process counts are
kept.

Stracing to attach to a multi-threaded process

If I want to strace a multi-threaded process (of all of its threads), how should I do it?
I know that one can do strace -f to follow forked process? But how about attaching to a process which is already multi-threaded when I start stracing? Is a way to tell strace to trace all of system calls of all the threads which belong to this process?
2021 update
strace -fp PID just does the right thing on my system (Ubuntu 20.04.1 LTS). The strace manual page points this out:
-f Trace child processes as they are created by currently traced processes as a result of the fork(2), vfork(2) and clone(2) system
calls. Note that -p PID -f will attach all threads of process PID if it is multi-threaded, not only thread with thread_id = PID.
Looks like this text was added back in 2013. If -f had this behavior on my system at the time, I didn't realize it. It does now, though!
Original 2013 answer
I just did this in a kludgy way, by listing each tid to be traced.
You can find them through ps:
$ ps auxw -T | fgrep program_to_trace
me pid tid1 ...
me pid tid2 ...
me pid tid3 ...
me pid tid4 ...
and then, according to man strace, you can attach to multiple pids at once:
-p pid Attach to the process with the process ID pid and begin tracing. The trace may be terminated at any time by a keyboard interrupt
signal (CTRL-C). strace will respond by detaching itself from the traced process(es) leaving it (them) to continue running. Mul‐
tiple -p options can be used to attach to up to 32 processes in addition to command (which is optional if at least one -p option is
given).
It says pid, but iirc on Linux the pid and tid share the same namespace, and this appeared to work:
$ strace -f -p tid1 -p tid2 -p tid3 -p tid4
I think that might be the best you can do for now. But I suppose someone could extend strace with a flag for expanding tids. There would probably still be a race between finding the processes and attaching to them in which a freshly started one would be missed. It'd fit in with the existing caveat about strace -f:
-f Trace child processes as they are created by currently traced processes as a result of the fork(2) system call.
On non-Linux platforms the new process is attached to as soon as its pid is known (through the return value of fork(2) in the par‐
ent process). This means that such children may run uncontrolled for a while (especially in the case of a vfork(2)), until the par‐
ent is scheduled again to complete its (v)fork(2) call. On Linux the child is traced from its first instruction with no delay. If
the parent process decides to wait(2) for a child that is currently being traced, it is suspended until an appropriate child
process either terminates or incurs a signal that would cause it to terminate (as determined from the child's current signal dispo‐
sition).
On SunOS 4.x the tracing of vforks is accomplished with some dynamic linking trickery.
As answered in multiple comments, strace -fp <pid> will show the trace of all threads owned by that process - even ones that process already has spawned before strace begins.

I don't get coredump with all process

I try to get a coredump, so i use :
ulimit -c unlimited
I run my program in background, and I kill it :
kill -SEGV %1
But i just get :
[1]+ Exit 1 ./Test
And no coredumps are created.
I did the same with other programs and it works, so why that didn't work with all ? Anybody can help me ?
Thanks. (GNU/Linux, Debian 2.6.26)
If your program traps the SEGV signal and does something else, it won't invoke the OS core dump routine. Check that it doesn't do that.
Under Linux, processes which change their user ID using setuid, seteuid or some other parameters get excluded from dumping core for security reasons (Think: /bin/passwd dumps core while reading /etc/shadow into memory)
You can re-enable dumping core on Linux programs which change their user ID by calling prctl() after the change of UID
Also you might want to check that the program you're running is not changing its working directory ( chdir() ), because then it will create the core file in a different directory than the one you're running it from.
And you can try this too:
kill -ABRT pid
Try (as root):
sysctl kernel.core_pattern=core
and then repeat your experiment. On some systems that variable is set to /dev/null by default.
However, if you see exit status 1, perhaps the program indeed intercepts the signal.

Resources