i have been trying to undertand the system calls, and want to understand how set_tid_address works. bascially from what i have read is that it returns the pid of the program or process which is executed.
I have tested this with ls, however with some commands like uptime, top etc i dont see set_tid_address being used. Why is that?
The clone() syscall can take a CLONE_CHILD_CLEARTID flag, that the value at child_tidptr (another clone() argument) gets cleared and an associated futex signal a wake-up when the child thread exits. This is used to implement pthread_join() (the parent thread waits on the futex).
set_tid_address() allows to pthread_join() on the initial thread. More information in the following LKML threads:
[patch] threading fix, tid-2.5.47-A3
[patch] user-vm-unlock-2.5.31-A2
As to why some programs call set_tid_address() and others don't, the answer is easy. Programs linked (directly or indirectly) to libpthread call set_tid_address. ls is linked to librt, which is linked to libpthread, so it runs the initialization for NPTL.
According to the Linux Programmer's Manual, set_tid_address is used to:
set pointer to thread ID
When it is finished, it returns the PID of the calling process. Unfortunately the manual is vague as to when you would actually want to use this system call.
In any case, why do you think that these commands are using set_tid_address?
Related
To verify the behavior of a third party binary distributed software I'd like to use, I'm implementing a kernel module whose objective is to keep track of each child this software produces and terminates.
The target binary is a Golang produced one, and it is heavily multi thread.
The kernel module I wrote installs hooks on the kernel functions _do_fork() and do_exit() to keep track of each process/thread this binary produces and terminates.
The LKM works, more or less.
During some conditions, however, I have a scenario I'm not able to explain.
It seems like a process/thread could terminate without passing through do_exit().
The evidence I collected by putting printk() shows the process creation but does not indicate the process termination.
I'm aware that printk() can be slow, and I'm also aware that messages can be lost in such situations.
Trying to prevent message loss due to slow console (for this particular application, serial tty 115200 is used), I tried to implement a quicker console, and messages have been collected using netconsole.
The described setup seems to confirm a process can terminate without pass through the do_exit() function.
But because I wasn't sure my messages couldn't be lost on the printk() infrastructure, I decided to repeat the same test but replacing printk() with ftrace_printk(), which should be a leaner alternative to printk().
Still the same result, occasionally I see processes not passing through the do_exit(), and verifying if the PID is currently running, I have to face the fact that it is not running.
Also note that I put my hook in the do_exit() kernel function as the first instruction to ensure the function flow does not terminate inside a called function.
My question is then the following:
Can a Linux process terminate without its flow pass through the do_exit() function?
If so, can someone give me a hint of what this scenario can be?
After a long debug session, I'm finally able to answer my own question.
That's not all; I'm also able to explain why I saw the strange behavior I described in my scenario.
Let's start from the beginning: monitoring a heavily multithreading application. I observed rare cases where a PID that suddenly stops exists without observing its flow to pass through the Linux Kernel do_exit() function.
Because this my original question:
Can a Linux process terminate without pass through the do_exit() function?
As for my current knowledge, which I would by now consider reasonably extensive, a Linux process can not end its execution without pass through the do_exit() function.
But this answer is in contrast with my observations, and the problem leading me to this question is still there.
Someone here suggested that the strange behavior I watched was because my observations were somehow wrong, alluding my method was inaccurate, as for my conclusions.
My observations were correct, and the process I watched didn't pass through the do_exit() but terminated.
To explain this phenomenon, I want to put on the table another question that I think internet searchers may find somehow useful:
Can two processes share the same PID?
If you'd asked me this a month ago, I'd surely answered this question with: "definitively no, two processes can not share the same PID."
Linux is more complex, though.
There's a situation in which, in a Linux system, two different processes can share the same PID!
https://elixir.bootlin.com/linux/v4.19.20/source/fs/exec.c#L1141
Surprisingly, this behavior does not harm anyone; when this happens, one of these two processes is a zombie.
updated to correct an error
The circumstances of this duplicate PID are more intricate than those described previously. The process must flush the previous exec context if a threaded process forks before invoking an execve (the fork copies also the threads). If the intention is to use the execve() function to execute a new text, the kernel must first call the flush_old_exec() function, which then calls the de_thread() function for each thread in the process other than the task leader. Except the task leader, all the process' threads are eliminated as a result. Each thread's PID is changed to that of the leader, and if it is not immediately terminated, for example because it needs to wait an operation completion, it keeps using that PID.
end of the update
That was what I was watching; the PID I was monitoring did not pass through the do_exit() because when the corresponding thread terminated, it had no more the PID it had when it started, but it had its leader's.
For people who know the Linux Kernel's mechanics very well, this is nothing to be surprised for; this behavior is intended and hasn't changed since 2.6.17.
Current 5.10.3, is still this way.
Hoping this to be useful to internet searchers; I'd also like to add that this also answers the followings:
Question: Can a Linux process/thread terminate without pass through do_exit()? Answer: NO, do_exit() is the only way a process has to end its execution — both intentional than unintentional.
Question: Can two processes share the same PID? Answer: Normally don't. There's some rare case in which two schedulable entities have the same PID.
Question: Do Linux kernel have scenarios where a process change its PID? Answer: yes, there's at least one scenario where a Process changes its PID.
Can a Linux process terminate without its flow pass through the do_exit() function?
Probably not, but you should study the source code of the Linux kernel to be sure. Ask on KernelNewbies. Kernel threads and udev or systemd related things (or perhaps modprobe or the older hotplug) are probable exceptions. When your /sbin/init of pid 1 terminates (that should not happen) strange things would happen.
The LKM works, more or less.
What does that means? How could a kernel module half-work?
And in real life, it does happen sometimes that your Linux kernel is panicking or crashes (and it could happen with your LKM, if it has not been peer-reviewed by the Linux kernel community). In such a case, there is no more any notion of processes, since they are an abstraction provided by a living Linux kernel.
See also dmesg(1), strace(1), proc(5), syscalls(2), ptrace(2), clone(2), fork(2), execve(2), waitpid(2), elf(5), credentials(7), pthreads(7)
Look also inside the source code of your libc, e.g. GNU libc or musl-libc
Of course, see Linux From Scratch and Advanced Linux Programming
And verifying if the PID is currently running,
This can be done is user land with /proc/, or using kill(2) with a 0 signal (and maybe also pidfd_send_signal(2)...)
PS. I still don't understand why you need to write a kernel module or change the kernel code. My intuition would be to avoid doing that when possible.
after attaching a pthread using its pid and manipulating the content of its debug registers, while waiting using waitpid(-1, &status, __WALL) ; I would like to be able to stop that thread and make additional manipulations (defining another breakpoint etc).
when I try sending a signal using kill() and waiting for the thread to be ready for additional ptrace requests, for just one target thread, it works fine. on the other hand, when the number of traced threads increase, i got stuck within waitpid() call and never get unblocked.
is there a safe and fast mechanism to stop an attached thread that is running for additional modifications?
cheers.
When sending a signal to a thread, do not use the pid. Sending a signal to a process (which is what you are doing) sends it to some random thread within that process, which is almost certainly not what you would like to do. The tool to send threads signals is ptrhread_kill.
That's where things become a little more hairy. The ptrace interface uses "thread ID" (or tid). These are framed in the same context as process IDs, i.e. - integers. pthread_kill, on the other hand, uses the pthread_t type, which is an opaque, and is not the same thing.
Since using ptrace means you are in dark magic land already, the simplest solution is to use tgkill. Just place your tid and pid in the relevant fields, and you're golden.
Of course, tgkill is not an exported function. You'll need to wrap it in syscall in order to invoke it.
What is the difference between exit() and exit_group(). Any process that has multiple threads should use exit_group instead of exit?
To answer the question why do you ask - we are having an process that has around forty threads. When a thread is locked up, we automatically exit the process and then restart the process. And we print the backtrace of the thread that was locked up. We wanted to know whether calling exit in this case is any different from exit_group.
From the docs: This system call is equivalent to exit(2) except that it terminates not only the calling thread, but all threads in the calling process's thread group - However, what is the difference between exiting the process and exiting all the threads. Isn't exiting process == exiting all the threads.
All thread libraries I know (e.g. recent glibc or musl-libc) are using the low-level clone(2) system call for their thread implementations (and some C libraries are even using clone to fork a process).
clone is a difficult Linux syscall. Unless you are a thread library implementor, you should not use it directly but only thru library functions (like e.g. pthread_create(3)); see also futex(7) used in pthread_mutex* functions
The clone syscall is used to create tasks: either threads (sharing address space in a multi-threaded process) or processes.
The exit_group syscall is related to exiting these tasks.
In short, you'll never use directly exit_group or clone. Your libc is doing that for you. So don't care about exit_group or _Exit; you should use the standard library function exit(3) only, which deals notably with atexit(3) & on_exit(3) registered handlers and flushes <stdio.h> buffers. In the rare cases you don't want that to happen, use _exit(2) (but you probably don't need that).
Of course, if you are reimplementing your own libc from scratch, you need to care about exit_group & clone; but otherwise you don't care about them..
If you care about gory implementation details, dive into the source code of your libc. Details may be libc-version, kernel-version, and compiler specific!
I'm here to ask you the difference between a process and a thread in linux. I know that a thread for linux is just a "task", which shares with the father process things that they need to have in common (the address space and other important informations). I also know that the two are creating calling the same function ('clone()'), but there's still something that I'm missing: what really happens when a thread exit? What function is called inside the linux kernel?
I know that when a process exits calls the do_exit function, but here or somewhere else there should be a way to understand if it is just a thread exiting or a whole process. Can you explain me this thing or redirect to some textbook?? I tried 'Understanding the linux kernel' but I was not satisfied with it.
I'm asking this thing because a need to add things to the task_struct struct, but I need to discriminate how to manage those informations for a process and its children.
Thank you.
The exit() syscall exits a single thread, and the exit_group() syscall exits the entire POSIX process ("thread group").
The main difference between processes and threads is that proceses run in their own virtual memory space, apart from every other process. That means two processes cannot access each other's data. The only way for two processes to interact is through the operating system somehow (shared memory sections, semaphores, sockets, etc.).
Threads on the other hand all exist within their creating process. That means threads have access to all the same data (variables, pointers, handles, etc.) that any other thread in the same process has. That is the main difference.
There are some implications of this. For instance, when the process terminates for some reason, all its threads go with it. It is also a lot easier to get multi-processing errors like torn data in threads, just because nothing is forcing you to use the OS syncronization functions that you really ought to be using.
I'm working on a memory tracking library where we use mprotect to remove access to most of a program's memory and a SIGSEGV handler to restore access to individual pages as the program touches them. This works great most of the time.
My problem is that when the program invokes a system call (say read) with memory that my library has marked no access, the system call just returns -1 and sets errno to EFAULT. This changes behavior of the programs being tested in strange ways. I would like to be able to restore access to each page of memory given to a system call before it actually goes to the kernel.
My current approach is to create a wrapper for each system call that touches memory. Each wrapper would touch all the memory given to it before handing it off to the real system call. It seems like this will work for calls made directly from the program, but not for those made by libc (for instance, fread will call read directly without using my wrapper). Is there any better approach? How is it possible to get this behavior?
You can use ptrace(2) to achieve this. It allows you to monitor a process and get told whenever certain events occur. For your purposes, look at PTRACE_SYSCALL which allows you to stop the process upon syscall entry and exit.
You will have to change some of your memory tracking infrastructure, however, as ptrace operates such that a parent process monitors a child process, and as far as the child is concerned it doesn't have visibility of when a monitored event occurs. Having said that, you should be able to do something along the lines of:
Setup ptrace parent and child, monitoring (at least) PTRACE_SYSCALL.
Child process does a syscall; and parent is notified.
Parent saves the requested syscall info; and uses PTRACE_GETREGS and PTRACE_SETREGS to change child state so instead of calling the syscall; the child process calls the 'memory unprotect' routine.
Child unprotect's it's memory; then raises SIGUSR1 or similar to tell controlling parent that the memory work is complete.
Parent catches SIGUSR, uses PTRACE_SETREGS to restore the previouly-saved syscall info and resumes the child.
Child resumes and executes the orignal syscall.