stopping an attached thread asynchrously using ptrace - linux - linux

after attaching a pthread using its pid and manipulating the content of its debug registers, while waiting using waitpid(-1, &status, __WALL) ; I would like to be able to stop that thread and make additional manipulations (defining another breakpoint etc).
when I try sending a signal using kill() and waiting for the thread to be ready for additional ptrace requests, for just one target thread, it works fine. on the other hand, when the number of traced threads increase, i got stuck within waitpid() call and never get unblocked.
is there a safe and fast mechanism to stop an attached thread that is running for additional modifications?
cheers.

When sending a signal to a thread, do not use the pid. Sending a signal to a process (which is what you are doing) sends it to some random thread within that process, which is almost certainly not what you would like to do. The tool to send threads signals is ptrhread_kill.
That's where things become a little more hairy. The ptrace interface uses "thread ID" (or tid). These are framed in the same context as process IDs, i.e. - integers. pthread_kill, on the other hand, uses the pthread_t type, which is an opaque, and is not the same thing.
Since using ptrace means you are in dark magic land already, the simplest solution is to use tgkill. Just place your tid and pid in the relevant fields, and you're golden.
Of course, tgkill is not an exported function. You'll need to wrap it in syscall in order to invoke it.

Related

How to detect if a linux thread is crashed

I've this problem, I need to understand if a Linux thread is running or not due to crash and not for normal exit. The reason to do that is try to restart the thread without reset\restart all system.
The pthread_join() seems not a good option because I've several thread to monitoring and the function return on specific thread, It doesn't work in "parallel". At moment I've a keeep live signal from thread to main but I'm looking for some system call or thread attribute to understand the state
Any suggestion?
P
Thread "crashes"
How to detect if a linux thread is crashed
if (0) //...
That is, the only way that a pthreads thread can terminate abnormally while other threads in the process continue to run is via thread cancellation,* which is not well described as a "crash". In particular, if a signal is received whose effect is abnormal termination then the whole process terminates, not just the thread that handled the signal. Other kinds of errors do not cause threads to terminate.
On the other hand, if by "crash" you mean normal termination in response to the thread detecting an error condition, then you have no limitation on what the thread can do prior to terminating to communicate about its state. For example,
it could update a shared object that tracks information about your threads
it could write to a pipe designated for the purpose
it could raise a signal
If you like, you can use pthread_cleanup_push() to register thread cleanup handlers to help with that.
On the third hand, if you're asking about detecting live threads that are failing to make progress -- because they are deadlocked, for example -- then your best bet is probably to implement some form of heartbeat monitor. That would involve each thread you want to monitor periodically updating a shared object that tracks the time of each thread's last update. If a thread goes too long between beats then you can guess that it may be stalled. This requires you to instrument all the threads you want to monitor.
Thread cancellation
You should not use thread cancellation. But if you did, and if you include termination because of cancellation in your definition of "crash", then you still have all the options above available to you, but you must engage them by registering one or more cleanup handlers.
GNU-specific options
The main issues with using pthread_join() to check thread state are
it doesn't work for daemon threads, and
pthread_join() blocks until the specified thread terminates.
For daemon threads, you need one of the approaches already discussed, but for ordinary threads on GNU/Linux, Glibc provides non-standard pthread_tryjoin_np(), which performs a non-blocking attempt to join a thread, and also pthread_timedjoin_np(), which performs a join attempt with a timeout. If you are willing to rely on Glibc-specific functions then one of these might serve your purpose.
Linux-specific options
The Linux kernel makes per-process thread status information available via the /proc filesystem. See How to check the state of Linux threads?, for example. Do be aware, however, that the details vary a bit from one kernel version to another. And if you're planning to do this a lot, then also be aware that even though /proc is a virtual filesystem (so no physical disk is involved), you still access it via slow-ish I/O interfaces.
Any of the other alternatives is probably better than reading files in /proc. I mention it only for completeness.
Overall
I'm looking for some system call or thread attribute to understand the state
The pthreads API does not provide a "have you terminated?" function or any other such state-inquiry function, unless you count pthread_join(). If you want that then you need to roll your own, which you can do by means of some of the facilities already discussed.
*Do not use thread cancellation.

Multi-threaded fork()

In a multi-threaded application, if a thread calls fork(), it will copy the state of only that thread. So the child process created would be a single-thread process. If some other thread were to hold a lock required by the thread which called the fork(), that lock would never be released in the child process. This is a problem.
To counter this, we can modify the fork() in two ways. Either we can copy all the threads instead of only that single one. Or we can make sure that any lock held by the (other) non-copied threads will be released. So what will be the modified fork() system call in both these cases. And which of these two would be better, or what would be the advantages and disadvantages of either option?
This is a thorny question.
POSIX has pthread_atfork() to work through the mess of mixing forks and thread creation. The NOTES section of that man page discusses mutexes etc. However, it acknowledges that getting it right is hard.
The function isn't so much an alternative to fork() as it is a way to explain to the pthread library how your program needs to be prepared for the use of fork().
In general not trying to launch a thread from the child of fork but either exiting that child or calling exec asap, will minimize problems.
This post has a good discussion of pthread_atfork().
...Or we can make sure that any lock held by the (other) non-copied threads will be released.
That's going to be harder than you realize because a program can implement "locks" entirely in user-mode code, in which case, the OS would have no knowledge of them.
Even if you were careful only to use locks that were known to the OS you still have a more general problem: Creating a new process with just the one thread would effectively be no different from creating a new process with all of the threads and then immediately killing all but one of them.
Read about why we don't kill threads. In a nutshell: Locks aren't the only state that needs to be cleaned up. Any of the threads that existed in the parent but not in the child could, at the moment of the fork call, been in the middle of making a mess that needs to be cleaned up. If that thread doesn't exist in the child, then you've lost the knowledge of what needs to be cleaned up.
we can copy all the threads instead of only that single one...
That also is a potential problem. The one thread that calls fork() would know when and why fork() was called, and it would be prepared for the fork call. None of the other threads would have any warning. And, if any of those threads is interacting with something outside of the process (e.g., talking to a remote service) then,where you previously had one client talking to the service, you suddenly have two clients, talking to the same service, and they both think that they are the only one. That's not going to end well.
Don't call fork() from multi-threaded programs.
In one project I worked on: We had a big multi-threaded program that needed to spawn other processes. How we did it is, we had it spawn a simple, single-threaded "helper" program before it created any new threads. Then, whenever it needed to spawn another process, it sent a message to the helper, and the helper did it.

Does kill(SIGSTOP) take effect by the time kill() returns?

Suppose I have a parent process and a child process (started with e.g. fork() or clone()) running on Linux. Further suppose that there is some shared memory that both the parent and the child can modify.
Within the context of the parent process, I would like to stop the child process and know that it has actually stopped, and moreover that any shared memory writes made by the child are visible to the parent (including whatever synchronization or cache flushes that may require in a multi-processor system).
This answer, which speaks of using kill(SIGSTOP) to stop a child process, contains an interesting tidbit:
When the first kill() call succeeds, you can safely assume that the child has stopped.
Is this statement actually true, and if so, can anyone expound on it, or point me to some more detailed documentation (e.g. a Linux manpage)? Otherwise, is there another mechanism that I can use to ensure that the child process is completely stopped and is not going to be doing any more writes to the shared memory?
I'm imagining something along the lines of:
the parent sends a different signal (e.g. SIGUSR1), which the child can handle
the child handles the SIGUSR1 and does something like a pthread_cond_wait() in the signal handler to safely "stop" (though still running from the kernel perspective) -- this is not fully fleshed out in my mind yet, just an idea
I'd like to avoid reinventing the wheel if there's already an established solution to this problem. Note that the child process needs to be stopped preemptively; adding some kind of active polling to the child process is not an option in this case.
If it only existed on Linux, pthread_suspend() would be perfect ...
It definitely sounds like you should be using a custom signal with a handler, and not sigstop.
It's rare not to care about the state of the child at all, e.g. being fine with it having stored 32bits out of a single non-atomic 64bit write, or logically caught between two dependent writes.
Even if you are, POSIX allows the OS to not make shared writes immediately visible to other processes, so the child should have a chance to call msync for portability, to ensure that writes are completely synced.
The POSIX documentation on Signal Concepts strongly suggests, but does not explicitly say, that the targeted process will be STOPped by the time kill() returns:
A signal is said to be "generated" for (or sent to) a process or thread when the event that causes the signal first occurs... Examples of such events include ... invocations of the kill() and sigqueue() functions.
The documentation is at pains to distinguish signal generation from delivery (when the signal action takes effect) or acceptance. Unfortunately, it sometimes mentions actions taken in response to a stop signal upon generation, and sometimes upon delivery. Given that something must happen upon generation per se, I'd agree that the target process must be STOPped by the time your call returns.
However, at the cost of another syscall, you can be sure. Since you have a parent/child relationship in your design, you can waitpid()/WUNTRACED to receive notification that your child process has, indeed, STOPped.
Edit
See the other answer from that other guy [sic] for reasons why you might not want to do this.

When does a process handle a signal

I want to know when does a linux process handles the signal.
Assuming that the process has installed the signal handler for a signal, I wanted to know when would the process's normal execution flow be interrupted and signal handler called.
According to http://www.tldp.org/LDP/tlk/ipc/ipc.html, the process would handle the signal when it exits from a system call. This would mean that a normal instruction like a = b+c (or its equivalent machine code) would not be interrupted because of signal.
Also, there are system calls which would get interrupted (and fail with EINTR or get restarted) upon receiving a signal. This means that signal is processed even before the system call completes. This behaviour seems to b conflicting with what I have mentioned in the previous paragraph.
So, I am not clear as to when is the signal processed and in which process states would it be handled by the process. Can it be interrupted
Anytime it enters from kernel space to user space, or
Anytime it is in user space, or
Anytime the process is scheduled for execution by the scheduler
Thanks!
According to http://www.tldp.org/LDP/tlk/ipc/ipc.html, the process would handle the signal when it exits from a system call. This would mean that a normal instruction like a = b+c (or its equivalent machine code) would not be interrupted because of signal.
Well, if that were the case, a CPU-intensive process would not obey the process scheduler. The scheduler, in fact, can interrupt a process at any point of time when its time quantum has elapsed. Unless it is a FIFO real-time process.
A more correct definition: One point when a signal is delivered to the process is when the control flow leaves the kernel mode to resume executing user-mode code. That doesn't necessarily involve a system call.
A lot of the semantics of signal handling are documented (for Linux, anyway - other OSes probably have similar, but not necessarily in the same spot) in the section 7 signal manual page, which, if installed on your system, can be accessed like this:
man 7 signal
If manual pages are not installed, online copies are pretty easy to find.

Starting and stopping a forked process

Is it possible for a parent process to start and stop a child (forked) process in Unix?
I want to implement a task scheduler (see here) which is able to run multiple processes at the same time which I believe requires either separate processes or threads.
How can I stop the execution of a child process and resume it after a given amount of time?
(If this is only possible with threads, how are threads implemented?)
You could write a simple scheduler imitation using signals.
If you have the permissions, then stop signal (SIGSTOP) stops the execution of a process, and continue signal (SIGCONT) continues it.
With signals you would not have any fine grained control on the "scheduling",
but I guess OS grade scheduler is not the purpose of this execersice any way.
Check kill (2) and signal (7) manual pages.
There are also many guides to using Unix signals in the web.
You can use signals, but in the usual UNIX world it's probably easier to use semaphores. Once you set the semaphore to not let the other process proceed, the scheduler will swap it out in the normal course of things; when you clear the semaphore, it will become ready to run again.
You can do the exact same thing with threads of course; the only dramatic difference is you save a heavyweight context switch.
Just a side note: If you are using signal(), the behavior may be different on different unixes. If you are using Linux, check the "Portability" section of the signal manpage, and the sigaction manpage, which is preferred.

Resources