can you use multiple threads to ptrace an application? - linux

I am writing a GUI oriented debugger which targets Linux primarily, but I plan ports to other OSes in the future. Because the GUI must stay interactive at all times, I have a few threads handling different things.
Primarily I have a "debug event" thread which simply loops waiting for waitpid to return and delivers the received events to the other threads. I do this because waitpid does not have a timeout, which makes it very hard to integrate it with other event loops and keep things responsive (waitpid can hang indefinitely!).
This strategy has worked wonderfully for the Linux builds so far. Lately I've been trying to make my debugger thread aware (as in the threads in the debugged application, not the debugger itself).
So I set the ptrace options to follow clone events and look for a status which has the upper 16-bit set to PTRACE_EVENT_CLONE. Then I use PTRACE_GETEVENTMSG to get the TID of the new thread. This all works nicely in my small test harness applications. But for some reason, it is failing when i put that code in my actual debugger. (I get a "No such process" error code)
The one thing that occurred to me is that Windows has a rule that only the thread which attached to an application can listen for debug events. Does Linux's ptrace have a similar limitation? If so, why does my code work for other debug events?
It seems that at the very least waitpid supports waiting from a different thread, the man page says:
Before Linux 2.4, a thread was just a
special case of a process, and as a
consequence one thread could not wait on the
children of another thread, even when
the latter belongs to the same thread
group. However, POSIX prescribes
such functionality, and since Linux 2.4 a
thread can, and by default
will, wait on children of other
threads in the same thread group.
So at most this is a ptrace limitation.

I had the same issue (plus many others!) while implementing the Linux-specific part of the Maxine VM debugger. You are correct in your guess that only one thread in the debugger can use ptrace to control the debuggee. We accomplish this by making all calls to ptrace on a dedicated thread. You may find it useful to look at the, linuxTask.h and linuxTask.c files in the Maxine sources available at

As far as I can tell, this is not allowed. A task cannot use ptrace on a task which it has not attached. Also, a task can be traced by at most one other task, so you can't simply attach it once in each thread. I think this is because when one task attaches to another task, the tracing task becomes the parent of the traced task, and each task can only have one parent.
It seems like multi-thread tracing ought to be allowed because the threads are part of the same process, but implementation-wise, there isn't actually much distinction between threads and processes in the Linux kernel. A thread is just a task that happens to share most of its resources with another task.
If you're interested, you can browse the source code for ptrace in the kernel. Specifically look at ptrace_check_attach, which is called by sys_ptrace for most requests. It returns -ESRCH (sounds like the error code you're getting) if the target task's parent is not the current task.


How to detect if a linux thread is crashed

I've this problem, I need to understand if a Linux thread is running or not due to crash and not for normal exit. The reason to do that is try to restart the thread without reset\restart all system.
The pthread_join() seems not a good option because I've several thread to monitoring and the function return on specific thread, It doesn't work in "parallel". At moment I've a keeep live signal from thread to main but I'm looking for some system call or thread attribute to understand the state
Any suggestion?
Thread "crashes"
How to detect if a linux thread is crashed
if (0) //...
That is, the only way that a pthreads thread can terminate abnormally while other threads in the process continue to run is via thread cancellation,* which is not well described as a "crash". In particular, if a signal is received whose effect is abnormal termination then the whole process terminates, not just the thread that handled the signal. Other kinds of errors do not cause threads to terminate.
On the other hand, if by "crash" you mean normal termination in response to the thread detecting an error condition, then you have no limitation on what the thread can do prior to terminating to communicate about its state. For example,
it could update a shared object that tracks information about your threads
it could write to a pipe designated for the purpose
it could raise a signal
If you like, you can use pthread_cleanup_push() to register thread cleanup handlers to help with that.
On the third hand, if you're asking about detecting live threads that are failing to make progress -- because they are deadlocked, for example -- then your best bet is probably to implement some form of heartbeat monitor. That would involve each thread you want to monitor periodically updating a shared object that tracks the time of each thread's last update. If a thread goes too long between beats then you can guess that it may be stalled. This requires you to instrument all the threads you want to monitor.
Thread cancellation
You should not use thread cancellation. But if you did, and if you include termination because of cancellation in your definition of "crash", then you still have all the options above available to you, but you must engage them by registering one or more cleanup handlers.
GNU-specific options
The main issues with using pthread_join() to check thread state are
it doesn't work for daemon threads, and
pthread_join() blocks until the specified thread terminates.
For daemon threads, you need one of the approaches already discussed, but for ordinary threads on GNU/Linux, Glibc provides non-standard pthread_tryjoin_np(), which performs a non-blocking attempt to join a thread, and also pthread_timedjoin_np(), which performs a join attempt with a timeout. If you are willing to rely on Glibc-specific functions then one of these might serve your purpose.
Linux-specific options
The Linux kernel makes per-process thread status information available via the /proc filesystem. See How to check the state of Linux threads?, for example. Do be aware, however, that the details vary a bit from one kernel version to another. And if you're planning to do this a lot, then also be aware that even though /proc is a virtual filesystem (so no physical disk is involved), you still access it via slow-ish I/O interfaces.
Any of the other alternatives is probably better than reading files in /proc. I mention it only for completeness.
I'm looking for some system call or thread attribute to understand the state
The pthreads API does not provide a "have you terminated?" function or any other such state-inquiry function, unless you count pthread_join(). If you want that then you need to roll your own, which you can do by means of some of the facilities already discussed.
*Do not use thread cancellation.

Multi-threaded fork()

In a multi-threaded application, if a thread calls fork(), it will copy the state of only that thread. So the child process created would be a single-thread process. If some other thread were to hold a lock required by the thread which called the fork(), that lock would never be released in the child process. This is a problem.
To counter this, we can modify the fork() in two ways. Either we can copy all the threads instead of only that single one. Or we can make sure that any lock held by the (other) non-copied threads will be released. So what will be the modified fork() system call in both these cases. And which of these two would be better, or what would be the advantages and disadvantages of either option?
This is a thorny question.
POSIX has pthread_atfork() to work through the mess of mixing forks and thread creation. The NOTES section of that man page discusses mutexes etc. However, it acknowledges that getting it right is hard.
The function isn't so much an alternative to fork() as it is a way to explain to the pthread library how your program needs to be prepared for the use of fork().
In general not trying to launch a thread from the child of fork but either exiting that child or calling exec asap, will minimize problems.
This post has a good discussion of pthread_atfork().
...Or we can make sure that any lock held by the (other) non-copied threads will be released.
That's going to be harder than you realize because a program can implement "locks" entirely in user-mode code, in which case, the OS would have no knowledge of them.
Even if you were careful only to use locks that were known to the OS you still have a more general problem: Creating a new process with just the one thread would effectively be no different from creating a new process with all of the threads and then immediately killing all but one of them.
Read about why we don't kill threads. In a nutshell: Locks aren't the only state that needs to be cleaned up. Any of the threads that existed in the parent but not in the child could, at the moment of the fork call, been in the middle of making a mess that needs to be cleaned up. If that thread doesn't exist in the child, then you've lost the knowledge of what needs to be cleaned up.
we can copy all the threads instead of only that single one...
That also is a potential problem. The one thread that calls fork() would know when and why fork() was called, and it would be prepared for the fork call. None of the other threads would have any warning. And, if any of those threads is interacting with something outside of the process (e.g., talking to a remote service) then,where you previously had one client talking to the service, you suddenly have two clients, talking to the same service, and they both think that they are the only one. That's not going to end well.
Don't call fork() from multi-threaded programs.
In one project I worked on: We had a big multi-threaded program that needed to spawn other processes. How we did it is, we had it spawn a simple, single-threaded "helper" program before it created any new threads. Then, whenever it needed to spawn another process, it sent a message to the helper, and the helper did it.

Forking vs Threading

I have used threading before in my applications and know its concepts well, but recently in my operating system lecture I came across fork(). Which is something similar to threading.
I google searched difference between them and I came to know that:
Fork is nothing but a new process that looks exactly like the old or the parent process but still it is a different process with different process ID and having it’s own memory.
Threads are light-weight process which have less overhead
But, there are still some questions in my mind.
When should you prefer fork() over threading and vice-verse?
If I want to call an external application as a child, then should I use fork() or threads to do it?
While doing google search I found people saying it is bad thing to call a fork() inside a thread. why do people want to call a fork() inside a thread when they do similar things?
Is it True that fork() cannot take advantage of multiprocessor system because parent and child process don't run simultaneously?
The main difference between forking and threading approaches is one of operating system architecture. Back in the days when Unix was designed, forking was an easy, simple system that answered the mainframe and server type requirements best, as such it was popularized on the Unix systems. When Microsoft re-architected the NT kernel from scratch, it focused more on the threading model. As such there is today still a notable difference with Unix systems being efficient with forking, and Windows more efficient with threads. You can most notably see this in Apache which uses the prefork strategy on Unix, and thread pooling on Windows.
Specifically to your questions:
When should you prefer fork() over threading and vice-verse?
On a Unix system where you're doing a far more complex task than just instantiating a worker, or you want the implicit security sandboxing of separate processes.
If I want to call an external application as a child, then should I use fork() or threads to do it?
If the child will do an identical task to the parent, with identical code, use fork. For smaller subtasks use threads. For separate external processes use neither, just call them with the proper API calls.
While doing google search I found people saying it is bad thing to call a fork() inside a thread. why do people want to call a fork() inside a thread when they do similar things?
Not entirely sure but I think it's computationally rather expensive to duplicate a process and a lot of subthreads.
Is it True that fork() cannot take advantage of multiprocessor system because parent and child process don't run simultaneously?
This is false, fork creates a new process which then takes advantage of all features available to processes in the OS task scheduler.
A forked process is called a heavy-weight process, whereas a threaded process is called light-weight process.
The following are the difference between them:
A forked process is considered a child process whereas a threaded process is called a sibling.
Forked process shares no resource like code, data, stack etc with the parent process whereas a threaded process can share code but has its own stack.
Process switching requires the help of OS but thread switching it is not required
Creating multiple processes is a resource intensive task whereas creating multiple thread is less resource intensive task
Each process can run independently whereas one thread can read/write another threads data.
Thread and process lecture
fork() spawns a new copy of the process, as you've noted. What isn't mentioned above is the exec() call which often follows. This replaces the existing process with a new process (a new executable) and as such, fork()/exec() is the standard means of spawning a new process from an old one.
e.g. that's how your shell will invoke a process from the command line. You specify your process (ls, say) and the shell forks and then execs ls.
Note that this operates at a very different level from threading. Threading runs multiple lines of execution intra-process. Forking is a means of creating new processes.
As #2431234123412341234123 said, on Linux thanks to COW, processes are not much heavier than threads and boils down to their usage. COW - copy on write means that a memory page of the forked process gets copied only when forked process makes changes to it, otherwise OS keeps redirecting it to pages of the parent process.
From a programming use case, let us say in the heap memory you have a big data structure a 2d array[2000000][100] (200 mb), and the page size of the kernel is around 4 kb. When the process is forked, no new memory for this array will be allocated. If one particular row (100 bytes) is changed (in either parent process or child), only the corresponding page (4 kb or 8kb if it is overlapping in two pages) will be copied and updated for the forked thread.
Other memory portions of memory work in forked processes same as threads (code is same, registers and call stack are separate).
On Windows as #Niels Keurentjes said, thrads might be better from a performance view, but on Linux it is more of use case.

Starting and stopping a forked process

Is it possible for a parent process to start and stop a child (forked) process in Unix?
I want to implement a task scheduler (see here) which is able to run multiple processes at the same time which I believe requires either separate processes or threads.
How can I stop the execution of a child process and resume it after a given amount of time?
(If this is only possible with threads, how are threads implemented?)
You could write a simple scheduler imitation using signals.
If you have the permissions, then stop signal (SIGSTOP) stops the execution of a process, and continue signal (SIGCONT) continues it.
With signals you would not have any fine grained control on the "scheduling",
but I guess OS grade scheduler is not the purpose of this execersice any way.
Check kill (2) and signal (7) manual pages.
There are also many guides to using Unix signals in the web.
You can use signals, but in the usual UNIX world it's probably easier to use semaphores. Once you set the semaphore to not let the other process proceed, the scheduler will swap it out in the normal course of things; when you clear the semaphore, it will become ready to run again.
You can do the exact same thing with threads of course; the only dramatic difference is you save a heavyweight context switch.
Just a side note: If you are using signal(), the behavior may be different on different unixes. If you are using Linux, check the "Portability" section of the signal manpage, and the sigaction manpage, which is preferred.

How independent are threads inside the same process?

Now, this might be a very newbie question, but I don't really have experience with multithreaded programming and I haven't fully understood how threads work compared to processes.
When a process on my machine hangs, say it's waiting for some IO that never comes or something similar, I can kill and restart it because other processes aren't affected and can, for example, still operate my terminal. This is very obvious, of course.
I'm not sure whether it is the same with threads inside a process: If one hangs, are the others unaffected? In other words, can I run a "watchdog" thread which supervises the other threads and, for example kill and recreate hanging threads? For example, if I have a threadpool that I don't want to be drained by occasional hangups.
Threads are independent, but there's a difference between a process and a thread, and that is that in the case of processes, the operating system does more than just "kill" it. It also cleans up after it.
If you start killing threads that seems to be hung, most likely you'll leave resources locked and similar, something that the operating system would close for you if you did the same to a process.
So for instance, if you open a file for writing, and start producing data and write it to the file, and this thread now hangs, for whatever reason, killing the thread will leave the file still open, and most likely locked, up until you close the entire program.
So the real answer to your question is: No, you can not kill threads the hard way.
If you simply ask a thread to close, that's different because then the thread is still in control and can clean up and close resources before terminating, but calling an API function like "KillThread" or similar is bad.
If a thread hangs, the others will continue executing. However, if the hung thread has locked a semaphore, critical section or other kind of synchronization object, and another thread attempts to lock the same synchronization object, you now have a deadlock with two dead threads.
It is possible to monitor other threads from a thread. Depending on your platform, there are appliable API's: I refer you to those as you haven't stated what OS you are writing for.
You didn't mention about the platform, but as far as I'm concerned, NT kernel schedules threads, not processes and threats them independently in that manner. This might not be and is not true on other platforms (some platforms, like Windows 3.1, do not use preemptive multithreading and if one thread goes in infinite loop, everything is affected).
The simple answer is yes.
Typically though code in a thread will handle this likely hood itself. Most commonly many APIs that perform operations that may hang will have timeout features of their own.
Alternatively a thread will wait on not just an the operation that might hang but also a timer. If the timer signals first its assummed the operation has hung.
Since for a watch dog thread to be useful in this scenario would need some co-operation from code in the other threads having the threads themselves set timeouts makes more sense than a watchdog.
Threads get scheduled independent of each other. So you could indeed stop and restart hanging threads. Threads do not run in a separate address-space so a misbehaving thread can still overwrite memory or take locks needed by other threads in the same process.
There's a pretty good overview of some of the pitfalls of killing and suspending threads in the Java documentation explaining why the methods that do it are deprecated. Basically, if you expect to be able to kill a thread, you have to be very, very careful to make it work without some sort of corruption. If a thread is hung it's probably because of a which case killing it will probably result in corruption.
If you need to be able to kill things, use processes.
