Starting and stopping a forked process - multithreading

Is it possible for a parent process to start and stop a child (forked) process in Unix?
I want to implement a task scheduler (see here) which is able to run multiple processes at the same time which I believe requires either separate processes or threads.
How can I stop the execution of a child process and resume it after a given amount of time?
(If this is only possible with threads, how are threads implemented?)

You could write a simple scheduler imitation using signals.
If you have the permissions, then stop signal (SIGSTOP) stops the execution of a process, and continue signal (SIGCONT) continues it.
With signals you would not have any fine grained control on the "scheduling",
but I guess OS grade scheduler is not the purpose of this execersice any way.
Check kill (2) and signal (7) manual pages.
There are also many guides to using Unix signals in the web.

You can use signals, but in the usual UNIX world it's probably easier to use semaphores. Once you set the semaphore to not let the other process proceed, the scheduler will swap it out in the normal course of things; when you clear the semaphore, it will become ready to run again.
You can do the exact same thing with threads of course; the only dramatic difference is you save a heavyweight context switch.

Just a side note: If you are using signal(), the behavior may be different on different unixes. If you are using Linux, check the "Portability" section of the signal manpage, and the sigaction manpage, which is preferred.

Related

How to detect if a linux thread is crashed

I've this problem, I need to understand if a Linux thread is running or not due to crash and not for normal exit. The reason to do that is try to restart the thread without reset\restart all system.
The pthread_join() seems not a good option because I've several thread to monitoring and the function return on specific thread, It doesn't work in "parallel". At moment I've a keeep live signal from thread to main but I'm looking for some system call or thread attribute to understand the state
Any suggestion?
P
Thread "crashes"
How to detect if a linux thread is crashed
if (0) //...
That is, the only way that a pthreads thread can terminate abnormally while other threads in the process continue to run is via thread cancellation,* which is not well described as a "crash". In particular, if a signal is received whose effect is abnormal termination then the whole process terminates, not just the thread that handled the signal. Other kinds of errors do not cause threads to terminate.
On the other hand, if by "crash" you mean normal termination in response to the thread detecting an error condition, then you have no limitation on what the thread can do prior to terminating to communicate about its state. For example,
it could update a shared object that tracks information about your threads
it could write to a pipe designated for the purpose
it could raise a signal
If you like, you can use pthread_cleanup_push() to register thread cleanup handlers to help with that.
On the third hand, if you're asking about detecting live threads that are failing to make progress -- because they are deadlocked, for example -- then your best bet is probably to implement some form of heartbeat monitor. That would involve each thread you want to monitor periodically updating a shared object that tracks the time of each thread's last update. If a thread goes too long between beats then you can guess that it may be stalled. This requires you to instrument all the threads you want to monitor.
Thread cancellation
You should not use thread cancellation. But if you did, and if you include termination because of cancellation in your definition of "crash", then you still have all the options above available to you, but you must engage them by registering one or more cleanup handlers.
GNU-specific options
The main issues with using pthread_join() to check thread state are
it doesn't work for daemon threads, and
pthread_join() blocks until the specified thread terminates.
For daemon threads, you need one of the approaches already discussed, but for ordinary threads on GNU/Linux, Glibc provides non-standard pthread_tryjoin_np(), which performs a non-blocking attempt to join a thread, and also pthread_timedjoin_np(), which performs a join attempt with a timeout. If you are willing to rely on Glibc-specific functions then one of these might serve your purpose.
Linux-specific options
The Linux kernel makes per-process thread status information available via the /proc filesystem. See How to check the state of Linux threads?, for example. Do be aware, however, that the details vary a bit from one kernel version to another. And if you're planning to do this a lot, then also be aware that even though /proc is a virtual filesystem (so no physical disk is involved), you still access it via slow-ish I/O interfaces.
Any of the other alternatives is probably better than reading files in /proc. I mention it only for completeness.
Overall
I'm looking for some system call or thread attribute to understand the state
The pthreads API does not provide a "have you terminated?" function or any other such state-inquiry function, unless you count pthread_join(). If you want that then you need to roll your own, which you can do by means of some of the facilities already discussed.
*Do not use thread cancellation.

Linux fork, execve - no wait zombies

In Linux & C, will not waiting (waitpid) for a fork-execve launched process create zombies?
What is the correct way to launch a new program (many times) without waiting and without resource leaks?
It would also be launched from a 2nd worker thread.
Can the first program terminate first cleanly if launched programs have not completed?
Additional: In my case I have several threads that can fork-execve processes at ANY TIME and THE SAME TIME -
1) Some I need to wait for completion and want to report any errors codes with waitpid
2) Some I do not want to block the thread and but would like to report errors
3) Some I don't want to wait and don't care about the outcome and could run after the program terminates
For #2, should I have to create an additional thread to do waitpid ?
For #3, should I do a fork-fork-execve and would ending the 1st fork cause the 2nd process to get cleaned up (no zombie) separately via init ?
Additional: I've read briefly (not sure I understand all) about using nohup, double fork, setgpid(0,0), signal(SIGCHLD, SIG_IGN).
Doesn't global signal(SIGCHLD, SIG_IGN) have too many side effects like getting inherited (or maybe not) and preventing monitoring other processes you do want to wait for ?
Wouldn't relying on init to cleanup resources leak while the program continues to run (weeks in my case)?
In Linux & C, will not waiting (waitpid) for a fork-execve launched process create zombies?
Yes, they become zombies after death.
What is the correct way to launch a new program (many times) without waiting and without resource leaks? It would also be launched from a 2nd worker thread.
Set SIGCHLd to SIG_IGN.
Can the first program terminate first cleanly if launched programs have not completed?
Yes, orphaned processes will be adopted by init.
I ended up keeping an array of just the fork-exec'd pids I did not wait for (other fork-exec'd pids do get waited on) and periodically scanned the list using
waitpid( pids[xx], &status, WNOHANG ) != 0
which gives me a chance report outcome and avoid zombies.
I avoided using global things like signal handlers that might affect other code elsewhere.
It seemed a bit messy.
I suppose that fork-fork-exec would be an alternative to asynchronously monitor the other program's completion by the first fork, but then the first fork needs cleanup.
In Windows, you just keep a handle to the process open if you want to check status without worry of pid reuse, or close the handle if you don't care what the other process does.
(In Linux, there seems no way for multiple threads or processes to monitor the status of the same process safely, only the parent process-thread can, but not my issue here.)

When two threads run a specific process separately, will the program end when one thread returns the value?

Here's the scenario:
You have two threads (which represent different machines) who take the same input from a singular data source, run through the same processes (which do not depend on any shared resources) and return the same value.
If one thread (read:machine) is faster than the other and finishes first, will the program accept that value and end, or will it wait for the other thread to finish? If the answer is the latter, is there anyway to force the program to take the first answer?
The practical reason for this would be to handle unbearably slow machines.
This is entirely you to decide. If you spawn two threads, you can control them from the parent process and decide the behavior you want. You may wait on both threads, or wait until one of them is available (eg. using select from the parent thread or signal from the child thread), and possibly kill the other one (using a signal again, or kill).
For a very good reference on system programming (multi-processing, threads, communications, concurrency..), see Unix system programming in Objective Caml. It has an example (psearch here) where threads collaborate to find a result, and stop as soon as one of them succeeded.

can you use multiple threads to ptrace an application?

I am writing a GUI oriented debugger which targets Linux primarily, but I plan ports to other OSes in the future. Because the GUI must stay interactive at all times, I have a few threads handling different things.
Primarily I have a "debug event" thread which simply loops waiting for waitpid to return and delivers the received events to the other threads. I do this because waitpid does not have a timeout, which makes it very hard to integrate it with other event loops and keep things responsive (waitpid can hang indefinitely!).
This strategy has worked wonderfully for the Linux builds so far. Lately I've been trying to make my debugger thread aware (as in the threads in the debugged application, not the debugger itself).
So I set the ptrace options to follow clone events and look for a status which has the upper 16-bit set to PTRACE_EVENT_CLONE. Then I use PTRACE_GETEVENTMSG to get the TID of the new thread. This all works nicely in my small test harness applications. But for some reason, it is failing when i put that code in my actual debugger. (I get a "No such process" error code)
The one thing that occurred to me is that Windows has a rule that only the thread which attached to an application can listen for debug events. Does Linux's ptrace have a similar limitation? If so, why does my code work for other debug events?
EDIT:
It seems that at the very least waitpid supports waiting from a different thread, the man page says:
Before Linux 2.4, a thread was just a
special case of a process, and as a
consequence one thread could not wait on the
children of another thread, even when
the latter belongs to the same thread
group. However, POSIX prescribes
such functionality, and since Linux 2.4 a
thread can, and by default
will, wait on children of other
threads in the same thread group.
So at most this is a ptrace limitation.
I had the same issue (plus many others!) while implementing the Linux-specific part of the Maxine VM debugger. You are correct in your guess that only one thread in the debugger can use ptrace to control the debuggee. We accomplish this by making all calls to ptrace on a dedicated thread. You may find it useful to look at the LinuxTask.java, linuxTask.h and linuxTask.c files in the Maxine sources available at kenai.com/projects/maxine/sources/maxine/show
As far as I can tell, this is not allowed. A task cannot use ptrace on a task which it has not attached. Also, a task can be traced by at most one other task, so you can't simply attach it once in each thread. I think this is because when one task attaches to another task, the tracing task becomes the parent of the traced task, and each task can only have one parent.
It seems like multi-thread tracing ought to be allowed because the threads are part of the same process, but implementation-wise, there isn't actually much distinction between threads and processes in the Linux kernel. A thread is just a task that happens to share most of its resources with another task.
If you're interested, you can browse the source code for ptrace in the kernel. Specifically look at ptrace_check_attach, which is called by sys_ptrace for most requests. It returns -ESRCH (sounds like the error code you're getting) if the target task's parent is not the current task.

How independent are threads inside the same process?

Now, this might be a very newbie question, but I don't really have experience with multithreaded programming and I haven't fully understood how threads work compared to processes.
When a process on my machine hangs, say it's waiting for some IO that never comes or something similar, I can kill and restart it because other processes aren't affected and can, for example, still operate my terminal. This is very obvious, of course.
I'm not sure whether it is the same with threads inside a process: If one hangs, are the others unaffected? In other words, can I run a "watchdog" thread which supervises the other threads and, for example kill and recreate hanging threads? For example, if I have a threadpool that I don't want to be drained by occasional hangups.
Threads are independent, but there's a difference between a process and a thread, and that is that in the case of processes, the operating system does more than just "kill" it. It also cleans up after it.
If you start killing threads that seems to be hung, most likely you'll leave resources locked and similar, something that the operating system would close for you if you did the same to a process.
So for instance, if you open a file for writing, and start producing data and write it to the file, and this thread now hangs, for whatever reason, killing the thread will leave the file still open, and most likely locked, up until you close the entire program.
So the real answer to your question is: No, you can not kill threads the hard way.
If you simply ask a thread to close, that's different because then the thread is still in control and can clean up and close resources before terminating, but calling an API function like "KillThread" or similar is bad.
If a thread hangs, the others will continue executing. However, if the hung thread has locked a semaphore, critical section or other kind of synchronization object, and another thread attempts to lock the same synchronization object, you now have a deadlock with two dead threads.
It is possible to monitor other threads from a thread. Depending on your platform, there are appliable API's: I refer you to those as you haven't stated what OS you are writing for.
You didn't mention about the platform, but as far as I'm concerned, NT kernel schedules threads, not processes and threats them independently in that manner. This might not be and is not true on other platforms (some platforms, like Windows 3.1, do not use preemptive multithreading and if one thread goes in infinite loop, everything is affected).
The simple answer is yes.
Typically though code in a thread will handle this likely hood itself. Most commonly many APIs that perform operations that may hang will have timeout features of their own.
Alternatively a thread will wait on not just an the operation that might hang but also a timer. If the timer signals first its assummed the operation has hung.
Since for a watch dog thread to be useful in this scenario would need some co-operation from code in the other threads having the threads themselves set timeouts makes more sense than a watchdog.
Threads get scheduled independent of each other. So you could indeed stop and restart hanging threads. Threads do not run in a separate address-space so a misbehaving thread can still overwrite memory or take locks needed by other threads in the same process.
There's a pretty good overview of some of the pitfalls of killing and suspending threads in the Java documentation explaining why the methods that do it are deprecated. Basically, if you expect to be able to kill a thread, you have to be very, very careful to make it work without some sort of corruption. If a thread is hung it's probably because of a bug...in which case killing it will probably result in corruption.
http://java.sun.com/j2se/1.4.2/docs/guide/misc/threadPrimitiveDeprecation.html
If you need to be able to kill things, use processes.

Resources