I think threads and subroutines are not directly comparable, but I'm designing a system that many class instances communicating with each other and I'm not sure how exactly I should do it.
Let's say I have a class that receives a command, execute the command, and return the result. To execute commands, I can make methods to handle them, or I can create threads for each command and forward the message to an appropriate thread and have the thread handle the command.
Is there any criteria to choose a certain design? like real-time, or concurrency issue? An can the number of threads be a problem if it is too large?
Thank you.
Related
I need to write a simple function which does the following in linux:
Create two processes.
Have thread1 in Process1 do some small operation, and send a message to Process2 via thread2 once operation is completed.
*Process2 shall acknowledge the received message.
I have no idea where to begin
I have written two simple functions which simply count from 0 to 1000 in a loop (the loop is run in a function called by a thread) and I have compiled them to get the binaries.
I am executing these one after the other (both running in the background) from a shell script
Once process1 reaches 1000 in its loop, I want the first process to send a "Complete" message to the other.
I am not sure if my approach is correct on the process front and I have absolutely no idea how to communicate between these two.
Any help will be appreciated.
LostinSpace
You'd probably want to use pipes for this. Depending on how the processes are started, you either want named or anonymous pipes:
Use named pipes (aka fifo, man mkfifo) if the processes are started independently of each other.
Use anonymous pipes (man 2 pipe) if the processes are started by a parent process through forking. The parent process would create the pipes, the child processes would inherit them. This is probably the "most beautiful" solution.
In both cases, the end points of the pipes are used just like any other file descriptor (but more like sockets than files).
If you aren't familiar with pipes yet, I recommend getting a copy of Marc Rochkind's book "Advanced UNIX programming" where these techniques are explained in great detail and easy to understand example code. That book also presents other inter-process communication methods (the only really other useful inter-process communication method on POSIX systems is shared memory, but just for fun/completeness he presents some hacks).
Since you create the processes (I assume you are using fork()), you may want to look at eventfd().
eventfd()'s provide a lightweight mechanism to send events from one process or thread to another.
More information on eventfd()s and a small example can be found here http://man7.org/linux/man-pages/man2/eventfd.2.html.
Signals or named pipes (since you're starting the two processes separately) are probably the way to go here if you're just looking for a simple solution. For signals, your client process (the one sending "Done") will need to know the process id of the server, and for named pipes they will both need to know the location of a pipe file to communicate through.
However, I want to point out a neat IPC/networking tool that can make your job a lot easier if you're designing a larger, more robust system: 0MQ can make this kind of client/server interaction dead simple, and allows you to start up the programs in whatever order you like (if you structure your code correctly). I highly recommend it.
The Twisted documentation led me to believe that it was OK to combine techniques such as reactor.spawnProcess() and threads.deferToThread() in the same application, that the reactor would handle this elegantly under the covers. Upon actually trying it, I found that my application deadlocks. Using multiple threads by themselves, or child processes by themselves, everything is fine.
Looking into the reactor source, I find that the SelectReactor.spawnProcess() method simply calls os.fork() without any consideration for multiple threads that might be running. This explains the deadlocks, because starting with the call to os.fork() you will have two processes with multiple concurrent threads running and doing who knows what with the same file descriptors.
My question for SO is, what is the best strategy for solving this problem?
What I have in mind is to subclass SelectReactor, so that it is a singleton and calls os.fork() only once, immediately when instantiated. The child process will run in the background and act as a server for the parent (using object serialization over pipes to communicate back and forth). The parent continues to run the application and may use threads as desired. Calls to spawnProcess() in the parent will be delegated to the child process, which will be guaranteed to have only one thread running and can therefore call os.fork() safely.
Has anyone done this before? Is there a faster way?
What is the best strategy for solving this problem?
File a ticket (perhaps after registering) describing the issue, preferably with a reproducable test case (for maximum accuracy). Then there can be some discussion about what the best way (or ways - different platforms may demand different solution) to implement it might be.
The idea of immediately creating a child process to help with further child process creation has been raised before, to solve the performance issue surrounding child process reaping. If that approach now resolves two issues, it starts to look a little more attractive. One potential difficulty with this approach is that spawnProcess synchronously returns an object which supplies the child's PID and allows signals to be sent to it. This is a little more work to implement if there is an intermediate process in the way, since the PID will need to be communicated back to the main process before spawnProcess returns. A similar challenge will be supporting the childFDs argument, since it will no longer be possible to merely inherit the file descriptors in the child process.
An alternate solution (which may be somewhat more hackish, but which may also have fewer implementation challenges) might be to call sys.setcheckinterval with a very large number before calling os.fork, and then restore the original check interval in the parent process only. This should suffice to avoid any thread switching in the process until the os.execvpe takes place, destroying all the extra threads. This isn't entirely correct, since it will leave certain resources (such as mutexes and conditions) in a bad state, but you use of these with deferToThread isn't very common so maybe that doesn't affect your case.
The advice Jean-Paul gives in his answer is good, but this should work (and does in most cases).
First, Twisted uses threads for hostname resolution as well, and I've definitely used subprocesses in Twisted processes that also make client connections. So this can work in practice.
Second, fork() does not create multiple threads in the child process. According to the standard describing fork(),
A process shall be created with a single thread. If a multi-threaded process calls fork(), the new process shall contain a replica of the calling thread ...
Now, that's not to say that there are no potential multithreading issues with spawnProcess; the standard also says:
... to avoid errors, the child process may only execute async-signal-safe operations until such time as one of the exec functions is called ...
and I don't think there's anything to ensure that only async-signal-safe operations are used.
So, please be more specific as to your exact problem, since it isn't a subprocess with threads being cloned.
Returning to this issue after some time, I found that if I do this:
reactor.callFromThread(reactor.spawnProcess, *spawnargs)
instead of this:
reactor.spawnProcess(*spawnargs)
then the problem goes away in my small test case. There is a remark in the Twisted documentation "Using Processes" that led me to try this: "Most code in Twisted is not thread-safe. For example, writing data to a transport from a protocol is not thread-safe."
I suspect that the other people Jean-Paul mentioned were having this problem may be making a similar mistake. The responsibility is on the application to enforce that reactor and other API calls are being made within the correct thread. And apparently, with very narrow exceptions, the "correct thread" is nearly always the main reactor thread.
fork() on Linux definitely leaves the child process with only one thread.
I assume you are aware that, when using threads in Twisted, the ONLY Twisted API that threads are permitted to call is callFromThread? All other Twisted APIs must only be called from the main, reactor thread.
I am writing a PERL script involving multithreading. It has a GUI and the number of threads to be used will be taken as user input. Depending on this number, the script should generate threads which all access the same sub. I want the n threads to work in parallel. But when I create a loop, the parallel processing is lost. Any idea as to how to overcome this issue?
I believe that the simplest way to answer would be to recommend you to look at something like POE. The framework cookbook webpage provides many examples that surely will be a good starting point for your original issue.
Depending on your GUI platform, you may also want to spend time on event loops provided by the framework itself.
You probably need to call threads->yield() function occasionally in the processing loops. The yield() function gives a "hint" to give up the CPU for a thread.
I am experimenting with openMP for large data processing and relatively new to it. My input data set is huge and so I divided it into multiple subsets and each thread working on the subset of data. Each data item in the subset is acted up on by the thread. If one of the threads fails during its operation on any dataitem, i would like to terminate other threads and return a failure. Using shared variable is an option, but is there a better way to do the same ?
What do you want to happen if one of your threads chokes on its input ? Do you want to bring the program to a sudden halt ? Or do you want to stop the parallel execution and have a serial part of the program tidy up ?
OpenMP is not really designed for either kind of operation, so you're going to be struggling against it, rather than struggling with it as most beginners do. You certainly could, as you suggest, use shared variables and write your own logic to stop the program or jump to the end of the parallel region if one of the threads fails. But there are no OpenMP intrinsic mechanisms for these operations.
You might want to investigate the new task capabilities of OpenMP 3.0. Using these you can certainly implement a system whereby tasks are dispatched to threads which return either success or failure, and have other tasks which deal with failures.
Finally, one might argue that bad input should not cause a program to crash, but that's another topic.
I read many answers given here for questions related to thread safety, re-entrancy, but when i think about them, some more questions came to mind, hence this question/s.
1.) I have one executable program say some *.exe. If i run this program on command prompt, and while it is executing, i run the same program on another command prompt, then in what conditions the results could be corrupted, i.e. should the code of this program be re-entrant or it should be thread safe alone?
2.) While defining re-entrancy, we say that the routine can be re-entered while it is already running, in what situations the function can be re-entered (apart from being recursive routine, i am not talking recursive execution here). There has to be some thread to execute the same code again, or how can that function be entered again?
3.) In a practical case, will two threads execute same code, i.e. perform same functionality. I thought the idea of multi-threading is to execute different functionality, concurrently(on different cores/processors).
Sorry if these queries seem different, but they all occured to me, same time when i read about the threadsafe Vs reentrant post on SO, hence i put them together.
Any pointers, reading material will be appreciated.
thanks,
-AD.
I'll try to explain these, in order:
Each program runs in its own process, and gets its own isolated memory space. You don't have to worry about thread safety in this situation. (However, if the processes are both accessing some other shared resource, such as a file, you may have different issues. For example, process 1 may "lock" the data file, preventing process 2 from being able to open it).
The idea here is that two threads may try to run the same routine at the same time. This is not always valid - it takes special care to define a class or a process in a way that multiple threads can use the same instance of the same class, or the same static function, without errors occurring. This typically requires synchronization in the class.
Two threads often execute the same code. There are two different conceptual ways to parition your work when threading. You can either think in terms of tasks - ie: one thread does task A while another does task B. Alternatively, you can think in terms of decomposing the the problem based on data. In this case, you work with a large collection, and each element is processed using the same routine, but the processing happens in parallel. For more info, you can read this blog post I wrote on Decomposition for Parallelism.
Two processes cannot share memory. So thread-safety is moot here.
Re-entrancy means that a method can be safely executed by two threads at the same time. This doesn't require recursion - threads are separate units of execution, and there is nothing keeping them both from attempting to run the same method simultaneously.
The benefits to threading can happen in two ways. One is when you perform different types of operations concurrently (like running cpu-intensive code and I/O-intensive code at the ame time). The other is when you can divide up a long-running operation among multiple processors. In this latter case, two threads may be executing the same function at the same time on different input data sets.
First of all, I strongly suggest you to look at some basic stuffs of computer system, especially how a process/thread is executing on CPU and scheduled by operating system. For example, virtual address, context switching, process/thread concepts(e.g., each thread has its own stack and register vectors while heap is shared by threads. A thread is an execution and scheduling unit, so it maintains control flow of code..) and so on. All of the questions are related to understanding how your program is actually working on CPU
1) and 2) are already answered.
3) Multithreading is just concurrent execution of any arbitrary thread. The same code can be executed by multiple threads. These threads can share some data, and even can make data races which are very hard to find. Of course, many times threads are executing separate code(we say it as thread-level parallelism).
In this context, I have used concurrent as two meaning: (a) in a single processor, multiple threads are sharing a single physical processor, but operating system gives a sort of illusion that threads are running concurrently. (b) In a multicore, yes, physically two or more threads can be executed concurrently.
Having concrete understanding of concurrent/parallel execution takes quite long time. But, you already have a solid understanding!