Do all threads really run the same program at the same time? - multithreading

I'm studying system programming and following the book "The Linux programming interface".
In section 29.9 and 29.10, the author said that
In a multithreaded application, all threads must be running the same program
(although perhaps in different functions). In a multiprocess application, different processes can run different programs.
and
In a multithreaded process, multiple threads are concurrently executing the same
program.
So what does it mean by saying the same program in this context? After reading this statement, it makes me feel like, all threads are doing the same mission at the same time !?

So what does it mean by saying the same program in this context?
It means the same executable code running in the same address space. All threads in a process share the processes address space. This means that they share:
the executable code,
the global variables,
the heap,
any memory segment shared with the file system or other processes.
That is basically the executing program.
(The threads even share all thread stacks; i.e. one thread is not prevented (by the OS) from looking at another thread's stack. However, I can't think of any good reason for one thread to look at another thread's stack.)
After reading this statement, it makes me feel like, all threads are doing the same mission at the same time !?
Well ... ummm ... yes and no.
They could literally be doing exactly the same thing, but they probably aren't. It possible that they are doing similar computations on different parts of a large problem. Or processing different requests. But if they had unrelated "missions" it probably wouldn't make sense for them to be part of the same program and/or process.
The way that I think about it is that threads and processes are both ways of doing more than one thing at a time.
Processes can't share objects, variables, etcetera. (Unless you use shared memory segments.) They typically don't trust each other or cooperate with each other.
Threads can (and typically do) share share objects, variables, etcetera. They need to trust each other and cooperate with each other.

In an operating system:
a thread is the unit of execution,
a process is the unit of isolation.
The program here, represents all the executable code that is loaded into memory and mapped into a process's virtual address space. Each thread can be executing a different part of that program, and maintains its own set of CPU register values and has its own stack -- although even its stack is technically stored in memory that all threads in the same process can access. But a thread does not have access to the code loaded into a different process, nor can it access memory from another process or use handles to any files or kernel objects (without explicitly sharing such things first by making a special request of the operating system to share them). Therefore the author is claiming that all threads in a process are executing the same program -- but they are not necessarily each on the exact same instruction of that program. Each thread maintains its own instruction pointer, which is just a virtual address of the memory location at which the next bytes of machine code to be executed are stored. One thread could be currently at instruction #3 of a program and another could be at instruction #103 of that same program -- which could be code in completely different functions.
Allow me to use an extended metaphor
Each thread is like a baker, preparing some food creation by following along step-by-step on a particular recipe. The process is like the kitchen. Since all the bakers are in the same kitchen, they have to share ovens, bowls, utensils, etc. And they actually have easy access to the entire kitchen, including things other bakers are using. A misbehaving baker could potentially put an ingredient into a bowl another baker was using. Or modify the temperature setting on the common oven. Bakers in the same kitchen have to coordinate access to shared resources and be careful not to accidentally interfere with another baker's work.
Bakers in different kitchens are isolated from each other. If in one kitchen they're making seafood, but someone has a life-threatening seafood allergy, then it would be best to have their food prepared in a completely different kitchen, where there are no seafood ingredients in the refrigerator. This is really important if you can't trust the other bakers. This is how we isolate bakers from each other, with different kitchens (different processes). A fire in one kitchen (like a program crash) won't affect any baking going on in other kitchens.
Think of the program like anything in the kitchen with instructions printed on it. It's the collection of all the recipe books, all the manuals for all the appliances and even the emergency procedures written on the signs on the walls of the kitchen. The bakers can only follow instructions on literature that is actually in the kitchen. Although they can follow instructions on any page of any book or manual in the kitchen.
Different bakers can be following instructions on different pages of the same book, or completely different books, or can even be currently following the exact same instruction of the same recipe. But they cannot suddenly start following instructions in a book that is not in the kitchen (a different program). Like, for example, if there are no recipe books with recipes for seafood in the kitchen, then the bakers won't be able to follow those instructions and won't be able to make any seafood. Bakers can only bake recipes that are in the kitchen with them.
So imagine a kitchen that has only a book of Chef Pierre's French Cuisine Recipes. If that's the only book in the kitchen, then that's all the bakers can do is bake French Cuisine Recipes because there are simply no other recipes in the kitchen to be followed. All the bakers are using the same book of recipes (running the same program). But one baker may be just finishing preparing the Ratatouille while two other bakers are both putting the finishing touches on a soufflé recipe from the book. Meanwhile, down the street, there may be another kitchen outfitted with a completely different set of recipes -- our seafood restaurant from before, perhaps. The two kitchens would be utilizing different books of recipes (running different programs). There could conceivably be another kitchen somewhere where they also have a copy of Chef Pierre's book. In that case you would have two different kitchens capable of preparing the same recipes (two different processes running the same program.)
So:
Single-threaded = one baker.
Multi-threaded = more than one baker in the same kitchen.
Single-process - one kitchen (any number of bakers).
Multi-process = more than one kitchen (each with any number of bakers)

Related

How are threads made?

I was reading up on threads and as I understand it, they are a set of values for an execution context. From what I understand, a thread is comprised of values (registers, PC, stack, etc.) that allow a CPU to continue running a set of instructions.
However, my question is: how are these threads made? I hear some of my professors throw around the word thread as a way to break up a process into multiple (mostly) independent parts of code (ie. multithreading). How does this work? Is there another section of memory that stores specifically what a thread can run, as well as it's state?
First of all you have to understand that operating systems vary greatly in their general working as well as in their implementations of seemingly identical functions.
so don't go into these kind of questions thinking that if one operating system does something in some way then other operating systems would do that in similar manner.
Now to your question
how are these threads made?
I will answer it using Linux as an example. When creating a new process Linux lets you specify which data structures (file descriptors, IO context etc) new process would share with its parent process. you can do this using the clone system call.
you can see in the documentation of clone that it takes some parameters specifying the sharing properties.
Now you can call a task_struct thread if it shares all sharable data structures with its parent ( because this property is consistent with the conventional definition of a thread). and if it shares none then you would call it a process.
But as far as Linux is concerned there is no notion of a thread or a process, all you have is a task_struct which may share certain resources with its parent.

What does it mean to say Apache spawns a thread per request, but node.js does not?

I have read about node.js and other servers such as Apache, where the threading is different. I simply do not understand what the threading means.
If I have a webpage that runs SQL to hit a database, say three different databases in the one server side page, what does that mean for threading in node.js? Apache? What does "thread" mean here?
Or as an article I saw, "start a new thread to handle each request."
What does it mean to say Apache spawns a thread per request, but node.js does not?
EDIT: I am hoping for an example that I can grasp. I'm used to having a server side page that hits a database(s). Several connections inside that file.
A thread is a context of program execution. Programs that are single-threaded can only do one thing at once, where multi-threaded programs can do many things at once.
Think of it like a kitchen at a restaurant. A single chef can really only do one task at a time, be that chopping onions or putting something in an oven. If an order comes in that requires lots of work from the chef (such as making salads vs. putting stuff in the oven and waiting) some meals may get delayed because that chef is busy. On the other hand, if that chef just has to bake a bunch of stuff, there isn't much work for him to do and he can make other meals while waiting for the food in the oven to be done.
With multiple chefs, many of these tasks can be done simultaneously. Many meals can be prepared simultaneously.
Apache's threading model is like hiring a fixed number of chefs (regardless of how many customers your restauarant has that night) and each chef can only work on one meal at a time. That means that if a meal order comes in, a dedicated chef is assigned to that meal. There will be times when that chef is busy chopping up ingredients and mixing cake batter, but there will also be times when he's just standing around waiting for the potatoes to boil. At any given time, you could have most of your chefs sitting idle, waiting on potatoes to boil and cake to bake and no more orders will be worked on, since each chef is dedicated to one order at a time.
To make matters worse, your kitchen is only as big as you can afford to make it. Each chef takes up space and resources, and you may have a situation where a bunch of chefs standing around holding the only spoons available are preventing other chefs from getting their food made.
Nginx is another web server (often used as a proxy) that you didn't ask about, but I'm including it to explain another threading model. It also hires a fixed number of chefs, but it hires fewer of them. Each chef can work on multiple meals at a time. So, if they're waiting on potatoes to boil while an order comes in for a chopped salad, they can go work on that salad instead of standing around idle. You can have a smaller kitchen (relative to the size of restaurant/number of customers) and get the same number of meals out, or more. It's a tight crew that is effective at not wasting time and resources.
Node.js is a bit different. It is single-threaded from a JavaScript perspective, but other tasks like disk and network IO are handled on separate threads automatically. It's like having a kitchen with only one chef, but that makes sense in some cases. If your kitchen has a lot of busy work for that chef, perhaps it makes sense to hire more chefs to do work. (To do this in Node.js, you can only spawn more processes, which is effectively like building a bunch of small kitchens right next to each other. You can have one guy standing out front coordinating the orders for all those kitchens.) However, if you're just a bakery (mainly just IO, with little busy-work for the chef), maybe you only need one chef.
To sum all this up, different threading models are used to divide work and process it effectively. Which threading model makes sense depends on your needs, and the other characteristics of the server you are choosing.
Node.js is single threaded in that it can only do one thing at once. You can run multiple instances of the node process on pretty much all cloud service providers, though. The apache process can multi-task on threads.
If the node process hangs for some reason, nothing else can happen. That's why its important to write node in an asyncronous way so that if a database query hangs, node can still take requests.
Without getting too technical, a thread can be thought of as a lane in the highway of the program. Its a specific channel of execution. In the lifetime of a request, a lot of things have to happen. All of those things are in one box.
Node doesn't have threads! You can think of it like a one lane road. But the way node is deployed you get many instances of that one lane road. They don't share anything though. If you a value gets added to an array in one, its not in the other. Anything that needs to be shared has to be shared in a cache or database layer.
What people confuse between is Threads, Process & Async, Non-blocking I/O.
Threads are child level 'runnable' to a process. All the execution environment is set up for a thread. Right from the Stack to Addressable memory locations it's allocated to a thread. If a child-level thread has to communicate back to the the main process thread, it has to use safe-messaging,notification models. There are multiple ways to do this, based on the language.
Node.js is Single Threaded and obviously single Process based. It's not meant for high CPU intensive blocking calls. But if you still want to use, You could consider Node clustering. So instead of creating threads, it creates multiple "process" that works like a thread.
Async - All the code that carries a callback functions are not actually Async.
Okay in other words, Literally, they are Asynchrounous as they don't block the call.
But in Node.js context, When someone says, Node is Async, it's completely linked to the OS interfacing. The capability of Node depends on the Non-blocking I/O capabilities of the underlying OS. So whatever objects the OS supports Non-blocking I/O for example, Sockets, Files, Pipes, Node utilizes them to maximum.
And btw, when you talk about Apache, you should ideally be comparing Nginx. Not Node.js.
Node.js is not meant to serve as a Web Server. It's a basically a Process that puts effective use of Async I/O.

How does the Linux kernel realize reentrancy?

All Unix kernels are reentrant: several processes may be executing in kernel
mode at the same time. How can I realize this effect in code? How should I handle the situation whereby many processes invoke system calls, pending in kernel mode?
[Edit - the term "reentrant" gets used in a couple of different senses. This answer uses the basic "multiple contexts can be executing the same code at the same time." This usually applies to a single routine, but can be extended to apply to a set of cooperating routines, generally routines which share data. An extreme case of this is when applied to a complete program - a web server, or an operating system. A web-server might be considered non-reentrant if it could only deal with one client at a time. (Ugh!) An operating system kernel might be called non-reentrant if only one process/thread/processor could be executing kernel code at a time.
Operating systems like that occurred during the transition to multi-processor systems. Many went through a slow transition from written-for-uniprocessors to one-single-lock-protects-everything (i.e. non-reentrant) through various stages of finer and finer grained locking. IIRC, linux finally got rid of the "big kernel lock" at approx. version 2.6.37 - but it was mostly gone long before that, just protecting remnants not yet converted to a multiprocessing implementation.
The rest of this answer is written in terms of individual routines, rather than complete programs.]
If you are in user space, you don't need to do anything. You call whatever system calls you want, and the right thing happens.
So I'm going to presume you are asking about code in the kernel.
Conceptually, it's fairly simple. It's also pretty much identical to what happens in a multi-threaded program in user space, when multiple threads call the same subroutine. (Let's assume it's a C program - other languages may have differently named mechanisms.)
When the system call implementation is using automatic (stack) variables, it has its own copy - no problem with re-entrancy. When it needs to use global data, it generally needs to use some kind of locking - the specific locking required depends on the specific data it's using, and what it's doing with that data.
This is all pretty generic, so perhaps an example might help.
Let's say the system call want to modify some attribute of a process. The process is represented by a struct task_struct which is a member of various linked lists. Those linked lists are protected by the tasklist_lock. Your system call gets the tasklist_lock, finds the right process, possibly gets a per-process lock controlling the field it cares about, modifies the field, and drops both locks.
One more detail, which is the case of processes executing different system calls, which don't share data with each other. With a reasonable implementation, there are no conflicts at all. One process can get itself into the kernel to handle its system call without affecting the other processes. I don't remember looking specifically at the linux implementation, but I imagine it's "reasonable". Something like a trap into an exception handler, which looks in a table to find the subroutine to handle the specific system call requested. The table is effectively const, so no locks required.

Is this a correct Analogy for Process and Threads?

I am trying to get the hang of Process, Threads, MultiCore etc...
So here's my analogy from what I have learnt..
A society is like a Computer.
A building is like a Chip.
A floor on a particular building is a Core on that Chip.
So a building can be one floor (single core on single chip),
One building multiple floors (multicore on a single chip),
Many building, many floors each(multiprocessor with multicore)
A flat in a particular floor is a Process.
A person living in a room in a particular flat is a thread.
People living in a particular flat, share the same space. i.e. each thread in a process share the same address space.
Each person shares few common things in the room like a kitchen, shower area etc i.e. each thread in a process share code, data, files
Shower room in a particular flat needs synchronisation between people (threads) living in that flat. As only one can use at a time.
Each person has its own personal set of things which he does not share with others, his underwear for example ;) (a thread has its own stack, registers)
A person can call new friends to his flat. i.e. A thread may spawn new threads for his her wishes (maybe the thread is bored lol)
The owner of the building, can shut down any room, create new rooms, or assign new tenants or chuck them out. Kernel can assign new process, and create destroy threads too.
......
Guys let me know if I got the analogy right. If anything else I can add to make it more clear or make it correct.
The only problem I see is when a process having many threads run on different cores... Any ideas how to include that in the analogy..?
Let me know. Thanks a lot. :)
Reference - https://www.his.se/PageFiles/4854/2010/threads_6slides.pdf?epslanguage=sv
In your example, Shower would be a limited resource possibly memory resource or device, which is why it must be shared. You didn't give the equivalent thing.
A thread doesn't have it's own registers, unless you mean something other than the registers in one of the cores of one of the CPUs. Threads share cores, so they are actually operating under time splices. I would compare it more to an apartment room has occupants move in and out, but then your shower idea falls apart. At which point I would say that it's the storage lockers that they must share, but only one tenant can use a particular locker at a time.
A process executes within the boundaries of a core, but it doesn't physically exist on the core, but I think for this case the metaphor is clear enough.
If you're worried about threads running on different cores, you can compare it to a prison shower room. Where guys(threads) move in and out of different showers(cores) even though they belong to different gangs(processes), and they have to share a set of lockers like before.
No matter what your analogy, it's hard to explain the time splices and rapid entering and exiting of different process/threads on a core.
And, the real problem I have with this, threads don't get bored :)

Threads vs Processes in Linux [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed last year.
The community reviewed whether to reopen this question last year and left it closed:
Original close reason(s) were not resolved
Improve this question
I've recently heard a few people say that in Linux, it is almost always better to use processes instead of threads, since Linux is very efficient in handling processes, and because there are so many problems (such as locking) associated with threads. However, I am suspicious, because it seems like threads could give a pretty big performance gain in some situations.
So my question is, when faced with a situation that threads and processes could both handle pretty well, should I use processes or threads? For example, if I were writing a web server, should I use processes or threads (or a combination)?
Linux uses a 1-1 threading model, with (to the kernel) no distinction between processes and threads -- everything is simply a runnable task. *
On Linux, the system call clone clones a task, with a configurable level of sharing, among which are:
CLONE_FILES: share the same file descriptor table (instead of creating a copy)
CLONE_PARENT: don't set up a parent-child relationship between the new task and the old (otherwise, child's getppid() = parent's getpid())
CLONE_VM: share the same memory space (instead of creating a COW copy)
fork() calls clone(least sharing) and pthread_create() calls clone(most sharing). **
forking costs a tiny bit more than pthread_createing because of copying tables and creating COW mappings for memory, but the Linux kernel developers have tried (and succeeded) at minimizing those costs.
Switching between tasks, if they share the same memory space and various tables, will be a tiny bit cheaper than if they aren't shared, because the data may already be loaded in cache. However, switching tasks is still very fast even if nothing is shared -- this is something else that Linux kernel developers try to ensure (and succeed at ensuring).
In fact, if you are on a multi-processor system, not sharing may actually be beneficial to performance: if each task is running on a different processor, synchronizing shared memory is expensive.
* Simplified. CLONE_THREAD causes signals delivery to be shared (which needs CLONE_SIGHAND, which shares the signal handler table).
** Simplified. There exist both SYS_fork and SYS_clone syscalls, but in the kernel, the sys_fork and sys_clone are both very thin wrappers around the same do_fork function, which itself is a thin wrapper around copy_process. Yes, the terms process, thread, and task are used rather interchangeably in the Linux kernel...
Linux (and indeed Unix) gives you a third option.
Option 1 - processes
Create a standalone executable which handles some part (or all parts) of your application, and invoke it separately for each process, e.g. the program runs copies of itself to delegate tasks to.
Option 2 - threads
Create a standalone executable which starts up with a single thread and create additional threads to do some tasks
Option 3 - fork
Only available under Linux/Unix, this is a bit different. A forked process really is its own process with its own address space - there is nothing that the child can do (normally) to affect its parent's or siblings address space (unlike a thread) - so you get added robustness.
However, the memory pages are not copied, they are copy-on-write, so less memory is usually used than you might imagine.
Consider a web server program which consists of two steps:
Read configuration and runtime data
Serve page requests
If you used threads, step 1 would be done once, and step 2 done in multiple threads. If you used "traditional" processes, steps 1 and 2 would need to be repeated for each process, and the memory to store the configuration and runtime data duplicated. If you used fork(), then you can do step 1 once, and then fork(), leaving the runtime data and configuration in memory, untouched, not copied.
So there are really three choices.
That depends on a lot of factors. Processes are more heavy-weight than threads, and have a higher startup and shutdown cost. Interprocess communication (IPC) is also harder and slower than interthread communication.
Conversely, processes are safer and more secure than threads, because each process runs in its own virtual address space. If one process crashes or has a buffer overrun, it does not affect any other process at all, whereas if a thread crashes, it takes down all of the other threads in the process, and if a thread has a buffer overrun, it opens up a security hole in all of the threads.
So, if your application's modules can run mostly independently with little communication, you should probably use processes if you can afford the startup and shutdown costs. The performance hit of IPC will be minimal, and you'll be slightly safer against bugs and security holes. If you need every bit of performance you can get or have a lot of shared data (such as complex data structures), go with threads.
Others have discussed the considerations.
Perhaps the important difference is that in Windows processes are heavy and expensive compared to threads, and in Linux the difference is much smaller, so the equation balances at a different point.
Once upon a time there was Unix and in this good old Unix there was lots of overhead for processes, so what some clever people did was to create threads, which would share the same address space with the parent process and they only needed a reduced context switch, which would make the context switch more efficient.
In a contemporary Linux (2.6.x) there is not much difference in performance between a context switch of a process compared to a thread (only the MMU stuff is additional for the thread).
There is the issue with the shared address space, which means that a faulty pointer in a thread can corrupt memory of the parent process or another thread within the same address space.
A process is protected by the MMU, so a faulty pointer will just cause a signal 11 and no corruption.
I would in general use processes (not much context switch overhead in Linux, but memory protection due to MMU), but pthreads if I would need a real-time scheduler class, which is a different cup of tea all together.
Why do you think threads are have such a big performance gain on Linux? Do you have any data for this, or is it just a myth?
I think everyone has done a great job responding to your question. I'm just adding more information about thread versus process in Linux to clarify and summarize some of the previous responses in context of kernel. So, my response is in regarding to kernel specific code in Linux. According to Linux Kernel documentation, there is no clear distinction between thread versus process except thread uses shared virtual address space unlike process. Also note, the Linux Kernel uses the term "task" to refer to process and thread in general.
"There are no internal structures implementing processes or threads, instead there is a struct task_struct that describe an abstract scheduling unit called task"
Also according to Linus Torvalds, you should NOT think about process versus thread at all and because it's too limiting and the only difference is COE or Context of Execution in terms of "separate the address space from the parent " or shared address space. In fact he uses a web server example to make his point here (which highly recommend reading).
Full credit to linux kernel documentation
If you want to create a pure a process as possible, you would use clone() and set all the clone flags. (Or save yourself the typing effort and call fork())
If you want to create a pure a thread as possible, you would use clone() and clear all the clone flags (Or save yourself the typing effort and call pthread_create())
There are 28 flags that dictate the level of resource sharing. This means that there are over 268 million flavours of tasks that you can create, depending on what you want to share.
This is what we mean when we say that Linux does not distinguish between a process and a thread, but rather alludes to any flow of control within a program as a task. The rationale for not distinguishing between the two is, well, not uniquely defining over 268 million flavours!
Therefore, making the "perfect decision" of whether to use a process or thread is really about deciding which of the 28 resources to clone.
How tightly coupled are your tasks?
If they can live independently of each other, then use processes. If they rely on each other, then use threads. That way you can kill and restart a bad process without interfering with the operation of the other tasks.
To complicate matters further, there is such a thing as thread-local storage, and Unix shared memory.
Thread-local storage allows each thread to have a separate instance of global objects. The only time I've used it was when constructing an emulation environment on linux/windows, for application code that ran in an RTOS. In the RTOS each task was a process with it's own address space, in the emulation environment, each task was a thread (with a shared address space). By using TLS for things like singletons, we were able to have a separate instance for each thread, just like under the 'real' RTOS environment.
Shared memory can (obviously) give you the performance benefits of having multiple processes access the same memory, but at the cost/risk of having to synchronize the processes properly. One way to do that is have one process create a data structure in shared memory, and then send a handle to that structure via traditional inter-process communication (like a named pipe).
In my recent work with LINUX is one thing to be aware of is libraries. If you are using threads make sure any libraries you may use across threads are thread-safe. This burned me a couple of times. Notably libxml2 is not thread-safe out of the box. It can be compiled with thread safe but that is not what you get with aptitude install.
I'd have to agree with what you've been hearing. When we benchmark our cluster (xhpl and such), we always get significantly better performance with processes over threads. </anecdote>
The decision between thread/process depends a little bit on what you will be using it to.
One of the benefits with a process is that it has a PID and can be killed without also terminating the parent.
For a real world example of a web server, apache 1.3 used to only support multiple processes, but in in 2.0 they added an abstraction so that you can swtch between either. Comments seems to agree that processes are more robust but threads can give a little bit better performance (except for windows where performance for processes sucks and you only want to use threads).
For most cases i would prefer processes over threads.
threads can be useful when you have a relatively smaller task (process overhead >> time taken by each divided task unit) and there is a need of memory sharing between them. Think a large array.
Also (offtopic), note that if your CPU utilization is 100 percent or close to it, there is going to be no benefit out of multithreading or processing. (in fact it will worsen)
Threads -- > Threads shares a memory space,it is an abstraction of the CPU,it is lightweight.
Processes --> Processes have their own memory space,it is an abstraction of a computer.
To parallelise task you need to abstract a CPU.
However the advantages of using a process over a thread is security,stability while a thread uses lesser memory than process and offers lesser latency.
An example in terms of web would be chrome and firefox.
In case of Chrome each tab is a new process hence memory usage of chrome is higher than firefox ,while the security and stability provided is better than firefox.
The security here provided by chrome is better,since each tab is a new process different tab cannot snoop into the memory space of a given process.
Multi-threading is for masochists. :)
If you are concerned about an environment where you are constantly creating threads/forks, perhaps like a web server handling requests, you can pre-fork processes, hundreds if necessary. Since they are Copy on Write and use the same memory until a write occurs, it's very fast. They can all block, listening on the same socket and the first one to accept an incoming TCP connection gets to run with it. With g++ you can also assign functions and variables to be closely placed in memory (hot segments) to ensure when you do write to memory, and cause an entire page to be copied at least subsequent write activity will occur on the same page. You really have to use a profiler to verify that kind of stuff but if you are concerned about performance, you should be doing that anyway.
Development time of threaded apps is 3x to 10x times longer due to the subtle interaction on shared objects, threading "gotchas" you didn't think of, and very hard to debug because you cannot reproduce thread interaction problems at will. You may have to do all sort of performance killing checks like having invariants in all your classes that are checked before and after every function and you halt the process and load the debugger if something isn't right. Most often it's embarrassing crashes that occur during production and you have to pore through a core dump trying to figure out which threads did what. Frankly, it's not worth the headache when forking processes is just as fast and implicitly thread safe unless you explicitly share something. At least with explicit sharing you know exactly where to look if a threading style problem occurs.
If performance is that important, add another computer and load balance. For the developer cost of debugging a multi-threaded app, even one written by an experienced multi-threader, you could probably buy 4 40 core Intel motherboards with 64gigs of memory each.
That being said, there are asymmetric cases where parallel processing isn't appropriate, like, you want a foreground thread to accept user input and show button presses immediately, without waiting for some clunky back end GUI to keep up. Sexy use of threads where multiprocessing isn't geometrically appropriate. Many things like that just variables or pointers. They aren't "handles" that can be shared in a fork. You have to use threads. Even if you did fork, you'd be sharing the same resource and subject to threading style issues.
If you need to share resources, you really should use threads.
Also consider the fact that context switches between threads are much less expensive than context switches between processes.
I see no reason to explicitly go with separate processes unless you have a good reason to do so (security, proven performance tests, etc...)

Resources