how about use shared memory to share data in a program

how about use shared memory to share data in a program - linux

I want to write a "embedded control system", on Linux
In order to make it easy to update in the future, i think multi-process is better than muti-thread
so, this system maybe separate into 3 program
"process", read some input data from others and do some calculation, then save the result to shared memory
"display", read instant data from shared memory and choose some data to display on UI(written by Qt)
"database", read instant data from shared memory and save them in a period, data will be saved in binary files at first, and maybe use sqlite instead in the future
and nore, maybe i will add a web server to read the instant data and show it through the browser
here is the question:
is multi-process really better multi-thread?
If use multi-process, how about use shared memory, is there any disadvantage.

is multi-process really better multi-thread?
Depends on what you want to do. Multiprocessing enforces a strict separation between system components, allowing for various parts to run with different credentials. It does require a more complicated communication mechanism than multithreading and incurs some overhead.
If use multi-process, how about use shared memory, is there any disadvantage.
The main disadvantage, compared to the obvious alternative of using sockets, is that it limits your entire system to running on a single host. No distributed computing.

You're already using Qt, which is suitable for embedded systems. Great.
It has good APIs for TCP sockets and *nix sockets, and good data streaming protocols. Unless you demand absolutely scorching performance, I would err on the side of secure, separate processes.

The upside of three processes is that the protected address space prevents unwanted communications between the processes.
The upside of three processes is that the protected address space hinders wanted communications between the processes.
Inter-process communication is slower and more difficult to manage than inter-thread comms.

Related

Performance implications of using inter-process communication (IPC)

What type of usage is IPC intended for and is it is OK to send larger chunks of JSON (hundreds of characters) between processes using IPC? Should I be trying to send as tiny as message as possible using IPC or would the performance gains coming from reducing message size not be worth the effort?

What type of usage is IPC intended for and is it is OK to send larger chunks of JSON (hundreds of characters) between processes using IPC?
At it's core, IPC is what it says on the tin. It's a tool to use when you need to communicate information between processes, whatever that may be. The topic is very broad, and technically includes allocating shared memory and doing the communication manually, but given the tone of the question, and the tags, I'm assuming you're talking about the OS provided facilities.
Wikipedia does a pretty good job discussing how IPC is used, and I don't think I can do much better, so I'll concentrate on the second question.
Should I be trying to send as tiny as message as possible using IPC or would the performance gains coming from reducing message size not be worth the effort?
This smells a bit like a micro-optimization. I can't say definitively, because I'm not privy to the source code at Microsoft and Apple, and I really don't want to dig through the Linux kernel's implementation of IPC, but, here's a couple points:
IPC is a common operation, so OS designers are likely to optimize it for efficiency. There are teams of engineers that have considered the problem and figured out how to make this fast.
The bottleneck in communication across processes/threads is almost always synchronization. Delays are bad, but race conditions and deadlocks are worse. There are, however, lots of creative ways that OS designers can speed up the procedure, since the system controls the process scheduler and memory manager.
There's lots of ways to make the data transfer itself fast. For the OS, if the data needs to cross process boundaries, then there is some copying that may need to take place, but the OS copies memory all over the place all the time. Think about a command line utility, like netstat. When that executable is run, memory needs to be allocated, the process needs to be loaded from disk, and any address fixing that the OS needs to do is done, before the process can even start. This is done so quickly that you hardly even notice. On Windows netstat is about 40k, and it loads into memory almost instantly. (Notepad, another fast loader is 10 times that size, but it still launches in a tiny amount of time.)
The big exception to #2 above is if you're talking about IPC between processes that aren't on the same computer. (Think Windows RPC) Then you're really bound by the speed of the networking/communication stack, but at that point a few kb here or there isn't going to make a whole lot of difference. (You could consider AJAX to be a form of IPC where the 'processes' are the server and your browser. Now consider how fast Google Docs operates.)
If the IPC is between processes on the same system, I don't think that it's worth a ton of effort shaving bytes from your message. Make your message easy to debug.
In the case that the communication is happening between processes on different machines, then you may have something to think about, having spent a lot of time debugging issues that would have been simple with a better data format, a few dozen extra milliseconds transit time isn't worth making the data harder to parse/debug. Remember the three rules of optimization1:
Don't.
Don't... yet. (For experts)
Profile before you do.
1 The first two rules are usually attributed to Michael Jackson. (This one not this one)

Do all types of interprocess/interthread communication need system calls?

In Linux,
do all types of interprocess communication need system calls?
Types of interprocess communication are such as
Pipes
Signals
Message Queues
Semaphores
Shared Memory
Sockets
Do all types of interthread communication need system calls?
I would like to know if all interprocess communications and interthread communications involve switching from user mode to kernel mode so that the OS kernel will run to perform the communications? Since system calls all involve such switch, I asked if the communications need system calls.
For example, "Shared memory" can be used for both interprocess and interthread communcations, but i am not sure if it requires system calls or involvement of OS kernel to take over the cpu to perform something.
Thanks.

For interprocess communication I am pretty sure you cannot avoid system calls.
For interthread communication I cannot give you a definitive answer, but my educated guess would be "yes-and-no". You see, you can communicate between threads using thread-safe queues, and the only thing that a thread-safe queue needs in order to work is a lock. If a lock is unavailable at the moment that a thread wants to obtain it, then of course the system must be involved in order to put the thread in a waiting mode. But if the lock is available to obtain, then the thread should be able to proceed without the need for any system call.
That's what I would guess, and I would be quite disappointed to find out that things do not actually work this way, because that would mean that code which I have up until now been considering pretty innocent in fact has a tremendous additional hidden overhead.

Yes, every IPC was set by some syscalls(2).
It might happen that some IPC was set by a previous program (e.g. the program in the same process before execve), for example when running a pipeline like ls | ./yourprog it is the shell which has called pipe(2), not yourprog.
Since threads -in the same process- (by definition) share a common address space they can communicate using some shared data. However, they often need some syscall for synchronization (e.g. with mutexes), see e.g. futex(7) - because you want to avoid spinlocks (i.e. wasting CPU power for waiting). But in practice you should use pthreads(7)
In practice you cannot use shared memory (like shm_overview(7)) without synchronization (e.g. with semaphores, see sem_overview(7)). Notice that cache coherence is tricky and makes memory model sometimes non-intuitive (and processor specific).

At least, you do not need a system call for each read/write to shared memory. Setting up shared memory will for sure and synchronizing threads/processes will often involve system calls.
You could use flags in shared memory for synchronization, but note that read and write of flags may not be atomic actions.
(For example if you set up a location in shared memory to be 0 in the beginning and then check for it to be non-zero, while the other process sets it to non-zero when ready for something)

Is it feasible to implemenent Linux concurrency primitives that give better isolation than threads but comparable performance?

Consider a following application: a web search server that upon start creates a large in-memory index of web pages based on data read from disk. Once initialized, in-memory index can not be modified and multiple threads are started to serve user queries. Assume the server is compiled to native code and uses OS threads.
Now, threading model gives no isolation between threads. A buggy thread or any non thread safe code, can corrupt the index or corrupt memory that was allocated by and logically belongs to some other thread. Such problems are difficult to detect and debug.
Theoretically, Linux allows to enforce a better isolation. Once index is initialized, memory it occupies can be marked read only. Threads can be replaced with processes that share the index (shared memory) but other than that have separate heaps and can not corrupt each other. Illegal operation are automatically detected by hardware and the operating system. No mutexes or other synchronization primitives are needed. Memory related data races are completely eliminated.
Is such model feasible in practice? Are you aware of any real life application that do such things? Or maybe there are some fundamental difficulties that make such model impractical? Do you think such approach would introduce a performance overhead compared to traditional threads? Theoretically, memory that is used is the same, but are there some implementation-related issues that would make things slower?

The obvious solution is to not use threads at all. Use separate processes. Since each process has much in common with code and readonly structures, making the readonly data shared is trivial: format it as needed for in-memory use within a file and map the file to memory.
Using this scheme, only the variable per-process data would be independent. The code would be shared and statically initialized data would be shared until written. If a process croaks, there is zero impact on other processes. No concurrency issues at all.

You can use mprotect() to make your index read-only. On a 64-bit system you can map the local memory for each thread at a random address (see this Wikipedia article on address space randomization) which makes the odds of memory corruption from one thread touching another astronomically small (and of course any corruption that misses mapped memory altogether will cause a segfault). Obviously you'll need to have different heaps for each thread.

I think you might find memcached interesting. Also, you can create a shared memory and open it as read-only and then create your threads. This should not cause much performance degradation.

Concurrency: Processes vs Threads

What are the main advantages of using a model for concurrency based on processes over one
based on threads and in what contexts is the latter appropriate?

Fault-tolerance and scalability are the main advantages of using Processes vs. Threads.
A system that relies on shared memory or some other kind of technology that is only available when using threads, will be useless when you want to run the system on multiple machines. Sooner or later you will need to communicate between different processes.
When using processes you are forced to deal with communication via messages, for example, this is the way Erlang handles communication. Data is not shared, so there is no risk of data corruption.
Another advantage of processes is that they can crash and you can feel relatively safe in the knowledge that you can just restart them (even across network hosts). However, if a thread crashes, it may crash the entire process, which may bring down your entire application. To illustrate: If an Erlang process crashes, you will only lose that phone call, or that webrequest, etc. Not the whole application.
In saying all this, OS processes also have many drawbacks that can make them harder to use, like the fact that it takes forever to spawn a new process. However, Erlang has it's own notion of processes, which are extremely lightweight.
With that said, this discussion is really a topic of research. If you want to get into more of the details, you can give Joe Armstrong's paper on fault-tolerant systems]1 a read, it explains a lot about Erlang and the philosophy that drives it.

The disadvantage of using a process-based model is that it will be slower. You will have to copy data between the concurrent parts of your program.
The disadvantage of using a thread-based model is that you will probably get it wrong. It may sound mean, but it's true-- show me code based on threads and I'll show you a bug. I've found bugs in threaded code that has run "correctly" for 10 years.
The advantages of using a process-based model are numerous. The separation forces you to think in terms of protocols and formal communication patterns, which means its far more likely that you will get it right. Processes communicating with each other are easier to scale out across multiple machines. Multiple concurrent processes allows one process to crash without necessarily crashing the others.
The advantage of using a thread-based model is that it is fast.
It may be obvious which of the two I prefer, but in case it isn't: processes, every day of the week and twice on Sunday. Threads are too hard: I haven't ever met anybody who could write correct multi-threaded code; those that claim to be able to usually don't know enough about the space yet.

In this case Processes are more independent of eachother, while Threads shares some resources e.g. memory. But in a general case Threads are more light-weight than Processes.
Erlang Processes is not the same thing as OS Processes. Erlang Processes are very light-weight and Erlang can have many Erlang Processes within the same OS Thread. See Technically why is processes in Erlang more efficient than OS threads?

First and foremost, processes differ from threads mostly in the way their memory is handled:
Process = n*Thread + memory region (n>=1)
Processes have their own isolated memory.
Processes can have multiple threads.
Processes are isolated from each other on the operating system level.
Threads share their memory with their peers in the process.
(This is often undesirable. There are libraries and methods out there to remedy this, but that is usually an artificial layer over operating system threads.)
The memory thing is the most important discerning factor, as it has certain implications:
Exchanging data between processes is slower than between threads. Breaking the process isolation always requires some involvement of kernel calls and memory remapping.
Threads are more lightweight than processes. The operating system has to allocate resources and do memory management for each process.
Using processes gives you memory isolation and synchronization. Common problems with access to memory shared between threads do not concern you. Since you have to make a special effort to share data between processes, you will most likely sync automatically with that.
Using processes gives you good (or ultimate) encapsulation. Since inter process communication needs special effort, you will be forced to define a clean interface. It is a good idea to break certain parts of your application out of the main executable. Maybe you can split dependencies like that.
e.g. Process_RobotAi <-> Process_RobotControl
The AI will have vastly different dependencies compared to the control component. The interface might be simple: Process_RobotAI --DriveXY--> Process_RobotControl.
Maybe you change the robot platform. You only have to implement a new RobotControl executable with that simple interface. You don't have to touch or even recompile anything in your AI component.
It will also, for the same reasons, speed up compilation in most cases.
Edit: Just for completeness I will shamelessly add what the others have reminded me of :
A crashing process does not (necessarily) crash your whole application.
In General:
Want to create something highly concurrent or synchronuous, like an algorithm with n>>1 instances running in parallel and sharing data, use threads.
Have a system with multiple components that do not need to share data or algorithms, nor do they exchange data too often, use processes. If you use a RPC library for the inter process communication, you get a network-distributable solution at no extra cost.
1 and 2 are the extreme and no-brainer scenarios, everything in between must be decided individually.
For a good (or awesome) example of a system that uses IPC/RPC heavily, have a look at ros.

What are the thread limitations when working on Linux compared to processes for network/IO-bound apps?

I've heard that under linux on multicore server it would be impossible to reach top performance when you have just 1 process but multiple threads because Linux have some limitations on the IO, so that 1 process with 8 threads on 8-core server might be slower than 8 processes.
Any comments? Are there other limitation which might slow the applications?
The applications is a network C++ application, serving 100s of clients, with some disk IO.
Update: I am concerned that there are some more IO-related issues other than the locking I implement myself... Aren't there any issues doing simultanious network/disk IO in several threads?

Drawbacks of Threads
Threads:
Serialize on memory operations. That is the kernel, and in turn the MMU must service operations such as mmap() that perform page allocations.
Share the same file descriptor table. There is locking involved making changes and performing lookups in this table, which stores stuff like file offsets, and other flags. Every system call made that uses this table such as open(), accept(), fcntl() must lock it to translate fd to internal file handle, and when make changes.
Share some scheduling attributes. Processes are constantly evaluated to determine the load they're putting on the system, and scheduled accordingly. Lots of threads implies a higher CPU load, which the scheduler typically dislikes, and it will increase the response time on events for that process (such as reading incoming data on a socket).
May share some writable memory. Any memory being written to by multiple threads (especially slow if it requires fancy locking), will generate all kinds of cache contention and convoying issues. For example heap operations such as malloc() and free() operate on a global data structure (that can to some degree be worked around). There are other global structures also.
Share credentials, this might be an issue for service-type processes.
Share signal handling, these will interrupt the entire process while they're handled.
Processes or Threads?
If you want to make debugging easier, use threads.
If you are on Windows, use threads. (Processes are extremely heavyweight in Windows).
If stability is a huge concern, try to use processes. (One SIGSEGV/PIPE is all it takes...).
If threads aren't available, use processes. (Not so common now, but it did happen).
If your threads share resources that can't be use from multiple processes, use threads. (Or provide an IPC mechanism to allow communicating with the "owner" thread of the resource).
If you use resources that are only available on a one-per-process basis (and you one per context), obviously use processes.
If your processing contexts share absolutely nothing (such as a socket server that spawns and forgets connections as it accept()s them), and CPU is a bottleneck, use processes and single-threaded runtimes (which are devoid of all kinds of intense locking such as on the heap and other places).
One of the biggest differences between threads and processes is this: Threads use software constructs to protect data structures, processes use hardware (which is significantly faster).
Links
pthreads(7)
About Processes and Threads (MSDN)
Threads vs. Processes

it really should make no difference but is probably about design.
A multi process app may have to do less locking but may use more memory. Sharing data between processes may be harder.
On the other hand multi process can be more robust. You can call exit() and quit the child safely mostly without affecting others.
It depends how dependent the clients are. I usually recommend the simplest solution.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string