We are still in the design-phase of our project but we are thinking of having three separate processes on an embedded Linux kernel. One of the processes with be a communications module which handles all communications to and from the device through various mediums.
The other two processes will need to be able to send/receive messages through the communication process. I am trying to evaluate the IPC techniques that Linux provides; the message the other processes will be sending will vary in size, from debug logs to streaming media at ~5 Mbit rate. Also, the media could be streaming in and out simultaneously.
Which IPC technique would you suggestion for this application?
http://en.wikipedia.org/wiki/Inter-process_communication
Processor is running around 400-500 Mhz if that changes anything.
Does not need to be cross-platform, Linux only is fine.
Implementation in C or C++ is required.
When selecting your IPC you should consider causes for performance differences including transfer buffer sizes, data transfer mechanisms, memory allocation schemes, locking mechanism implementations, and even code complexity.
Of the available IPC mechanisms, the choice for performance often comes down to Unix domain sockets or named pipes (FIFOs). I read a paper on Performance Analysis of Various Mechanisms for Inter-process Communication that indicates Unix domain sockets for IPC may provide the best performance. I have seen conflicting results elsewhere which indicate pipes may be better.
When sending small amounts of data, I prefer named pipes (FIFOs) for their simplicity. This requires a pair of named pipes for bi-directional communication. Unix domain sockets take a bit more overhead to setup (socket creation, initialization and connection), but are more flexible and may offer better performance (higher throughput).
You may need to run some benchmarks for your specific application/environment to determine what will work best for you. From the description provided, it sounds like Unix domain sockets may be the best fit.
Beej's Guide to Unix IPC is good for getting started with Linux/Unix IPC.
I would go for Unix Domain Sockets: less overhead than IP sockets (i.e. no inter-machine comms) but same convenience otherwise.
Can't believe nobody has mentioned dbus.
http://www.freedesktop.org/wiki/Software/dbus
http://en.wikipedia.org/wiki/D-Bus
Might be a bit over the top if your application is architecturally simple, in which case - in a controlled embedded environment where performance is crucial - you can't beat shared memory.
If performance really becomes a problem you can use shared memory - but it's a lot more complicated than the other methods - you'll need a signalling mechanism to signal that data is ready (semaphore etc) as well as locks to prevent concurrent access to structures while they're being modified.
The upside is that you can transfer a lot of data without having to copy it in memory, which will definitely improve performance in some cases.
Perhaps there are usable libraries which provide higher level primitives via shared memory.
Shared memory is generally obtained by mmaping the same file using MAP_SHARED (which can be on a tmpfs if you don't want it persisted); a lot of apps also use System V shared memory (IMHO for stupid historical reasons; it's a much less nice interface to the same thing)
As of this writing (November 2014) Kdbus and Binder have left the staging branch of the linux kernel. There is no guarantee at this point that either will make it in, but the outlook is somewhat positive for both. Binder is a lightweight IPC mechanism in Android, Kdbus is a dbus-like IPC mechanism in the kernel which reduces context switch thus greatly speeding up messaging.
There is also "Transparent Inter-Process Communication" or TIPC, which is robust, useful for clustering and multi-node set ups; http://tipc.sourceforge.net/
Unix domain sockets will address most of your IPC requirements. You don't really need a dedicated communication process in this case since kernel provides this IPC facility. Also, look at POSIX message queues which in my opinion is one of the most under-utilized IPC in Linux but comes very handy in many cases where n:1 communications are needed.
Related
In order to develop a highly network intensive server application on linux, what sort of architecture is preferred? The idea is that this app would typically run on machines with multiple cores (either virtual or physical). Considering that performance is the key criteria, is it better to go for a multi-threaded application or the one with multi-process design? I do know that sharing of resources and synchronization to access of such resources from multiple processes is a lot of programming overhead, but as mentioned earlier overall performance is the key requirement and so we can ignore those things. And the programming language would be C/C++.
I have heard that even the multi-threaded applications (single process) can take advantage of multiple cores and run each thread on a different core independently (as long as there is no sync issues). And this scheduling is done by the kernel. If so, is there not much difference in performance between multi-threaded applications and multi-process applications? Nginx uses a multi-process architecture and is really quick, but can one get the same performance with multi-threaded applications?
Thanks.
Processes and threads on linux are very similar to each other - the main difference is that the whole virtual memory is shared as well as certain things like signal handling differ.
This makes for cheaper context switches between threads (no need for costly MMU reloads etc.) but doesn't necessarily cause much difference in speed (especially outside of thread creation).
For designing a highly network intensive application, basically the only solution is to use an evented architecture (otherwise you'll bog down the system with huge amount of processes/threads and spend more time on their management than actually running work code), where you react to I/O on sockets and based on which sockets exhibit activity do apropriate operations.
A famous writeup about the problems faced in such situations is "The C10k problem", available from http://www.kegel.com/c10k.html - it describes different I/O approaches, so despite being a bit dated, it's a very good introduction.
Be careful before jumping deeply into reactor-like designs, though - they can get unwieldy and complex, so see if you can't use library/language that provides a nicer abstraction over it (Erlang is my personal favourite in this, languages with coroutines like Go can be useful too).
If your threads are doing the job independent from one another, under linux, there is simply no reason to not going with multiple processes instead. Multiple processes would increase your memory usage as each process has its own private memory space, but on the other hand sharing the memory space between independent threads is the worse decision. Context switching between threads vs processes is usually done better for processes rather than threads although its a little bit architecture and code dependent. Processes are safe to not get serialized with locks and mutex es. Processes are easier to manage and interact with in Linux. here is a good document you might find interesting (http://elinux.org/images/1/1c/Ben-Yossef-GoodBadUgly.pdf).
In Xenomai's API of Posix skin, I find the following:
POSIX skin.
Clocks and timers services.
Condition variables services.
Interruptions management services.
Message queues services.
Mutex services.
Semaphores services.
Shared memory services.
Signals services.
Threads management services.
Thread cancellation.
Threads scheduling services.
Thread creation attributes.
Thread-specific data.
I can't see anything regarding the file handling and socket programming, so I am guessing that perhaps file handling and sockets are not to be dealt in the real time? Is the guess wrong?
Please guide.
Xenomai and its origin, RTAI, both take control of your scheduler, handling the linux kernel itself as a non-real-time thread.
They have provided many services, most of which as you can see is related to threads and synchornization that do NOT call Linux API (in kernel space) or system calls (in user space). As you know, real-time is all about "guaranteeing deadline" and calling Linux violates it (because Linux doesn't guarantee anything).
Since drivers are also important in real-time systems, they have implemented the real-time driver model, or RTDM that helps both implementing and using device drivers in a real-time context.
File handling in kernel is strongly frowned upon. If you are talking about user-space real-time applications, then you can have access to any drivers that is implemented in RTDM. If you don't find one for file handling or sockets, then no you can't use them. Note that even a printf uses Linux system calls and is forbidden.
Note that if you do use them, nothing breaks, you just lose your real-time-ness! I personally do use files for logging, but only call them in case of an error that means real-time is already ruined anyway.
I don't know about Xenomai, but at least in RTAI, if you call a Linux system call, then you get a warning like "RTAI: LXRT changed mode: syscall ..." in your kernel logs.
The real-time is a property of the ENTIRE SYSTEM. To achieve property of real-time in the system all its components (including hardware, operating system, drivers, libraries, and applications) should be designed taking into account the requirements applied to real-time systems. Such components (like RTOS) can be used to build a real-time system. But they usage doesn't automatically mean that final system will be a real-time system. Actually, if at least one of the component of your system doesn't support requirements of real time systems, your entire system won't be real-time!
Real-time systems usually has resources significantly exceeding the average requirements of the real-time tasks. Unconsumed resources can be used for performing useful but non-critical background tasks, such as logging, monitoring of the system state, statistics collection and analysis, etc. Applications that will perform this tasks can be designed as non real-time components which run atop of real-time components. This design is safe if you are sure that all components participating in real-time tasks support requirements of real-time. Due to this direct answer to your question:
It completely depends from application. In general, all code, that is not used in handling of real-time tasks, CAN BE written as non real-time. All code which is used in the handling of real-time tasks MUST BE written as real-time.
What the Xenomai is doing is isolation of the non-real time Linux and its activities used for handling of non-real-time tasks in the special container, which is run atop of RTOS kernel and in parallel with RTOS-based real-time tasks. To build real-time system on the Xenomai bases your application should rely only to the Xenomai API and on the other libraries and APIs which are known and are proven to be a real-time. All background activities which can be useful, but completely uncritical can be written as a ordinal Linux applications.
Such systems and services as storage and network services usually are not used in real-time tasks, because the commonly used hardware is very indeterministic and thus doesn't fit well into real-time concept. It is hard to say a priory how much time it will take to send five packets over network or write a file into the HDD. Due to this, interfaces for such systems are not commonplace. But again, the application dictates what real-time services it needs. I can imagine real-time tasks, which involve storage and network actions. In the case of such tasks designer is forced to find such system components, which will provide real-time storage and network services. As you can see, Xenomai is not a candidate.
what are the disadvantages of RPC with respect to message passing?
Are you talking about RPC vs Messaging? As in (typically) asynchronous messaging? If that's what you're talking about, then Messaging tends to be more robust at the cost of complexity and extra infrastructure.
The simplest example is if you have a scenario where you RPC->RPC->RPC, you end up having a call stack that's 3 processes/machines deep. Any one of those processes/machine could fail during processing, and the entire stack unwinds.
If you were messaging, the actual connectivity between the processes is much less. You hand the message off, and you're on your way. Now if one of the processes fail, there's a good chance of it being restarted where it left off, since, typically, the message is still sitting on a queue somewhere waiting for a new process to fetch it. The overall time may be longer, but it's a much more robust system.
Mind it's no panacea, there are a lot of pitfalls with an asynchronous architecture, but this robustness is a prime distinction between RPC and Messaging systems.
As a general rule, RPC provides a higher level of abstraction than some other means of interprocess communication. This makes it, perhaps, easier to use than lower level primitives. For this abstraction you may pay some penalty in performance due to marshaling/unmarshaling and may have to deal with added complexity in configuration for simple scenarios.
You might be interested in this thesis (pdf) by Jackie Silcock which discusses differences between message passing, RPC, and distributed shared memory with respect to several different measures of performance and implementation. You can also read one of the papers based on the thesis: Message Passing, Remote Procedure Calls and
Distributed Shared Memory as Communication
Paradigms for Distributed Systems (pdf)
The idea of automatic memory management has gained big support with new programming languages. I'm interested if concepts exists for automatic management of other resources like files, network-sockets etc.?
For single threaded applications, the pattern of a resource being available for the extent of a block of code, with clean-up at the end, exists in several languages. Examples are the use of RAII in C++, or with-open-file in Common Lisp (and equivalent in newer Lisp-influenced languages - the same in Dylan, C#, Python and in Ruby you can pass a block to a file object).
I'm not aware of anything better suited for the multithreaded environments where modern garbage collection shines, short of combining RAII and reference counting or auto_ptr in C++, which isn't always a trivial combination.
One important distinction between automatic management of resources and automatic memory management is that memory management can often afford to be non-deterministic and only reclaimed when the process requires it, whereas often a resource is limited at an OS level, so should be reclaimed as soon as it is no longer used. Hence the choice of smart pointers rather than garbage collection as the management implementation. There's an intermediate level of resource - GDI objects, temporary file handles, threads - where an application wants to limit the total it uses, but doesn't care so much about releasing them to other processes - these are often pooled, which gets you some way towards automatic management.
One of the reasons we can automatically manage memory allocation now is we have so much of it.
Back in the days when memory was tight you had to squeeze the most out of every bite the system had.
Other resouces such as file handles and sockets are far fewer, and still need to be handled by hand (pun intended).
Consider also the .net compact framework, it’s not uncommon for windows mobile devices to have 32mb or 64mb of volatile memory to play with which - when you think about it - is still “lots”.
I wondering what the footprint of the .net compact framework is, and how would it perform on a Nokia phone with 4mb of volatile memory.
Anyone any ideas?
(This is a wiki answer, feel free to correct or add more detail)
So, IMHIO we can afford to be slow reclaiming memory, because we're not going to run out of it in a hurry, which isn't the case with other resources.
Object persistence and caching subsystems can be considered an automatic allocation of files and resources. If you apply a caching subsystem to a network connection you don't have to care about file opening, file deleting, and so on.
A way to manage automatically network connection could be done in parallel computing environment (i.e MPI), you can set programmatically the shape of the processors interconnections. Than you just send a message from a process to another, almost ignoring the way it's implemented. Sometimes those messages are translated in sockets.
If you have a function that let you get a page from its Url, would you consider it a sort of Automatic socket management?
We've all read the benchmarks and know the facts - event-based asynchronous network servers are faster than their threaded counterparts. Think lighttpd or Zeus vs. Apache or IIS. Why is that?
I think event based vs thread based is not the question - it is a nonblocking Multiplexed I/O, Selectable sockets, solution vs thread pool solution.
In the first case you are handling all input that comes in regardless of what is using it- so there is no blocking on the reads- a single 'listener'. The single listener thread passes data to what can be worker threads of different types- rather than one for each connection. Again, no blocking on writing any of the data- so the data handler can just run with it separately. Because this solution is mostly IO reads/writes it doesn't occupy much CPU time- thus your application can take that to do whatever it wants.
In a thread pool solution you have individual threads handling each connection, so they have to share time to context switch in and out- each one 'listening'. In this solution the CPU + IO ops are in the same thread- which gets a time slice- so you end up waiting on IO ops to complete per thread (blocking) which could traditionally be done without using CPU time.
Google for non-blocking IO for more detail- and you can prob find some comparisons vs. thread pools too.
(if anyone can clarify these points, feel free)
Event-driven applications are not inherently faster.
From Why Events Are a Bad Idea (for High-Concurrency Servers):
We examine the claimed strengths of events over threads and show that the
weaknesses of threads are artifacts of specific threading implementations
and not inherent to the threading paradigm. As evidence, we present a
user-level thread package that scales to 100,000 threads and achieves
excellent performance in a web server.
This was in 2003. Surely the state of threading on modern OSs has improved since then.
Writing the core of an event-based server means re-inventing cooperative multitasking (Windows 3.1 style) in your code, most likely on an OS that already supports proper pre-emptive multitasking, and without the benefit of transparent context switching. This means that you have to manage state on the heap that would normally be implied by the instruction pointer or stored in a stack variable. (If your language has them, closures ease this pain significantly. Trying to do this in C is a lot less fun.)
This also means you gain all of the caveats cooperative multitasking implies. If one of your event handlers takes a while to run for any reason, it stalls that event thread. Totally unrelated requests lag. Even lengthy CPU-invensive operations have to be sent somewhere else to avoid this. When you're talking about the core of a high-concurrency server, 'lengthy operation' is a relative term, on the order of microseconds for a server expected to handle 100,000 requests per second. I hope the virtual memory system never has to pull pages from disk for you!
Getting good performance from an event-based architecture can be tricky, especially when you consider latency and not just throughput. (Of course, there are plenty of mistakes you can make with threads as well. Concurrency is still hard.)
A couple important questions for the author of a new server application:
How do threads perform on the platforms you intend to support today? Are they going to be your bottleneck?
If you're still stuck with a bad thread implementation: why is nobody fixing this?
It really depends what you're doing; event-based programming is certainly tricky for nontrivial applications. Being a web server is really a very trivial well understood problem and both event-driven and threaded models work pretty well on modern OSs.
Correctly developing more complex server applications in an event model is generally pretty tricky - threaded applications are much easier to write. This may be the deciding factor rather than performance.
It isn't about the threads really. It is about the way the threads are used to service requests. For something like lighttpd you have a single thread that services multiple connections via events. For older versions of apache you had a process per connection and the process woke up on incoming data so you ended up with a very large number when there were lots of requests. Now however with MPM apache is event based as well see apache MPM event.