We have used rpcgen to create a rpc server on Linux machine (c language).
When there are many calls to our program it still results in a single
threaded request.
I see that it's common problem from 2004, there is a new rpcgen (or other genarator) that solved this problem?
Thanks,
Kobi
rpcgen will simply generate the serialization routines. Your server might be coded to have several threads. Learn more about pthreads.
You probably should not have too many threads (e.g. at most a dozen, not thousands). You could design your program to use some thread pool, or simply to have a fixed set of worker threads which are continuously handling RPC requests (with the main thread just in charge of accepting connections, etc).
Read rpc(3). You might consider not using svc_run in your server, but instead doing it your own way with threads. Beware that if you use threads, you'll need to synchronize, perhaps with mutex.
You could also consider JSONRPC, or perhaps making your C program some specialized HTTP server (e.g. using libonion) and have your clients do HTTP requests (maybe with libcurl). See also this. And you might consider a message passing architecture, perhaps with Open-MPI.
Beware sun version is being abandoned, look for tirpc
Related
We have a DLL that provides an API for a USB device we make that can appear as a USB CDC com port. We actually use a custom driver on windows for best performance along with async i/o, but we have also used serial port async file i/o in the past with reasonable success as well.
Latency is very important in this API when it is communicating with our device, so we have structured our library so that when applications make API calls to execute commands on the device, those commands turn directly into writes on the API caller's thread so that there is no waiting for a context switch. The library also maintains a listening thread which is always waiting using wait objects on an async read for new responses. These responses get parsed and inserted into thread-safe queues for the API user to read at their convenience.
So basically, we do most of our writing in the API caller's thread, and all of our reading in a listening thread. I have tried porting a version of our code over to using QSerialPort instead of native serial file i/o for Windows and OSX, but I am running into an error whenever I try to write() from the caller's thread (the QSerialPort is created in the listening thread):
QObject: Cannot create children for a parent that is in a different thread.
which seems to be due to the creation of another QObject-based WriteOverlappedCompletionNotifier for the notifiers pool used by QSerialPortPrivate::startAsyncWrite().
Is the current 5.2 version of QSerialPort limited to only doing reads and writes on the same thread? This seems very unfortunate as the underlying operating systems do not have any such thread limitations for serial port file i/o. As far as I can tell, the issue mainly has to do with the fact that all of QSerialPort's notifier classes are based on QObject.
Does anyone have a good work around to this? I might try building my own QSerialPort that uses notifiers not based on QObject to see how far that gets me. The only real advantage QObject seems to be giving here is in the destruction of the notifiers when the port closes.
Minimal Impact Solution
You're free to inspect the QSerialPort and QIODevice code and see what would need to change to make the write method(s) thread-safe for access from one thread only. The notifiers don't need to be children of the QSerialPort at all, they could be added to a list of pointers that's cleaned up upon destruction.
My guess is that perhaps no other changes are necessary to the mainline code, and only mutex protection is needed for access to error state, but you'd need to confirm that. This would have lowest impact on your code.
If you care about release integrity, you should be compiling Qt yourself anyway, and you should be having it as a part of your own source code repository, too. So none of this should be any problem at all.
On the Performance
"those commands turn directly into writes on the API caller's thread so that there is no waiting for a context switch" Modern machines are multicore and multiple threads can certainly run in parallel without any context switching. The underlying issue is, though: why bother? If you need hard-realtime guarantees, you need a hard-realtime system. Otherwise, nothing in your system should care about such minuscule latency. If you're doing this only to make the GUI feel responsive, there's really no point to such overcomplication.
A Comms Thread Approach
What I do, with plenty of success, and excellent performance, is to have the communications protocol and the communications port in the same, dedicated thread, and the users in either the GUI thread, or yet other thread(s). The communications port is generally a QIODevice, like QTcpSocket, QSerialPort, QLocalSocket, etc. Since the communications protocol object is "just" a QObject, it can also live, with the port, in the GUI thread for demostration purposes - it's designed fully asynchronously anyway, and doesn't block for anything but most trivial of computations.
The communications protocol is queuing multiple requests for execution. Even on a single-core machine, once the GUI thread is done submitting all of the requests, the further execution is all in the communications thread.
The QSerialPort implementation uses asynchronous OS APIs. There's little to no benefit to further processing those async replies on separate threads. Those operations have very low overhead and you will not gain anything measurable in your latency by trying to do so. Remember: this is not your code, but merely code that pushes bytes between buffers. Yes, the context switch overhead may be there on heavily loaded or single-core systems, but unless you can measure the difference between its presence and absence, you're fighting imaginary problems.
It is possible to use any QObject from multiple threads, of course, as long as you serialize the access to it via the event queue mutex. This is done for you whenever you use the QMetaObject::invokeMethod or signal-slot connections.
So, add a trivial wrapper around QSerialPort that exposes the write as a thread-safe method. Internally, it should use a signal-slot connection. You can call this thread-safe write from any thread. The overhead in such a call is a mutex lock and 2+n malloc/free calls, where n is the non-zero number of arguments.
In your wrapper, you can also process the readyRead signal, and emit a signal with received data. That signal can be processed by a QObject living in another thread.
Overall, if you do the measurements correctly, and if your port thread's implementation is correct, you should find no benefit whatsoever to all this complication.
If your communications protocol does heavy data processing, this should be factored out. It could go into a separate QObject that can then run on its own thread. Or, it can be simply done using dedicated functors that are executed by QtConcurrent::run.
What if you use QSerialPort to open and configure the serial port, and QSocketNotifier to monitor for read activity (and other QSocketNotifier instances for write completion and error handling, if necessary)?
QSerialPort::handle should give you the file descriptor you need. On Windows, if that function returns a Windows HANDLE, you can use _open_osfhandle to get a file descriptor.
As a follow up, shortly after this discussion I did implement my own thread-safe serial port code for POSIX systems using select() and the like and it is working well on multiple threads in conjunction with Qt and non-Qt applications alike. Basically, I have abandoned using QtSerialPort at all.
The following two diagrams are my understanding on how threads work in a event-driven web server (like Node.js + JavaScript) compared to a non-event driven web server (like IIS + C#)
From the diagram is easy to tell that on a traditional web server the number of threads used to perform 3 long running operations is larger than on a event-driven web server (3 vs 1.)
I think I got the "traditional web server" counts correct (3) but I wonder about the event-driven one (1). Here are my questions:
Is it correct to assume that only one thread was used in the event-driven scenario? That can't be correct, something must have been created to handle the I/O tasks. Right?
How did the evented server handled the I/O? Let's say that the I/O was to read from a database. I suspect that the web server had to create a thread to hand off the job of connecting to the database? Right?
If the event-driven web server indeed created threads to handle the I/O where is the gain?
A possible explanation for my confusion could be that on both scenarios, traditional and event-driven, three separate threads were indeed created to handle the I/O (not shown in the pictures) but the difference is really on the number of threads on the web server per-se, not on the I/O threads. Is that accurate?
Node may use threads for IO. The JS code runs in a single thread, but all the IO requests are running in parallel threads. If you want some JS code to run in parallel threads, use thread-a-gogo or some other packages out there which mitigate that behaviour.
Same as 1., threads are created by Node for IO operations.
You don't have to handle threading, unless you want to. Easier to develop. At least that's my point of view.
A node application can be coded to run like another web server. Typically, JS code runs in a single thread, but there are ways to make it behave differently.
Personally, I recommend threads-a-gogo (the package name isn't that revealing, but it is easy to use) if you want to experiment with threads. It's faster.
Node also supports multiple processes, you may run a completely separate process if you also want to try that out.
The best way to picture NodeJS is like a furious squirrel (i.e. your thread) running in a wheel with an infinite number of pigeons (your I/O) available to pass messages around.
I/O in node is "free". Your squirrel works to set up the connection and send the pigeon off, then can go on to do other things while the pigeon retrieves the data, only dealing with the data when the pigeon returns.
If you write bad code, you can end up having the squirrel waiting for each pigeon.
So always write non-blocking i/o code.
If you can encourage your Pigeons to promise to come back ;)
Promises and generators are probably the best approach you can take to this.
HOWEVER you can always use Node cluster to establish a master squirrel that will procreate child squirrels based on the number of CPUs the master squirrel can find to dole out the work.
Hope this helps and note the complete lack of a car analogy.
I'm following the boost-asio tutorial and don't know how to make a multi-threaded server using boost. I've compiled and tested the daytime client and daytime synchronous server and improved the communication (server asks the client for a command, processes it and then returns the result to the client). But this server can handle only one client at one time.
I would like to use boost to make a multi-threaded server. There is also daytime asynchronous server which executes
boost::asio::io_service io_service;
tcp_server server(io_service);
io_service.run();
in the main program function. The question is - is boost creating a thread for each client somewhere inside? Is this a multi-threaded solution? If not - how to make a multi-threaded server with boost? Thanks for any advice.
have a look at this tutorial. in short terms:
io_service.run() in multiple threads gives a thread pool
multiple io_services give completely separated threads
You don't need to explicitly work with threads when you want to support multiple clients. But for that you should use asynchronous calls (as opposed to synchronous, which are used in the tutorials you listed). Have a look at the asynchronous echo tcp server example, it serves multiple clients without using threads.
is boost creating a thread for each client somewhere inside?
When working with asynchronous calls, boost asio is doing these things behind the scenes. It could use threads, but it usually doesn't because there are other, preferred mechanisms for working with multiple sockets at once. For example on linux you have epoll, select and poll (in order of preference). I'm not sure what the situation is on windows, there might be other mechanisms or the preference order might be different. But in any case, boost asio takes care of this, chooses the best mechanism there is for your platform and hides it behind those asynchronous calls.
There is a lot on multithreading on the Corba server side, but I'm interested about the client side. We have a multithreaded client (Solaris, Orbix 6.3) with a Corba singleton "manager" that initialises the ORB. During runtime 'lsof' shows only one TCP connection to the Corba server, so all synchronous calls made from the client worker threads should be serialised.
Would like to change this arrangement to take advantage of parallelism: each thread to manage its own connection. I've changed the setup so that instead of a singleton each worker thread calls ORB_init(), etc.
Totally puzzled now: 'lsof' shows now 2 TCP connections but there are 6 worker threads.
Something is not right, would have expected as many TCP connections as the number of worker threads. May be that the approach is naive - does it makes sense for example to call ORB_init() per thread?
I'd need someones opinion on this. Sample code for a multithreaded client would greatly help. Again, using Orbix 6.3 on Solaris.
Kind regards,
Adrian
The management of connections is implementation specific for plain CORBA. Each vendor has its own proprietary way of configuration their behavior. If you check the RTCORBA specification, that has a standardized way to configure how connections between client and server will be used.
I don't know how Orbix works and whether it supports RTCORBA, that is something you could get from their manuals probably. I do know that TAO has a lot of support for threading at the client side. By default when multiple threads make an invocation to the same server multiple tcpip transports can be opened at the same moment.
Thank you guys for your answers. I found, as Johnny says that this is indeed implementation specific.
omniORB has for example maxGIOPConnectionPerServer - default 5. That's:
The maximum number of concurrent connections the ORB will open to a single server. If multiple threads on the client call the same server, the ORB opens additional connections to the server, up to the maximum specified by this parameter. If the maximum is reached, threads are blocked until a connection becomes free for them to use.
Unfortunately I haven't yet found out what's the equivalent (if any) for Orbix. It's definitely defaulting to 1 connection. Still googling...
Found out though that as part of Solaris -> Linux migration will be moving from Orbix to TAO in a number of months. Hoping TAO would be more friendly and customizable.
Orbix internally uses a lot of optimization routines to ensure that connections are used efficiently. Specifically, it's not going to open up multiple connections to the same server endpoint because it's able to multiplex multiple concurrent GIOP requests over the same TCP connection. CORBA deliberately hides connection management from client and server programmers.
I don't believe this is controllable through configuration. Send a support ticket to Progress Support to confirm. You might be able to force it to happen if you move away from the singleton model and initialize a different ORB for each client (each with their own unique ID), but that would be a very heavy-handed and costly solution to a problem that is a little vague. The underlying ORB is already build to optimize for concurrent requests, so I'm not sure what problem it is you're trying to solve.
In my honest opinion I don't think there is such a concept called multi threaded client for CORBA applications. Because in the server side, there is only one object that is registered with the naming service which is available for all the clients. If you look at the IOR of the object, it will be same for all the clients. So it can establishes at most only one connection to that object. It also leads to thinking that you can not get more than one remote object (which means how much ever you do look-up for the object from different clients, they all get the same reference) for any number of clients. So, in order to support mutli-threading ,the server actually has to support different thread policies. POA the server can have different thread policies. Please go through JAVA PROGRAMMING WITH CORBA for more.
I don't know how exactly Orbix works, but normally ORB initialization in done only once even for a multithreaded setup. The multithreaded (server side) ORB will start an amount of worker threads (on demand or if needed or if configured, a fixed number) to handle incomming connection. These connections are handled by a worker. This worker looks up the servant that can handle this request. Normally this (the real call to the servant) is performed in an extra thread also. But you won't see this thread with lsof. Try so use ps -eLf or top -H with thread support enabled.
EDIT:
On the client side it depends on how many object do you want to call. For each object a caller thread is possible. It is also possible to have more than one caller thread per remote object, but only if called from different threads on the client side logic. (Imagine to have multiple threads and the remote object is shared across the threads)
I know non-blocking receive is not used as much in message passing, but still some intuition tells me, it is needed. Take for example GUI event driven applications, you need some way to wait for a message in a non-blocking way, so your program can execute some computations. One of the ways to solve this is to have a special thread with message queue. Is there some use case, where you would really need non-blocking receive even if you have threads?
Threads work differently than non-blocking asynchronous operations, although you can usually achieve the same effect by having threads that does synchronous operations. However, in the end, it boils down on how to handle doing things more efficiently.
Threads are limited resources, and should be used to process long running, active operations. If you have something that is not really active doing things, but need to wait idly for some time for the result (think some I/O operation over the network like calling web services or database servers), then it is better to use the provided asynchronous alternative for it instead of wasting threads unnecessarily by putting the synchronous call on another thread.
You can have a good read on this issue here for more understanding.
One thread per connection is often not a good idea (wasted memory, not all OS are very good with huge thread counts, etc)
How do you interrupt the blocking receive call? On Linux, for example (and probably on some other POSIX OS) pthreads + signals = disaster. With a non-blocking receive you can multiplex your wait on the receiving socket and some kind of IPC socket used to communicate between your threads. Also maps to the Windows world relatively easily.
If you need to replace your regular socket with something more complex (e.g. OpenSSL) relying on the blocking behavior can get you in trouble. OpenSSL, for example, can get deadlocked on a blocking socket, because SSL protocol has sender/receive inversion scenarios where receive can not proceed before some sending is done.
My experience has been -- "when in doubt use non-blocking sockets".
With blocking IO, it's challenging on many platforms to get your application to do a best effort orderly shutdown in the face of slow, hung, or disconnected clients/services.
With non-blocking IO, you can kill the in-flight operation as soon as the system call returns, which is immediately. If your code is written with premature termination in mind - which is comparatively simple with non-blocking IO - this can allow you to clean up your saved state gracefully.
I can't think of any, but sometimes the non-blocking APIs are designed in a way that makes them easier/more intuitive to use than an explicitly multi-threaded implementation.
Here goes a real situation I have faced recently. Formerly I had a script that would run every hour, managed by crontab, but sometimes users would log to the machine and run the script manually. This had some problems, for example concurrent execution by crontab and user could cause problems, and sometimes users would log in as root - I know, bad pattern, not under my control - and run script with wrong permissions. So we decided to have the routine running as daemon, with proper permissions, and the command users were used to run would now just trigger the daemon.
So, this user executed command would basically do two things: trigger the daemon and wait for it to finish the task. But it also needed a timeout and to keep dumping daemon logs to user while waiting.
If I understand the situation you proposed, I had the case you want: I needed to keep listening from daemon while still interacting with user independently. The solution was asynchronous read.
Lucky for me, I didn't think about using threads. I probably would have thought so if I were coding in Java, but this was Python code.
My point is, that when we consider threads and messaging being perfect, the real trade-off is about writing scheduler for planning the non-blocking receive operations and writing synchronizations codefor threads with shared state (locks etc.). I would say, that both can be sometime easy and sometime hard. So an use case would be when there are many messages asynchronous messages to be received and when there is much data to be operated on based on the messages. This would be quite easy in one thread using non-blocking receive and would ask for much synchronization with many threads and shared state.... I am also thinking about some real life example, I will include it probably later.