How to set up communication between two processes? - linux

I have the following situation:
A daemon that does a privileged operation on data that is kept in memory.
A multithreaded server currently running on about 30 cores handling user requests.
The server (1) would receive queries from (2), process them one by one, and return an answer. Each query to (1) would never block and only take a fraction of a microsecond on (1) to process, so we are guaranteed to get responses back fast unless (1) gets overrun by too much load.
Essentially, I would like to set up a situation where (1) listens to a UNIX domain socket and (2) writes requests and reads responses. However, I would like each thread of (2) to be able to read and write concurrently. My idea is to have one UNIX socket per thread for communication between (1) and (2) have (1) block on epoll_wait on these sockets processing requests one by one. Each thread on (2) would then read and write independently to its socket.
The problem that I see with this approach is that I can't easily dynamically grow the number of threads on (2). Is there a way to accomplish this in a way that is flexible with respect to runtime configuration? I guess one approach would be to have a large number of sockets and a thread on (2) would pick one socket by random, take a mutex on it, write a query and block waiting for a response, then release the mutex once it gets a response back from (1).
Anyone have better ideas?

I would suggest a viable possibility is to go with your own proposal and have each thread create its own socket for communicating with the daemon. You can use streaming (tcp) sockets which can easily solve your problem of adding more threads dynamically:
The daemon listens on a particular port, using socket(), bind() and listen(). The socket being listened to is initially the only thing in its epoll_wait set.
The client threads connect to this port with connect()
The daemon server accepts (with accept()) the incoming connection to create a new socket, which is added to its epoll_wait set with epoll_ctl().
The above procedure can be used to arbitrarily add as many sockets as you need, all with a single epoll_wait loop on the daemon side.

Related

epoll: must I use multi-threading

I've got a basic knowledge from here about epoll. I know that epoll can monitor multiple FDs and handle them.
My question is: can a heavy event block the server so I must use multithreading?
For example, the epoll of a server is monitoring 2 sockets A and B. Now A starts to send lot of messages to the server so the server starts to read them. One second later, B starts to send messages too while A is still sending. In this case, Need I create a thread for these read actions? If I don't, does it mean that the server has no chance to get the messages from B until A finishes its sending?
If you can process incoming messages fast enough (no blocking calls, no heavy computations), you don't need a separate thread. Otherwise, you would benefit from going multi-threaded.
In any case, it helps to understand what happens when you have only one thread and you can't process messages fast enough. If you are working with TCP protocol, the machines sending you the data will simply reduce their transmission rate. When using UDP, some incoming packets will get dropped.

Is a mutex needed on a listener socket shared between child processes?

I'm developing a server application using C++. I designed it in a such way that there will be main process, responsible for maintaining child processes (workers). Workers accept() new connections and create threads for handle them individually.
Suppose I create a listener socket in main process and each worker would monitor it (using kqueue, epoll, etc.) for new connections. After researching a bit, I found some affirmations of the need of using mutex on listener socket to prevent concurrent accept()s that would lead workers accept()ing the same connections at same time.
Well, being aware of such need, I'm not sure what is the best way to distribute client connections among workers, as the result will be the same as accept() them on main process and send somehow just the new socket FD to workers (new connections handling becomes blocking - one accept() at a time).
My question is: Is mutex on listening socket really needed? Am I right of its accept() blocking (one new connection accept()ed at a time) side effect?
I'm concerned about this single detail because this application must scale to up to thousands of new connections per second (exact number may vary, as this applications is intended to be used on networks with from 100s to 1000s of clients).
A long time ago there were operating systems that had race conditions if multiple processes performed an accept concurrently on the same socket. Apache used to have an optional accept mutex to resolve this.
This problem has long since been solved on every operating system you're likely to use and it's perfectly reasonable to use a shared socket that workers call accept on. If you want each worker to handle only one connection at a time, an idle worker can block in accept on a shared socket.
I'm concerned about this single detail because this application must scale to up to hundred of thousands or even millions of new connections per second. I want to avoid the work of writing two complex applications for the sole purpose of comparing both methods performance. Also, I've no way to simulate real world simultaneous connections.
You can't have it both ways. Either you abandon such ambitious scaling plans or you accept that you will have numerous major efforts on your hand. Just simulating that kind of connection load for testing would be a major effort.
I can't answer the part of your question about how threadsafe the listen() and accept() calls are, because I would never even consider trying that. What I would do is have the main thread doing the listen() and accept(), and forking a new thread when accept() returns, passing the socket off to the thread.
Similarly, you could have a bunch of running threads, and mutex a variable that will do the socket notification. Basically the same as above, but rather than create a thread at accept time, you notify an already running thread of the socket descriptor. General pseudocode might be:
main()
{
listen();
while(true)
{
int socket = accept();
if(fork() == 0)
{
DoMyThing(socket);
}
}
}

Multi threaded Linux Socket programming design

I am trying to write a server program which supports one client till now and over the few days i was trying to develop it, I concluded i needed threads. The reason for such a decision was since I take input from a wifi socket and later process it and finally write to a file, the processing time is slow and hence i needed a input thread -> circular buffer -> output thread pattern with producer consumer model which is quite common in network programming.
Now, The situation becomes complicated, as I need to manage client disconnection and re connection. I thought of using pthread_exit() and cleaning up all the semaphores and then re initializing them each time the single client re connects.
My question is that is this a efficient approach i.e. everytime killing the threads and semaphores and re creating them. Are there any better solutions.
Thanks.
My question is that is this a efficient approach i.e. everytime killing the threads and semaphores and re creating them. Are there any better solutions.
Learn how to use non-blocking sockets and an event loop. Or use a library that provides TCP sessions for you using non-blocking sockets under the hood. Such as boost::asio.
Learn how to use multi-threading without polluting your code with any synchronization primitives by using message passing to communicate between threads, not shared state. The event loop library you use for non-blocking I/O should also provide means for cross-thread message passing.
Some comments and suggestions.
1-In TCP detecting that the other side has silently disconnected it very difficult if not impossible. A client could disconnect sending a RST TCP message to the server or sending a FIN message, this is the good case. Sometimes the client can disconnect without notice (crash, cable disconnection, etc).
One suggestion here is that you consider the way client and server will communicate. For example, you can use function “select” to set a timeout for receiving a message from client and detect a silent client.
Additionally, depending on the programming language and operating system you may need to handle broken pipe (SIGPIPE) signal (in Linux, with C/C++), for a server trying to send a message through a connection closed by the client.
2-Regarding semaphores, you shouldn’t need to clean semaphores in any especial way when a client disconnect. By applying common good practices of locking and unlocking mutexes should be enough. Also with resources like file descriptors, you need to release them before ending the thread either by returning from the thread start function or with pthread_exit. Maybe I didn’t understand this part of the question.
3-Regarding threads: if you work with multiple threads to optimum is to have a pool of pre-created consumer/worker threads that will check the circular buffer to consume the next available connection. Creating and destroying threads is costly for the operating system.
Threads are resource consuming and you may exhaust operating system resources if you need to create 1,000 threads for example.
Another alternative, is to have only one consumer thread that manages all connections (sockets) asynchronously: a) Each connection has its own state. b) The main thread goes through all connections and use function “select” to detect when connection reads or a writes are ready. 3)Use of non-blocking sockets but this is not essential because from select you know which sockets are ready and will not block.
You can use functions select, poll, epoll.
One link about select and non-blocking sockets: Using select() for non-blocking sockets
Other link with an example: http://linux.die.net/man/2/select

UNIX socket magic. Recommended for high performance application?

I'm looking using to transfer an accept()ed socket between processes using sendmsg(). In short, I'm trying to build a simple load balancer that can deal with a large number of connections without having to buffer the stream data.
Is this a good idea when dealing with a large number (let's say hundreds) of concurrent TCP connections? If it matters, my system is Gentoo Linux
You can share the file descriptor as per the previous answer here.
Personally, I've always implemented servers using pre-fork. The parent sets up the listening socket, spawns (pre-forks) children, and each child does a blocking accept. I used pipes for parent <-> child communication.
Until someone does a benchmark and establishes how "hard" it is to send a file descriptor, this remains speculation (someone might pop up: "Hey, sending the descriptor like that is dirt-cheap"). But here goes.
You will (likely, read above) be better off if you just use threads. You can have the following workflow:
Start a pool of threads that just wait around for work. Alternatively you can just spawn a new thread when a request arrives (it's cheaper than you think)
Use epoll(7) to wait for traffic (wait for connections + interesting traffic)
When interesting traffic arrives you can just dispatch a "job" to one of the threads.
Now, this does circumvent the whole descriptor sending part. So what's the catch ? The catch is that if one of the threads crashes, the whole process crashes. So it is up to you to benchmark and decide what's best for your server.
Personally I would do it the way I outlined it above. Another point: if the workers are children of the process doing the accept, sending the descriptor is unnecessary.

Can Threads Share the same Client Socket?

Im using TClientSocket or indy's TIdTCPClient (depending on project)
I have a few Threads each processing items, and sometimes need to send data over the connected client socket. (Data Read form the socket is NOT used in the processing threads)
Basically my question is...
Is the possible?
is it "safe"?
or should I
have a client socket per thread or
some kinda of Marshalling/critical sections
delphi-7 indy-9
Multiple threads can read and write to the same socket. Since everytime you accept, it shall extract the first connection on the queue of pending connections, create a new socket with the same socket properties and allocate a new file descriptor for that socket.
So only one thread per accepted connection.
If you are asking if you can do multiple write/read on an accepted connection, you will need locking features, hence loss of parallelism benefits. If you want to thread a long process and then write the result in the socket, use synchronization to write in the correct order.

Resources