why cpu not balance when multiprocess epoll on the same socket? - multithreading

I hava a high performance server receiving incoming connections, The master process listen/bind on the tcp port, and fork itself to some workers.
The workers then use epoll to watch the incoming connection event, try to accept connection if event happens.
it works well, but when I count the connections each workers handled(or the CPU utils each worker consumed), I found it not balance at all.
for example:
One Busy Worker: handling 10k connections, and consumer 20% CPU;
One Idle Worker: handling 300 conenctions, and consumer 4% CPU;
My server running on a RHEL6.5 OS(2.6.32 kernel).
Would anyone can help me on this issue?
EDITED:
Why
After digging some kernel code(2.6.32.x), I found why the in-balance occures.
1 * MasterProcess:create and bind Listen socket;
n * WorkerPrecess:create epfd and monitor the listen socket from master.
When WorkProcess epoll_ctl(..., listen_sock,...), the kernel add the watch file to a rbtree of the epoll struct(#see fs/eventpoll.c ep_insert) and add the epoll struct to a wait_queue of the listen_sock by a callback (ep_ptable_queue_proc)
static void ep_ptable_queue_proc(struct file *file, wait_queue_head_t *whead,
poll_table *pt) {
...
add_wait_queue(whead, &pwq->wait);
...
}
//whead is the waitqueue of listen socket,
//and the *pt is a container of epfd's releated resource.
when a new connection is incoming(SYN_REC), the listen socket's event has been changed, kernel will iterator the waitqueue to notify the event to all epoll monitor on the socket by a callback given by epoll. the callback is ep_poll_callback (#see fs/eventpoll.c), and the callback will wake up the Process(or Thread) wait on epoll_wait system call.
The sequences of listen socket's waitqueue will not change after the notify process. and the processes wait on the events will get notified with a fixed order. The process wake up early shall have more connections to handle vs the the last process get notified. That causes the in-balance.
FIX
1 * MasterProcess create a epfd for all WorkerProcess;
2 * WorkerProcess wait on the same epfd by epoll_wait;
in this case, we have only one epoll struct at all in the kernel level. When event occurs, only one epoll struct's wait up call_back will be called.
the epoll struct's wake up callback is:
static int ep_poll_callback(wait_queue_t *wait, unsigned mode, int sync, void *key)
Right now, all WorkerProcess's boss thread is wait on epoll_wait, and the ep_poll_callback will only wake up one of them.
While a WorkerProcess wake up by epoll, it will remove itself from the wait_queue, and re-add them self to the tail of wait_queue if epoll_wait is called. So, we can wake up WorkerProcess one by one.

There is no problem.
The purpose of having multiple workers is for at least one to available, even if the others are busy.
But when multiple workers are waiting, it does not matter which one gets the event. A process/thread does not wear out simply because it's running for a longer time.
There is no balance because the kernel does not care. Neither should you.

Related

How to use poll/epoll with GLFW events polling

I need to use epoll on top of the GLFW event polling. My first try was to add X11 socket descriptor to the epoll and wait on events. If the descriptor becomes readable, I used glfwPollEvents() to drain X11 events.
But, to my surprise, the X11 file descriptor are readable all the time, which creates busy loop.
The question is, how to use GLFW event polling with some outer event polling interface.
Since glfwPollEvents, poll and epoll_wait are non-blocking (for poll and epoll if timeout is set to zero), why not call them in sequence. GLFW will handle the X11 Events, while poll or epoll_wait will handle the IO Events,
e.g.
while (active) {
glfwPollEvents(); //handle queued only display events
epoll_wait(..., 0); //handle io events
nanosleep(...); //instead of sleep,
//use the timeout functionality
//of the addressed functions (e.g. glfwWaitEventsTimeout)
}
If you have to populate an event queue with your own event structures, build them in the event handlers and push them onto the queue. If you call glfw and poll in sequence, there will be no danger of any asynchronicity.

How to gracefully exit from a blocking kernel_recvmsg()?

I have the following running in a kernel thread:
size = kernel_recvmsg(kthread->sock, &msg, &iov, 1, bufsize, MSG_DONTWAIT)
I receive UDP packets very frequently (every 10ms let's say). In order to process them as fast as possible I don't have any sleep in the loop where the kernel_recvmsg() is. Unfortunately I observe very big CPU consumption while the UDP packets are not coming.
If I make the socket blocking (remove the MSG_DONTWAIT) is there some indirect way to unblock and exit from kernel_recvmsg()..?
What would happen if I do unexpected sock_release()? Should kernel_recvmsg() unblock and return some error and I could handle it accordingly (exit from the loop and thread)?
It's easy to unblock a UDP wait from another thread - sendto() it a datagram to unblock it on the local stack, ( an empty one would be fine). You can use some 'abort' bolean flag to indicate to the post-kernel_recvmsg code to return or perform some other unusual action.
That said, there are few comms stacks on mature OS that do not unblock a socket wait if the socket is closed from another thread. I've heard on SO that this is unreliable, but I've never seen ANY case where an error return or exception is not immediately raised if the underlying socket is freed.

Send Recv on a socket from Multiple threads

I have a process ProcessA that starts 2 threads ThreadA and ThreadB. Both threads send and recv data from ProcessB using the same socket descriptor.
So essentially:
int s;
void thread_fnA(void*)
{
while(1) {
sendto(s);
recvfrom(s);
}
}
void thread_fnB(void*)
{
while(1) {
sendto(s);
recvfrom(s);
}
}
int main()
{
s = socket(AF_UNIX, SOCK_DGRAM, 0);
bind(s);
dispatch_thread(A);
dispatch_thread(B);
}
Is there a possibility that the message to be received by thread B could be received in thread A.
So sequence of events:
Thread A prepares a message and calls sendto();
Thread B starts executing and prepares a message and calls sendto();
Thread B calls recvfrom() simultaneously with Thread A.
However the message content expected by both threads are different.
Can the messages be exchanged, ThreadB destined message be received by ThreadA.
Should the send and receive be involved in some locks. (Mutex)
I would suggest another design, in where you have a single thread doing the sending and receiving, and message queues for the other threads.
When the send/receive thread receives a message it check what kind of message it is, and ad it to the (protected) queue of the correct processing thread. The processing threads (your current treads A and B) gets the messages from its respective message queue, and process the messages in any way it pleases. Then if thread A or B wants to send a message, it passes it to the send/receive thread using another queue, which the send/receive thread polls.
Alternatively, the processing threads (A and B in your example) could send directly over the socket. Or each have a different socket used only for sending.
Since you are using the same socket in both threads it is possible that one thread reads the message that is destined to the other thread. Even if you use mutex, the design would be very difficult. You can open two sockets (or even pipes):
One socket is for communication in the direction A->B
The second socket in the direction B->A
A second possibility is having one socket with one writer (thread A) and one reader (thread B). The reader, when it receives a datagram, it decides, maybe based on datagram payload, what task to do. Or it can also send a task to other set of workers that will process the datagram.

How do I "disengage" from `accept` on a blocking socket when signalled from another thread?

I am in the same situation as this guy, but I don't quite understand the answer.
The problem:
Thread 1 calls accept on a socket, which is blocking.
Thread 2 calls close on this socket.
Thread 1 continues blocking. I want it to return from accept.
The solution:
what you should do is send a signal to the thread which is blocked in
accept. This will give it EINTR and it can cleanly disengage - and
then close the socket. Don't close it from a thread other than the one
using it.
I don't get what to do here -- when the signal is received in Thread 1, accept is already blocking, and will continue to block after the signal handler has finished.
What does the answer really mean I should do?
If the Thread 1 signal handler can do something which will cause accept to return immediately, why can't Thread 2 do the same without signals?
Is there another way to do this without signals? I don't want to increase the caveats on the library.
Instead of blocking in accept(), block in select(), poll(), or one of the similar calls that allows you to wait for activity on multiple file descriptors and use the "self-pipe trick". All of the file descriptors passed to select() should be in non-blocking mode. One of the file descriptors should be the server socket that you use with accept(); if that one becomes readable then you should go ahead and call accept() and it will not block. In addition to that one, create a pipe(), set it to non-blocking, and check for the read side becoming readable. Instead of calling close() on the server socket in the other thread, send a byte of data to the first thread on the write end of the pipe. The actual byte value doesn't matter; the purpose is simply to wake up the first thread. When select() indicates that the pipe is readable, read() and ignore the data from the pipe, close() the server socket, and stop waiting for new connections.
The accept() call will return with error code EINTR if a signal is caught before a connection is accepted. So check the return value and error code then close the socket accordingly.
If you wish to avoid the signal mechanism altogether, use select() to determine if there are any incoming connections ready to be accepted before calling accept(). The select() call can be made with a timeout so that you can recover and respond to abort conditions.
I usually call select() with a timeout of 1000 to 3000 milliseconds from a while loop that checks for an exit/abort condition. If select() returns with a ready descriptor I call accept() otherwise I either loop around and block again on select() or exit if requested.
Call shutdown() from Thread 2. accept will return with "invalid argument".
This seems to work but the documentation doesn't really explain its operation across threads -- it just seems to work -- so if someone can clarify this, I'll accept that as an answer.
Just close the listening socket, and handle the resulting error or exception from accept().
I believe signals can be used without increasing "the caveats on the library". Consider the following:
#include <pthread.h>
#include <signal.h>
#include <stddef.h>
static pthread_t thread;
static volatile sig_atomic_t sigCount;
/**
* Executes a concurrent task. Called by `pthread_create()`..
*/
static void* startTask(void* arg)
{
for (;;) {
// calls to `select()`, `accept()`, `read()`, etc.
}
return NULL;
}
/**
* Starts concurrent task. Doesn't return until the task completes.
*/
void start()
{
(void)pthread_create(&thread, NULL, startTask, NULL);
(void)pthread_join(thread);
}
static void noop(const int sig)
{
sigCount++;
}
/**
* Stops concurrent task. Causes `start()` to return.
*/
void stop()
{
struct sigaction oldAction;
struct sigaction newAction;
(void)sigemptyset(&newAction.sa_mask);
newAction.sa_flags = 0;
newAction.sa_handler = noop;
(void)sigaction(SIGTERM, &newAction, &oldAction);
(void)pthread_kill(thread, SIGTERM); // system calls return with EINTR
(void)sigaction(SIGTERM, &oldAction, NULL); // restores previous handling
if (sigCount > 1) // externally-generated SIGTERM was received
oldAction.sa_handler(SIGTERM); // call previous handler
sigCount = 0;
}
This has the following advantages:
It doesn't require anything special in the task code other than normal EINTR handling; consequently, it makes reasoning about resource leakage easier than using pthread_cancel(), pthread_cleanup_push(), pthread_cleanup_pop(), and pthread_setcancelstate().
It doesn't require any additional resources (e.g. a pipe).
It can be enhanced to support multiple concurrent tasks.
It's fairly boilerplate.
It might even compile. :-)

Where does Linux kernel do process and TCP connections cleanup after process dies?

I am trying to find place in the linux kernel where it does cleanup after process dies. Specifically, I want to see if/how it handles open TCP connections after process is killed with -9 signal. I am pretty sure it closes all connections, but I want to see details, and if there is any chance that connections are not closed properly.
Pointers to linux kernel sources are welcome.
The meat of process termination is handled by exit.c:do_exit(). This function calls exit_files(), which in turn calls put_files_struct(), which calls close_files().
close_files() loops over all file descriptors the process has open (which includes all sockets), calling filp_close() on each one, which calls fput() on the struct file object. When the last reference to the struct file has been put, fput() calls the file object's .release() method, which for sockets, is the sock_close() function in net/socket.c.
I'm pretty sure the socket cleanup is more of a side effect of releasing all the file descriptors after the process dies, and not directly done by the process cleanup.
I'm going to go out on a limb though, and assume you're hitting a common pitfall with network programming. If I am correct in guessing that your problem is that you get an "Address in use" error (EADDRINUSE) when trying to bind to an address after a process is killed, then you are running into the socket's TIME_WAIT.
If this is the case, you can either wait for the timeout, usually 60 seconds, or you can modify the socket to allow immediate reuse like so.
int sock, ret, on;
struct sockaddr_in servaddr;
sock = socket( AF_INET, SOCK_STREAM, 0 ):
/* Enable address reuse */
on = 1;
ret = setsockopt( sock, SOL_SOCKET, SO_REUSEADDR, &on, sizeof(on) );
[EDIT]
From your comments, It sounds like you are having issues with half-open connections, and don't fully understand how TCP works. TCP has no way of knowing if a client is dead, or just idle. If you kill -9 a client process, the four-way closing handshake never completes. This shouldn't be leaving open connections on your server though, so you still may need to get a network dump to be sure of what's going on.
I can't say for sure how you should handle this without knowing exactly what you are doing, but you can read about TCP Keepalive here. A couple other options are sending empty or null messages periodically to the client (may require modifying your protocol), or setting hard timers on idle connections (may result in dropped valid connections).

Resources