File Descriptor Sharing between Parent and Pre-forked Children - linux

In Unix Network Programming there is an example of a Pre-forked server which uses message passing on a Unix Domain Pipe to instruct child processes to handle an incoming connection:
for ( ; ; ) {
rset = masterset;
if (navail <= 0)
FD_CLR(listenfd, &rset); /* turn off if no available children */
nsel = Select(maxfd + 1, &rset, NULL, NULL, NULL);
/* 4check for new connections */
if (FD_ISSET(listenfd, &rset)) {
clilen = addrlen;
connfd = Accept(listenfd, cliaddr, &clilen);
for (i = 0; i < nchildren; i++)
if (cptr[i].child_status == 0)
break; /* available */
if (i == nchildren)
err_quit("no available children");
cptr[i].child_status = 1; /* mark child as busy */
cptr[i].child_count++;
navail--;
n = Write_fd(cptr[i].child_pipefd, "", 1, connfd);
Close(connfd);
if (--nsel == 0)
continue; /* all done with select() results */
}
As you can see, the parent writes the file descriptor number for the socket to the pipe, and then calls close on the file descriptor. When the preforked children finish with the socket they also call close on the descriptor. The thing which is throwing me for a loop is that because these children are preforked I would assume that only file descriptors which existed at the time the children were forked would be shared. However, if that was true, then this example would fail spectacularly, yet it works.
Can someone shed some light on how it is that file descriptors created by the parent after the fork end up being shared with the children process?

Take a look at the Write_fd implementation. It uses something like
union {
struct cmsghdr cm;
char control[CMSG_SPACE(sizeof(int))];
} control_un;
struct cmsghdr *cmptr;
msg.msg_control = control_un.control;
msg.msg_controllen = sizeof(control_un.control);
cmptr = CMSG_FIRSTHDR(&msg);
cmptr->cmsg_len = CMSG_LEN(sizeof(int));
cmptr->cmsg_level = SOL_SOCKET;
cmptr->cmsg_type = SCM_RIGHTS;
*((int *) CMSG_DATA(cmptr)) = sendfd;
That is, sending a control message with type SCM_RIGHTS is a way unixes can share a file descriptor with an unreleated process.

You can send (most) arbitrary file descriptors to a potentially unrelated process using the FD passing mechanism in Unix sockets.
This is typically a little-used mechanism and rather tricky to get right - both processes need to cooperate.
Most prefork servers do NOT do this, rather, they have the child process call accept() on a shared listen socket, and create its own connected socket this way. Other processes cannot see this connected socket, and there is only one copy of it, so when the child closes it, it's gone.
One disadvantage is that the process cannot tell what the client is going to request BEFORE calling accept, so you cannot handle different types of requests in different children etc. Once one child has accept()ed it, another child cannot.

Related

What s the Windows exact equivalent of WaitOnAddress() on Linux?

Using shared memory with the shmget() system call, the aim of my C++ program, is to fetch a bid price from the Internet through a server written in Rust so that each times the value changes, I m performing a financial transaction.
Server pseudocode
Shared_struct.price = new_price
Client pseudocode
Infinite_loop_label:
Wait until memory address pointed by Shared_struct.price changes.
Launch_transaction(Shared_struct.price*1.13)
Goto Infinite_loop
Since launching a transaction involve paying transaction fees, I want to create a transaction only once per buy price change.
Using a semaphore or a futex, I can do the reverse, I m meaning waiting for a variable to reachs a specific value, but how to wait until a variable is no longer equal to current value?
Whereas on Windows I can do something like this on the address of the shared segment:
ULONG g_TargetValue; // global, accessible to all process
ULONG CapturedValue;
ULONG UndesiredValue;
UndesiredValue = 0;
CapturedValue = g_TargetValue;
while (CapturedValue == UndesiredValue) {
WaitOnAddress(&g_TargetValue, &UndesiredValue, sizeof(ULONG), INFINITE);
CapturedValue = g_TargetValue;
}
Is there a way to do this on Linux? Or a straight equivalent?
You can use futex. (I assumed "var" is in shm mem)
/* Client */
int prv;
while (1) {
int prv = var;
int ret = futex(&var, FUTEX_WAIT, prv, NULL, NULL, 0);
/* Spurious wake-up */
if (!ret && var == prv) continue;
doTransaction();
}
/* Server */
int prv = NOT_CACHED;
while(1) {
var = updateVar();
if (var != prv || prv = NOT_CACHED)
futex(&var, FUTEX_WAKE, 1, NULL, NULL, 0);
prv = var;
}
It requires the server side to call futex as well to notify client(s).
Note that the same holds true for WaitOnAddress.
According to MSDN:
Any thread within the same process that changes the value at the address on which threads are waiting should call WakeByAddressSingle to wake a single waiting thread or WakeByAddressAll to wake all waiting threads.
(Added)
More high level synchronization method for this problem is to use condition variable.
It is also implemented based on futex.
See link

non-blocking socket vs. select() driven approach

I'm writing user-space application which among other functionality uses netlink sockets to talk to the kernel. I use simple API provided by open source library libmnl.
My application sets certain options over netlink as well as it subscribes to netlink events (notifications), parses it etc. So this second feature (event notifications) is asynchronous, currently I implemented a simple select() based loop:
...
fd_set rfd;
struct timeval tv;
int ret;
while (1) {
tv.tv_sec = 1;
tv.tv_usec = 0;
FD_ZERO(&rfd);
/* fd - is a netlink socket */
FD_SET(fd, &rfd);
ret = select(fd + 1, &rfd, NULL, NULL, &tv);
if (ret < 0) {
perror("select()");
continue;
} else if (ret == 0) {
printf("Timeout on fd %d", fd);
} else if (FD_ISSET(fd, &rfd)) {
/*
count = recv(fd, buf ...)
while (count > 0) {
parse 'buf' for netlink message, validate etc.
count = recv(fd, buf)
}
*/
}
}
So I'm observing now that code inside else if (FD_ISSET(fd, &rfd)) { branch blocks at the second recv() call.
Now I'm trying to understand if I need to set the netlink socket to non-blocking (SOCK_NOBLOCK for example), but then I probably don't need select() at all, I simply can have recv() -> message parse -> recv() loop and it won't block.
... if I need to set the netlink socket to non-blocking ..., but then I probably don't need select() at all ...
Exactly this is the purpose of a non-blocking socket: Instead of doing the if(FD_ISSET(...)) you call recv() and evaluate the return value.
If you use blocking sockets, you must not call recv() more than once after calling select(); then the program is "effectively" non-blocking.
HOWEVER,
... as user "kaylum" already suggested in his comment, you'll have another problem in any case:
It is not guaranteed that one complete "message" is available at the same time. The other end of the socket might send the first part of the message, wait some seconds and then send the second part of the message.
However, select() will tell you that there is at least one byte available; it will not tell you if the complete message is available.
If you want to wait for the complete message in the inner loop (while(count > 0)), you will always have to wait (which means that your program has "effectively" a blocking behavior even if the socket is non-blocking).
If you simply want to process all bytes already available in the inner loop, then the condition count > 0 is wrong. Instead, you should do something like this if you are working with blocking sockets:
else if(FD_ISSET(...))
{
while(FD_ISSET(...))
{
count = recv(...);
if(count > 0)
{
...
select(...);
}
else FD_ZERO(...);
}
}
However, in most cases this will not be necessary and you can simply process the "remaining" data bytes in the next "outer" loop.

Server crashes after closesocket

I have multithreading application, it's periodically polling a few hundred devices.
Each thread serves one device, its socket and other descriptors are encapsulated at individual object, so no shared descriptors.
Occasionally application crashes after closesocket(fSock), when I try set descriptor fSock to 0.
I assume, I should not set fSock = 0, if closesocket(fSock) returns SOCKET_ERROR.
Or is there any other reason?
My code:
bool _EthDev::Connect()
{
int sockErr, ret, i, j;
int szOut = sizeof(sockaddr_in);
// create socket
if ((fSock = socket(AF_INET, SOCK_STREAM, 0)) == INVALID_SOCKET)
{
sockErr = GetLastError();
Log("Invalid socket err %d", sockErr);
fSock = 0;
return false;
}
// set fast closing socket (by RST)
linger sLinger;
sLinger.l_onoff = 1;
sLinger.l_linger = 0;
if (sockErr = setsockopt(fSock, SOL_SOCKET, SO_LINGER, (const char FAR*)&sLinger, sizeof(linger)))
{
sockErr = WSAGetLastError();
Log("Setsockopt err %d", sockErr);
closesocket(fSock);
fSock = 0; // here crashes
return false;
}
// connect to device
fSockaddr.sin_port = htons((u_short)(baseport));
if (connect(fSock, (struct sockaddr*)&fSockaddr, szOut))
{
closesocket(fSock);
fSock = 0;
return false;
}
...
return true;
}
I have multithreading application, ... [it] occasionally crashes
A multithreading application that occasionally crashes is a classic symptom of a race condition. I think to prevent the crashes you need to figure out what the race condition is in your code, and fix that.
I assume, I should not set fSock = 0, if closesocket(fSock) returns
SOCKET_ERROR. Or is there any other reason?
I doubt the problem is actually related to closesocket() or to setting fSock to 0. Keep in mind that sockets are really just integers, and setting an integer to 0 isn't likely to cause a crash on its own. What could cause a crash is a write to invalid memory -- and fSock = 0 does write to the memory location where the member variable fSock is (or was) located at.
Therefore, a more likely hypothesis is that the _EthDev object got deleted by thread B while thread A was still in the middle of calling Connect() on it. This would be most likely happen while the connect() call was executing, because a blocking connect() call can take a relatively long time to return. So if there was another thread out there that rudely deleted the _EthDev object during the connect() call, then as soon as connect() returned, the next line of code that would write to the location where the (now deleted) _EthDev object used to be would be the "fSock = 0;" line, and that could cause a crash.
I suggest you review your code that deletes _EthDev objects, and if it isn't careful to first shut down any thread(s) using those objects (and also to wait for the threads to exit!) before deleting the _EthDev objects, you should rewrite it so that it does so reliably. Deleting an object while another thread might still be using it is asking for trouble.

Spurious epoll (edge triggered) notifications

My understanding of epoll and edge triggered behaviour was that you are notified when a state change occurs for a given file descriptor. Ie, when data becomes available on the fd, you get notified but you only get notified again after you drain all the data (you get EAGAIN) and data becomes present again.
Based on this understanding, I'd implemented a server which does the following:
Listens for client connections
For each connections, add fd into epoll
When data available on fd, spawn a separate thread to read until EAGAIN
Repeat until data reads are done
Semi pseudo-code:
n = epoll_wait(efd, events, MAXEVENTS, -1);
for (i = 0; i < n; ++i) {
check for errors;
if (server_sock == events[i].data.fd) {
conn = accept( ... );
make_nonblocking(conn);
event.data.fd = conn;
event.events = EPOLLIN | EPOLLRDHUP;
epoll_ctl(efd, EPOLL_CTL_ADD, conn, &event);
} else {
spawn_reader(events[i].data.fd);
}
With the above code, my expectation was that until the reader thread hits EAGAIN, epoll will not spawn another thread for this fd. This doesn't seem to be the case and I get many threads being spawned.
Am I using epoll incorrectly? I shouldn't need to remove the fd via EPOLL_CTL_DEL each time I spawn the reader thread, correct?
Yes, its working as it should. epoll allows asynchronous IO programming in a single thread of control (event driven). Using a thread to process reads on an fd does not remove it from the list of monitored fds.

Dynamic pool of processes

I'm writing a client-server (TCP) program in C on a Unix system. The client sends some information and the server answers. There's only one connection per child process. New connections use pre-running processes from a pool, and the pool size is dynamic, so if the number of free processes (processes not servicing a client) drops too low, it should create new processes, and likewise if it gets too high extra processes should be terminated.
This is my server code. Every connection make a new child process using fork(). Each connection runs in a new process. How can I make a dynamic pool like I explained above?
int main(int argc, char * argv[])
{
int cfd;
int listener = socket(AF_INET, SOCK_STREAM, 0); //create listener socket
if(listener < 0){
perror("socket error");
return 1;
}
struct sockaddr_in addr;
addr.sin_family = AF_INET;
addr.sin_port = htons(PORT);
addr.sin_addr.s_addr = htonl(INADDR_ANY);
int binding = bind(listener, (struct sockaddr *)&addr, sizeof(addr));
if(binding < 0){
perror("binding error");
return 1;
}
listen(listener, 1); //listen for new clients
signal(SIGCHLD,handler);
int pid;
for(;;) // infinity loop on server
{
cfd = accept(listener, NULL, NULL); //client socket descriptor
pid = fork(); //make child proc
if(pid == 0) //in child proc...
{
close(listener); //close listener socket descriptor
... //some server actions that I do.(receive or send)
close(cfd); // close client fd
return 0;
}
close(cfd);
}
If you have several processes blocked in accept on the same listen socket, then a new connection that comes in will get delivered to one of them. (Depending, several may wake up, but only one will actually get the connection). So you need to fork several children after listen, but before accept. After handling a request, the child goes back to accept instead of exit. That handles (1) and (2).
(3) is harder. You need some form of IPC. Typically, you'd have a parent process that just manages having the right number of children. Your child processes need to use IPC to tell the parent how busy they are. The parent can then either fork more children (which go into the accept loop above) or send signals to children to tell them to finish up and exit. It should also handle waiting on children, handle unexpected deaths, etc.
The IPC you want to use is probably shared memory. Your two options are SysV (shmget) and POSIX (shm_open`) shared memory. You probably want the latter if available. You'll have to deal with synchronizing access (both POSIX and SysV provide semaphores to help with this, again prefer POSIX) or using atomic access only.
(You probably don't actually want a process to exit the instant there are more than X free children, that'll lead to repeatedly reaping and spawning them, which is expensive. Instead you probably want some measure of how utilized they were over the last second... So your data is more complicated than a bitmap of in use/free.)
There are a lot of daemons that work like this, so you can fairly easily find code examples. Of course, if you go look at Apache, you'll probably find it more complicated, to get good performance and be portable everywhere.

Resources