Server crashes after closesocket

Server crashes after closesocket - multithreading

I have multithreading application, it's periodically polling a few hundred devices.
Each thread serves one device, its socket and other descriptors are encapsulated at individual object, so no shared descriptors.
Occasionally application crashes after closesocket(fSock), when I try set descriptor fSock to 0.
I assume, I should not set fSock = 0, if closesocket(fSock) returns SOCKET_ERROR.
Or is there any other reason?
My code:
bool _EthDev::Connect()
{
int sockErr, ret, i, j;
int szOut = sizeof(sockaddr_in);
// create socket
if ((fSock = socket(AF_INET, SOCK_STREAM, 0)) == INVALID_SOCKET)
{
sockErr = GetLastError();
Log("Invalid socket err %d", sockErr);
fSock = 0;
return false;
}
// set fast closing socket (by RST)
linger sLinger;
sLinger.l_onoff = 1;
sLinger.l_linger = 0;
if (sockErr = setsockopt(fSock, SOL_SOCKET, SO_LINGER, (const char FAR*)&sLinger, sizeof(linger)))
{
sockErr = WSAGetLastError();
Log("Setsockopt err %d", sockErr);
closesocket(fSock);
fSock = 0; // here crashes
return false;
}
// connect to device
fSockaddr.sin_port = htons((u_short)(baseport));
if (connect(fSock, (struct sockaddr*)&fSockaddr, szOut))
{
closesocket(fSock);
fSock = 0;
return false;
}
...
return true;
}

I have multithreading application, ... [it] occasionally crashes
A multithreading application that occasionally crashes is a classic symptom of a race condition. I think to prevent the crashes you need to figure out what the race condition is in your code, and fix that.
I assume, I should not set fSock = 0, if closesocket(fSock) returns
SOCKET_ERROR. Or is there any other reason?
I doubt the problem is actually related to closesocket() or to setting fSock to 0. Keep in mind that sockets are really just integers, and setting an integer to 0 isn't likely to cause a crash on its own. What could cause a crash is a write to invalid memory -- and fSock = 0 does write to the memory location where the member variable fSock is (or was) located at.
Therefore, a more likely hypothesis is that the _EthDev object got deleted by thread B while thread A was still in the middle of calling Connect() on it. This would be most likely happen while the connect() call was executing, because a blocking connect() call can take a relatively long time to return. So if there was another thread out there that rudely deleted the _EthDev object during the connect() call, then as soon as connect() returned, the next line of code that would write to the location where the (now deleted) _EthDev object used to be would be the "fSock = 0;" line, and that could cause a crash.
I suggest you review your code that deletes _EthDev objects, and if it isn't careful to first shut down any thread(s) using those objects (and also to wait for the threads to exit!) before deleting the _EthDev objects, you should rewrite it so that it does so reliably. Deleting an object while another thread might still be using it is asking for trouble.

Related

What s the Windows exact equivalent of WaitOnAddress() on Linux?

Using shared memory with the shmget() system call, the aim of my C++ program, is to fetch a bid price from the Internet through a server written in Rust so that each times the value changes, I m performing a financial transaction.
Server pseudocode
Shared_struct.price = new_price
Client pseudocode
Infinite_loop_label:
Wait until memory address pointed by Shared_struct.price changes.
Launch_transaction(Shared_struct.price*1.13)
Goto Infinite_loop
Since launching a transaction involve paying transaction fees, I want to create a transaction only once per buy price change.
Using a semaphore or a futex, I can do the reverse, I m meaning waiting for a variable to reachs a specific value, but how to wait until a variable is no longer equal to current value?
Whereas on Windows I can do something like this on the address of the shared segment:
ULONG g_TargetValue; // global, accessible to all process
ULONG CapturedValue;
ULONG UndesiredValue;
UndesiredValue = 0;
CapturedValue = g_TargetValue;
while (CapturedValue == UndesiredValue) {
WaitOnAddress(&g_TargetValue, &UndesiredValue, sizeof(ULONG), INFINITE);
CapturedValue = g_TargetValue;
}
Is there a way to do this on Linux? Or a straight equivalent?

You can use futex. (I assumed "var" is in shm mem)
/* Client */
int prv;
while (1) {
int prv = var;
int ret = futex(&var, FUTEX_WAIT, prv, NULL, NULL, 0);
/* Spurious wake-up */
if (!ret && var == prv) continue;
doTransaction();
}
/* Server */
int prv = NOT_CACHED;
while(1) {
var = updateVar();
if (var != prv || prv = NOT_CACHED)
futex(&var, FUTEX_WAKE, 1, NULL, NULL, 0);
prv = var;
}
It requires the server side to call futex as well to notify client(s).
Note that the same holds true for WaitOnAddress.
According to MSDN:
Any thread within the same process that changes the value at the address on which threads are waiting should call WakeByAddressSingle to wake a single waiting thread or WakeByAddressAll to wake all waiting threads.
(Added)
More high level synchronization method for this problem is to use condition variable.
It is also implemented based on futex.
See link

Embedded Linux poll() returns constantly

I have a particular problem. Poll keeps returning when I know there is nothing to read.
So the setup it as follows, I have 2 File Descriptors which form part of a fd set that poll watches. One is for a Pin high to low change (GPIO). The other is for a proxy input. The problem occurs with the Proxy Input.
The order of processing is: start main functions; it will then poll; write data to proxy; poll will break; accept the data; send the data over SPI; receiving slave device, signals that it wants to send ack, by Dropping GPIO low; poll() senses this drop and reacts;
Infinite POLLINs :(
IF I have no timeout on the Poll function, the program works perfectly. The moment I include a timeout on the Poll. The Poll returns continuously. Not sure what I am doing wrong here.
while(1)
{
memset((void*)fdset, 0, sizeof(fdset));
fdset[0].fd = gpio_fd;
fdset[0].events = POLLPRI; // POLLPRI - There is urgent data to read
fdset[1].fd = proxy_rx;
fdset[1].events = POLLIN; // POLLIN - There is data to read
rc = poll(fdset, nfds, 1000);//POLL_TIMEOUT);
if (rc < 0) // Error
{
printf("\npoll() failed/Interrupted!\n");
}
else if (rc == 0) // Timeout occurred
{
printf(" poll() timeout\n");
}
else
{
if (fdset[1].revents & POLLIN)
{
printf("fdset[1].revents & POLLIN\n");
if ((resultR =read(fdset[1].fd,command_buf,10))<0)
{
printf("Failed to read Data\n");
}
if (fdset[0].revents & POLLPRI)
//if( (gpio_fd != -1) && (FD_ISSET(gpio_fd, &err)))
{
lseek(fdset[0].fd, 0, SEEK_SET); // Read from the start of the file
len = read(fdset[0].fd, reader, 64);
}
}
}
}
So that is the gist of my code.
I have also used GDB and while debugging, I found that the GPIO descriptor was set with revents = 0x10, which means that an error occurred and that POLLPRI also occurred.
In this question, something similar was addressed. But I do read all the time whenever I get POLLIN. It is a bit amazing, that this problem only occurs when I include the timeout, if I replace the poll timeout with -1, it works perfectly.

When poll fails (returning -1) you should do something with errno, perhaps thru perror; and your nfds (the second argument to poll) is not set, but it should be the constant 2.
Probably the GCC compiler would have given a warning, at least with all warnings enabled (-Wall), about nfds not being set.
(I'm guessing that nfds being uninitialized might be some "random" large value.... So the kernel might be polling other "random" file descriptors, those in your fdset after index 2...)
BTW, you could strace your program. And using the fdset name is a bit confusing (it could refer to select(2)).

Assuming I fixed your formatting properly in your question, it looks like you have a missing } after the POLLIN block and the next if() that checks the POLLPRI. It would possibly work better this way:
if (fdset[1].revents & POLLIN)
{
printf("fdset[1].revents & POLLIN\n");
if ((resultR =read(fdset[1].fd,command_buf,10))<0)
{
printf("Failed to read Data\n");
}
}
if (fdset[0].revents & POLLPRI)
//if( (gpio_fd != -1) && (FD_ISSET(gpio_fd, &err)))
{
lseek(fdset[0].fd, 0, SEEK_SET); // Read from the start of the file
len = read(fdset[0].fd, reader, 64);
}
Although you can do whatever you want with indentation in C/C++/Java/JavaScript, not doing it right can bite you really hard. Hopefully, I'm wrong and your original code was correct.
Another one I often see: People not using the { ... } at all and end up writing code like:
if(expr) do_a; do_b;
and of course, do_b; will be executed all the time, whether expr is true or false... and although you could fix the above with a comma like so:
if(expr) do_a, do_b;
the only safe way to do it right is to use the brackets:
if(expr)
{
do_a;
do_b;
}
Always make sure your indentation is perfect and write small functions so you can see that it is indeed perfect.

CoGetInterfaceAndReleaseStream let my thread hangs

UINT __stdcall CExternal::WorkThread( void * pParam)
{
HRESULT hr;
CTaskBase* pTask;
CComPtr<IHTMLDocument3> spDoc3;
CExternal* pThis = reinterpret_cast<CExternal*>(pParam);
if (pThis == NULL)
return 0;
// Init the com
::CoInitializeEx(0,COINIT_APARTMENTTHREADED);
hr = ::CoGetInterfaceAndReleaseStream(
pThis->m_pStream_,
IID_IHTMLDocument3,
(void**)&spDoc3);
if(FAILED(hr))
return 0;
while (pThis->m_bShutdown_ == 0)
{
if(pThis->m_TaskList_.size())
{
pTask = pThis->m_TaskList_.front();
pThis->m_TaskList_.pop_front();
if(pTask)
{
pTask->doTask(spDoc3); //do my custom task
delete pTask;
}
}
else
{
Sleep(10);
}
}
OutputDebugString(L"start CoUninitialize\n");
::CoUninitialize(); //release com
OutputDebugString(L"end CoUninitialize\n");
return 0;
}
The above the code that let my thread hang, the only output is "start CoUninitialize".
m_hWorker_ = (HANDLE)_beginthreadex(NULL, 0, WorkThread, this, 0, 0);
This code starts my thread, but the thread can't exit safely, so it waits. What the problem with this code?

The problem is not in this code, although it violates core COM requirements. Which says that you should release interface pointers when you no longer use them, calling IUnknown::Release(), and that an apartment-threaded thread must pump a message loop. Especially the message loop is important, you'll get deadlock when the owner thread of a single-threaded object (like a browser) is not pumping.
CoUninitialize() is forced to clean up the interface pointer wrapped by spDoc3 since you didn't do this yourself. It is clear from the code that the owner of the interface pointer actually runs on another thread, something to generally keep in mind since that pretty much defeats the point of starting your own worker thread. Creating your own STA thread doesn't fix this, it is still the wrong thread.
So the proxy needs to context switch to the apartment that owns the browser object. With the hard requirement that this apartment pumps a message loop so that the call can be dispatched on the right thread in order to safely call the Release() function. With very high odds that this thread isn't pumping messages anymore when your program is shutting down. Something you should be able to see in the debugger, locate the owner thread in the Debug + Windows + Threads window and see what it is doing.
Deadlock is the common outcome. The only good way to fix it is to shut down threads in the right order, this one has to shut down before the thread that owns the browser object. Shutting down a multi-threaded program cleanly can be quite difficult when threads have an interdependency like this. The inspiration behind the C++11 std::quick_exit() addition.

How to handle all possible errors on async socket?

I have decided to use async io for my project and simply do a single threaded loop where I try to read some data each frame from each open socket. This worked quite well and overall I'm happy with it for now. The problem is weird problems I'm having with the async sockets.
I have code like this:
accept a connection...
fcntl(O_NONBLOCK) on the client socket...
int rc;
if((rc = recv(socket))>0)
process data
if rc == 0
close socket and cleanup
The problem is that I get rc == 0 sometimes even though I know that the connection is not closed. If I don't clean up then my app works as normal. But if I do cleanup then the client receives a disconnect before the connection is even established.
So my question is: Do I have to check somehow whether the socket is ready before doing a recv in order to get the correct return value from it?
Most of the information I have been able to find was inconclusive. I found a references to select() but it seems to block until there is a status change on the socket - but I need the socket to be nonblocking.
What I'm looking for is just the intuitive behavior that if there is data, it is read to the buffer and recv returns number of bytes read, if there is no data it returns -1 and if the socket is disconnected then it should return 0.
Do I have to do anything else to the socket before calling recv to make it work as expected?

First, taking on the heavy lifting of going "all asynchronous" with a socket server is a good start for a design and will enable scalability very easily.
As for your question.
recv() will return the following values:
A postive value returned by recv() indicates the number of bytes
copied to your buffer.(i.e you actually received these bytes)
recv() will return 0 when the socket was closed by the remote side.
For async sockets, recv() will return -1 and set errno to either
EAGAIN or EWOULDBLOCK if the connection is still valid, but there's
no new data to be consumed. Call select() or poll() on the socket to
wait for data.
Otherwise, any general connection failure will result in -1 being returned by recv(). (And the only thing you can do is close the socket).
So when you say, "rc == 0 sometimes even though I know that the connection is not closed", I suspect your pseudocode is not checking the return value, but instead checking the result of (rc > 0).
This is closer to the logic you want:
int rc;
rc = recv(s, buf, buffersize, 0);
if (rc == 0)
{
/* socket closed by remote end */
close(s); s=-1;
}
else if ((rc == -1) && ((errno == EAGAIN) || (errno == EWOULDBLOCK)) )
{
// need to wait. Call select() or poll()
}
else if (rc == -1)
{
close(s); s=-1;
}
else
{
ProcessNewData(s, buffer, rc);
}

React on client disconnect in linux

I am writing a simple socket daemon witch listens to a port and reads the incoming data. It works fine until i choose to disconnect a client from the server...then it enters in a infinte loop recv() returns the last packet never gets to -1. My question is how can i detect that the client had been disconnected and close the thread/ socket el
My thread is as follows :
void * SocketHandler(void* lp){
int * csock = (int*)lp;
int test = 0;
char buffer[1024];
int buffer_len = 1024;
int bytecount,ierr;
memset(buffer,0,buffer_len);
while (test == 0)
{
if ((bytecount = recv(*csock, buffer, buffer_len, 0))== -1){
close(csock);
free(csock);
test++;
return 0;
}
else
{
syslog(LOG_NOTICE,"%s",buffer);
}
}
return 0;
};

A cleanly closed socket will end up in a ZERO read, while a broken connection is an error state returning -1. You need to catch the 0 return of your recv.

What happens here is that your end may not detect the fact the socket is dead (especially, if you are just reading from it).
What you can do is set keepalive on the socket. This will turn on periodic checks of the socket liveness. But don't expect fast reactions, the default timeout is something like 20 minutes.
i = 1;
setsockopt(sock, SOL_SOCKET, SO_KEEPALIVE, (char *)&i, sizeof(i));
Another option is to do your own keep-alive communication.

recv() will indicate a proper shutdown of the socket by returning 0 (see the manpage for details). It will return -1 if and only if an error occurred. You should check errno for the exact error, since it may or may not indicate that the connection failed (EINTR, EAGAIN or EWOULDBLOCK [non-blocking sockets assumed] would both be recoverable errors).
Side note: there's no need to pass the fd of the socket as pointer and since you're returning a void * you may want to change return 0 to return NULL (just for readability).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string