Boost ASIO as an event loop with boost lockfree queue for socket write - multithreading

I am using boost ASIO for a TCP client. for the most part the ASIO is a glorified event loop for read and write. There is actually only one client managed by the ASIO.
The architecture is like this -
The TCP server streams continuous messages. The Client will read the messages, process it and ack back with proper code.
My code runs in client side. There is one thread running io_service. The io_service thread reads messages and distributes it to N number of worker threads using a boost lockfree SPSC queue. The workers after processing posts the replies to the io_service thread.
most important concern for me is the rate of read and write. So I am using synchronous reads and writes.
Read Code:
void read ()
{
if (_connected && !_readInProgress) {
_socket.async_read_some(boost::asio::null_buffers(),
make_boost_alloc_handler(_readAllocator,
[self = shared_from_this(), this] (ErrorType err, unsigned a)
{
connection()->handleRead(err);
_readInProgress = false;
if (err) disconnect();
else asyncRead();
});
_readInProgress = true;
}
}
Basically I use read_some with nullbuffer() and then directly use Unix system calls to read the messages. The read give N number of messages which are enqueued to threads in a loop.
I want use the boost SPSC queue in the reverse direction for writes to the socket from workers.
Write:
// Get the queue to post writes
auto getWriteQ ()
{
static thread_local auto q =
std::make_shared< LFQType >(_epoch);
return q;
}
So each thread gets a thread-local Q using getWriteQ. The writes to the queue looks like this:
void write (Buf& buf) override
{
auto q = getWriteQ();
while (!q->enqueue(buf) && _connected);
if (!_connected) return;
_ioService.post( [self = shared_from_this(), this, q]()
{
writeHelper(q); });
}
}
Now this is inefficent, as we do a ioservice post for each write. The write handler at a time actually writes upto 32 messages in a single system-call using sendmmsg()
So I am looking for help with 2 things:
Is the design any good?
Any fool proof way to minimize the no. of posts. I was thinking of keep an atomic enqueue count. The worker thread will do this -
the writing thread does this - (Pseudo code)
bool post = false;
if(enqueue_count == 0) post = true
// enqueue the message
++enqueue_count
if(post)
// post the queue event
The io-service thread does this -
enqueue_count -= num_processed;
if (enqueue_count)
// repost the queue for further processing
Would this work if the enqueue_count is atomic ?

Related

Two threads, how to implement communication between them. (Pseudocode)

I have two threads:
Main thread :
Main thread is listening for HTTP requests,
Main thread registers handler to HTTP request used for long polling
Second thread:
Getting data from different socket.
if it finds something in data from socket, updates thread local storage,
and IF Main thread has HTTP request pending it sends data to it somehow.
How do I make Main thread and second thread communicate to each other?
Main Thread http handler pseudocode:
function mg_handler(request){
var handle = parseHandle(request.data);
var storage_name = parseStorageName(request.data);
var response = WaitForResponse(handle,storage_name);
mg_printf(rasponse);
return;
}
Second thread pseudocode:
var storage
function t_run()
{
var buf
while(1){
recvfrom(socket,buf);
var result;
bool found_something = search(buf,result);
if(found_something){
update(storage,result);
//if WaitForResponse is waiting let it continue by sending storage to it somehow.
//???
}
}
cleanup:
return;
}
In your scenario there is no communication between any threads. Communication would mean for example one thread controls the execution of another thread. In your case you are simply sharing data (or the memory that holds the data to be more precise).
Note, that you have to provide a synchronization mechanism in order to synchronize the concurrent access to the shared memory. Otherwise you will experience all the concurrency issues that usually come with multithreading, especially when shared memory is involved.
Most languages that support multithreading also support synchronization on compiler level.
// Shared resources.
// Requires some synchronization mechanism for thread-safe access...
shared_pending_http_request_flag;
shared_data_variable;
function main()
{
// Main thread context
startBackgroundThread();
listenToHttpRequests();
}
function startBackgroundThread()
{
// Background thread context
result_data := readDataFromSocket();
IF
main thread has set shared variable shared_pending_http_request_flag
THEN
shared_data_variable := result_data ;
}
function listenToHttpRequests()
{
// Main thread context
WHILE
listening to HTTP requests
DO
IF
there is a pending request
THEN
set the shared_pending_http_request_flag;
IF
shared_pending_http_request_flag AND shared_data_variable are both set
THEN
handle value of shared_data_variable;
}
For an object oriented language or any language that supports callbacks or method references, you should implement the Observer pattern. This removes the need to poll for changes, as the observable can simply notify the observer about changes. Generally the observable invokes a callback that was registered by the observer.

Process websocket incomming messages using multiple threads in tomcat

From what I understand (please correct me if I am wrong), in tomcat incoming websocket messages are processed sequentially. Meaning that if you have 100 incoming messages in one websocket, they will be processed using only one thread one-by-one from message 1 to message 100.
But this does not work for me. I need to concurrently process incoming messages in a websocket in order to increase my websocket throughput. The messages coming in do not depend on each other hence do not need to be processed sequentially.
The question is how to configure tomcat such that it would assign multiple worker threads per websocket to process incoming messages concurrently?
Any hint is appreciated.
This is where in tomcat code that I think it is blocking per websocket connection (which makes sense):
/**
* Called when there is data in the ServletInputStream to process.
*
* #throws IOException if an I/O error occurs while processing the available
* data
*/
public void onDataAvailable() throws IOException {
synchronized (connectionReadLock) {
while (isOpen() && sis.isReady()) {
// Fill up the input buffer with as much data as we can
int read = sis.read(
inputBuffer, writePos, inputBuffer.length - writePos);
if (read == 0) {
return;
}
if (read == -1) {
throw new EOFException();
}
writePos += read;
processInputBuffer();
}
}
}
You can't configure Tomcat to do what you want. You need to write a message handler that consumes the message, passes it to an Executor (or similar for processing) and then returns.

Efficient consumer thread with multiple producers

I am trying to make a producer/consumer thread situation more efficient by skipping expensive event operations if necessary with something like:
//cas(variable, compare, set) is atomic compare and swap
//queue is already lock free
running = false
// dd item to queue – producer thread(s)
if(cas(running, false, true))
{
// We effectively obtained a lock on signalling the event
add_to_queue()
signal_event()
}
else
{
// Most of the time if things are busy we should not be signalling the event
add_to_queue()
if(cas(running, false, true))
signal_event()
}
...
// Process queue, single consumer thread
reset_event()
while(1)
{
wait_for_auto_reset_event() // Preferably IOCP
for(int i = 0; i &lt SpinCount; ++i)
process_queue()
cas(running, true, false)
if(queue_not_empty())
if(cas(running, false, true))
signal_event()
}
Obviously trying to get these things correct is a little tricky(!) so is the above pseudo code correct? A solution that signals the event more than is exactly needed is ok but not one that does so for every item.
This falls into the sub-category of "stop messing about and go back to work" known as "premature optimisation". :-)
If the "expensive" event operations are taking up a significant portion of time, your design is wrong, and rather than use a producer/consumer you should use a critical section/mutex and just do the work from the calling thread.
I suggest you profile your application if you are really concerned.
Updated:
Correct answer:
Producer
ProducerAddToQueue(pQueue,pItem){
EnterCriticalSection(pQueue->pCritSec)
if(IsQueueEmpty(pQueue)){
SignalEvent(pQueue->hEvent)
}
AddToQueue(pQueue, pItem)
LeaveCriticalSection(pQueue->pCritSec)
}
Consumer
nCheckQuitInterval = 100; // Every 100 ms consumer checks if it should quit.
ConsumerRun(pQueue)
{
while(!ShouldQuit())
{
Item* pCurrentItem = NULL;
EnterCriticalSection(pQueue-pCritSec);
if(IsQueueEmpty(pQueue))
{
ResetEvent(pQueue->hEvent)
}
else
{
pCurrentItem = RemoveFromQueue(pQueue);
}
LeaveCriticalSection(pQueue->pCritSec);
if(pCurrentItem){
ProcessItem(pCurrentItem);
pCurrentItem = NULL;
}
else
{
// Wait for items to be added.
WaitForSingleObject(pQueue->hEvent, nCheckQuitInterval);
}
}
}
Notes:
The event is a manual-reset event.
The operations protected by the critical section are quick. The event is only set or reset when the queue transitions to/from empty state. It has to be set/reset within the critical section to avoid a race condition.
This means the critical section is only held for a short time. so contention will be rare.
Critical sections don't block unless they are contended. So context switches will be rare.
Assumptions:
This is a real problem not homework.
Producers and consumers spend most of their time doing other stuff, i.e. getting the items ready for the queue, processing them after removing them from the queue.
If they are spending most of the time doing the actual queue operations, you shouldn't be using a queue. I hope that is obvious.
Went thru a bunch of cases, can't see an issue. But it's kinda complicated. I thought maybe you would have an issue with queue_not_empty / add_to_queue racing. But looks like the post-dominating CAS in both paths covers this case.
CAS is expensive (not as expensive as signal). If you expect skipping the signal to be common, I would code the CAS as follows:
bool cas(variable, old_val, new_val) {
if (variable != old_val) return false
asm cmpxchg
}
Lock-free structures like this is the stuff that Jinx (the product I work on) is very good at testing. So you might want to use an eval license to test the lock-free queue and signal optimization logic.
Edit: maybe you can simplify this logic.
running = false
// add item to queue – producer thread(s)
add_to_queue()
if (cas(running, false, true)) {
signal_event()
}
// Process queue, single consumer thread
reset_event()
while(1)
{
wait_for_auto_reset_event() // Preferably IOCP
for(int i = 0; i &lt SpinCount; ++i)
process_queue()
cas(running, true, false) // this could just be a memory barriered store of false
if(queue_not_empty())
if(cas(running, false, true))
signal_event()
}
Now that the cas/signal are always next to each other they can be moved into a subroutine.
Why not just associate a bool with the event? Use cas to set it to true, and if the cas succeeds then signal the event because the event must have been clear. The waiter can then just clear the flag before it waits
bool flag=false;
// producer
add_to_queue();
if(cas(flag,false,true))
{
signal_event();
}
// consumer
while(true)
{
while(queue_not_empty())
{
process_queue();
}
cas(flag,true,false); // clear the flag
if(queue_is_empty())
wait_for_auto_reset_event();
}
This way, you only wait if there are no elements on the queue, and you only signal the event once for each batch of items.
I believe, you want to achieve something like in this question:
WinForms Multithreading: Execute a GUI update only if the previous one has finished. It is specific on C# and Winforms, but the structure may well apply for you.

TcpClient and StreamReader blocks on Read

Here's my situation:
I'm writing a chat client to connect to a chat server. I create the connection using a TcpClient and get a NetworkStream object from it. I use a StreamReader and StreamWriter to read and write data back and forth.
Here's what my read looks like:
public string Read()
{
StringBuilder sb = new StringBuilder();
try
{
int tmp;
while (true)
{
tmp = StreamReader.Read();
if (tmp == 0)
break;
else
sb.Append((char)tmp);
Thread.Sleep(1);
}
}
catch (Exception ex)
{
// log exception
}
return sb.ToString();
}
That works fine and dandy. In my main program I create a thread that continually calls this Read method to see if there is data. An example is below.
private void Listen()
{
try
{
while (IsShuttingDown == false)
{
string data = Read();
if (!string.IsNullOrEmpty(data))
{
// do stuff
}
}
}
catch (ThreadInterruptedException ex)
{
// log it
}
}
...
Thread listenThread = new Thread(new ThreadStart(Listen));
listenThread.Start();
This works just fine. The problem comes when I want to shut down the application. I receive a shut down command from the UI, and tell the listening thread to stop listening (that is, stop calling this read function). I call Join and wait for this child thread to stop running. Like so:
// tell the thread to stop listening and wait for a sec
IsShuttingDown = true;
Thread.Sleep(TimeSpan.FromSeconds(1.00));
// if we've reach here and the thread is still alive
// interrupt it and tell it to quit
if (listenThread.IsAlive)
listenThread.Interrupt();
// wait until thread is done
listenThread.Join();
The problem is it never stops running! I stepped into the code and the listening thread is blocking because the Read() method is blocking. Read() just sits there and doesn't return. Hence, the thread never gets a chance to sleep that 1 millisecond and then get interrupted.
I'm sure if I let it sit long enough I'd get another packet and get a chance for the thread to sleep (if it's an active chatroom or a get a ping from the server). But I don't want to depend on that. If the user says shut down I want to shut it down!!
One alternative I found is to use the DataAvailable method of NetworkStream so that I could check it before I called StreamReader.Read(). This didn't work because it was undependable and I lost data when reading from packets from the server. (Because of that I wasn't able to login correctly, etc, etc)
Any ideas on how to shutdown this thread gracefully? I'd hate to call Abort() on the listening thread...
Really the only answer is to stop using Read and switch to using asynchronous operations (i.e. BeginRead). This is a harder model to work with, but means no thread is blocked (and you don't need to dedicate a thread—a very expensive resource—to each client even if the client is not sending any data).
By the way, using Thread.Sleep in concurrent code is a bad smell (in the Refactoring sense), it usually indicates deeper problems (in this case, should be doing asynchronous, non-blocking, operations).
Are you actually using System.IO.StreamReader and System.IO.StreamWriter to send and receive data from the socket? I wasn't aware this was possible. I've only ever used the Read() and Write() methods on the NetworkStream object returned by the TcpClient's GetStream() method.
Assuming this is possible, StreamReader returns -1 when the end of the stream is reached, not 0. So it looks to me like your Read() method is in an infinite loop.

"window procedure" of a newly created thread without window

I want to create a thread for some db writes that should not block the ui in case the db is not there. For synchronizing with the main thread, I'd like to use windows messages. The main thread sends the data to be written to the writer thread.
Sending is no problem, since CreateThread returns the handle of the newly created thread. I thought about creating a standard windows event loop for processing the messages. But how do I get a window procedure as a target for DispatchMessage without a window?
Standard windows event loop (from MSDN):
while( (bRet = GetMessage( &msg, NULL, 0, 0 )) != 0)
{
if (bRet == -1)
{
// handle the error and possibly exit
}
else
{
TranslateMessage(&msg);
DispatchMessage(&msg);
}
}
Why windows messages? Because they are fast (windows relies on them) and thread-safe. This case is also special as there is no need for the second thread to read any data. It just has to recieve data, write it to the DB and then wait for the next data to arrive. But that's just what the standard event loop does. GetMessage waits for the data, then the data is processed and everything starts again. There's even a defined signal for terminating the thread that is well understood - WM_QUIT.
Other synchronizing constructs block one of the threads every now and then (critical section, semaphore, mutex). As for the events mentioned in the comment - I don't know them.
It might seem contrary to common sense, but for messages that don't have windows, it's actually better to create a hidden window with your window proc than to manually filter the results of GetMessage() in a message pump.
The fact that you have an HWND means that as long as the right thread has a message pump going, the message is going to get routed somewhere. Consider that many functions, even internal Win32 ones, have their own message pumps (for example MessageBox()). And the code for MessageBox() isn't going to know to invoke your custom code after its GetMessage(), unless there's a window handle and window proc that DispatchMessage() will know about.
By creating a hidden window, you're covered by any message pump running in your thread, even if it isn't written by you.
EDIT: but don't just take my word for it, check these articles from Microsoft's Raymond Chen.
Thread messages are eaten by modal loops
Why do messages posted by PostThreadMessage disappear?
Why isn't there a SendThreadMessage function?
NOTE: Refer this code only when you don't need any sort of UI-related or some COM-related code. Other than such corner cases, this code works correctly: especially good for pure computation-bounded worker thread.
DispathMessage and TranslateMessage are not necessary if the thread is not having a window. So, simply just ignore it. HWND is nothing to do with your scenario. You don't actually need to create any Window at all. Note that that two *Message functions are needed to handle Windows-UI-related message such as WM_KEYDOWN and WM_PAINT.
I also prefer Windows Messages to synchronize and communicate between threads by using PostThreadMessage and GetMessage, or PeekMessage. I wanted to cut and paste from my code, but I'll just briefly sketch the idea.
#define WM_MY_THREAD_MESSAGE_X (WM_USER + 100)
#define WM_MY_THREAD_MESSAGE_Y (WM_USER + 100)
// Worker Thread: No Window in this thread
unsigned int CALLBACK WorkerThread(void* data)
{
// Get the master thread's ID
DWORD master_tid = ...;
while( (bRet = GetMessage( &msg, NULL, 0, 0 )) != 0)
{
if (bRet == -1)
{
// handle the error and possibly exit
}
else
{
if (msg.message == WM_MY_THREAD_MESSAGE_X)
{
// Do your task
// If you want to response,
PostThreadMessage(master_tid, WM_MY_THREAD_MESSAGE_X, ... ...);
}
//...
if (msg.message == WM_QUIT)
break;
}
}
return 0;
}
// In the Master Thread
//
// Spawn the worker thread
CreateThread( ... WorkerThread ... &worker_tid);
// Send message to worker thread
PostThreadMessage(worker_tid, WM_MY_THREAD_MESSAGE_X, ... ...);
// If you want the worker thread to quit
PostQuitMessage(worker_tid);
// If you want to receive message from the worker thread, it's simple
// You just need to write a message handler for WM_MY_THREAD_MESSAGE_X
LRESULT OnMyThreadMessage(WPARAM, LPARAM)
{
...
}
I'm a bit afraid that this is what you wanted. But, the code, I think, is very easy to understand. In general, a thread is created without having message queue. But, once Window-message related function is called, then the message queue for the thread is initialized. Please note that again no Window is necessary to post/receive Window messages.
You don't need a window procedure in your thread unless the thread has actual windows to manage. Once the thread has called Peek/GetMessage(), it already has the same message that a window procedure would receive, and thus can act on it immediately. Dispatching the message is only necessary when actual windows are involved. It is a good idea to dispatch any messages that you do not care about, in case other objects used by your thread have their own windows internally (ActiveX/COM does, for instance). For example:
while( (bRet = GetMessage(&msg, NULL, 0, 0)) != 0 )
{
if (bRet == -1)
{
// handle the error and possibly exit
}
else
{
switch( msg.message )
{
case ...: // process a message
...
break;
case ...: // process a message
...
break;
default: // everything else
TranslateMessage(&msg);
DispatchMessage(&msg);
break;
}
}
}

Resources