Best way to wait for infiniband receive completions on Linux?

Best way to wait for infiniband receive completions on Linux? - linux

We're porting Isis2 (isis2.codeplex.com) to make better use of Infiniband verbs and have our code running. However, IB is oriented around an asynchronous receive model in which you post a bunch of receive buffers and then, as receives complete, you process the received data.
Polling is slow: if I use a blocking wait for, say, 2ms, I might delay as long as 2ms before seeing the IB data. So that's a solution, but a poor one. What I really want is a way to wait until an IB completion record is finalized and then to have my thread wake up instantly (on Windows this is easy... on Linux it isn't as natural). Does anyone know how one does this? When using Verbs, there isn't any IB file descriptor, so obviously I can't use select()

Never mind; we just realized that they offer a method (ibv_reg_notify_cq) for this. We'll try that. Not the world's best documented API...

Related

Esper UpdateListener's concurrency

My boss want to me learning Esper, the open source library for CEP, so I need some help.
I want to many UpdateListener subscribing one event stream, and they run on concurrently. That means, if one listener have a long and big process, then other listener running concurrency, because we have so many event at short time, so I need more fast processing.

The UpdateListener code can simply use a Java threadpool to do its work. For an example there is http://www.javacodegeeks.com/2013/01/java-thread-pool-example-using-executors-and-threadpoolexecutor.html.
In Esper you can also configure threading.
http://esper.codehaus.org/esper-5.1.0/doc/reference/en-US/html_single/index.html#api-threading-advanced

Node Background Threads - When Do These Get Created?

I've been doing a fair amount of work with Node lately, trying to build a system which has certain characteristics, one of which is non-blocking / parallelism - a Node strong suit, as I understand it.
What I don't fully understand is when a separate thread is spun off to handle some processing. I'm pretty sue this happens on a function call/call back, but certainly not all of them.
In my specific case, it's an Express based app. At app start-up it does several things including instantiating a RabbitMQ based "bus", an object with a method which will write to the bus (objA) and object which will subscribe to the bus and process messages coming across it (objB).
objA will write to the bus inside an express callback
app.put((req,res) => {
objA.methodWhichWritesToBus();
});
I believe at this point, that objA.methodWhichWritesToBus is executed in a background/worker thread - whatever you call it, not on the main event loop.
Is that the only point at which this sort of thing happens? methodWhichWritesToBus is IO instensive (it calls an elastic search service on another box and brings back 10's to 100's of thousands of records) with lots of chained promises etc., but none of that gets split off, does it?
How about the fact that the obj on which the method is called is instantiated outside the Express callback - does that affect the parallel-ism?
Finally, are the ways to effect/force a method etc to "run in the background"?
I've been noodling this, testing it, for awhile now but all on one machine so it's difficult to tell what's going on.
Who can clarify this for me?

Pre-answer: this is a topic best learned by going and reading, doing coding exercises to solidify your understanding, and working with the technology in a significant way. You're not going to "get it" based on a Q&A format. That said...
What I don't fully understand is when a separate thread is spun off to handle some processing.
Never, sort of. "Processing" as in the computation that happens in your javascript program, happens in the main event loop thread. End of story. However, waiting on I/O to come back from the OS is not considered "processing" so there are various queues managed by node and the OS to track pending I/O requests and invoke callbacks when data is ready. There are a handful of threads node uses internally to manage this stuff with the OS, but from your program's perspective, those threads are irrelevant. Your program can ask node to do some IO, then your program keeps running in parallel, and when the I/O is done, node will eventually invoke the callback in the main event loop and you can process the results.
I believe at this point, that objA.methodWhichWritesToBus is executed in a background/worker thread - whatever you call it, not on the main event loop.
You call it "asynchronously" and it happens whenever you do IO, including filesystem calls, networking, or child processes. Which is to say, quite a lot.
How about the fact that the obj on which the method is called is instantiated outside the Express callback - does that affect the parallel-ism?
Nope.
Finally, are the ways to effect/force a method etc to "run in the background"?
Generally I/O is done asynchronously by default, so no you don't normally need to force anything to run in the background. It's baked into the node design by way of the node core APIs themselves. However, there are ways to delay synchronous processing to a future event loop using setImmediate, setTimeout, or process.nextTick. I explain these in some detail in my blog post setTimeout and friends.
More precisely, all networking is asynchronous. End of story. Specifically, the APIs in node core that are available are all asynchronous, and there's simply no synchronous API available in node. For filesystem IO and child processes, there are both synchronous and asynchronous APIs, but the synchronous APIs must only be used under special limited circumstances, and if you don't know confidently that it's OK in this specific case to make a synchronous IO API call, you should use the asynchronous API so you don't break the lynchpin that makes node perform as it does.

Producer/Consumer in the kernel space - Linux

I would like to have one thread to queue some requests in a request queue and another to serve these requests. The producer should wake up the consumer when there is a new request queued.
Is there anyone who has done this already or knows how to do it?
I have tried several tutorials on the internet and none of them really worked cleanly. They either miss a request, cause a system lockup/instability, or they just do not terminate.
Note: My question in essence is similar to this one. However, I wont be specific like the one who asked that question. Anyone who can/willing to help can just throw his two cents and may be we can work something out.
Thanks!

You can use Work Queues. Work Queues are simple, once you set up up your work queue, you use something like the following:
DECLARE_WORK(name, void (*function)(void *), void *data);
Your function call will be scheduled and called later, take a look at this article.
I also highly recommend you this book: Linux Device Drivers
edit: I just saw you already linked an SO post where they use work queues. Have you tried it out? You run into some issues? I suggest you start with an really simple example, just to try out if it's working. Implement your core functionality later.
Update:
From the official Documentation:
Some users depend on the strict execution ordering of ST wq. The
combination of #max_active of 1 and WQ_UNBOUND is used to achieve this
behavior. Work items on such wq are always queued to the unbound
worker-pools and only one work item can be active at any given time
thus achieving the same ordering property as ST wq.
That way you will have a guaranteed FIFO execution of your workers. But be aware that the work may be executed on different CPUs. You have to use memory barriers to ensure visibility (eg. wmb()).
Update:
As #user2009594 mentioned, a single threaded wq can be created using the following macro defined in linux/workqueue.h:
#define create_singlethread_workqueue(name) \
alloc_workqueue("%s", WQ_UNBOUND | WQ_MEM_RECLAIM, 1, (name)))

Multicast Netlink sockets can work here greatly. Recently I did the same; only difference was that my consumer was in kernel while producers in user space: same can be used in kernel only space.

Outputting console data from a process to gui in wxwidgets

I'm running a long process in the background. I've managed to output the console data to gui. But the problem is that, the data is returned only after the process is finished. But I need to display the data at realtime. ie, I need to display the data, every time it produces some output on the console. I'm running the process with in my gui from a seperate thread.
I mean, it would be like building a gui for the ping command, where output is displayed on console after each packet is send, ie at realtime. I just need to redirect that to gui, in realtime. I'm implementing the gui in wxwidgets. Any help would be greatly appreciated.
Thanking You..
Jvc

Is the output you wish to display generated in a separate process from the process running the GUI? Or in a separate thread in the same process?
I ask because most people, when they ask this question, mean a a separate thread. Since you have tagged your question with "process" I will assume that is what you mean.
You need some inter-process communication. There is a bewildering variety of techniques to do this. Personally, I always use sockets.
wxWidgets has simple, easy to use socket classes wxSocketClient and wxSocketServer.
The background process is probably not running wxWidgets, so you will need something else there. I reccomend boost::asio. I know it looks intimidating, but in fact the tutorial code can be used as is.
There is a lot more to be said, but I risk straying away from the point, since there are so few details in your question.

You can have an output queue protected by a wxMutex. The thread doing the computation writes to the queue, then signals the GUI thread using wxQueueEvent with a custom event to let it know that the thread is not empty. The GUI thread then reads the queue and outputs the data.

winsock 2. thread safety for simultaneous send's. tcp

is it possible to have multiple threads sending on the same socket? will there be interleaving of the streams or will the socket block on the first thread (assuming tcp)? the majority of opinions i've found seems to warn against doing this for obvious fears of interleaving, but i've also found a few comments that state the opposite. are interleaving fears a carryover from winsock1 and are they well-founded for winsock2? is there a way to setup a winsock2 socket that would allow for lack of local synchronization?
two of the contrary opinions below... who's right?
comment 1
"Winsock 2 implementations should be completely thread safe. Simultaneous reads / writes on different threads should succeed, or fail with WSAEINPROGRESS, depending on the setting of the overlapped flag when the socket is created. Anyway by default, overlapped sockets are created; so you don't have to worry about it. Make sure you don't use NT SP6, if ur on SP6a, you should be ok !"
source
comment 2
"The same DLL doesn't get accessed by multiple processes as of the introduction of Windows 95. Each process gets its own copy of the writable data segment for the DLL. The "all processes share" model was the old Win16 model, which is luckily quite dead and buried by now ;-)"
source
looking forward to your comments!
jim
~edit1~
to clarify what i mean by interleaving. thread 1 sends the msg "Hello" thread 2 sends the msg "world!". recipient receives: "Hwoel lorld!". this assumes both messages were NOT sent in a while loop. is this possible?

I'd really advice against doing this in any case. The send functions might send less than you tell it to for various very legit reasons, and if another thread might enter and try to also send something, you're just messing up your data.
Now, you can certainly write to a socket from several threads, but you've no longer any control over what gets on the wire unless you've proper locking at the application level.
consider sending some data:
WSASend(sock,buf,buflen,&sent,0,0,0:
the sent parameter will hold the no. of bytes actually sent - similar to the return value of the send()function. To send all the data in buf you will have to loop doing a WSASend until all all the data actually get sent.
If, say, the first WSASend sends all but the last 4 bytes, another thread might go and send something while you loop back and try to send the last 4 bytes.
With proper locking to ensure that can't happen, it should e no problem sending from several threads - I wouldn't do it anyway just for the pure hell it will be to debug when something does go wrong.

is it possible to have multiple threads sending on the same socket?
Yes - although, depending on implementation this can be more or less visible. First, I'll clarify where I am coming from:
C# / .Net 3.5
System.Net.Sockets.Socket
The overall visibility (i.e. required management) of threading and the headaches incurred will be directly dependent on how the socket is implemented (synchronously or asynchronously). If you go the synchronous route then you have a lot of work to manually manage connecting, sending, and receiving over multiple threads. I highly recommend that this implementation be avoided. The efforts to correctly and efficiently perform the synchronous methods in a threaded model simply are not worth the comparable efforts to implement the asynchronous methods.
I have implemented an asynchronous Tcp server in less time than it took for me to implement the threaded synchronous version. Async is much easier to debug - and if you are intent on Tcp (my favorite choice) then you really have few worries in lost messages, missing data, or whatever.
will there be interleaving of the streams or will the socket block on the first thread (assuming tcp)?
I had to research interleaved streams (from wiki) to ensure that I was accurate in my understanding of what you are asking. To further understand interleaving and mixed messages, refer to these links on wiki:
Real Time Messaging Protocol
Transmission Control Protocol
Specifically, the power of Tcp is best described in the following section:
Due to network congestion, traffic load balancing, or other unpredictable network behavior, IP packets can be
lost, duplicated, or delivered out of order. TCP detects these problems, requests retransmission of lost
packets, rearranges out-of-order packets, and even helps minimize network congestion to reduce the
occurrence of the other problems. Once the TCP receiver has finally reassembled a perfect copy of the data
originally transmitted, it passes that datagram to the application program. Thus, TCP abstracts the application's
communication from the underlying networking details.
What this means is that interleaved messages will be re-ordered into their respective messages as sent by the sender. It is expected that threading is or would be involved in developing a performance-driven Tcp client/server mechanism - whether through async or sync methods.
In order to keep a socket from blocking, you can set it's Blocking property to false.
I hope this gives you some good information to work with. Heck, I even learned a little bit...

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string