In LMAX disruptor,How producer knows consumer has finished the job - disruptor-pattern

Is there anyway consumers inform producer that any particular event is processed successfully ?

Not directly no, and given that the disruptor is intended decouple processing doing so would negate all the performance gains which it provides.

You can wire this up, if you like. Something like
class MyEvent {
String payload;
Runnable onComplete;
}
Then have your event handler/consumer/whatever run onComplete after it handles the payload. Replace runnable with a Future or any other thing you might like, but this pattern works best for asynchronous things (maybe the runnable sets a done flag in your UI?)
Disruptor tends to be a poor fit for things where the caller needs to block for the result.

could you not just maintain counters in your producer(s) and consumer(s), producer(s) increment counter before publishing, consumer(s) increment counter after processing, then have the producer compare the produced_count with the consumed_count?

Related

Why does the Disruptor hold lots of data when the producer is much faster than the consumer?

I'm learning about the LMAX Disruptor and have a problem: When I have a very large ring buffer, like 1024, and my producer is much faster than my consumer, the ring buffer will hold lots of data, but will not publish the events until my application ends. Which means my application will lose lots of data (my application is not a daemon).
I've tried to slow down the rate of the producer, which works. But I can't use this approach in my application, it would reduce my application's performance greatly.
val ringBufferSize = 1024
val disruptor = new Disruptor[util.Map[String, Object]](new MessageEventFactory, ringBufferSize, new MessageThreadFactory, ProducerType.MULTI, new BlockingWaitStrategy)
disruptor.handleEventsWith(new MessageEventHandler(batchSize, this))
disruptor.setDefaultExceptionHandler(new MessageExceptionHandler)
val ringBuffer = disruptor.start
val producer = new MessageEventProducer(ringBuffer)
part.foreach { row =>
// Thread.sleep(2000)
accm.add(1)
producer.onData(row)
// flush(row)
}
I want to find a way to control the batch size of the disruptor by myself, and is there any method to consume the rest of the data held at the end of my application?
If you let your application end abruptly, your consumers will end abruptly, too, of course. There is no need to slow down the producer, you simply need to block your application from exiting until all consumers (i. e. event handlers) have finished working on the outstanding events.
The normal way to do this is to invoke Disruptor.shutdown() on the main thread, thus blocking the application from exiting until Disruptor.shutdown() has returned.
In your code snipplet above, you'd add that command before you exit the routine after the part.foreach statement, blocking until the routine returns normally. That would ensure that all events are properly handled to completion.
The Disruptor excels mainly in buffering (smoothing out) bursts of data coming from a single (extremely fast) or multiple (still pretty fast) producer threads, to feed that data to consumers which perform in a predictable manner, thus eliminating as much latency and overhead due to lock contention as possible. You may find that simply invoking the consumer code from within your lambda may yield better or similar results if your producers are in fact much faster than your consumers, unless you use advanced techniques such as batching or setting up the Disruptor to run multiple instances of the same consumer in parallel threads, which requires the event handler implementation to be modified though (see the Disruptor FAQ).
In your example, it seems that all you try to accomplish is to feed an already available set of data (your "part" collection) into a single event handler (MessageEventHandler). In such a use case, you might be better off saying something like parts.stream().parallel().foreach(... messageEventHanler.onEvent(event) ...)

Does RxJava Parallelization Break the Observable Contract?

Ben Christensen posted here that the best way to currently achieve parallelism in RxJava is to create another Observable and subscribe it on a scheduler as shown below.
streamOfItems.flatMap(item -> {
doStuffWithItem(item).subscribeOn(Schedulers.io());
});
However, the Observable Contract says that an onNext() call may be called any number of times, as long as the calls do not overlap. Well, any operators in the rest of the chain following the one above could now easily break that rule (unless they explicitly do some sort of synchronization/serialization).
My impression is RxJava prefers to keep a stream of emissions on one thread at a time and switching a steady sequential stream from one thread to another at specific operators, but never in parallel (as depicted below).
observeOn() thread -------------------------Y----Y----Y-------------
subscribeOn() thread ----X----X----X----X-----------------------------
With a parallel approach, I understand the chart may look something like this and that looks pretty overlapped to me.
par subscribeOn() thread 3 -------------------------Y-----Y---------------
par subscribeOn() thread 2 ---------------------------Y---Y---------------
par subscribeOn() thread 1 -------------------------Y-------------Y-------
initial subscribeOn() thread ----X----X----X----X---------------------------
Did I misunderstand anything or make broad assumptions? Is parallelism not breaking the Observable contract? Does that make it not preferable in some way?
If you are using standard operators, nothing will break the Observable contract because whenever concurrency may happen, the operators serialize their output. In your example, flatMap does this so its output is guaranteed to be sequential (although the the reception thread may switch back and forth).
This is, however, not generally true for different stages of the same pipeline if those are separated by an asynchronous boundary or an operator that may do thread arbitration.

Serial Dispatch Queue with Asynchronous Blocks

Is there ever any reason to add blocks to a serial dispatch queue asynchronously as opposed to synchronously?
As I understand it a serial dispatch queue only starts executing the next task in the queue once the preceding task has completed executing. If this is the case, I can't see what you would you gain by submitting some blocks asynchronously - the act of submission may not block the thread (since it returns straight-away), but the task won't be executed until the last task finishes, so it seems to me that you don't really gain anything.
This question has been prompted by the following code - taken from a book chapter on design patterns. To prevent the underlying data array from being modified simultaneously by two separate threads, all modification tasks are added to a serial dispatch queue. But note that returnToPool adds tasks to this queue asynchronously, whereas getFromPool adds its tasks synchronously.
class Pool<T> {
private var data = [T]();
// Create a serial dispath queue
private let arrayQ = dispatch_queue_create("arrayQ", DISPATCH_QUEUE_SERIAL);
private let semaphore:dispatch_semaphore_t;
init(items:[T]) {
data.reserveCapacity(data.count);
for item in items {
data.append(item);
}
semaphore = dispatch_semaphore_create(items.count);
}
func getFromPool() -> T? {
var result:T?;
if (dispatch_semaphore_wait(semaphore, DISPATCH_TIME_FOREVER) == 0) {
dispatch_sync(arrayQ, {() in
result = self.data.removeAtIndex(0);
})
}
return result;
}
func returnToPool(item:T) {
dispatch_async(arrayQ, {() in
self.data.append(item);
dispatch_semaphore_signal(self.semaphore);
});
}
}
Because there's no need to make the caller of returnToPool() block. It could perhaps continue on doing other useful work.
The thread which called returnToPool() is presumably not just working with this pool. It presumably has other stuff it could be doing. That stuff could be done simultaneously with the work in the asynchronously-submitted task.
Typical modern computers have multiple CPU cores, so a design like this improves the chances that CPU cores are utilized efficiently and useful work is completed sooner. The question isn't whether tasks submitted to the serial queue operate simultaneously — they can't because of the nature of serial queues — it's whether other work can be done simultaneously.
Yes, there are reasons why you'd add tasks to serial queue asynchronously. It's actually extremely common.
The most common example would be when you're doing something in the background and want to update the UI. You'll often dispatch that UI update asynchronously back to the main queue (which is a serial queue). That way the background thread doesn't have to wait for the main thread to perform its UI update, but rather it can carry on processing in the background.
Another common example is as you've demonstrated, when using a GCD queue to synchronize interaction with some object. If you're dealing with immutable objects, you can dispatch these updates asynchronously to this synchronization queue (i.e. why have the current thread wait, but rather instead let it carry on). You'll do reads synchronously (because you're obviously going to wait until you get the synchronized value back), but writes can be done asynchronously.
(You actually see this latter example frequently implemented with the "reader-writer" pattern and a custom concurrent queue, where reads are performed synchronously on concurrent queue with dispatch_sync, but writes are performed asynchronously with barrier with dispatch_barrier_async. But the idea is equally applicable to serial queues, too.)
The choice of synchronous v asynchronous dispatch has nothing to do with whether the destination queue is serial or concurrent. It's simply a question of whether you have to block the current queue until that other one finishes its task or not.
Regarding your code sample code, that is correct. The getFromPool should dispatch synchronously (because you have to wait for the synchronization queue to actually return the value), but returnToPool can safely dispatch asynchronously. Obviously, I'm wary of seeing code waiting for semaphores if that might be called from the main thread (so make sure you don't call getFromPool from the main thread!), but with that one caveat, this code should achieve the desired purpose, offering reasonably efficient synchronization of this pool object, but with a getFromPool that will block if the pool is empty until something is added to the pool.

QThread execution freezes my GUI

I'm new to multithread programming. I wrote this simple multi thread program with Qt. But when I run this program it freezes my GUI and when I click inside my widow, it responds that your program is not responding .
Here is my widget class. My thread starts to count an integer number and emits it when this number is dividable by 1000. In my widget simply I catch this number with signal-slot mechanism and show it in a label and a progress bar.
Widget::Widget(QWidget *parent) :
QWidget(parent),
ui(new Ui::Widget)
{
ui->setupUi(this);
MyThread *th = new MyThread;
connect( th, SIGNAL(num(int)), this, SLOT(setNum(int)));
th->start();
}
void Widget::setNum(int n)
{
ui->label->setNum( n);
ui->progressBar->setValue(n%101);
}
and here is my thread run() function :
void MyThread::run()
{
for( int i = 0; i < 10000000; i++){
if( i % 1000 == 0)
emit num(i);
}
}
thanks!
The problem is with your thread code producing an event storm. The loop counts very fast -- so fast, that the fact that you emit a signal every 1000 iterations is pretty much immaterial. On modern CPUs, doing a 1000 integer divisions takes on the order of 10 microseconds IIRC. If the loop was the only limiting factor, you'd be emitting signals at a peak rate of about 100,000 per second. This is not the case because the performance is limited by other factors, which we shall discuss below.
Let's understand what happens when you emit signals in a different thread from where the receiver QObject lives. The signals are packaged in a QMetaCallEvent and posted to the event queue of the receiving thread. An event loop running in the receiving thread -- here, the GUI thread -- acts on those events using an instance of QAbstractEventDispatcher. Each QMetaCallEvent results in a call to the connected slot.
The access to the event queue of the receiving GUI thread is serialized by a QMutex. On Qt 4.8 and newer, the QMutex implementation got a nice speedup, so the fact that each signal emission results in locking of the queue mutex is not likely to be a problem. Alas, the events need to be allocated on the heap in the worker thread, and then deallocated in the GUI thread. Many heap allocators perform quite poorly when this happens in quick succession if the threads happen to execute on different cores.
The biggest problem comes in the GUI thread. There seems to be a bunch of hidden O(n^2) complexity algorithms! The event loop has to process 10,000 events. Those events will be most likely delivered very quickly and end up in a contiguous block in the event queue. The event loop will have to deal with all of them before it can process further events. A lot of expensive operations happen when you invoke your slot. Not only is the QMetaCallEvent deallocated from the heap, but the label schedules an update() (repaint), and this internally posts a compressible event to the event queue. Compressible event posting has to, in worst case, iterate over entire event queue. That's one potential O(n^2) complexity action. Another such action, probably more important in practice, is the progressbar's setValue internally calling QApplication::processEvents(). This can, recursively call your slot to deliver the subsequent signal from the event queue. You're doing way more work than you think you are, and this locks up the GUI thread.
Instrument your slot and see if it's called recursively. A quick-and-dirty way of doing it is
void Widget::setNum(int n)
{
static int level = 0, maxLevel = 0;
level ++;
maxLevel = qMax(level, maxLevel);
ui->label->setNum( n);
ui->progressBar->setValue(n%101);
if (level > 1 && level == maxLevel-1) {
qDebug("setNum recursed up to level %d", maxLevel);
}
level --;
}
What is freezing your GUI thread is not QThread's execution, but the huge amount of work you make the GUI thread do. Even if your code looks innocuous.
Side Note on processEvents and Run-to-Completion Code
I think it was a very bad idea to have QProgressBar::setValue invoke processEvents(). It only encourages the broken way people code things (continuously running code instead of short run-to-completion code). Since the processEvents() call can recurse into the caller, setValue becomes a persona-non-grata, and possibly quite dangerous.
If one wants to code in continuous style yet keep the run-to-completion semantics, there are ways of dealing with that in C++. One is just by leveraging the preprocessor, for example code see my other answer.
Another way is to use expression templates to get the C++ compiler to generate the code you want. You may want to leverage a template library here -- Boost spirit has a decent starting point of an implementation that can be reused even though you're not writing a parser.
The Windows Workflow Foundation also tackles the problem of how to write sequential style code yet have it run as short run-to-completion fragments. They resort to specifying the flow of control in XML. There's apparently no direct way of reusing standard C# syntax. They only provide it as a data structure, a-la JSON. It'd be simple enough to implement both XML and code-based WF in Qt, if one wanted to. All that in spite of .NET and C# providing ample support for programmatic generation of code...
The way you implemented your thread, it does not have its own event loop (because it does not call exec()). I'm not sure if your code within run() is actually executed within your thread or within the GUI thread.
Usually you should not subclass QThread. You probably did so because you read the Qt Documentation which unfortunately still recommends subclassing QThread - even though the developers long ago wrote a blog entry stating that you should not subclass QThread. Unfortunately, they still haven't updated the documentation appropriately.
I recommend reading "You're doing it wrong" on Qt Blog and then use the answer by "Kari" as an example of how to set up a basic multi-threaded system.
But when I run this program it freezes my GUI and when I click inside my window,
it responds that your program is not responding.
Yes because IMO you're doing too much work in thread that it exhausts CPU. Generally program is not responding message pops up when process show no progress in handling application event queue requests. In your case this happens.
So in this case you should find a way to divide the work. Just for the sake of example say, thread runs in chunks of 100 and repeat the thread till it completes 10000000.
Also you should have look at QCoreApplication::processEvents() when you're performing a lengthy operation.

Is there such a thing as a lockless queue for multiple read or write threads?

I was thinking, is it possible to have a lockless queue when more than one thread is reading or writing? I've seen an implementation with a lockless queue that worked with one read and one write thread but never more than one for either. Is it possible? I don't think it is. Can/does anyone want to prove it?
There are multiple algorithms available, I ended up implementing the An Optimistic Approach to Lock-Free FIFO Queues, which avoids the ABA problem via pointer-tagging (needs the CMPXCHG8B instruction on x86), and it runs fine in a production app (written in Delphi). (Another version, with Java code)
Nevertheless, to be really-really lockless, you would also need a lock-free memory allocator - see Scalable Lock-Free Dynamic Memory Allocation (implemented in Concurrent Building Block) or NBMalloc (but so far, I didn't get to use one of these).
You may also want to look at answers for optimistic lock-free FIFO queues impl?
Java's implementation of a Lockless Queue allows both reads and writes. This work is done with a compare and set operation (which is a single CPU instruction).
The ConcurrentLinkedQueue uses a method in which threads help each other read (or poll) objects from the queue. Since it is linked, the head of the queue can accept writes while the tail of the queue can accept reads (assuming enough room). All of this can be done in parallel and is completely thread safe.
With .NET 4.0, there is ConcurrentQueue(T) Class.
According to C# 4.0 in a nutshell, this is a lock free implementation. See also this blog entry.
You don't specifically need a lock, but an atomic way of deleting things from the queue. This is also possible without a lock and with an atomic test-and-set instruction.
There is a dynamic lock free queue in the OmniThreadLibrary by Primoz Gabrijelcic (the Delphi Geek): http://www.thedelphigeek.com/2010/02/omnithreadlibrary-105.html
With .NET 4.0, there is ConcurrentQueue Class.
Sample
https://dotnetfiddle.net/ehLZCm
public static void Main()
{
PopulateQueueParallel(new ConcurrentQueue<string>(), 500);
}
static void PopulateQueueParallel(ConcurrentQueue<string> queue, int queueSize)
{
Parallel.For(0, queueSize, (i) => queue.Enqueue(string.Format("my message {0}", i)));
Parallel.For(0, queueSize,
(i) =>
{
string message;
bool success = queue.TryDequeue(out message);
if (!success)
throw new Exception("Error!");
Console.WriteLine(message);
});
}

Resources