Can consumer publish events to the same ring buffer? - disruptor-pattern

Sure it can. But does that correct usage of Disruptor pattern?
My consumer prepare some data for next consumer. As far as I can see I can organize second consumer to wait for the first one. But how to pass calculated data between them?
Thanks for all in advance!

EventHandlers do not generally claim & publish to new slots. However they are perfectly entitled to write back to the slot they are consuming from and hence have subsequent consumers see their results.
The canonical example would be un-marshalling. EventHandler1 reads the bytes from the slot, and writes the un-marhalled object to a different field in the ring buffer entry.

Related

QSemaphore - implementing overwrite policy

I want to implement ring buffer for classic Producer--Consumer interaction. In the future both P and C will be implemented as permanent threads running during data processing task, and GUI will be the third thread only for displaying actual data and coordinate starts and stops of data processing by user interaction. C can be quite slow to be able to fully process all incoming data, but only a bit and for a short periods of time. So I want to just allocate ring buffer of several P's MTUs in size, but in any case, if C will be too slow to process existing data it's okay to loose old data in favor of new one (overwrite policy).
I've read QSemaphore example in Qt help and realized that by usage of semaphore's acquires and releases I can only implement discard policy, because acquiring of specified chunk in queue will block until there are no free space.
Are there any ways of implementing overwrite policy together with QSemaphore or I just need to go and implement another approach?
I've came to this solution. If we should push portion of the src data to the ring buffer at any costs (it's ok to drop possible newly incoming data) we should use acquire() in Producer part - that would provide us discard policy. In case we need overwrite policy we should use tryAcquire() in Producer - thus at the very first possible moment of time only the newest data will be pushed to the ring buffer

Is my design for sending data to clients at various intervals correct?

The code should be written in C++. I'm mentioning this just in case someone will suggest a solution that won't work efficient when implementing in C++.
Objective:
Producer that runs on thread t1 inserts images to Consumer that runs on thread t2. The consumer has a list of clients that he should send the images to at various intervals. E.g. client1 requires images every 1sec, client2 requires images every 5sec and etc.
Suggested implementation:
There is one main queue imagesQ in Consumer to which Producer enqueues images to. In addition to the main queue, the Consumer manages a list of vector of queues clientImageQs of size as number of clients. The Consumer creates a sub-consumer, which runs on its own thread, for each client. Each such sub-consumer dequeues the images from a relevant queue from clientImageQs and sends images to its client at its interval.
Every time a new image arrives to imagesQ, the Consumer duplicates it and enqueus to each queue in clientImageQs. Thus, each sub-consumer will be able to send the images to its client at its own frequency.
Potential problem and solution:
If Producer enqueues images at much higher rate than one of the sub-consumers dequeues, the queue will explode. But, the Consumer can check the size of the queue in clientImageQs before enqueuing. And, if needed, Consumer will dequeue a few old images before enqueuing new ones.
Question
Is this a good design or there is a better one?
You describe the problem within a set of already determined solution limitations. Your description is complex, confusing, and I dare say, confused.
Why have a consumer that only distributes images out of a shared buffer? Why not allow each "client" as you call it read from the buffer as it needs to?
Why not implement the shared buffer as a single-image buffer. The producer writes at its rate. The clients perform non-destructive reads of the buffer at their own rate. Each client is ensured to read the most recent image in the buffer whenever the client reads the buffer. The producer simply over-writes the buffer with each write.
A multi-element queue offers no benefit in this application. In fact, as you have described, it greatly complicates the solution.
See http://sworthodoxy.blogspot.com/2015/05/shared-resource-design-patterns.html Look for the heading "unconditional buffer".
The examples in the posting listed above are all implemented using Ada, but the concepts related to concurrent design patterns are applicable to all programming languages supporting concurrency.

Application design for parallel collection processing

I'm experimenting with the System.Collections.Concurrent namespace but I have a problem implementing my design.
My input queue (ConcurrentQueue) is getting populated fine from a Thread which is doing some I/O at startup to read and parse.
Next I kick off a Parallel.ForEach() on the input queue. I'm doing some I/O bound work on each item.
A log item is created for each item processed in the ForEach() and is dropped into a result queue.
What I would like to do is kick off the logging I start reading the input because I may not be able to fit all of the log items in memory. What is the best way to wait for items to land in the result queue? Are there design patterns or examples that I should be looking at?
I think the pattern you're looking for is the producer/consumer pattern. More specifically, you can have a producer/consumer implementation built around TPL and BlockingCollection.
The main concepts you want to read about are:
Task,
BlockingCollection,
TaskFactory.ContinueWhenAll(will allow you to perform some action when a set of tasks/threads is finished running).
Bounding and Blocking in BlockingCollection. This allows you to set a maximum size for your output collection (for memory reasons) and producer thread(s) will wait for consumers to pick up elements in case the maximum size you specify is reached.
BlockingCollection.CompleteAdding and BlockingCollection.IsCompleted which can be used to synchronize producers and consumers (producer can say when it's finished, consumer can check for that and keep running until the producer(s) are finised).
A more complete sample is in the second article I linked.
In your case I think you want the consumer to just pick up things from the result queue and dispose of them as soon as possible (write them to a logging store, or similar).
So your final collection, where you dump log items should be a BlockingCollection, not a ConcurrentQueue.

How does one determine if all messages in an Azure Queue have been processed?

I've just begun tinkering with Windows Azure and would appreciate help with a question.
How does one determine if a Windows Azure Queue is empty and that all work-items in it have been processed? If I have multiple worker processes querying a work-item queue, GetMessage(s) returns no messages if the queue is empty. But there is no guarantee that a currently invisible message will not be pushed back into the queue.
I need this functionality since follow-up behavior of my workflow depends on completion of all work-items in that particular queue. A possible way of tackling this problem would be to count the number of puts and deletes. But this will again require synchronization at a shared storage level and I would like to avoid it if possible.
Any ideas?
Take a look at the ApproximateMessageCount method. This should return the number of messages on the queue, including invisible messages (e.g. the ones being processed).
Mike Wood blogged about this subtlety, along with a tidbit about the queue's Clear method, here.
That said: you might want to choose a different mechanism for workflow management. Maybe a table row, where you have your rowkey equal to some multi-queue-item transation id, and individual properties being status flags. This allows you to track failed parts of the transaction (say, 9 out of 10 queue items process ok, the 10th fails; you can still delete the 10th queue item, but set its status flag to failed, then letting you deal with this scenario accordingly). Also: let's say you use the same queue to process another 'transaction' (meaning the queue is again non-zero in length). By using a separate object like a Table Row, you can still determine that your 'transaction' is complete even though there are additional queue messages.
The best way is to have another queue, call it termination indicator queue, and put a message in that queue for every message your process from your main queue. That is how it is done in research projects too. Check this out http://www.cs.gsu.edu/dimos/content/gis-vector-data-overlay-processing-azure-platform.html

Patterns to azure idempotent operations?

anybody know patterns to design idempotent operations to azure manipulation, specially the table storage? The more common approach is generate a id operation and cache it to verify new executions, but, if I have dozen of workers processing operations this approach will be more complicated. :-))
Thank's
Ok, so you haven't provided an example, as requested by knightpfhor and codingoutloud. That said, here's one very common way to deal with idempotent operations: Push your needed actions to a Windows Azure queue. Then, regardless of the number of worker role instances you have, only one instance may work on a specific queue item at a time. When a queue message is read from the queue, it becomes invisible for the amount of time you specify.
Now: a few things can happen during processing of that message:
You complete processing after your timeout period. When you go to delete the message, you get an exception.
You realize you're running out of time, so you increase the queue message timeout (today, you must call the REST API to do this; one day it'll be included in the SDK).
Something goes wrong, causing an exception in your code before you ever get to delete the message. Eventually, the message becomes visible in the queue again (after specified invisibility timeout period).
You complete processing before the timeout and successfully delete the message.
That deals with concurrency. For idempotency, that's up to you to ensure you can repeat an operation without side-effects. For example, you calculate someone's weekly pay, queue up a print job, and store the weekly pay in a Table row. For some reason, a failure occurs and you either don't ever delete the message or your code aborts before getting an opportunity to delete the message.
Fast-forward in time, and another worker instance (or maybe even the same one) re-reads this message. At this point, you should theoretically be able to simply re-perform the needed actions. If this isn't really possible in your case, you don't have an idempotent operation. However, there are a few mechanisms at your disposal to help you work around this:
Each queue message has a DequeueCount. You can use this to determine if the queue message has been processed before and, if so, take appropriate action (maybe examine the Table row for that employee, for example).
Maybe there are stages of your processing pipeline that can't be repeated. In that case: you now have the ability to modify the queue message contents while the queue message is still invisible to others and being processed by you. So, imagine appending something like |SalaryServiceCalled . Then a bit later, appending |PrintJobQueued and so on. Now, if you have a failure in your pipeline, you can figure out where you left off, the next time you read your message.
Hope that helps. Kinda shooting in the dark here, not knowing more about what you're trying to achieve.
EDIT: I guess I should mention that I don't see the connection between idempotency and Table Storage. I think that's more of a concurrency issue, as idempotency would need to be dealt with whether using Table Storage, SQL Azure, or any other storage container.
I believe you can use Reply log storage way to solve this problem

Resources