Node.js: When exactly one should emit a 'drain' event? - node.js

I'm the author of this node.js lib: async-kit.
I'm currently coding a worker/queue feature.
By default, the worker process one job at a time, but the concurrency level can be adjusted to anything.
Suppose I create a worker with a concurrency level of 3, so up to 3 async job can be pending at any time. At the time of writing, my lib will emit the 'drain' event when no more jobs are pending, i.e. when all jobs have triggered their callback. Let's call that the 'no-more-pending-job' event for instance.
However, one may want an event to be emitted anytime the number of pending jobs drop below 3, meaning that if a new job is added to the queue, it will run immediately. Let's call it temporarily the 'ready-to-process-new-job-right-now' event. In fact, this event is way more useful than the 'no-more-pending-job' event.
So... What is exactly the standard in node.js for the 'drain' event? Is it more like 'no-more-pending-job' or like 'ready-to-process-new-job-right-now'?
I tend toward the first one, also the second one seems way more useful, and I can't find a lib that implement the two types of event, so I can't find either a standardized name for the 'ready-to-process-new-job-right-now'.
Also it is important to notice that for the "one-job-at-a-time" default behavior, those two events would fire at the same time anyway.
Any advice?

Related

Event sourcing concurrency issue across multiple instances

I am new to Event sourcing concept so there are a couple of moments I don't understand. One of them is how to handle following scenario:
I've got 2 instances of a service. Both of them listen to a event queue. There are two messages: CreateUser and UpdateUser. First instance picks up CreateUser and second instance picks up UpdateUser. For some reason second instance will handle its command quicker but there will be no User to update, since it was not created.
What am I getting wrong here?
What am I getting wrong here?
Review: Race Conditions Don't Exist
A microsecond difference in timing shouldn’t make a difference to core business behaviors.
In other words, what you want is logic such that the order of the messages doesn't change the final result, and a first writer wins policy (aka compare-and-swap), so that when you have two processes trying to update the same resource, the loser of the data race has to start over.
As a general rule, events should be understood to support multiple observers - all subscribers get to see all events. So a queue with competing consumers isn't the usual approach unless you are trying to distribute a specific subscriber across multiple processes.
You do not have a concurrency issue you can solve. This totally runs down to either using bad tools or not reading the documentation.
Both of them listen to a event queue.
And that queue should support that. Example are azure queues, where I Can listen AND TELL THE QUEUE not to show the event to anyone else for X seconds (which is enough for me to decide whether i handled it or not). If I do not answer -> event is reinserted after that time. If I kill it first, there is no concurrency.
So, you need a backend queue that can handle this.

Event Sourcing with Side-Effects

I'm building a service using the familiar event sourcing pattern:
A request is received.
The aggregate's history is loaded.
The aggregate is rebuilt (from its history).
New events are prepared and the aggregate is updated in response to the incoming request from Step 1.
These events are written to the log, and are made available (published) to any subscribers.
In my case, Step 5 is accomplished in two parts. The events are written to the event log. A background process reads from the event log and publishes all events starting from an offset.
In some cases, I need to publish side effects in addition to events related to the aggregate. As far as the system is concerned, these are events too because they are consumed by and affect the state of other services. However, they don't affect the history of the aggregate in this service and are not needed to rebuild it.
How should I handle these in the code?
Option 1-
Don't write side-effecting events to the event log. Publish these in the main process prior to Step 5.
Option 2-
Write everything to the event log and ignore side-effecting events when the history is loaded. (These aren't part of the history!)
Option 3-
Write side-effecting events to a dummy aggregate so they are published, but never loaded.
Option 4-
?
In the first option, there may be trouble if there is a concurrency violation. If the write fails in Step 5, the side effect cannot be easily rolled back. The second option write events that are not part of the aggregate's history. When loading in Step 2, these side-effecting events would have to be ignored. The 3rd option feels like a hack.
Which of these seems right to you?
Name events correctly
Events are "things that happened". So if you are able to name the events that only trigger side effects in a "X happened" fashion, they become a natural part of the event history.
In my experience, this is always possible, because side-effects don't happen out of thin air. Sometimes the name becomes a bit artificial, but it is still better to name events that way than to call them e.g. "send email to that client event".
In terms of your list of alternatives, this would be option 2.
Example
Instead of calling an event "send status email to customer event", call it "status email triggered event". Of course, if there is a better name for the actual trigger, use that one :-)
Option 4 - Have some other service subscribe to the events and produce the side effects, and any additional events related to them.
Events should be fine-grained.
Option 1- Don't write side-effecting events to the event log. Publish
these in the main process prior to Step 5.
What if you later need this part of the history by building a new bounded context?
Option 2- Write everything to the event log and ignore side-effecting
events when the history is loaded. (These aren't part of the history!)
How to ignore the effect of something which does not have any effect? :D
Option 3- Write side-effecting events to a dummy aggregate so they are
published, but never loaded.
Why do you need consistency boundary around something which you will never change?
What you are talking about is the most common form of domain events, which you use to communicate with other BC-s. Ofc. you need to save them.

Cancel a scheduled task

I have a Windows Delphi application that receives events, on each of these events i'd like to run a task in a parallel way (so i can be ready for the following event). There is many way to do this through omnithread library's abstractions.
The issue is that part of my code needs to be executed immediately after the reception of the event (basically to "decode" the events params), and another part needs to be executed a few seconds after only under the condition of nothing new happend for the same context.
This behaviour should respond to "only store this new value if it last longer than 3000ms, otherwise just cancel it".
So what I need would be to "cancel" a running task (the one waiting 3000ms) if a new event arrives with the same context.
I cannot use a pipeline abstraction because when the first stage ends, it automatically fills the second stage queue without asking me if i want to cancel it or not.
Is that possible?
Thank you.
Sounds like you need a Dictionary<Context, Event> where the events also carry a "created" timestamp property, and a background tread which continuously checks if there are event entries in this dictionary with elapsed time > 3000ms.
Incoming events update the timestamp and event params, until the thread detects an entry which matches the condition and then extracts the entry from the dictionary.

Multithreading Task Library, Threading.Timer or threads?

Hi we are building an application that will have the possibility to register scheduled tasks.
Each task has an time interval when it should be executed
Each task should have an timeout
The amount of tasks can be infinite but around 100 in normal cases.
So we have an list of tasks that need to be executed in intervals, which are the best solution?
I have looked at giving each task their timer and when the timer elapses the work will be started, another timer keeps tracks on the timeout so if the timeout is reached the other timer stops the thread.
This feels like we are overusing timers? Or could it work?
Another solution is to use timers for each task, but when the time elapses we are putting the task on a queue that will be read with some threads that executes the work?
Any other good solutions I should look for?
There is not too much information but it looks like that you can consider RX as well - check more at MSDN.com.
You can think about your tasks as generated events which should be composed (scheduled) in some way. So you can do the following:
Spawn cancellable tasks with Observable.GenerateWithDisposable and your own Scheduler - check more at Rx 101 Sample
Delay tasks with Observable.Delay
Wait for tasks with 'Observable.Timeout
Compose tasks in any preferable way
Once again you can check more at specified above links.
You should check out Quartz.NET.
Quartz.NET is a full-featured, open
source job scheduling system that can
be used from smallest apps to large
scale enterprise systems.
I believe you would need to implement your timeout requirement by yourself but all the plumbing needed to schedule tasks could be handled by Quartz.NET.
I have done something like this before where there were a lot of socket objects that needed periodic starts and timeouts. I used a 'TimedAction' class with 'OnStart' and 'OnTimeout' events, (socket classes etc. derived from this), and one thread that handled all the timed actions. The thread maintained a list of TimedAction instances ordered by the tick time of the next action required, (delta queue). The TimedAction objects were added to the list by queueing them to the thread input queue. The thread waitied on this input queue with a timeout, (this was Windows, so 'WaitForSingleObject' on the handle of the semaphore that managed the queue), set to the 'next action required' tick count of the first item in the list. If the queue wait timed out, the relevant action event of the first item in the list was called and the item removed from the list - the next queue wait would then be set by the new 'first item in the list', which would contain the new 'nearest action time'. If a new TimedAction arrived on the queue, the thread calculated its timeout tick time, (GetTickCount + ms interval from the object), and inserted it in the sorted list at the correct place, (yes, this sometimes meant moving a lot of objects up the list to make space).
The events called by the timeout handler thread could not take any lengthy actions in order to prevent delays to the handling of other timeouts. Typically, the event handlers would set some status enumeration, signal some synchro object or queue the TimedAction to some other P-C queue or IO completion port.
Does that make sense? It worked OK, processing thousands of timed actions in my server in a reasonably timely and efficient manner.
One enhancement I planned to make was to use multiple lists with a restricted set of timeout intervals. There were only three const timeout intervals used in my system, so I could get away with using three lists, one for each interval. This would mean that the lists would not need sorting explicitly - new TimedActions would always go to the end of their list. This would eliminate costly insertion of objects in the middle of the list/s. I never got around to doing this as my first design worked well enough and I had plenty other bugs to fix :(
Two things:
Beware 32-bit tickCount rollover.
You need a loop in the queue timeout block - there may be items on the list with exactly the same, or near-same, timeout tick count. Once the queue timeout happens, you need to remove from the list and fire the events of every object until the newly claculated timeout time is >0. I fell foul of this one. Two objects with equal timeout tick count arrived at the head of the list. One got its events fired, but the system tick count had moved on and so the calcualted timeout tick for the next object was -1: INFINITE! My server stopped working properly and eventually locked up :(
Rgds,
Martin

How to implement an asynchronous timer on a *nix system using pthreads

I have 2 questions :
Q1) Can i implement an asynchronous timer in a single threaded application i.e i want a functionality like this.
....
Timer mytimer(5,timeOutHandler)
.... //this thread is doing some other task
...
and after 5 seconds, the timeOutHandler function is invoked.
As far as i can think this cannot be done for a single threaded application(correct me if i am wrong). I don't know if it can be done using select as the demultiplexer, but even if select could be used, the event loop would require one thread ? Isn't it ?
I also want to know whether i can implement a timer(not timeout) using select.
Select only waits on set of file descriptors, but i want to have a list of timers in ascending order of their expiry timeouts and want select to tell me when the first timer expires and so on. So the question boils down to can a asynchronous timer be implemented using select/poll or some other event demultiplexer ?
Q2) Now lets come to my second question. This is my main question.
Now i am using a dedicated thread for checking timeouts i.e i have a min heap of timers(expiry times) and this thread sleeps till the first timer expires and then invokes the callback.
i.e code looks something like this
lock the mutex
check the time of the first timer
condition timed wait for that time(and wake up if some other thread inserts a timer with expiry time less than the first timer) Condition wait unlocks the lock.
After the condition wait ends we have the lock. So unlock it, remove the timer from the heap and invoke the callback function.
go to 1
I want the time complexity of such asynchronous timer. From what i see
Insertion is lg(n)
Expiry is lg(n)
Cancellation
:( this is what makes me dizzy ) the problem is that i have a min heap of timers according to their times and when i insert a timer i get a unique id. So when i need to cancel the timer, i need to provide this timer id and searching for this timer id in the heap would take in the worst case O(n)
Am i wrong ?
Can cancellation be done in O(lg n)
Please do take care of some multithreading issues. I would elaborate on what i mean by my previous sentence once i get some responses.
It's definitely possible (and usually preferable) to implement timers using a single thread, if we can assume that the thread will be spending most of its time blocking in select().
You could check out using signal() and SIGALRM to implement the functionality under POSIX, but I'd recommend against it (Unix signals are ugly hacks, and when the signal callback function runs there is very little that you can do inside it safely, since it is running asynchronously to your app thread)
Your idea about using select()'s timeout to implement your timer functionality is a good one -- that is a very common technique and it works well. Basically you keep a list of pending/upcoming events that is sorted by timestamp, and just before you call select() you subtract the current time from the first timestamp in the list, and pass in that time-delta as the timeout value to select(). (note: if the time-delta is negative, pass in zero as the timeout value!) When select() returns, you compare the current time with the time of the first item in the list; if the current time is greater than or equal to the event time, handle the timer-event, pop the first item off the head of the list, and repeat.
As for efficiency, your big-O times will depend entirely on the data structure you use to store your ordered list of timers. If you use a priority queue (or a similar ordered tree type structure) you can have O(log N) times for all of your operations. You can even go further and store the events-list in both a hash table (keyed on the event IDs) and a linked list (sorted by time stamp), and that can give you O(1) times for all operations. O(log N) is probably sufficiently efficient though, unless you plan to have a really large number of events pending at once.
man pthread_cond_timedwait
man pthread_cond_signal
If you are a windows App, you can trigger a WM_TIMER message to be sent to you at some point in the future, which will work even if your app is single threaded. However, the accuracy of the timing will not be great.
If your app runs in a constant loop (like a game, rendering at 60Hz), you can simply check each time around the loop to see if triggered events need to be called.
If you want your app to basically be interrupted, your function to be called, then execution to return to where it was, then you may be out of luck.
If you're using C#, System.Timers.Timer will do what you want. You specify an event handler method that the timer calls when it expires, which can be in the class that you invoke the timer from. Note that when the timer calls the event handler, it will do it on a separate thread, which you need to take into account if you're updating the user interface, or use its SynchronizingObject property to run it on the UI thread.

Resources