Node - Set maximum time per asynchronous callback

Node - Set maximum time per asynchronous callback - node.js

I would like to be able to tune the Node event loop to abort or throw an exception if ever a piece of code listening for an event takes too long to execute.
Using the Async Hooks API, it is possible to monitor the time that a callback runs for. However, I cannot find a way to take control in any way.
Ideally I would like to be able to tune Node so that no synchronous code runs for too long. Code running for too long will cause the application to become blocked and unresponsive. (see example). So if I could set a limit to how long any callback is allowed to run for, that would be good.
From my research I think that what I want is not possible, but I would love to be wrong :)
EDIT: Any documentation about what tuning options are available would close this issue. So far I have only heard of garbage collection and heap size.

Node.js is single-threaded by design. For this reason it's not a good choice for CPU-intensive apps in the first place.
It's generally unsafe to interrupt a running thread asynchronously, regardless of the programming language, and JavaScript is no exception. A thread might be in a state when it locked some resources, allocated memory, is modifying global data, etc. so interrupting it can lead to deadlock, memory leaks, data corruption, etc.
A process that can be safely interrupted needs to implement that explicitly, for example by periodically checking for some flag and then stopping in a safe manner. Games often use time-based compute loops in which the elapsed time is checked periodically and the loop exits when it reaches e.g. 20ms, for the process to be continued later during the next invocation.
When possible, break down a long-running computation into a series of shorter steps/callbacks.

Related

Async and scheduling - how do libraries avoid blocking at the lowest level?

I've been using various concurrency constructs for a while now without much consideration for how all the magic happens, which has recently made me increasingly uneasy.
In an attempt to remedy this ... feeling, I have been reading up on how async works under the hood. When I say async, in this case I'm referring to userland / greenthread / cooperative multitasking, although I assume some of the concepts will also apply to traditional OS managed threads insofar as a scheduler and workers are involved.
I see how a worker can suspend itself and let other workers execute, but at the lowest level in non-blocking library code, how does the scheduler know when a previously suspended worker's job is done and to wake up that worker?
For example if you fire up a worker in some sort of async block and perform an operation that would normally block (e.g. HTTP request, SQL query, other I/O), then even though your calling code is async, that operation (library code) better play nice with your async framework or you've effectively defeated the purpose of using it and blocked your scheduler from calling other waiting operations (the, What Color is Your Function problem) while it waits for your blocking call, which was executed inside your non-blocking calling code, to complete.
So now we've got async code calling other async library code, and now I'm asking myself the question all over again - how does the async library code know when to suspend and resume operation?
The idea of firing off a HTTP request, moving on, and returning later to check for results is weird to think about for me - not conceptually but from an implementation standpoint.
How do you perform a partial operation, e.g. sending TCP packets and then continuing with the rest of the program execution, only to come back later and check if results have been delivered. Delivered to what? A socket?
Now we're another layer deep and you are using socket selects to avoid creating threads and blocking, but, again...
how do those sockets start an operation, move on before completion, and then how does select know when data is available?
Are you continually checking some buffer to see if bytes have been delivered in an infinite loop and moving on if not?
Anyhow - I think you see where I'm going here....
I focused mainly on HTTP as a motivating example, but the same question applies for any normally blocking operations - how does it all work at the bottom?
Here are some of the resources I found helpful while researching the topic and which informed this question:
David Beazley's excellent video Build Your Own Async where he walks you through a simple implementation of a scheduler which fire callbacks and suspend execution by sleeping on a waiting queue. I found this video tremendously instructive, but it stops a bit short in that it shows you how using an async sleep frees up the scheduler to execute other workers, but doesn't really go into what would happen when you call code in those workers that itself must be non-blocking so it plays nice with the scheduler.
How does non-blocking IO work under the hood - This got me further along in my understanding, but still left with a few uncertainties.

Can code running in a background thread be faster than in the main VCL thread in Delphi?

If anybody has had a lot of experience timing code running on the main VCL thread vs a background thread, I'd like to get an opinion. I have some code that does some heavy string processing running in my Delphi 6 application on the main thread. Each time I run an operation, the time for each operation hovers around 50 ms on a single thread on my i5 Quad core. What makes me really suspicious is that the same code running on an old Pentium 4 that I have, shows the same time for the operation when usually I see code running about 4 times slower on the Pentium 4 than the Quad Core. I am beginning to wonder if the code might be consuming significantly less time than 50 ms but that there's something about the main VCL thread, perhaps Windows message handling or executing Windows API calls, that is creating an artificial "floor" for the operation. Note, an operation is triggered by an incoming request on a socket if that matters, but the time measurement does not take place until the data is fully received.
Before I undertake the work of moving all the code on to a background thread for testing, I am wondering if anyone has any general knowledge in this area? What have your experiences been with code running on and off the main VCL thread? Note, the timing measurements are being done when there is absolutely no user triggered activity going on during the tests.
I'm also wondering if raising the priority of the thread to just below real-time would do any good. I've never seen much improvement in my run times when experimenting with those flags.
-- roschler

Given all threads have the same priority, as they normally do, there can't be a difference, for the following reasons. If you're seeing a difference, re-evaluate the code (make sure you run the same thing in both VCL and background threads) and make sure you time it properly:
The compiler generates the exact same code, it doesn't care if the code is going to run in the main thread or in a background thread. In fact you can put the whole code in a procedure and call that from both your worker thread's Execute() and from the main VCL thread.
For the CPU all cores, and all threads, are equal. Unless it's actually a Hyper Threading CPU, where not all cores are real, but then see the next bullet.
Even if not all CPU cores are equal, your thread will very unlikely run on the same core, the operating system is free to move it around at will (and does actually schedule your thread to run on different cores at different times).
Messaging overhead doesn't matter for the main VCL thread, because unless you're calling Application.ProcessMessages() manually, the message pump is simply stopped while your procedure does it's work. The message pump is passive, your thread needs to request messages from the queue, but since the thread is busy doing your work, it's not requesting any messages so no overhead there.
There's just one place where threads are not equal, and this can change the perceived speed of execution: It's the operating system that schedules threads to execution units (cores), and for the operating system threads have different priorities. You can tell the OS a certain thread needs to be treated differently using the SetThreadPriority() API (which is used by the TThread.Priority property).

Without simple source code to reproduce the issue, and how you are timing your threads, it will be difficult to understand what occurs in your software.
Sounds definitively like either:
An Architecture issue - how are your threads defined?
A measurement issue - how are you timing your threads?
A typical scaling issue of both the memory manager and the RTL string-related implementation.
About the latest point, consider this:
The current memory manager (FastMM4) is not scaling well on multi-core CPU; try with a per-thread memory manager, like our experimental SynScaleMM - note e.g. that the Free Pascal Compiler team has written a new scaling MM from scratch recently, to avoid such issue;
Try changing the string process implementation to avoid memory allocation (use static buffers), and string reference-counting (every string reference counting access produces a LOCK DEC/INC which do not scale so well on multi-code CPU - use per-thread char-level process, using e.g. PChar on static buffers instead of string).
I'm sure that without string operations, you'll find that all threads are equivalent.
In short: neither the current Delphi MM, neither the current string implementation scales well on multi-core CPU. You just found out a known issue of the current RTL. Read this SO question.

When your code has control of the VCL thread, for instance if it is in one method and doesn't call out to any VCL controls or call Application.ProcessMessages, then the run time will not be affected just because it's in the main VCL thread.
There is no overhead, since you "own" the whole processing power of the thread when you are in your own code.
I would suggest that you use a profiling tool to find where the actual bottleneck is.

Performance can't be assessed statically. For that you need to get AQTime, or some other performance profiler for Delphi. I use AQtime, and I love it, but I'm aware it's considered expensive.
Your code will not magically get faster just because you moved it to a background thread. If anything, your all-inclusive-time until you see results in your UI might get a little slower, if you have to send a lot of data from the background thread to the foreground thread via some synchronization mechanisms.
If however you could execute parts of your algorithm in parallel, that is, split your work so that you have 2 or more worker threads processing your data, and you have a quad core processor, then your total time to do a fixed load of work, could decrease. That doesn't mean the code would run any faster, but depending on a lot of factors, you might achieve a slight benefit from multithreading, up to the number of cores in your computer. It's never ever going to be a 2x performance boost, to use two threads instead of one, but you might get 20%-40% better performance, in your more-than-one-threaded parallel solutions, depending on how scalable your heap is under multithreaded loads, and how IO/memory/cache bound your workload is.
As for raising thread priorities, generally all you will do there is upset the delicate balance of your Windows system's performance. By raising the priorities you will achieve (sometimes) a nominal, but unrepeatable and non-guaranteeable increase in performance. Depending on the other things you do in your code, and your data sources, playing with priorities of threads can introduce subtle problems. See Dining Philosophers problem for more.
Your best bet for optimizing the speed of string operations is to first test it and find out exactly where it is using most of its time. Is it heap operations? Memory Copy and move operations? Without a profiler, even with advice from other people, you will still be comitting a cardinal sin of programming; premature optimization. Be results oriented. Be science based. Measure. Understand. Then decide.
Having said that, I've seen a lot of horrible code in my time, and there is one killer thing that people do that totally kills their threaded app performance; Using TThread.Synchronize too much.
Here's a pathological (Extreme) case, that sadly, occurs in the wild fairly frequently:
procedure TMyThread.Execute;
begin
while not Terminated do
Synchronize(DoWork);
end;
The problem here is that 100% of the work is really done in the foreground, other than the "if terminated" check, which executes in the thread context. To make the above code even worse, add a non-interruptible sleep.
For fast background thread code, use Synchronize sparingly or not at all, and make sure the code it calls is simple and executes quickly, or better yet, use TThread.Queue or PostMessage if you could really live with queueing main thread activity.

Are you asking if a background thread would be faster? If your background thread would run the same code as the main thread and there's nothing else going on in the main thread, you don't stand to gain anything with a background thread. Threads should be used to split and distribute processing loads that would otherwise contend with one another and/or block one another when running in the main thread. Since you seem to be dealing with a case where your main thread is otherwise idle, simply spawning a thread to run slow code will not help.
Threads aren't magic, they can't speed up slow code or eliminate processing bottlenecks in a particular segment not related to contention on the main thread. Make sure your code isn't doing something you don't know about and that your timing methodology is correct.
My first hunch would be that your interaction with the socket is affecting your timing in a way you haven't detected... (I know you said you're sure that's not involved - but maybe check again...)

Pseudo real time threading

So I have built a small application that has a physics engine and a display. The display is attached to a controller which handles the physics engine(well, actually a view model that handles the controller, but details).
Currently the controller is a delegate that gets activated by a begin-invoke and deactivated by a cancellation token, and then reaped by an endinvoke. Inside the lambda brushes PropertyChanged(hooked into INotifyPropertyChanged) which keeps the UI up to date.
From what I understand the BeginInvoke method activates a task rather than another thread(which on my computers does activate another thread, but this isn't a guarantee from the reading I have done,it's up to the thread pool how it wants to get the task completed), which is fine from all the testing I have done. The lambda doesn't complete until a CancellationToken is killed. It has a sleep and an update(so it is sort of simulating a real-time physics engine...it's crude, but I don't need real precision on the real time, just enough to get a feel)
The question I have is, will this work on other computers, or should I switch over to explicit threads that I start and cancel? The scenario I am thinking of is on a 1 core processor, is it possible the second task will get massively less processor time and thereby make my acceptably inaccurate model into something unacceptably inaccurate(i.e. waiting for milliseconds before switching rather than microseconds?). Or is their some better way of doing this that I haven't come up with?

In my experience, using the threadpool in the way you described will pretty much guarantee reasonably optimal performance on most computers, without you having to go to the trouble to figure out how to divvy up the threads.
A thread is not the same thing as a core; you will still get multiple threads on a single-core machine, and those threads will each take part of the processing load. You won't get the "deadlock" condition you describe, unless you do something unusual with the threads, like give one of them real-time priority.
That said, microseconds is not a lot of time for context switching between threads, so YMMV. You'll have to try it, and see how well it works; there may be some tweaking required.

Thread vs async execution. What's different?

I believed any kind of asynchronous execution makes a thread in invisible area. But if so,
Async codes does not offer any performance gain than threaded codes.
But I can't understand why so many developers are making many features async form.
Could you explain about difference and cost of them?

The purpose of an asynchronous execution is to prevent the code calling the asynchronous method (the foreground code) from being blocked. This allows your foreground code to go on doing useful work while the asynchronous thread is performing your requested work in the background. Without asynchronous execution, the foreground code must wait until the background task is completed before it can continue executing.
The cost of an asynchronous execution is the same as that of any other task running on a thread.
Typically, an async result object is registered with the foreground code. The async result object can either raise an event when the background task is completed, or the foreground code can periodically check the async result object to see if its completion flag has been set.

Concurrency does not necessarily require threads.
In Linux, for example, you can perform non-blocking syscalls. Using this type of calls, you can for instance start a number of network reads. Your code can keep track of the reads manually (using handles in a list or similar) and periodically ask the OS if new data is available on any of the connections. Internally, the OS also keeps a list of ongoing reads. Using this technique, you can thus achieve concurrency without any (extra) threads, neither in your program nor in the OS.
If you use threads and blocking IO, you would typically start one thread per read. In this scenario, the OS will instead have a list of ongoing threads, which it parks when the tread tries to read data when there is none available. Threads are resumed as data becomes available.
Having the OS switch between threads might involve slightly more overhead in the form of context switching - switching program counter and register content. But the real deal breaker is usually stack allocation per thread. This size is a couple of megabytes by default on Linux. If you have a lot of concurrency in your program, this might push you in the direction of using non-blocking calls to handle more concurrency per thread.
So it is possible to do async programming without threads. If you want to do async programming using only blocking OS-calls you need to dedicate a thread to do the blocking while you continue. But if you use non-blocking calls you can do a lot of concurrent things with just a single thread. Have a look at Node.js, which have great support for many concurrent connections while being single-threaded for most operations.
Also check out Golang, which achieve a similar effect using a sort of green threads called goroutines. Multiple goroutines run concurrently on the same OS thread and they are restrictive in stack memory, pushing the limit much further.

Async codes does not offer any performance gain than threaded codes.
Asynchornous execution is one of the traits of multi-threaded execution, which is becoming more relevant as processors are packing in more cores.
For servers, multi-core only mildly relevant, as they are already written with concurrency in mind and will scale natrually, but multi-core is particularly relevant for desktop apps, which traditionally do only a few things concurrently - often just one foreground task with a background thread. Now, they have to be coded to do many things concurrently if they are to take advantage of the power of the multi-core cpu.
As to the performance - on single-core - the asynchornous tasks slow down the system as much as they would if run sequentially (this a simplication, but true for the most part.) So, running task A, which takes 10s and task B which takes 5s on a single core, the total time needed will be 15s, if B is run asynchronously or not. The reason is, is that as B runs, it takes away cpu resources from A - A and B compete for the same cpu.
With a multi-core machine, additional tasks run on otherwise unused cores, and so the situation is different - the additional tasks don't really consume any time - or more correctly, they don't take away time from the core running task A. So, runing tasks A and B asynchronously on multi-core will conume just 10s - not 15s as with single core. B's execution runs at the same time as A, and on a separate core, so A's execution time is unaffected.
As the number of tasks and cores increase, then the potential improvements in performance also increase. In parallel computing, exploiting parallelism to produce an improvement in performance is known as speedup.
we are already seeing 64-core cpus, and it's esimated that we will have 1024 cores commonplace in a few years. That's a potential speedup of 1024 times, compared to the single-threaded synchronous case. So, to answer your question, there clearly is a performance gain to be had by using asynchronous execution.

I believed any kind of asynchronous execution makes a thread in invisible area.
This is your problem - this actually isn't true.
The thing is, your whole computer is actually massively asynchronous - requests to RAM, communication via a network card, accessing a HDD... those are all inherently asynchronous operations.
Modern OSes are actually built around asynchronous I/O. Even when you do a synchronous file request, for example (e.g. File.ReadAllText), the OS sends an asynchronous request. However, instead of giving control back to your code, it blocks while it waits for the response to the asynchronous request. And this is where proper asynchronous code comes in - instead of waiting for the response, you give the request a callback - a function to execute when the response comes back.
For the duration of the asynchronous request, there is no thread. The whole thing happens on a completely different level - say, the request is sent to the firmware on your NIC, and given a DMA address to fill the response. When the NIC finishes your request, it fills the memory, and signals an interrupt to the processor. The OS kernel handles the interrupt by signalling the owner application (usually an IOCP "channel") the request is done. This is still all done with no thread whatsoever - only for a short time right at the end, a thread is borrowed (in .NET this is from the IOCP thread pool) to execute the callback.
So, imagine a simple scenario. You need to send 100 simultaneous requests to a database engine. With multi-threading, you would spin up a new thread for each of those requests. That means a hundred threads, a hundread thread stacks, the cost of starting a new thread itself (starting a new thread is cheap - starting a hundred at the same time, not so much), quite a bit of resources. And those threads would just... block. Do nothing. When the response comes, the threads are awakened, one after another, and eventually disposed.
On the other hand, with asynchronous I/O, you can simply post all the requests from a single thread - and register a callback when each of those is finished. A hundred simultaneous requests will cost you just your original thread (which is free for other work as soon as the requests are posted), and a short time with threads from the thread pool when the requests are finished - in "worst" case scenario, about as many threads as you have CPU cores. Provided you don't use blocking code in the callback, of course :)
This doesn't necessarily mean that asynchronous code is automatically more efficient. If you only need a single request, and you can't do anything until you get a response, there's little point in making the request asynchronous. But most of the time, that's not your actual scenario - for example, you need to maintain a GUI in the meantime, or you need to make simultaneous requests, or your whole code is callback-based, rather than being written synchronously (a typical .NET Windows Forms application is mostly event-based).
The real benefit from asynchronous code comes from exactly that - simplified non-blocking UI code (no more "(Not Responding)" warnings from the window manager), and massively improved parallelism. If you have a web server that handles a thousand requests simultaneously, you don't want to waste 1 GiB of address space just for the completely unnecessary thread stacks (especially on a 32-bit system) - you only use threads when you have something to do.
So, in the end, asynchronous code makes UI and server code much simpler. In some cases, mostly with servers, it can also make it much more efficient. The efficiency improvements come precisely from the fact that there is no thread during the execution of the asynchronous request.
Your comment only applies to one specific kind of asynchronous code - multi-threaded parallelism. In that case, you really are wasting a thread while executing a request. However, that's not what people mean when saying "my library offers an asynchronous API" - after all, that's a 100% worthless API; you could have just called await Task.Run(TheirAPIMethod) and gotten the exact same thing.

Threads or asynch?

How do you make your application multithreaded ?
Do you use asynch functions ?
or do you spawn a new thread ?
I think that asynch functions are already spawning a thread so if your job is doing just some file reading, being lazy and just spawning your job on a thread would just "waste" ressources...
So is there some kind of design when using thread or asynch functions ?

If you are talking about .Net, then don't forget the ThreadPool. The thread pool is also what asynch functions often use. Spawning to much threads can actually hurt your performance. A thread pool is designed to spawn just enough threads to do the work the fastest. So do use a thread pool instead of spwaning your own threads, unless the thread pool doesn't meet your needs.
PS: And keep an eye out on the Parallel Extensions from Microsoft

Spawning threads is only going to waste resources if you start spawning tons of them, one or two extra threads isn't going to effect the platforms proformance, infact System currently has over 70 threads for me, and msn is using 32 (I really have no idea how a messenger can use that many threads, exspecialy when its minimised and not really doing anything...)
Useualy a good time to spawn a thread is when something will take a long time, but you need to keep doing something else.
eg say a calculation will take 30 seconds. The best thing to do is spawn a new thread for the calculation, so that you can continue to update the screen, and handle any user input because users will hate it if your app freezes untill its finished doing the calculation.
On the other hand, creating threads to do something that can be done almost instantly is nearly pointless, since the overhead of creating (or even just passing work to an existing thread using a thread pool) will be higher than just doing the job in the first place.
Sometimes you can break your app into a couple of seprate parts which run in their own threads. For example in games the updates/physics etc may be one thread, while grahpics are another, sound/music is a third, and networking is another. The problem here is you really have to think about how these parts will interact or else you may have worse proformance, bugs that happen seemingly "randomly", or it may even deadlock.

I'll second Fire Lancer's answer - creating your own threads is an excellent way to process big tasks or to handle a task that would otherwise be "blocking" to the rest of synchronous app, but you have to have a clear understanding of the problem that you must solve and develope in a way that clearly defines the task of a thread, and limits the scope of what it does.
For an example I recently worked on - a Java console app runs periodically to capture data by essentially screen-scraping urls, parsing the document with DOM, extracting data and storing it in a database.
As a single threaded application, it, as you would expect, took an age, averaging around 1 url a second for a 50kb page. Not too bad, but when you scale out to needing to processes thousands of urls in a batch, it's no good.
Profiling the app showed that most of the time the active thread was idle - it was waiting for I/O operations - opening of a socket to the remote URL, opening a connection to the database etc. It's this sort of situation that can easily be improved with multithreading. Rewriting to be multi-threaded and with just 5 threads instead of one, even on a single core cpu, gave an increase in throughput of over 20 times.
In this example, each "worker" thread was explicitly limited to what it did - open the remote a remote url, parse the data, store it in the db. All the "high level" processing - generating the list of urls to parse, working out which next, handling errors, all remained with the control of the main thread.

The use of threads makes you think more about the way your application needs threading and can in the long run make it easier to improve / control your performance.
Async methods are faster to use but they are a bit magic - a lot of things happen to make them possible - so it's probable that at some point you will need something that they can't give you. Then you can try and roll some custom threading code.
It all depends on your needs.

The answer is "it depends".
It depends on what you're trying to achieve. I'm going to assume that you're aiming for more performance.
The simplest solution is to find another way to improve your performance. Run a profiler. Look for hot spots. Reduce unnecessary IO.
The next solution is to break your program into multiple processes, each of which can run in their own address space. This is easiest because there is no chance of the individual processes messing each other up.
The next solution is to use threads. At this point you're opening a major can of worms, so start small, and only multi-thread the critical path of the code.
The next solution is to use asynch IO. Generally only recommended for people writing some of very heavily loaded server, and even then I would rather re-use one of the existing frameworks that abstract away the details e.g. the C++ framework ICE, or an EJB server under java.
Note that each of these solutions has multiple sub-solutions - there are different breeds of threads and different kinds of asynch IO, each with slightly different performance characteristics, but again, it's generally best to let the framework handle it for you.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string