libuv uses blocking file system calls internally – Why? How? - node.js

I just learned that Node.js crown jewel libuv uses blocking system calls for file operations. The asynchronous behavior is implemented with threads! That raises two questions (I only care about Unix):
Why is it not using the non-blocking filesystem calls like it does for networking?
If there are one million outstanding file reads, it probably does not launch one million threads... What does libuv do??

Most likely to support synchronous operations such as fs.renameSync() vs fs.rename().
It uses a thread pool, as explained in the "Note" at the link you provded.
[...] but invoke these functions in a thread pool and notify watchers registered with the event loop when application interaction is required.
So, it creates a limited number of threads and reuses them as they become available.
Also, regarding the quip of "crown jewel:" Node.js and libuv aren't magic. They're good tools to have at your disposal, but certainly have their limitations.
Though, the hyperbole of "one million file reads" would be a stretch for any platform to manage without constraint.

The same non-blocking API can not be used, as O_NONBLOCK and friends don’t work on regular files! For Linux AIO is available, but it has it’s own quirks (i.e. depends on the filesystem, turns silently blocking for some operations).
I have no idea.

Related

In node js, what is libuv and does it use all core?

As far as I know, all IO requests and other asynchronous tasks are done by libuv in nodejs.
I want to know if libuv is using threading. If it is, is it using all available core or not?
First of all, what is libuv. As mentioned in the documentation, it's a multi-platform support library with a focus on asynchronous I/O.
libuv doesn't use thread for asynchronous tasks, but for those that aren't asynchronous by nature.
As an example, it doesn't use threads to deal with sockets, it uses threads to make synchronous fs calls asynchronous.
When threads are involved, libuv uses a thread pool the size of which you can change at compile-time using UV_THREADPOOL_SIZE.
node.js is provided with a precompiled version of libuv and thus a fixed UV_THREADPOOL_SIZE parameter.
It goes without saying that it has nothing to do with the number of cores of your chip.
I'm tempted to affirm that you can safely ignore the topic, for libuv and thus node.js don't use threads intensively for their purposes (unless you are using them in a really perverse way or if you are running an high number of libuv work requests).
Feel free to run an instance of node.js per core if you need as most of the users do.
The design overview section of libuv is also clear enough about this point:
The I/O (or event) loop is the central part of libuv. It establishes the content for all I/O operations, and it’s meant to be tied to a single thread. One can run multiple event loops as long as each runs in a different thread.
The libuv module has a responsibility that is relevant for some particular functions in the standard library. for SOME standard library function calls, the node C++ side and libuv decide to do expensive calculations outside of the event loop entirely.They make something called a thread pool that thread pool is a series of four threads that can be used for running computationally intensive tasks such as hashing functions.
By default libuv creates four threads in this thread pool. Thread Pool in the picture is organized by the Libuv So that means that in addition to that thread used for the event loop there are four other threads that can be used to offload expensive calculations that need to occur inside of our application. Many of the functions include in the node standard library will automatically make use of this thread pool.
Network (Network IO) is responsible for api requests, File system (File IO) is fs module. so node.js single thread delegates those heavy work to the libuv
If you have too many function calls, It will use all of the cores. CPU cores do not actually speed up the processing function calls, they just allow for some amount of concurrency inside of the work that you are doing.
From here:
A single instance of Node.js runs in a single thread. To take
advantage of multi-core systems the user will sometimes want to launch
a cluster of Node.js processes to handle the load.
The cluster module allows easy creation of child processes that all
share server ports.
Multiple processes could be better than multithreading in some cases. Some people even think theads are evil. Maybe node.js is designed in such a way to take advantage of processes better than threads.

QSerialPort - Is it possible to read() and write() on separate threads?

We have a DLL that provides an API for a USB device we make that can appear as a USB CDC com port. We actually use a custom driver on windows for best performance along with async i/o, but we have also used serial port async file i/o in the past with reasonable success as well.
Latency is very important in this API when it is communicating with our device, so we have structured our library so that when applications make API calls to execute commands on the device, those commands turn directly into writes on the API caller's thread so that there is no waiting for a context switch. The library also maintains a listening thread which is always waiting using wait objects on an async read for new responses. These responses get parsed and inserted into thread-safe queues for the API user to read at their convenience.
So basically, we do most of our writing in the API caller's thread, and all of our reading in a listening thread. I have tried porting a version of our code over to using QSerialPort instead of native serial file i/o for Windows and OSX, but I am running into an error whenever I try to write() from the caller's thread (the QSerialPort is created in the listening thread):
QObject: Cannot create children for a parent that is in a different thread.
which seems to be due to the creation of another QObject-based WriteOverlappedCompletionNotifier for the notifiers pool used by QSerialPortPrivate::startAsyncWrite().
Is the current 5.2 version of QSerialPort limited to only doing reads and writes on the same thread? This seems very unfortunate as the underlying operating systems do not have any such thread limitations for serial port file i/o. As far as I can tell, the issue mainly has to do with the fact that all of QSerialPort's notifier classes are based on QObject.
Does anyone have a good work around to this? I might try building my own QSerialPort that uses notifiers not based on QObject to see how far that gets me. The only real advantage QObject seems to be giving here is in the destruction of the notifiers when the port closes.
Minimal Impact Solution
You're free to inspect the QSerialPort and QIODevice code and see what would need to change to make the write method(s) thread-safe for access from one thread only. The notifiers don't need to be children of the QSerialPort at all, they could be added to a list of pointers that's cleaned up upon destruction.
My guess is that perhaps no other changes are necessary to the mainline code, and only mutex protection is needed for access to error state, but you'd need to confirm that. This would have lowest impact on your code.
If you care about release integrity, you should be compiling Qt yourself anyway, and you should be having it as a part of your own source code repository, too. So none of this should be any problem at all.
On the Performance
"those commands turn directly into writes on the API caller's thread so that there is no waiting for a context switch" Modern machines are multicore and multiple threads can certainly run in parallel without any context switching. The underlying issue is, though: why bother? If you need hard-realtime guarantees, you need a hard-realtime system. Otherwise, nothing in your system should care about such minuscule latency. If you're doing this only to make the GUI feel responsive, there's really no point to such overcomplication.
A Comms Thread Approach
What I do, with plenty of success, and excellent performance, is to have the communications protocol and the communications port in the same, dedicated thread, and the users in either the GUI thread, or yet other thread(s). The communications port is generally a QIODevice, like QTcpSocket, QSerialPort, QLocalSocket, etc. Since the communications protocol object is "just" a QObject, it can also live, with the port, in the GUI thread for demostration purposes - it's designed fully asynchronously anyway, and doesn't block for anything but most trivial of computations.
The communications protocol is queuing multiple requests for execution. Even on a single-core machine, once the GUI thread is done submitting all of the requests, the further execution is all in the communications thread.
The QSerialPort implementation uses asynchronous OS APIs. There's little to no benefit to further processing those async replies on separate threads. Those operations have very low overhead and you will not gain anything measurable in your latency by trying to do so. Remember: this is not your code, but merely code that pushes bytes between buffers. Yes, the context switch overhead may be there on heavily loaded or single-core systems, but unless you can measure the difference between its presence and absence, you're fighting imaginary problems.
It is possible to use any QObject from multiple threads, of course, as long as you serialize the access to it via the event queue mutex. This is done for you whenever you use the QMetaObject::invokeMethod or signal-slot connections.
So, add a trivial wrapper around QSerialPort that exposes the write as a thread-safe method. Internally, it should use a signal-slot connection. You can call this thread-safe write from any thread. The overhead in such a call is a mutex lock and 2+n malloc/free calls, where n is the non-zero number of arguments.
In your wrapper, you can also process the readyRead signal, and emit a signal with received data. That signal can be processed by a QObject living in another thread.
Overall, if you do the measurements correctly, and if your port thread's implementation is correct, you should find no benefit whatsoever to all this complication.
If your communications protocol does heavy data processing, this should be factored out. It could go into a separate QObject that can then run on its own thread. Or, it can be simply done using dedicated functors that are executed by QtConcurrent::run.
What if you use QSerialPort to open and configure the serial port, and QSocketNotifier to monitor for read activity (and other QSocketNotifier instances for write completion and error handling, if necessary)?
QSerialPort::handle should give you the file descriptor you need. On Windows, if that function returns a Windows HANDLE, you can use _open_osfhandle to get a file descriptor.
As a follow up, shortly after this discussion I did implement my own thread-safe serial port code for POSIX systems using select() and the like and it is working well on multiple threads in conjunction with Qt and non-Qt applications alike. Basically, I have abandoned using QtSerialPort at all.

multithread boost-asio server (vs boost async server tutorial)

I'm following the boost-asio tutorial and don't know how to make a multi-threaded server using boost. I've compiled and tested the daytime client and daytime synchronous server and improved the communication (server asks the client for a command, processes it and then returns the result to the client). But this server can handle only one client at one time.
I would like to use boost to make a multi-threaded server. There is also daytime asynchronous server which executes
boost::asio::io_service io_service;
tcp_server server(io_service);
io_service.run();
in the main program function. The question is - is boost creating a thread for each client somewhere inside? Is this a multi-threaded solution? If not - how to make a multi-threaded server with boost? Thanks for any advice.
have a look at this tutorial. in short terms:
io_service.run() in multiple threads gives a thread pool
multiple io_services give completely separated threads
You don't need to explicitly work with threads when you want to support multiple clients. But for that you should use asynchronous calls (as opposed to synchronous, which are used in the tutorials you listed). Have a look at the asynchronous echo tcp server example, it serves multiple clients without using threads.
is boost creating a thread for each client somewhere inside?
When working with asynchronous calls, boost asio is doing these things behind the scenes. It could use threads, but it usually doesn't because there are other, preferred mechanisms for working with multiple sockets at once. For example on linux you have epoll, select and poll (in order of preference). I'm not sure what the situation is on windows, there might be other mechanisms or the preference order might be different. But in any case, boost asio takes care of this, chooses the best mechanism there is for your platform and hides it behind those asynchronous calls.

Is there an use case for non-blocking receive when I have threads?

I know non-blocking receive is not used as much in message passing, but still some intuition tells me, it is needed. Take for example GUI event driven applications, you need some way to wait for a message in a non-blocking way, so your program can execute some computations. One of the ways to solve this is to have a special thread with message queue. Is there some use case, where you would really need non-blocking receive even if you have threads?
Threads work differently than non-blocking asynchronous operations, although you can usually achieve the same effect by having threads that does synchronous operations. However, in the end, it boils down on how to handle doing things more efficiently.
Threads are limited resources, and should be used to process long running, active operations. If you have something that is not really active doing things, but need to wait idly for some time for the result (think some I/O operation over the network like calling web services or database servers), then it is better to use the provided asynchronous alternative for it instead of wasting threads unnecessarily by putting the synchronous call on another thread.
You can have a good read on this issue here for more understanding.
One thread per connection is often not a good idea (wasted memory, not all OS are very good with huge thread counts, etc)
How do you interrupt the blocking receive call? On Linux, for example (and probably on some other POSIX OS) pthreads + signals = disaster. With a non-blocking receive you can multiplex your wait on the receiving socket and some kind of IPC socket used to communicate between your threads. Also maps to the Windows world relatively easily.
If you need to replace your regular socket with something more complex (e.g. OpenSSL) relying on the blocking behavior can get you in trouble. OpenSSL, for example, can get deadlocked on a blocking socket, because SSL protocol has sender/receive inversion scenarios where receive can not proceed before some sending is done.
My experience has been -- "when in doubt use non-blocking sockets".
With blocking IO, it's challenging on many platforms to get your application to do a best effort orderly shutdown in the face of slow, hung, or disconnected clients/services.
With non-blocking IO, you can kill the in-flight operation as soon as the system call returns, which is immediately. If your code is written with premature termination in mind - which is comparatively simple with non-blocking IO - this can allow you to clean up your saved state gracefully.
I can't think of any, but sometimes the non-blocking APIs are designed in a way that makes them easier/more intuitive to use than an explicitly multi-threaded implementation.
Here goes a real situation I have faced recently. Formerly I had a script that would run every hour, managed by crontab, but sometimes users would log to the machine and run the script manually. This had some problems, for example concurrent execution by crontab and user could cause problems, and sometimes users would log in as root - I know, bad pattern, not under my control - and run script with wrong permissions. So we decided to have the routine running as daemon, with proper permissions, and the command users were used to run would now just trigger the daemon.
So, this user executed command would basically do two things: trigger the daemon and wait for it to finish the task. But it also needed a timeout and to keep dumping daemon logs to user while waiting.
If I understand the situation you proposed, I had the case you want: I needed to keep listening from daemon while still interacting with user independently. The solution was asynchronous read.
Lucky for me, I didn't think about using threads. I probably would have thought so if I were coding in Java, but this was Python code.
My point is, that when we consider threads and messaging being perfect, the real trade-off is about writing scheduler for planning the non-blocking receive operations and writing synchronizations codefor threads with shared state (locks etc.). I would say, that both can be sometime easy and sometime hard. So an use case would be when there are many messages asynchronous messages to be received and when there is much data to be operated on based on the messages. This would be quite easy in one thread using non-blocking receive and would ask for much synchronization with many threads and shared state.... I am also thinking about some real life example, I will include it probably later.

What Use are Threads Outside of Parallel Problems on MultiCore Systems?

Threads make the design, implementation and debugging of a program significantly more difficult.
Yet many people seem to think that every task in a program that can be threaded should be threaded, even on a single core system.
I can understand threading something like an MPEG2 decoder that's going to run on a multicore cpu ( which I've done ), but what can justify the significant development costs threading entails when you're talking about a single core system or even a multicore system if your task doesn't gain significant performance from a parallel implementation?
Or more succinctly, what kinds of non-performance related problems justify threading?
Edit
Well I just ran across one instance that's not CPU limited but threads make a big difference:
TCP, HTTP and the Multi-Threading Sweet Spot
Multiple threads are pretty useful when trying to max out your bandwidth to another peer over a high latency network connection. Non-blocking I/O would use significantly less local CPU resources, but would be much more difficult to design and implement.
Performing a CPU intensive task without blocking the user interface, for example.
Any application in which you may be waiting around for a resource (for example, blocking I/O from network sockets or disk devices) can benefit from threading.
In that case the thread blocking on the slow operation can be put to sleep while other threads continue to run (including, under some operating systems, the GUI thread which, if the OS cannot contact it for a while, will offer the use the chance to destroy it, thinking it's deadlocked somehow).
So it's not just for multi-core machines at all.
An interesting example is a webserver - you need to be able to handle multiple incoming connections that have nothing to do with each other.
what kinds of non-performance related
problems justify threading?
Web applications are the classic example. Each user request is conceptually a new thread. Nothing to do with performance, it's just a natural fit for the design.
Blocking code is usually much simpler to write and easier to read (and therefore maintain) than non-blocking code. Yet, using blocking code limits you to a single execution path and also locks out things like user interface (mentioned) and other IO ports. Threading is an elegant solution in these cases.
Another case when multithreading is to be considered is when you have several near-synchronous IO channels that should be managed: using multiple threads (and usually a local message queue) allows for much clearer code.
Here are a couple of specific and simple scenarios where I have launched threads...
A long running report request by the user. When the report is submitted, it is placed in a queue to be processed by a separate thread. The user can then go on within the application and check back later to see the status of their report, they aren't left with a "Processing..." page or icon.
A thread that iterates cache storage, removing data that has expired or no longer needed. The thread's job within the application is independent of any processing for a specific user, but part of the overall application run-time maintenance.
although, not specifically a threading scenario, logging within our web site is handed off to a parallel process, so the throughput of the web site isn't hindered by the time it takes to record log data.
I agree that threading just for threadings sake isn't a good idea and it can introduce problems within your application if isn't done properly, but it is an extremely useful tool for solving some problems.
Whenever you need to call some external component (be it a database query, a 3. party library, an operating system primitive etc.) that only provides a synchronous/blocking interface or using the asynchronous interface not worth the extra trouble and pain - and you also need some form of concurrency - e.g. serving multiple clients in a server or keep the GUI still responsive.
Well, how do you know if you're app is going to run on a multi-core system or not?
Beyond that, there are a lot of processes that take up time, but don't require the CPU. Such as writing to a disk or networking. Who wants to push a button in a GUI and then have to sit there and wait for a network connection. Even on a single core machine, having a separate IO thread greatly improves user experience. You always at least want a separate thread for the UI.
Yet many people seem to think that
every task in a program that can be
threaded should be threaded, even on a
single core system.
"Many people"... Who?
Also from my experience many many programs that should be multithreaded aren't (especially games.. I have an i7 and yet most games still use only 1 of my cores), so I'm not sure what you're talking about. Definitely programs like calc.exe are not multithread (or, if they are, 1 thread does 99% of the work).
Performing a CPU intensive task
without blocking the user interface,
for example.
Yes, this is true but this is fairly easy to implement and it's not what the OP is referring to (since, in this case, 1 thread does almost all the work and you only need very few mutexes)

Resources