What is the performance of piping a stream in node.js? - node.js

I have a TCP server written in Node.js. When a socket is received on its server socket, the process passes off the socket to a process pool. It either forks or reuses a previously forked process. It then passes the received socket to that other process using ChildProcess.send(). This gives complete control of the socket to the child process.
I am considering taking a different approach, but I'm concerned about the potential performance trade-offs. I would like instead to pipe the socket to the child process either through stdin or a unix domain socket or maybe a pipe. There are a number of reasons why this approach would be preferable in my particular domain, but I won't belabor this question with those details.
So I am left to wonder about the performance characteristics of the pipe() method on a Node.js stream. Is the piping of the stream handled at the system level, or does Node.js have to read every byte from one stream and send it down the destination? There are a few system calls (i.e. splice()) that provide some level of zero-copy streaming of file descriptors. Does Node.js use some sort of mechanism like that or is it manual?

I recommend having a read of this blog post, which highlights how
when it comes to streams, things are only as fast as the slowest
stream in the workflow
Also, have a read of this answer explaining how to benchmark streams in node.

Related

In Node.js, is writing to a TCP connection blocking?

I have a node.js process which has several entry points, including a tcp server, websocket server, and named pipe server. I am wondering if any interactions with these connections will be blocking.
Example: for a given connection, if there isnt anything in the buffer because the client didnt send anything yet, will this block all other code from running in the Node.js process until the client sends data?
My understanding is that node will offload I/O operations like these to the system kernel, so it wouldnt hold up the call stack.
Most likely I am getting something wrong here so please let me know! Thank you.
This is a very interesting question!
I would recommend you to start by understanding what the event loop is (reading https://nodejs.org/en/docs/guides/event-loop-timers-and-nexttick/) and then understanding the difference between blocking & non-blocking calls (reading https://nodejs.org/en/docs/guides/blocking-vs-non-blocking/).
Now we'll know a bit more about how node works behind the scenes, what blocking and non-blocking operations are and therefore we're equipped to understand and spot what will or won't block our loop.
Will a TCP connection block it? There may be a module out there that will, it really depends on each case, library, implementation.
Regarding TCP on the "native" implementation, if you're using the node.js Net module you'll find that it is a:
module [that] provides an asynchronous network API for creating stream-based TCP or IPC servers
Therefore, in principle, it will be non-blocking.
As an example, if we look at the socket.write documentation itself, we'll find that this function:
Returns true if the entire data was flushed successfully to the kernel buffer. Returns false if all or part of the data was queued in user memory. 'drain' will be emitted when the buffer is again free.
Therefore it should not block.
PS: Another interesting article on this subject is https://medium.com/#hnasr/when-nodejs-i-o-blocks-327f8a36fbd4
Happy reading, and keep an eye out for blocking function calls!

When to use sockets over Fifo for Client-Server IPC?

Having looked at code for sometime now, I see most coders have used sockets for IPC over Pipes (Or FIFO to be specific).
Considering that there's only one client and one server, isn't it better to use FIFO over sockets?
Please educate me in this matter.
FIFO has following benefits:
they are atomic for writing if data len is less than PIPE_BUF
splice is almost garanted working with SPLICE_F_MOVE (no user space data copy kernel will move data between pipes)
they are more easy to set up in comparison with sockets
But on the other hand it is unidirectional i.e. you most likely need 2 separate fifo:
for writing data from client to server
for writing data from server to client
Or use one fifo but reopen it as need i.e. get data on server - reopen fifo WR_ONLY on server, reopen fifo RD_ONLY on client, get data on client and do vice versa after reading data from server.
http://man7.org/linux/man-pages/man7/pipe.7.html
Socket is bi-directional, pipe/fifo is uni-directional.
Socket can be stream or datagram, pipe/fifo are always streams.
Socket and fifo do not require related processes, whereas unnamed pipes do.
Socket can handle more than one peer.

Node.JS Unbounded Concurrency / Stream backpressure over TCP

As I understand it, one of the consequences of Node's evented IO model is the inability to tell a Node process that is (for example) receiving data over a TCP socket, to block, once you've hooked up your receiving event handlers (or otherwise started listening for data).
If the receiver can't process the incoming data fast enough, "unbounded concurrency" can result, whereby node under-the-hood continues to read data off the socket as fast as it can, scheduling new data events on the event loop instead of block on the socket, until the process eventually runs out of memory and dies.
The receiver can't tell node to slow its reading, which would otherwise allow TCP's inbuilt flow control mechanisms to kick in and indicate to the sender that it needs to slow down.
Firstly, is what I've described so far accurate? Is there something I've missed that allows node to avoid this situation?
One of the much touted features of Node Streams is the automatic handling of backpressure.
AFAIK, the only way a writable stream (of a tcp socket) can tell if it needs to slow down or not is by looking at socket.bufferSize (indicating the amount of data written to the socket but not yet sent). Given that Node at the receiving end always reads as fast as it can, this can only indicate a slow network connection between sender and receiver, and NOT whether the receiver can't keep up.
So secondly, can Node Streams automatic backpressure somehow work in this situation to deal with a receiver that can't keep up?
It also seems that this problem affects browsers receiving data via websockets, for the similar reason that the websockets API doesn't provide a mechanism to tell the browser to slow its reading from the socket.
Is the only solution to this problem for Node (and browsers using websockets) to implement a manual flow control mechanism at the application level, to explicitly tell the sending process to slow down?
To answer your first question, I believe your understanding is not accurate -- at least not when piping data between streams. In fact, if you read the documentation for the pipe() function you'll see that it explicitly says that it automatically manages the flow so that "destination is not overwhelmed by a fast readable stream."
The underlying implementation of pipe() is taking care of all of the heavy lifting for you. The input stream (a Readable stream) will continue to emit data events until the output stream (a Writable stream) is full. As an aside, if I remember correctly, the stream will return false when you attempt to write data that it cannot currently process. At this point, the pipe will pause() the Readable stream, which will prevent it from emitting further data events. Thus, the event loop isn't going to fill up and exhaust your memory nor is it going to emit events that are simply lost. Instead, the Readable will stay paused until the Writable stream emits a drain event. At that point, the pipe will resume() the Readable stream.
The secret sauce is piping one stream into another, which is managing the back pressure for you automatically. This hopefully answers your second question, which is that Node can and does automatically manage this by simply piping streams.
And finally, there is really no need to implement this manually (unless you are writing a new stream from scratch) since it is already provided for you. :)
Handling all of this is not easy, as admitted on the Node blog post that announced the streams2 API in Node. It's a great resource and certainly provides much more information than I could here. One little gotcha that isn't entirely obvious that you should know however, from the docs here and for backwards compatibility reasons:
If you attach a data event listener, then it will switch the stream into flowing mode, and data will be passed to your handler as soon as it is available.
So just be aware that attaching the data event listener in an attempt to observe something in the stream will fundamentally alter the stream to the old way of doing things. Ask me how I know.

winsock application and multhreading - listening to socket event from another thread

assume we have an application which uses winsock to implement tcp communication.
for each socket we create a thread and block-receiving on it.
when data arrives, we would like to notify other threads (listening threads).
i was wondering what is the best way to implement this:
move away from this design and use a non-blocking socket, then the listening thread will have to iterate constantly and call a non-blocking receive, thus making it thread safe (no extra threads for the sockets)
use asynchronous procedure calls to notify listening threads - which again will have to alert-wait for apc to queue for them.
implement some thread safe message queue, where each socket thread will post messages to it, and the listener, again, will go over it every interval and pull data from it.
also, i read about WSAAsyncSelect, but i saw that this is used to send messages to a window. isnt there something similar for other threads? (well i guess apcs are...)
Thanks!
Use I/O completion ports. See the CreateIoCompletionPort() and the GetQueuedCompletionStatus() functions of the Win32 API (under File Management functions). In this instance, the socket descriptors are used in place of file handles.
You'll always be better off abstracting the mechanics of socket API (listening, accepting, reading & writing) in a separate layer from the application logic. Have an object that captures the state of a connection, which is created during an incoming connection and you can maintain buffers in this object for the incoming and outgoing traffic. This will allow your network interface layer to be independent of the application code. This will also make the code cleaner by separating the application functionality from the underlying communication mechanism.
Blocking or non-blocking socket decision depends on the level of scalability that your applications needs to achieve. If your application needs to support hundreds of incoming connections, adopting a thread-per-socket approach is not going to be very wise. You'll be better off going for an Io ports based implementation, which will make your app immensely scaleable at added code complexity. However, if you only foresee a few 10s of connections at any point in time, you can go for an asynchronous sockets model using Win32 events or messages. Win32 events based approach doesn't scale very well beyond a certain limit as you would have to manage multiple threads if the number of concurrent sockets exceed 63 (as WaitForMultipleObjects can only support a max of 64 sockets). Windows message based mechanism doesn't have this limitation though. OHOH, Win32 event based approach does not require a GUI window to work.
Check out WSAEventSelect along with WSAAsyncSelect API documentation in MSDN.
You might want to take a look at boost::asio package as well. It provides a neat (though a little complex) C++ abstraction over sockets API.

How does an asynchronous socket server work?

I should state that I'm not asking about specific implementation details (yet), but just a general overview of what's going on. I understand the basic concept behind a socket, and need clarification on the process as a whole. My (probably very wrong) understanding is currently this:
A socket is constantly listening for clients that want to connect (in its own thread). When a connection occurs, an event is raised that spawns another thread to perform the connection process. During the connection process the client is assigned it's own socket in which to communicate with the server. The server then waits for data from the client and when data arrives an event is raised which spawns a thread to read the data from a stream into a buffer.
My questions are:
How off is my understanding?
Does each client socket require it's own thread to listen for data on?
How is data routed to the correct client socket? Is this something taken care of by the guts of TCP/UDP/kernel?
In this threaded environment, what kind of data is typically being shared, and what are the points of contention?
Any clarifications and additional explanation would be greatly appreciated.
EDIT:
Regarding the question about what data is typically shared and points of contention, I realize this is more of an implementation detail than it is a question regarding general process of accepting connections and sending/receiving data. I had looked at a couple implementations (SuperSocket and Kayak) and noticed some synchronization for things like session cache and reusable buffer pools. Feel free to ignore this question. I've appreciated all your feedback.
One thread per connection is bad design (not scalable, overly complex) but unfortunately way too common.
A socket server works more or less like this:
A listening socket is setup to accept connections, and added to a socketset
The socket set is checked for events
If the listening socket has pending connections, new sockets are created by accepting the connections, and then added to the socket set
If a connected socket has events, the relevant IO functions are called
The socket set is checked for events again
This happens in one thread, you can easily handle thousands of connected sockets in a single thread, and there's few valid reasons for making this more complex by introducing threads.
while running
select on socketset
for each socket with events
if socket is listener
accept new connected socket
add new socket to socketset
else if socket is connection
if event is readable
read data
process data
else if event is writable
write queued data
else if event is closed connection
remove socket from socketset
end
end
done
done
The IP stack takes care of all the details of which packets go to what "socket" in which order. Seen from the applications point of view, a socket represents a reliable ordered byte stream (TCP) or an unreliable unordered sequence of packets(UDP)
EDIT: In response to updated question.
I don't know either of the libraries you mention, but on the concepts you mention:
A session cache typically keeps data associated with a client, and can reuse this data for multiple connections. This makes sense when your application logic requires state information, but it's a layer higher than the actual networking end. In the above sample, the session cache would be used by the "process data" part.
Buffer pools are also an easy and often effective optimization of a high-traffic server. The concept is very easy to implement, instead of allocating/deallocating space for storing data you read/write, you fetch a preallocated buffer from a pool, use it, then return it to a pool. This avoids the (sometimes relatively expensive) backend allocation/deallocation mechanisms. This is not directly related to networking, you can just as well use buffer pools for e.g. something that reads chunks of files and process them.
How off is my understanding?
Pretty far.
Does each client socket require it's own thread to listen for data on?
No.
How is data routed to the correct client socket? Is this something taken care of by the guts of TCP/UDP/kernel?
TCP/IP is a number of layers of protocol. There's no "kernel" to it. It's pieces, each with a separate API to the other pieces.
The IP Address is handled in on place.
The port # is handled in another place.
The IP addresses are matched up with MAC addresses to identify a particular host. The port # is what ties a TCP (or UDP) socket to a particular piece of application software.
In this threaded environment, what kind of data is typically being shared, and what are the points of contention?
What threaded environment?
Data sharing? What?
Contention? The physical channel is the number one point of contention. (Ethernet, for example depends on collision-detection.) After that, well, every part of the computer system is a scarce resource shared by multiple applications and is a point of contention.

Resources