Can std::io::BufReader on a TcpStream lead to data loss? - linux

Can a single instance of std::io::BufReader on a tokio::net::TcpStream lead to data loss when the BufReader is used to read_until a given (byte) delimiter?
That is, is there any possibility that after I use the BufReader for:
let buffer = Vec::new();
let reader = BufReader::new(tcp_stream);
tokio::io::read_until(reader, delimiter, buffer)
.map(move |(s, _)| s.into_inner())
a following tokio::io::read using the same stream would return data that is actually beyond the delimiter + 1, causing therefore data loss?
I have an issue (and complete reproducible example on Linux) that I have trouble explaining if the above assumption isn't correct.
I have a TCP server that is supposed to send the content of a file to multiple TCP clients following multiple concurrent requests.
Sometimes, using always the same inputs, the data received by the client is less than expected, therefore the transfer fails.
The error is not raised 100% of the times (that is, some of the client requests still succeed), but with the 100 tries defined in tcp_client.rs it was always reproducible for at least one of them.
The sequence of data transferred between client and server is composed by:
the client send a request
the server read the request and send a response
the client read the response
the server send the file data
the client read the file data
This issue is only reproducible only if steps 1, 2 and 3 are involved, otherwise it works as expected.
The error is raised when this tokio::io::read (used to read the file content) returns 0, as if the server closed the connection, even is the server is actually up and running, and all the data has been sent (there is an assertion after tokio::io::copy and I checked the TCP packets using a packet sniffer). On a side note, in all my runs the amount of data read before the error was always > 95% than the one expected.
Most importantly the common.rs module defines 2 different read_* functions:
read_until currently used.
read_exact not used.
The logic of the 2 is the same, they need to read the request/response (and both client and server can be updated to use one or the other). What is surprising is that the bug presents itself only when tokio::io::read_until is used, while tokio::io::read_exact works as expected.
Unless, I misused tokio::io::read_until or there is a bug in my implementation, I expected both versions to work without any issue. What I am seeing instead is this panic being raised because some clients cannot read all the data sent by the server.

Yes. This is described in the documentation for BufReader (emphasis mine):
When the BufReader is dropped, the contents of its buffer will be discarded.
The next sentence is correct but not extensive enough:
Creating multiple instances of a BufReader on the same stream can cause data loss.
The BufReader has read data from the underlying source and put it in the buffer, then you've thrown away the buffer. The data is gone.

Related

NodeJs - TCP/IP Socket send/receive serially?

The TCP server I am hitting (trying to use the built in node TLS Socket) expects a handshaking process of send/receives in a certain order (send, on receive of success, send more messages, on success, send more, etc). The receive messages does not have anything to let me know which send it is responding to, so I am not able to easily use the streaming nature of the built in TCP Node library.
Any ideas of what the best way to handle this case in Node?
example (python), and this is example of the process:
s.send("hello")
s.send("send this 1")
reply = s.recv()
message = reply[0]
if message == OK:
print('Got OK for hello')
s.send("send this 2")
reply = s.recv()
message = reply[0]
if message == OK:
print('Got it')
else:
raise Exception('Failed to send hello')
When you have non-blocking I/O and you want to do something such as send data, read specific response from that send you need to set up some appropriate state so that when the next set of data come in, you know exactly what it belongs to and therefore you know what to do with it.
There are a number of ways to do that I can think of:
Create a general purpose state machine where you send data and read data and whenever you read data, you can tell what state the socket is in and therefore what you are supposed to do with the data you read.
Create a temporal set of listeners where you send data, then add a temporal listener (you can use .once()) for incoming data that is specially designed to process it the way you are expecting this response to be. When the data arrives, you make sure that listener is removed.
Your pseudo-code example does not show enough info for anyone to make a more concrete suggestion. TCP, by its very nature is stream driven. It doesn't have any built-in sense of a message or a packet. So, what you show doesn't even show the most basic level of any TCP protocol which is how to know when you've received an entire response.
Even your reply = s.recv() shown in some other language isn't practical in TCP (no matter the language) because s.recv() needs to know when it's got a complete message/chunk/whatever it is that you're waiting to receive. TCP delivers data in order sent, but does not have any sense of a particular packet of information that goes together. You have to supply that on top of the TCP layer. Common techniques used for delineating messages are :
Some message delimiter (like a carriage return or line feed or a zero byte or some other tag - all of which are known not to occur inside the message itself)
Sending a length first so the reader knows exactly how many bytes to read.
Wrapping messages in some sort of container where the start and end of the container are made clear by the structure of the container (note options 1 and 2 above are just specific implementations of such a container). For example, the webSocket protocol uses a very specific container model that includes some length data and other info.
I was thinking of showing you an example using socket.once('data', ...) to listen for the specific response, but even that won't work properly without knowing how to delineate an incoming message so one knows when you've received a complete incoming message.
So, one of your first steps would be to implement a layer on top of TCP that reads data and knows how to break it into discrete messages (knows both when a complete message has arrived and how to break up multiple messages that might be arriving) and then emits your own event on the socket when a whole message has arrived. Then, and only then, can you start to implement the rest of your state machine using the above techniques.

Node JS Streams: Understanding data concatenation

One of the first things you learn when you look at node's http module is this pattern for concatenating all of the data events coming from the request read stream:
let body = [];
request.on('data', chunk => {
body.push(chunk);
}).on('end', () => {
body = Buffer.concat(body).toString();
});
However, if you look at a lot of streaming library implementations they seem to gloss over this entirely. Also, when I inspect the request.on('data',...) event it almost ever only emits once for a typical JSON payload with a few to a dozen properties.
You can do things with the request stream like pipe it through some transforms in object mode and through to some other read streams. It looks like this concatenating pattern is never needed.
Is this because the request stream in handling POST and PUT bodies pretty much only ever emits one data event which is because their payload is way below the chunk partition size limit?. In practice, how large would a JSON encoded object need to be to be streamed in more than one data chunk?
It seems to me that objectMode streams don't need to worry about concatenating because if you're dealing with an object it is almost always no larger than one data emitted chunk, which atomically transforms to one object? I could see there being an issue if a client were uploading something like a massive collection (which is when a stream would be very useful as long as it could parse the individual objects in the collection and emit them one by one or in batches).
I find this to probably be the most confusing aspect of really understanding the node.js specifics of streams, there is a weird disconnect between streaming raw data, and dealing with atomic chunks like objects. Do objectMode stream transforms have internal logic for automatically concatenating up to object boundaries? If someone could clarify this it would be very appreciated.
The job of the code you show is to collect all the data from the stream into one buffer so when the end event occurs, you then have all the data.
request.on('data',...) may emit only once or it may emit hundreds of times. It depends upon the size of the data, the configuration of the stream object and the type of stream behind it. You cannot ever reliably assume it will only emit once.
You can do things with the request stream like pipe it through some transforms in object mode and through to some other read streams. It looks like this concatenating pattern is never needed.
You only use this concatenating pattern when you are trying to get the entire data from this stream into a single variable. The whole point of piping to another stream is that you don't need to fetch the entire data from one stream before sending it to the next stream. .pipe() will just send data as it arrives to the next stream for you. Same for transforms.
Is this because the request stream in handling POST and PUT bodies pretty much only ever emits one data event which is because their payload is way below the chunk partition size limit?.
It is likely because the payload is below some internal buffer size and the transport is sending all the data at once and you aren't running on a slow link and .... The point here is you cannot make assumptions about how many data events there will be. You must assume there can be more than one and that the first data event does not necessarily contain all the data or data separated on a nice boundary. Lots of things can cause the incoming data to get broken up differently.
Keep in mind that a readStream reads data until there's momentarily no more data to read (up to the size of the internal buffer) and then it emits a data event. It doesn't wait until the buffer fills before emitting a data event. So, since all data at the lower levels of the TCP stack is sent in packets, all it takes is a momentary delivery delay with some packet and the stream will find no more data available to read and will emit a data event. This can happen because of the way the data is sent, because of things that happen in the transport over which the data flows or even because of local TCP flow control if lots of stuff is going on with the TCP stack at the OS level.
In practice, how large would a JSON encoded object need to be to be streamed in more than one data chunk?
You really should not know or care because you HAVE to assume that any size object could be delivered in more than one data event. You can probably safely assume that a JSON object larger than the internal stream buffer size (which you could find out by studying the stream code or examining internals in the debugger) WILL be delivered in multiple data events, but you cannot assume the reverse because there are other variables such as transport-related things that can cause it to get split up into multiple events.
It seems to me that objectMode streams don't need to worry about concatenating because if you're dealing with an object it is almost always no larger than one data emitted chunk, which atomically transforms to one object? I could see there being an issue if a client were uploading something like a massive collection (which is when a stream would be very useful as long as it could parse the individual objects in the collection and emit them one by one or in batches).
Object mode streams must do their own internal buffering to find the boundaries of whatever objects they are parsing so that they can emit only whole objects. At some low level, they are concatenating data buffers and then examining them to see if they yet have a whole object.
Yes, you are correct that if you were using an object mode stream and the object themselves were very large, they could consume a lot of memory. Likely this wouldn't be the most optimal way of dealing with that type of data.
Do objectMode stream transforms have internal logic for automatically concatenating up to object boundaries?
Yes, they do.
FYI, the first thing I do when making http requests is to go use the request-promise library so I don't have to do my own concatenating. It handles all this for you. It also provides a promise-based interface and about 100 other helpful features which I find helpful.

Streaming output from program to an arbitrary number of programs under Linux?

How should I stream the output from one program to an undefined number of programs in such a fashion that the data isn't buffered anywhere and that the application where the stream originates from doesn't block even if there's nothing reading the stream, but the programs reading the stream do block if there's no output from the first-mentioned program?
I've been trying to Google around for a while now, but all I find is methods where the program does block if nothing is reading the stream.
How should I stream the output from one program to an undefined number of programs in such a fashion that the data isn't buffered anywhere and that the application where the stream originates from doesn't block even if there's nothing reading the stream
Your requirements as stated can not possibly be satisfied without some form of a buffer.
Most straightforward option is to write the output to the file and let consumers read that file.
Another option is to have a ring-buffer in a form of a memory mapped file. As the capacity of a ring-buffer is normally fixed there needs to be a policy for dealing with slow consumers. Options are: block the producer; terminate the slow consumer; let the slow consumer somehow recover when it missed data.
Many years ago I wrote something like what you describe for an audio stream processing app (http://hewgill.com/nwr/). It's on github as splitter.cpp and has a small man page.
The splitter program currently does not support dynamically changing the set of output programs. The output programs are fixed when the command is started.
Without knowing exactly what sort of data you are talking about (how large is the data, what format is it, etc, etc) it is hard to come up with a concrete answer. Let's say for example you want a "ticker-tape" application that sends out information for share purchases on the stock exchange, you could quite easily have a server that accepts a socket from each application, starts a thread and sends the relevant data as it appears from the recoder at the stock market. I'm not aware of any "multiplexer" that exists today (but Greg's one may be a starting point). If you use (for example) XML to package the data, you could send the second half of a packet, and the client code would detect that it's not complete, so throws it away.
If, on the other hand, you are sending out high detail live update weather maps for the whole country, the data is probably large enough that you don't want to wait for a full new one to arrive, so you need some sort of lock'n'load protocol that sets the current updated map, and then sends that one out until (say) 1 minute later you have a new one. Again, it's not that complex to write some code to do this, but it's quite a different set of code to the "ticker tape" solution above, because the packet of data is larger, and getting "half a packet" is quite wasteful and completely useless.
If you are streaming live video from the 2016 Olympics in Brazil, then you probably want a further diffferent solution, as timing is everything with video, and you need the client to buffer, pick up key-frames, throw away "stale" frames, etc, etc, and the server will have to be different.

Linux: send whole message or none of it on TCP socket

I'm sending various custom message structures down a nonblocking TCP socket. I want to send either the whole structure in one send() call, or return an error with no bytes sent if there's only room in the send buffer for part of the message (ie send() returns EWOULDBLOCK). If there's not enought room, I will throw away the whole structure and report overflow, but I want to be recoverable after that, ie the receiver only ever receives a sequence of valid complete structures. Is there a way of either checking the send buffer free space, or telling the send() call to do as described? Datagram-based sockets aren't an option, must be connection-based TCP. Thanks.
Linux provides a SIOCOUTQ ioctl() to query how much data is in the TCP output buffer:
http://www.kernel.org/doc/man-pages/online/pages/man7/tcp.7.html
You can use that, plus the value of SO_SNDBUF, to determine whether the outgoing buffer has enough space for any particular message. So strictly speaking, the answer to your question is "yes".
But there are two problems with this approach. First, it is Linux-specific. Second, what are you planning to do when there is not enough space to send your whole message? Loop and call select again? But that will just tell you the socket is ready for writing again, causing you to busy-loop.
For efficiency's sake, you should probably bite the bullet and just deal with partial writes; let the network stack worry about breaking your stream up into packets for optimal throughput.
TCP has no support for transactions; this is something which you must handle on layer 7 (application).

TCP Message framing + recv() [linux]: Good conventions?

I am trying to create a p2p applications on Linux, which I want to run as efficiently as possible.
The issue I have is with managing packets. As we know, there may be more than one packet in the recv() buffer at any time, so there is a need to have some kind of message framing system to make sure that multiple packets are not treated as one big packet.
So at the moment my packet structure is:
(u16int Packet Length):(Packet Data)
Which requires two calls to recv(); one to get the packet size, and one to get the packet.
There are two main problems with this:
1. A malicious peer could send a packet with a size header of
something large, but not send any more data. The application will
hang on the second recv(), waiting for data that will never come.
2. Assuming that calling Recv() has a noticeable performance penalty
(I actually have no idea, correct me if I am wrong) calling Recv() twice
will slow the program down.
What is the best way to structure packets/Recieving system for both the best efficiency and stability? How do other applications do it? What do you recommend?
Thankyou in advance.
I think your "framing" of messages within a TCP stream is right on.
You could consider putting a "magic cookie" in front of each frame (e.g. write the 32-bit int "0xdeadbeef" at the top of each frame header in addition to the packet length) such that it becomes obvious that your are reading a frame header on the first of each recv() pairs. It the magic integer isn't present at the start of the message, you have gotten out of sync and need to tear the connection down.
Multiple recv() calls will not likely be a performance hit. As a matter of fact, because TCP messages can get segmented, coalesced, and stalled in unpredictable ways, you'll likely need to call recv() in a loop until you get all the data you expected. This includes your two byte header as well as for the larger read of the payload bytes. It's entirely possible you call "recv" with a 2 byte buffer to read the "size" of the message, but only get 1 byte back. (Call recv again, and you'll get the subsequent bytes). What I tell the developers on my team - code your network parsers as if it was possible that recv only delivered 1 byte at a time.
You can use non-blocking sockets and the "select" call to avoid hanging. If the data doesn't arrive within a reasonable amount of time (or more data arrives than expected - such that syncing on the next message becomes impossible), you just tear the connection down.
I'm working on a P2P project of my own. Would love to trade notes. Follow up with me offline if you like.
I disagree with the others, TCP is a reliable protocol, so a packet magic header is useless unless you fear that your client code isn't stable or that unsolicited clients connect to your port number.
Create a buffer for each client and use non-blocking sockets and select/poll/epoll/kqueue. If there is data available from a client, read as much as you can, it doesn't matter if you read more "packets". Then check whether you've read enough so the size field is available, if so, check that you've read the whole packet (or more). If so, process the packet. Then if there's more data, you can repeat this procedure. If there is partial packet left, you can move that to the start of your buffer, or use a circular buffer so you don't have to do those memmove-s.
Client timeout can be handled in your select/... loop.
That's what I would use if you're doing something complex with the received packet data. If all you do is to write the results to a file (in bigger chunks) then sendfile/splice yields better peformance. Just read packet length (could be multiple reads) then use multiple calls to sendfile until you've read the whole packet (keep track of how much left to read).
You can use non-blocking calls to recv() (by setting SOCK_NONBLOCK on the socket), and wait for them to become ready for reading data using select() (with a timeout) in a loop.
Then if a file descriptor is in the "waiting for data" state for too long, you can just close the socket.
TCP is a stream-oriented protocol - it doesn't actually have any concept of packets. So, in addition to recieving multiple application-layer packets in one recv() call, you might also recieve only part of an application-layer packet, with the remainder coming in a future recv() call.
This implies that robust reciever behaviour is obtained by receiving as much data as possible at each recv() call, then buffering that data in an application-layer buffer until you have at least one full application-layer packet. This also avoids your two-calls-to-recv() problem.
To always recieve as much data as possible at each recv(), without blocking, you should use non-blocking sockets and call recv() until it returns -1 with errno set to EWOULDBLOCK.
As others said, a leading magic number (OT: man file) is a good (99.999999%) solution to identify datagram boundaries, and timeout (using non-blocking recv()) is good for detecting missing/late packet.
If you count on attackers, you should put a CRC in your packet. If a professional attacker really wants, he/she will figure out - sooner or later - how your CRC works, but it's even harder than create a packet without CRC. (Also, if safety is critical, you will find SSL libs/examples/code on the Net.)

Resources