error detection/correction/recovery in serial protocols - protocols

I have some designing to do for a serial protocol and am running into some questions that I figure must have been considered elsewhere.
So I'm wondering if there are some recommendations for best practices in designing serial protocols. (Please either state a fact that is easily verifiable, or cite a reputable source if you make a claim.) General recommendations for websites/books are also welcome.
In particular I have to deal with issues like
parsing a stream of bytes into packets
verifying a packet is correct (easy with a CRC, for instance)
identifying reasonable types of errors that can occur (e.g. in a point-to-point serial stream, sporadic single bit errors, and dropped series of bytes, are both likely, but extra phantom bytes are unlikely; whereas with a record stored in flash memory or on a disk drive the types of errors that predominate are different)
error correction or recovery (if I detect an error in a packet, can I correct it? If not, can I resync to the boundary of the next packet?)
how to make variable-length packets robust to error correction / recovery.
Any suggestions?

Packet delimiting
For syncing to packet boundaries, typically you have a byte or byte sequence that identifies the packet boundary, which cannot occur within the packet itself. If the packet data happens to contain that identifier, then you have to "escape" (aka byte stuff) it.
Examples:
PPP Encapsulation
Consistent Overhead Byte Stuffing (COBS), or maybe COBS/R, which encodes data packets so no zero bytes are present, thus you can use zero bytes for packet delimiting
Packet verification
Various options are:
Checksum
Adler-32
Fletcher
CRC (the more bits the better the check)
Error correction etc
Good questions. I've not had much experience with that.

Have you considered FEC (Forward Error Correction)?
This procedure is very often used in "physical" level communication protocols such as WDM (Wavelength Division Multiplexing) / OTN (Optical Transport Network).

Related

maximizing linux raw socket throughput

I have a simple application which receives packets of fixed ethertype via raw socket (the transport is ethernet), and sends two duplicates over another interface (via raw socket):
recvfrom() //blocking
//make duplicate
//add tail
sendto(packet1);
sendto(packet2);
I want two increase throughput. I need at least 4000 frames/second, can't change packet size. How can I achieve these? The system is embedded (AM335x SoC), kernel is 4.14.40... How can I encrease the performance?
A few considerations:
I know you said you can't change "packet size", but using buffered writes and infrequent flushes might really help performance
You could enable jumbo frames
You could disable Nagle
Usually people don't want to change their packet sizes, because they expect a 1-to-1 correspondence between their send()'s and recv()'s. This is not a good thing, because TCP specifically does not ensure that your send()'s and recv()'s will have a 1-1 correspondence. They usually will be 1-1, but they are not guaranteed to do so. Transmitting data with Nagle enabled or over many router hops or without Path MTU Discovery enabled makes the 1-1 relationship less likely.
So if you use buffering, and frame your data somehow (EG 1: terminate messages with a nul byte, if your data cannot otherwise have a nul or EG 2: transfer lengths as network shorts or something, so you know how much to read), you'll likely be killing two birds with one stone - that is, you'll get better speed and better reliability.

Why TCP/IP speed depends on the size of sending data?

When I sent small data (16 bytes and 128 bytes) continuously (use a 100-time loop without any inserted delay), the throughput of TCP_NODELAY setting seems not as good as normal setting. Additionally, TCP-slow-start appeared to affect the transmission in the beginning.
The reason is that I want to control a device from PC via Ethernet. The processing time of this device is around several microseconds, but the huge latency of sending command affected the entire system. Could you share me some ways to solve this problem? Thanks in advance.
Last time, I measured the transfer performance between a Windows-PC and a Linux embedded board. To verify the TCP_NODELAY, I setup a system with two Linux PCs connecting directly with each other, i.e. Linux PC <--> Router <--> Linux PC. The router was only used for two PCs.
The performance without TCP_NODELAY is shown as follows. It is easy to see that the throughput increased significantly when data size >= 64 KB. Additionally, when data size = 16 B, sometimes the received time dropped until 4.2 us. Do you have any idea of this observation?
The performance with TCP_NODELAY seems unchanged, as shown below.
The full code can be found in https://www.dropbox.com/s/bupcd9yws5m5hfs/tcpip_code.zip?dl=0
Please share with me your thinking. Thanks in advance.
I am doing socket programming to transfer a binary file between a Windows 10 PC and a Linux embedded board. The socket library are winsock2.h and sys/socket.h for Windows and Linux, respectively. The binary file is copied to an array in Windows before sending, and the received data are stored in an array in Linux.
Windows: socket_send(sockfd, &SOPF->array[0], n);
Linux: socket_recv(&SOPF->array[0], connfd);
I could receive all data properly. However, it seems to me that the transfer time depends on the size of sending data. When data size is small, the received throughput is quite low, as shown below.
Could you please shown me some documents explaining this problem? Thank you in advance.
To establish a tcp connection, you need a 3-way handshake: SYN, SYN-ACK, ACK. Then the sender will start to send some data. How much depends on the initial congestion window (configurable on linux, don't know on windows). As long as the sender receives timely ACKs, it will continue to send, as long as the receivers advertised window has the space (use socket option SO_RCVBUF to set). Finally, to close the connection also requires a FIN, FIN-ACK, ACK.
So my best guess without more information is that the overhead of setting up and tearing down the TCP connection has a huge affect on the overhead of sending a small number of bytes. Nagle's algorithm (disabled with TCP_NODELAY) shouldn't have much affect as long as the writer is effectively writing quickly. It only prevents sending less than full MSS segements, which should increase transfer efficiency in this case, where the sender is simply sending data as fast as possible. The only effect I can see is that the final less than full MSS segment might need to wait for an ACK, which again would have more impact on the short transfers as compared to the longer transfers.
To illustrate this, I sent one byte using netcat (nc) on my loopback interface (which isn't a physical interface, and hence the bandwidth is "infinite"):
$ nc -l 127.0.0.1 8888 >/dev/null &
[1] 13286
$ head -c 1 /dev/zero | nc 127.0.0.1 8888 >/dev/null
And here is a network capture in wireshark:
It took a total of 237 microseconds to send one byte, which is a measly 4.2KB/second. I think you can guess that if I sent 2 bytes, it would take essentially the same amount of time for an effective rate of 8.2KB/second, a 100% improvement!
The best way to diagnose performance problems in networks is to get a network capture and analyze it.
When you make your test with a significative amount of data, for example your bigger test (512Mib, 536 millions bytes), the following happens.
The data is sent by TCP layer, breaking them in segments of a certain length. Let assume segments of 1460 bytes, so there will be about 367,000 segments.
For every segment transmitted there is a overhead (control and management added data to ensure good transmission): in your setup, there are 20 bytes for TCP, 20 for IP, and 16 for ethernet, for a total of 56 bytes every segment. Please note that this number is the minimum, not accounting the ethernet preamble for example; moreover sometimes IP and TCP overhead can be bigger because optional fields.
Well, 56 bytes for every segment (367,000 segments!) means that when you transmit 512Mib, you also transmit 56*367,000 = 20M bytes on the line. The total number of bytes becomes 536+20 = 556 millions of bytes, or 4.448 millions of bits. If you divide this number of bits by the time elapsed, 4.6 seconds, you get a bitrate of 966 megabits per second, which is higher than what you calculated not taking in account the overhead.
From the above calculus, it seems that your ethernet is a gigabit. It's maximum transfer rate should be 1,000 megabits per second and you are getting really near to it. The rest of the time is due to more overhead we didn't account for, and some latencies that are always present and tend to be cancelled as more data is transferred (but they will never be defeated completely).
I would say that your setup is ok. But this is for big data transfers. As the size of the transfer decreases, the overhead in the data, latencies of the protocol and other nice things get more and more important. For example, if you transmit 16 bytes in 165 microseconds (first of your tests), the result is 0.78 Mbps; if it took 4.2 us, about 40 times less, the bitrate would be about 31 Mbps (40 times bigger). These numbers are lower than expected.
In reality, you don't transmit 16 bytes, you transmit at least 16+56 = 72 bytes, which is 4.5 times more, so the real transfer rate of the link is also bigger. But, you see, transmitting 16 bytes on a TCP/IP link is the same as measuring the flow rate of an empty acqueduct by dropping some tears of water in it: the tears get lost before they reach the other end. This is because TCP/IP and ethernet are designed to carry much more data, with reliability.
Comments and answers in this page point out many of those mechanisms that trade bitrate and reactivity for reliability: the 3-way TCP handshake, the Nagle algorithm, checksums and other overhead, and so on.
Given the design of TCP+IP and ethernet, it is very normal that, for little data, performances are not optimal. From your tests you see that the transfer rate climbs steeply when the data size reaches 64Kbytes. This is not a coincidence.
From a comment you leaved above, it seems that you are looking for a low-latency communication, instead than one with big bandwidth. It is a common mistake to confuse different kind of performances. Moreover, in respect to this, I must say that TCP/IP and ethernet are completely non-deterministic. They are quick, of course, but nobody can say how much because there are too many layers in between. Even in your simple setup, if a single packet get lost or corrupted, you can expect delays of seconds, not microseconds.
If you really want something with low latency, you should use something else, for example a CAN. Its design is exactly what you want: it transmits little data with high speed, low latency, deterministic time (just microseconds after you transmitted a packet, you know if it has been received or not. To be more precise: exactly at the end of the transmission of a packet you know if it reached the destination or not).
TCP sockets typically have a buffer size internally. In many implementations, it will wait a little bit of time before sending a packet to see if it can fill up the remaining space in the buffer before sending. This is called Nagle's algorithm. I assume that the times you report above are not due to overhead in the TCP packet, but due to the fact that the TCP waits for you to queue up more data before actually sending.
Most socket implementations therefore have a parameter or function called something like TcpNoDelay which can be false (default) or true. I would try messing with that and seeing if that affects your throughput. Essentially these flags will enable/disable Nagle's algorithm.

Securely transmit commands between PIC microcontrollers using nRF24L01 module

I have created a small wireless network using a few PIC microcontrollers and nRF24L01 wireless RF modules. One of the PICs is PIC18F46K22 and it is used as the main controller which sends commands to all other PICs. All other (slave) microcontrollers are PIC16F1454, there are 5 of them so far. These slave controllers are attached to various devices (mostly lights). The main microcontroller is used to transmit commands to those devices, such as turn lights on or off. These devices also report the status of the attached devices back to the main controller witch then displays it on an LCD screen. This whole setup is working perfectly fine.
The problem is that anybody who has these cheap nRF24L01 modules could simply listen to the commands which are being sent by the main controller and then repeat them to control the devices.
Encrypting the commands wouldn’t be helpful as these are simple instructions and if encrypted they will always look the same, and one does not need to decrypt it to be able to retransmit the message.
So how would I implement a level of security in this system?
What you're trying to do is to prevent replay attacks. The general solution to this involves two things:
Include a timestamp and/or a running message number in all your messages. Reject messages that are too old or that arrive out of order.
Include a cryptographic message authentication code in each message. Reject any messages that don't have the correct MAC.
The MAC should be at least 64 bits long to prevent brute force forgery attempts. Yes, I know, that's a lot of bits for small messages, but try to resist the temptation to skimp on it. 48 bits might be tolerable, but 32 bits is definitely getting into risky territory, at least unless you implement some kind of rate limiting on incoming messages.
If you're also encrypting your messages, you may be able to save a few bytes by using an authenticated encryption mode such as SIV that combines the MAC with the initialization vector for the encryption. SIV is a pretty nice choice for encrypting small messages anyway, since it's designed to be quite "foolproof". If you don't need encryption, CMAC is a good choice for a MAC algorithm, and is also the MAC used internally by SIV.
Most MACs, including CMAC, are based on block ciphers such as AES, so you'll need to find an implementation of such a cipher for your microcontroller. A quick Google search turned up this question on electronics.SE about AES implementations for microcontrollers, as well as this blog post titled "Fast AES Implementation on PIC18F4550". There are also small block ciphers specifically designed for microcontrollers, but such ciphers tend to be less thoroughly analyzed than AES, and may harbor security weaknesses; if you can use AES, I would. Note that many MAC algorithms (as well as SIV mode) only use the block cipher in one direction; the decryption half of the block cipher is never used, and so need not be implemented.
The timestamp or message number should be long enough to keep it from wrapping around. However, there's a trick that can be used to avoid transmitting the entire number with each message: basically, you only send the lowest one or two bytes of the number, but you also include the higher bytes of the number in the MAC calculation (as associated data, if using SIV). When you receive a message, you reconstruct the higher bytes based on the transmitted value and the current time / last accepted message number and then verify the MAC to check that your reconstruction is correct and the message isn't stale.
If you do this, it's a good idea to have the devices regularly send synchronization messages that contain the full timestamp / message number. This allows them to recover e.g. from prolonged periods of message loss causing the truncated counter to wrap around. For schemes based on sequential message numbering, a typical synchronization message would include both the highest message number sent by the device so far as well as the lowest number they'll accept in return.
To guard against unexpected power loss, the message numbers should be regularly written to permanent storage, such as flash memory. Since you probably don't want to do this after every message, a common solution is to only save the number every, say, 1000 messages, and to add a safety margin of 1000 to the saved value (for the outgoing messages). You should also design your data storage patterns to avoid directly overwriting old data, both to minimize wear on the memory and to avoid data corruption if power is lost during a write. The details of this, however, are a bit outside the scope of this answer.
Ps. Of course, the MAC calculation should also always include the identities of the sender and the intended recipient, so that an attacker can't trick the devices by e.g. echoing a message back to its sender.

How long should a message header/prefix be?

I've worked with a few protocols, and written my own. I have written some message formats with only 1 char to identify the message, and some with 4 chars. I don't feel that I'm experienced enough to tell which is better, so I'm looking for an answer which describes in which scenario one might be better than the other.
For performance, you would imagine that sending 2 bytes (A%1i) is faster than sending 5 bytes (ABCD%1i). However, I have noticed that when writing the protocol with the 1 byte prefix, if you have a bug which causes your code to not read enough data from the socket, you might get garbage data comming into your system.
So is the purpose of a 4 byte prefix just to provide a guarentee that your message is clean? Is it worth it for the performance you sacrafice? Do you really sacrafice any performance at all? Maybe it's better to have 2 or 3 byte prefix?
I'm not sure if this question should be specific to TCP, or whether it applies to all transport protocols. Advice on this would be interesting.
Update: For interest, I will mention that Synergy uses 4-byte message prefixes, so for a mouse move delta the header is the same size as the actual data. Some have suggested just having a 1 or 2 byte prefix to improve efficiency. I wonder what drawbacks this would have?
Update: Also, I wonder if only the handshake really matters, if you're worried about garbage data. Synergy has a long handshake (a few bytes), so are the 4-byte message prefixes needed? I made a protocol recently that has only a 1 byte handshake, and that turned out to be a bad idea, since incompatible protocols were spamming the system with bad data (off the back of this, I might reccomend at least having a long handshake).
The purpose of the header is to make it easier to solve the frame synchronization problem ( byte aligning in serial communication ).
To synchronize, the receiver looks for anything in the data stream that "looks like" a start-of-message header.
If you have lots of different kinds of valid start-of-message headers, and all of them are 1 byte long, then you will inevitably get a lot of "false frame synchronizations" -- garbage from something that "looks like" a start-of-message header, but isn't.
It would be better to pick some other header that makes it "unlikely" that anything in the serial data stream "looks like" a valid start-of-message header.
It is inevitable that you will get garbage data coming into your system, no matter how you design the packet header.
Whatever you use to handle these other problems (such as occasional bit errors in the middle of the message) should also be adequate to handle the occasional "false frame synchronization" garbage.
In some systems, any bad data is quickly overwritten by fresh new good data, and if you blink you might never see the bad data.
Other systems need at least some sort of error detection in the footer to reject the bad data.
Yet other systems need to not only detect such errors, but somehow keep re-sending that message -- until both sides are convinced that an error-free version of that message has been successfully received.
As Oleksi implied, in some systems the latency is not significantly different between sending a single binary bit (100 ms) and sending 10 bytes (102.4 ms).
So the advantages of using a tiny header (2.4% less latency!) may not be worth it compared to the advantages of using a more verbose header (easier debugging; easier to make backward-compatible and forward-compatible; easier to test the effect of minor changes "in isolation" without upgrading both sides in lockstep to the new protocol which is completely incompatible with the old protocol).
Perhaps you could get the best of both worlds by (a) keeping the verbose, easy-to-debug headers on messages that are so rarely used that the effect of tiny headers is too small to measure (which I suspect is nearly all messages), and (b) introducing a "tiny header" format for any kind of message where the effect of tiny headers is "noticeably better" or at least at least measurable.
It looks like the Synergy protocol is flexible enough to add such a "tiny header" format in a way that is easily distinguishable from the other kinds of message headers.
I use Synergy between my laptop and a few desktop machines. I am glad someone is trying to make it even better.
The performance will depend on the content of the message you are sending. If your content is several kilobytes, it doesn't really matter how many bytes your header is. For now, I would choose the scheme that's easiest to work with, because the performance difference between sending one byte, or four bytes is going to be negligible compared to the actual data that you're sending.

RTP packet combining

I have a bunch of RTP packets that I'd like to re-assemble into an audio stream. For each packet, I have the sequence number, SSRC, timestamp, and a byte array representing the data itself.
Currently I'm taking each subset of packets by their SSRC, then ordering them by timestamp and combining the byte arrays in that order. Afterwards, I'm mixing the byte arrays. The resulting audio data sounds great (by great, I mean everything is in time), but I'm worried that it's due to not having much packet loss.
So, a couple questions...
For missing packets, a missing sequence number shows where I need to add a bit of empty audio. I believe the sequence number "wraps around" quite often, so I need to use timestamp to break them up into subsets. Then I can look for missing sequence numbers in those subsets and add as needed. Does that sound like the right thing to do?
I haven't quite figured out what else the timestamp is good for. Since I'm recording already existing packets and filling in the missing ones, maybe I don't need to worry about this as much?
1) Avoid using timestamps in your algorithm. Your algorithm will fail in case you are receiving stream from bad clients (Improper timestamps). And "timestamps increment" value changes with codec types. In that case you may need different subsets for different codecs. There is no limitations on sequence number. Sequence number are incremented monotonically. Using sequence number you can track lost packets easily.
2) Timestamp is used for synchronization between Audio and video. Mainly for lip sync. A relationship between audio and video timestamps is established for achieving synchronization. In your case its only audio so you can avoid using timestamp.
Edit: According to RFC 3389 (Real-time Transport Protocol (RTP) Payload for Comfort Noise (CN))
RTP allows discontinuous transmission (silence suppression) on any
audio payload format. The receiver can detect silence suppression
on the first packet received after the silence by observing that
the RTP timestamp is not contiguous with the end of the interval
covered by the previous packet even though the RTP sequence number
has incremented only by one. The RTP marker bit is also normally
set on such a packet.
1) I don't think sequence number "wrap around" quickly. This is 16-bit value so it wraps every 65536 messages and even if message is send every 10 milliseconds this give more than 10 minutes of transmission. It is very unlikely that packet will be lost for so long. So in my opinion you should only check sequence number, checking timestamp is pointless.
2) I think you shouldn't worry much about timestamp. I know that some protocols didn't even fill this value and relay only on sequence number.
I think what Zulijn is getting at in his answer above is that if your packets are stored in the order they were captured then you can use some simple rules to find out-of-order packets - e.g. look back 50 packets and forward 50 packets. If it is not there then it counts as a lost packet.
This should avoid any issues with the sequence number having wrapped around. To handle any lost packets there are many techniques you can use, so it would be useful to google 'Audio packet loss' or 'VOIP packet loss concealment'. As Adam mentions timestamp will vary with codec so you need to understand this if you are going to use it.
You don't mention what the actual application is but if you are trying to understand what the received audio actually sounded like, you really need some more info, in particular the jitter buffer size - this effectively determines how long the receiver will wait for an out of sequence packet before deciding it is lost. What this means to you is that there may be out-of-sequence packets in your file which the 'real world' receiver would have given up and not played back - i.e. your reconstruction from the file may give a higher quality than the 'real time' experience.
If it is a two way transmission, then delay is very important also (even if it is a constant delay and hence does not affect jitter and packet loss). This is the type of effect you used to get on some radio telephones and still do on some satellite phones (or VoIP phones), and it can significantly impact the user experience.
Finally, different codecs and clients may apply different techniques to correct lost packets, insert 'silent tones' for any gaps in the audio (e.g. pauses in conversation), suppress background noise etc.
To get a proper feel for the user experience you would have to try to 'replay' your captured packets as accurately as possible using the same codec, jitter buffer and any error correction/packet loss techniques the receiver used also.

Resources