Packet delimiting options - protocols

I am looking to send data through a uart communication link that is very noisy (BER can reach up to 1E-13). I was thinking of sending data that is 64 bytes long.
However, these packets are of varying lengths and should be self decodable using reed solomon with X FEC bytes that are set according to another function in the program.
The packet should be divided according to the following scheme:
byte | 1 | 2..64 | 64..X |
meaning | Seq number | DATA | RS FEC bytes |
What I am now thinking about is how I can be able to delimit the packet to be recognizable by the receiver. I have thought about two major choices:
COBS
Using COBS sounds like a good option, however, since it's a noisy channel, I am afraid that an error that affects the delimiting characters will break the whole packet.
Add a header
Adding a header telling how big is the packet feels somewhat bad, since it would be only one byte long, there is no option for error correcting with reed solomon, and writing another error correcting algorithm is overkill.
What are other options I have at my disposal for this problem?

Related

Twincat 3 PLC: UDP data frame is being cut from its original length

we are trying to process a 152 bytes UDP data frame from a remote service. By following the PeerToPeer Beckhoff infosys example (https://infosys.beckhoff.com/content/1033/tf6310_tc3_tcpip/18014398593720075.html?id=9052404215823027436) we are not able to see the entire 152 bytes message, just a couple of bytes.
Is it possible that the String variable would be only showing the characters until the first 00 bytes or similar (null delimiter)?
In the image below you can see the full UDP frame and what we get as message.
Thanks in advance.
You are correct, the Beckhoff PeerToPeer example will not work with binary data because it uses strings that will cut off at zero value. So it doesn't like the UDP data you have for it.
Instead you should use function blocks like e.g. ReceiveData which will work with a data array and pointers, thus allowing any byte value received. You can do a google search for 'Beckhoff ReceiveData' to get the precise information.

Live audio streaming container formats

When I start receiving the live audio (radio) stream (e.g. MP3 or AAC) I think the received data are not kind of raw bitstream (i.e. raw encoder output), but they are always wrapped into some container format. If this assumption is correct, then I guess I cannot start streaming from arbitrary place of the stream, but I have to wait to some sync byte. Is that right? Is it usual to have some sync bytes? Is there any header following the sync byte, from which I can guess the used codec, number of channels, sample rate, etc.?
When I connect to live stream, will I receive data starting by the nearest sync byte or I will get them from the actual position and I have to check for the sync byte first?
Some streams like icecast use headers in the HTTP response, where stream related information are included, but i think i can skip them and deal directly with the steam format.
Is that correct?
Regards,
STeN
When you look at SHOUTcast/Icecast, the data that comes across is pure MPEG Layer III audio data, and nothing more. (Provided you haven't requested metadata.)
It can be cut at an arbitrary place, so you need to sync to the stream. This is usually done by finding a potential header, and using the data in that header to find sequential headers. Once you have found a few frame headers, you can safely assume you have synced up to the stream and start decoding for playback.
Again, there is no "container format" for these. It's just raw data.
Now, if you want metadata, you have to request it from the server. The data is then just injected into the stream every x number of bytes. See http://www.smackfu.com/stuff/programming/shoutcast.html.
Doom9 has great starting info about both mpeg and aac frame formats. Shoutcast will add some 'metadata' now and then, and it's really trivial. The thing I want to share with you is this; I have an application that can capture all kind of streams, and shoutcast, both aac and mp3 is among them. First versions had their files cut at arbitrary point according to the time, for example every 5 minutes, regardless of the mp3/aac frames. It was somehow OK for the mp3 (the files were playable) but was very bad for aacplus.
The thing is - aacplus decoder ISN'T that forgiving about wrong data, and I had everything from access violations to mysterious software shutdowns with no errors of any kind.
Anyway, if you want to capture stream, open the socket to the server, read the response, you'll have some header there, then use that info to strip metadata that will be injected now and then. Use the header information for both aacplus and mp3 to determine frame boundaries, and try to honor them and split the file at the right place.
mp3 frame header:
http://www.mp3-tech.org/programmer/frame_header.html
aacplus frame header:
http://wiki.multimedia.cx/index.php?title=Understanding_AAC
also this:
aacplus frame alignment problems
Unfortunately it's not always that easy, check the format and notes here:
MPEG frame header format
I will continue the discussion byu answering myself (even we are discouraged to do that):
I was also looking into streamed data and I have found that frequently the sequence ff f3 82 70 is repeated - this I suggest is the MPEG frame header, so I try to look what that means:
ff f3 82 70 (hex) = 11111111 11110011 10000010 01110000 (bin)
Analysis
11111111111 | SYNC
10 | MPEG version 2
01 | Layer III
1 | No CRC
1000 | 64 kbps
00 | 22050Hz
1 | Padding
0 | Private
01 | Joint stereo
11 | ...
Any comments to that?
When starting receiving the streaming data, should I discard all data prior this header before giving the buffer to the class which deals with the DSP? I know this can be implementation specific, but I would like to know what are in general the proceedings here...
BR
STeN

RTP packet combining

I have a bunch of RTP packets that I'd like to re-assemble into an audio stream. For each packet, I have the sequence number, SSRC, timestamp, and a byte array representing the data itself.
Currently I'm taking each subset of packets by their SSRC, then ordering them by timestamp and combining the byte arrays in that order. Afterwards, I'm mixing the byte arrays. The resulting audio data sounds great (by great, I mean everything is in time), but I'm worried that it's due to not having much packet loss.
So, a couple questions...
For missing packets, a missing sequence number shows where I need to add a bit of empty audio. I believe the sequence number "wraps around" quite often, so I need to use timestamp to break them up into subsets. Then I can look for missing sequence numbers in those subsets and add as needed. Does that sound like the right thing to do?
I haven't quite figured out what else the timestamp is good for. Since I'm recording already existing packets and filling in the missing ones, maybe I don't need to worry about this as much?
1) Avoid using timestamps in your algorithm. Your algorithm will fail in case you are receiving stream from bad clients (Improper timestamps). And "timestamps increment" value changes with codec types. In that case you may need different subsets for different codecs. There is no limitations on sequence number. Sequence number are incremented monotonically. Using sequence number you can track lost packets easily.
2) Timestamp is used for synchronization between Audio and video. Mainly for lip sync. A relationship between audio and video timestamps is established for achieving synchronization. In your case its only audio so you can avoid using timestamp.
Edit: According to RFC 3389 (Real-time Transport Protocol (RTP) Payload for Comfort Noise (CN))
RTP allows discontinuous transmission (silence suppression) on any
audio payload format. The receiver can detect silence suppression
on the first packet received after the silence by observing that
the RTP timestamp is not contiguous with the end of the interval
covered by the previous packet even though the RTP sequence number
has incremented only by one. The RTP marker bit is also normally
set on such a packet.
1) I don't think sequence number "wrap around" quickly. This is 16-bit value so it wraps every 65536 messages and even if message is send every 10 milliseconds this give more than 10 minutes of transmission. It is very unlikely that packet will be lost for so long. So in my opinion you should only check sequence number, checking timestamp is pointless.
2) I think you shouldn't worry much about timestamp. I know that some protocols didn't even fill this value and relay only on sequence number.
I think what Zulijn is getting at in his answer above is that if your packets are stored in the order they were captured then you can use some simple rules to find out-of-order packets - e.g. look back 50 packets and forward 50 packets. If it is not there then it counts as a lost packet.
This should avoid any issues with the sequence number having wrapped around. To handle any lost packets there are many techniques you can use, so it would be useful to google 'Audio packet loss' or 'VOIP packet loss concealment'. As Adam mentions timestamp will vary with codec so you need to understand this if you are going to use it.
You don't mention what the actual application is but if you are trying to understand what the received audio actually sounded like, you really need some more info, in particular the jitter buffer size - this effectively determines how long the receiver will wait for an out of sequence packet before deciding it is lost. What this means to you is that there may be out-of-sequence packets in your file which the 'real world' receiver would have given up and not played back - i.e. your reconstruction from the file may give a higher quality than the 'real time' experience.
If it is a two way transmission, then delay is very important also (even if it is a constant delay and hence does not affect jitter and packet loss). This is the type of effect you used to get on some radio telephones and still do on some satellite phones (or VoIP phones), and it can significantly impact the user experience.
Finally, different codecs and clients may apply different techniques to correct lost packets, insert 'silent tones' for any gaps in the audio (e.g. pauses in conversation), suppress background noise etc.
To get a proper feel for the user experience you would have to try to 'replay' your captured packets as accurately as possible using the same codec, jitter buffer and any error correction/packet loss techniques the receiver used also.

error detection/correction/recovery in serial protocols

I have some designing to do for a serial protocol and am running into some questions that I figure must have been considered elsewhere.
So I'm wondering if there are some recommendations for best practices in designing serial protocols. (Please either state a fact that is easily verifiable, or cite a reputable source if you make a claim.) General recommendations for websites/books are also welcome.
In particular I have to deal with issues like
parsing a stream of bytes into packets
verifying a packet is correct (easy with a CRC, for instance)
identifying reasonable types of errors that can occur (e.g. in a point-to-point serial stream, sporadic single bit errors, and dropped series of bytes, are both likely, but extra phantom bytes are unlikely; whereas with a record stored in flash memory or on a disk drive the types of errors that predominate are different)
error correction or recovery (if I detect an error in a packet, can I correct it? If not, can I resync to the boundary of the next packet?)
how to make variable-length packets robust to error correction / recovery.
Any suggestions?
Packet delimiting
For syncing to packet boundaries, typically you have a byte or byte sequence that identifies the packet boundary, which cannot occur within the packet itself. If the packet data happens to contain that identifier, then you have to "escape" (aka byte stuff) it.
Examples:
PPP Encapsulation
Consistent Overhead Byte Stuffing (COBS), or maybe COBS/R, which encodes data packets so no zero bytes are present, thus you can use zero bytes for packet delimiting
Packet verification
Various options are:
Checksum
Adler-32
Fletcher
CRC (the more bits the better the check)
Error correction etc
Good questions. I've not had much experience with that.
Have you considered FEC (Forward Error Correction)?
This procedure is very often used in "physical" level communication protocols such as WDM (Wavelength Division Multiplexing) / OTN (Optical Transport Network).

To pad or not to pad - creating a communication protocol

I am creating a protocol to have two applications talk over a TCP/IP stream and am figuring out how to design a header for my messages. Using the TCP header as an initial guide, I am wondering if I will need padding. I understand that when we're dealing with a cache, we want to make sure that data being stored fits in a row of cache so that when it is retrieved it is done so efficiently. However, I do not understand how it makes sense to pad a header considering that an application will parse a stream of bytes and store it how it sees fit.
For example: I want to send over a message header consisting of a 3 byte field followed by a 1 byte padding field for 32 bit alignment. Then I will send over the message data.
In this case, the receiver will just take 3 bytes from the stream and throw away the padding byte. And then start reading message data. As I see it, he will not be storing the 3 bytes and the message data the way he wants. The whole point of byte alignment is so that it will be retrieved in an efficient manner. But if the retriever doesn't care about the padding how will it be retrieved efficiently?
Without the padding, the retriever just takes the 3 header bytes from the stream and then takes the data bytes. Since the retriever stores these bytes however he wants, how does it matter whether or not the padding is done?
Maybe I'm missing the point of padding.
It's slightly hard to extract a question from this post, but with what I've said you guys can probably point out my misconceptions.
Please let me know what you guys think.
Thanks,
jbu
If word alignment of the message body is of some use, then by all means, pad the message to avoid other contortions. The padding will be of benefit if most of the message is processed as machine words with decent intensity.
If the message is a stream of bytes, for instance xml, then padding won't do you a whole heck of a lot of good.
As far as actually designing a wire protocol, you should probably consider using a plain text protocol with compression (including the header), which will probably use less bandwidth than any hand-designed binary protocol you could possibly invent.
I do not understand how it makes sense to pad a header considering that an application will parse a stream of bytes and store it how it sees fit.
If I'm a receiver, I might pass a buffer (i.e. an array of bytes) to the protocol driver (i.e. the TCP stack) and say, "give this back to me when there's data in it".
What I (the application) get back, then, is an array of bytes which contains the data. Using C-style tricks like "casting" and so on I can treat portions of this array as if it were words and double-words (not just bytes) ... provided that they're suitably aligned (which is where padding may be required).
Here's an example of a statement which reads a DWORD from an offset in a byte buffer:
DWORD getDword(const byte* buffer)
{
//we want the DWORD which starts at byte-offset 8
buffer += 8;
//dereference as if it were pointing to a DWORD
//(this would fail on some machines if the pointer
//weren't pointing to a DWORD-aligned boundary)
return *((DWORD*)buffer);
}
Here's the corresponding function in Intel assembly; note that it's a single opcode i.e. quite an efficient way to access the data, more efficient that reading and accumulating separate bytes:
mov eax,DWORD PTR [esi+8]
Oner reason to consider padding is if you plan to extend your protocol over time. Some of the padding can be intentionally set aside for future assignment.
Another reason to consider padding is to save a couple of bits on length fields. I.e. always a multiple of 4, or 8 saves 2 or 3 bits off the length field.
One other good reason that TCP has padding (which probably does not apply to you) is it allows dedicated network processing hardware to easily separate the data from the header. As the data always starts on a 32 bit boundary, it's easier to separate the header from the data when the packet gets routed.
If you have a 3 byte header and align it to 4 bytes, then designate the unused byte as 'reserved for future use' and require the bits to be zero (rejecting messages where they are not as malformed). That leaves you some extensibility. Or you might decide to use the byte as a version number - initially zero, and then incrementing it if (when) you make incompatible changes to the protocol. Don't let the value be 'undefined' and "don't care"; you'll never be able to use it if you start out that way.

Resources