Understanding the spec of the ogg header format - audio

For writing my own ogg-container-class (not using libogg), I try to understand the needed header format. According to the spec, at byte 27 of the stream (starting to count at 0) starts the "segment_table (containing packet lacing values)". This is the red marked byte 13. Concerning the Opus-data that I want to include, the Opus data must start with OpusHead (4F 70 75 73) on its beginning. Why doesn't it start on position 27 where the red 13 is placed? A 13 is a "device control 3" symbol that neither occurs in the Ogg spec, nor in the Opus spec.
EDIT: I found this link that describes the spec a little. There it becomes clear (which it is not from the first link imho) that the 13 (byte 27) is the size of the following segment.

That appears to be a single byte giving the length of the following segment_table data. So there is 13(hex) bytes (16 decimal) bytes of segment_table data.

RFC 3533 is a more verbose description of the format header.
Byte 26 says how many bytes the segment table occupies, so you read that, add 27, and that tells you where the first packet starts (or continues).
The segment table tells you the length(s) of the encapsulated packet(s). Basically you read through the table, adding together the values in each successive byte. If the value you just added is < 255 then that marks a packet boundary, so record the current value of the accumulator, reset it to zero, then continue until you reach the end of the table.
In your example, the segment table size in byte 26 is 1, so the data starts at 27+1 or byte 28, which is the start of the 'OpusHead' string. The value in the 1 byte segment table is 0x13, so the packet is 19 bytes long. 28+19 is 47 (or 0x2f) which is the start of the 'OggS' capture pattern at the start of the next header.
This slightly complicated algorithm is designed to store framing data for many small packets with bounded overhead while still allowing arbitrarily large packets. Note also that packets can be continued between pages, spanning 2 or more segment tables.

Related

Parsing a PCAP file - Why does this packet header timestamp contains SOH \01?

I'm extracting the first 4 bytes from a pcap packet header, which should represent a quantity of seconds. Here they are, in order of appearance in the ByteStream (I'm using Haskell):
\192 (192)
\166 (166)
x (120)
SOH (01) (Start of Header)
My understanding is that the four bytes can be read as a 32-bit integer. However, the presence of SOH is throwing me off. If I interpret the 4 bytes as one integer, I get 2 billion, which is invalid (2 billion seconds = 63 years => invalid because UNIX times starts in 1970, about 45 years ago).
The packet header also ends with NUL (00).
I'm also not sure why the four bytes are reversed, maybe a side-effect of how I'm pulling bytes from the stream (using a Get function and getInt32). Shouldn't the SOH come first?
Did you check the magic number at the very beginning of the pcap file? Its purpose is 1. identify the file format; 2. allow you to determine the byte order. Here's a handy reference: https://wiki.wireshark.org/Development/LibpcapFileFormat#File_Format

Why is the data subchunk of some wav files 2 bytes off?

I have been trying out a wav joiner program in vb.net to join wav files and although it's working fine some of the time, often the resultant wav file doesn't play. After peeking into the original wav files, I noticed that the data subchunk where the word 'data' is was starting at offset 38 instead of 36. That is what's messing up the joiner which assumes offset 36. When I reexported that wav file from audacity, it fixed it up and the data subchunk starts at 36. All programs play the original file fine so I guess it's valid. Why are there two extra 00 bytes values right before the word 'data' in those wav files?
This is a guess, but have you looked at the four-byte number which is at offset 16 in the files where data starts at offset 38?
The fmt sub-chunk is of variable size, and its size is specified in a dword at offset 16 relative to the chunk ID, which is at zero in your files. That dword value is the size of the remainder of the sub-chunk, exclusive of the ID field and of the size field itself. My guess is that if you look there, the ones with the two extra bytes will say that their fmt sub-chunk is 18 bytes long rather than 16 (thanks ooga for catching my error on that).
https://ccrma.stanford.edu/courses/422/projects/WaveFormat/
When there's a size field, always use it. There's no need to jump to fixed offsets in the file on faith if the file format will tell you how big things are. And if it is telling you the size of things, take that as a warning that the size may change.

bitshift large strings for encoding QR Codes

As an example, suppose a QR Code data stream contains 55 data words (each one byte in length) and 15 error correction words (again one byte). The data stream begins with a 12 bit header and ends with four 0 bits. So, 12 + 4 bits of header/footer and 15 bytes of error correction, leaves me 53 bytes to hold 53 alphanumeric characters. The 53 bytes of data and 15 bytes of ec are supplied in a string of length 68 (str68). The problem seems simple enough - concatenate 2 bytes of (right-shifted) header data with str68 and then left shift the entire 70 bytes by 4 bits.
This is the first time in many years of programming that I have ever needed to do something like this, I am a c and bit shifting noob, so please be gentle... I have done a little investigation and so far have not been able to figure out how to bitshift 70 bytes of data; any help would be greatly appreciated.
Larger QR codes can hold 2000 bytes of data...
You need to look at this 4 bits at a time.
The first 4 bits you need to worry about are the lower bits of the first byte. Fortunately this is an easy case because they need to end up in the upper bits of the first byte.
The next 4 bits you need to worry about are the upper bits of the second byte. These need to end up as the lower bits of the first byte.
The next 4 bits you need to worry about are the lower bits of the second byte. But fortunately you already know how to do this because you already did it for the first byte.
You continue in this vein until you have dealt with the lower bytes of the 70th byte.

Reduce length of decimal variable (algorithm)

I have a string of decimal digits like:
965854242113548732659745896523654789653244879653245794444524
length : 60 character
I want to send it to a function, but first I want reduce the length of it as much as possible. How can I do that?
I think about convert it to base-34, that will be 1RG7EEWTN7NW60EWIWMASEWWMEOSWC2SS8482WQE. That is 40 characters in length. Can I reduce it more some way?
Your number fits into 70 bits - for such a small payload compression seems nonsensical. Assuming that the server API supports arbitrary binary data, I would simply encode the value in binary and prefix it with the number of bytes needed.
1 byte length information - for 854657986453156789675, the example you gave initially, this would be 9
9 bytes of binary payload
→ 10 bytes of data transferred for your example.
Your example in hex:
09 2e 54 c3 1e 81 cf 05 fd ab
With the length given in bytes, this of course supports only decimals up to 255 bytes length, but I suppose this is sufficient. If your transport protocol has a built in concept of length of a packet, you could even skip the initial length byte.
Important: ensure that all sides use the same endianness. As you are transmitting your data over the network, network byte order (big endian) would be natural.
If you want to transmit very large numbers, keep in mind that you can use any compression algorithm you like on the binary representation of your data. However, your payload must be significantly larger in order to make compression feasible - for example, using zLib compression for the above 9 byte payload results in an 18 byte payload due to the overhead for the zLib datastructures.
If (and only if) you cannot use arbitrary bytes for your payload, you can encode your data (possibly after compression). Most modern libraries have built in support for Base64, so this would be a natual way of representing the data.

To pad or not to pad - creating a communication protocol

I am creating a protocol to have two applications talk over a TCP/IP stream and am figuring out how to design a header for my messages. Using the TCP header as an initial guide, I am wondering if I will need padding. I understand that when we're dealing with a cache, we want to make sure that data being stored fits in a row of cache so that when it is retrieved it is done so efficiently. However, I do not understand how it makes sense to pad a header considering that an application will parse a stream of bytes and store it how it sees fit.
For example: I want to send over a message header consisting of a 3 byte field followed by a 1 byte padding field for 32 bit alignment. Then I will send over the message data.
In this case, the receiver will just take 3 bytes from the stream and throw away the padding byte. And then start reading message data. As I see it, he will not be storing the 3 bytes and the message data the way he wants. The whole point of byte alignment is so that it will be retrieved in an efficient manner. But if the retriever doesn't care about the padding how will it be retrieved efficiently?
Without the padding, the retriever just takes the 3 header bytes from the stream and then takes the data bytes. Since the retriever stores these bytes however he wants, how does it matter whether or not the padding is done?
Maybe I'm missing the point of padding.
It's slightly hard to extract a question from this post, but with what I've said you guys can probably point out my misconceptions.
Please let me know what you guys think.
Thanks,
jbu
If word alignment of the message body is of some use, then by all means, pad the message to avoid other contortions. The padding will be of benefit if most of the message is processed as machine words with decent intensity.
If the message is a stream of bytes, for instance xml, then padding won't do you a whole heck of a lot of good.
As far as actually designing a wire protocol, you should probably consider using a plain text protocol with compression (including the header), which will probably use less bandwidth than any hand-designed binary protocol you could possibly invent.
I do not understand how it makes sense to pad a header considering that an application will parse a stream of bytes and store it how it sees fit.
If I'm a receiver, I might pass a buffer (i.e. an array of bytes) to the protocol driver (i.e. the TCP stack) and say, "give this back to me when there's data in it".
What I (the application) get back, then, is an array of bytes which contains the data. Using C-style tricks like "casting" and so on I can treat portions of this array as if it were words and double-words (not just bytes) ... provided that they're suitably aligned (which is where padding may be required).
Here's an example of a statement which reads a DWORD from an offset in a byte buffer:
DWORD getDword(const byte* buffer)
{
//we want the DWORD which starts at byte-offset 8
buffer += 8;
//dereference as if it were pointing to a DWORD
//(this would fail on some machines if the pointer
//weren't pointing to a DWORD-aligned boundary)
return *((DWORD*)buffer);
}
Here's the corresponding function in Intel assembly; note that it's a single opcode i.e. quite an efficient way to access the data, more efficient that reading and accumulating separate bytes:
mov eax,DWORD PTR [esi+8]
Oner reason to consider padding is if you plan to extend your protocol over time. Some of the padding can be intentionally set aside for future assignment.
Another reason to consider padding is to save a couple of bits on length fields. I.e. always a multiple of 4, or 8 saves 2 or 3 bits off the length field.
One other good reason that TCP has padding (which probably does not apply to you) is it allows dedicated network processing hardware to easily separate the data from the header. As the data always starts on a 32 bit boundary, it's easier to separate the header from the data when the packet gets routed.
If you have a 3 byte header and align it to 4 bytes, then designate the unused byte as 'reserved for future use' and require the bits to be zero (rejecting messages where they are not as malformed). That leaves you some extensibility. Or you might decide to use the byte as a version number - initially zero, and then incrementing it if (when) you make incompatible changes to the protocol. Don't let the value be 'undefined' and "don't care"; you'll never be able to use it if you start out that way.

Resources