How are sps/pps in H264 converted to base64? I couldn't find this on the specs.
thanks
RFC 6184 mentions this in 8.1. Media Type Registration:
sprop-parameter-sets:
This parameter MAY be used to convey any sequence and picture
parameter set NAL units (herein referred to as the initial
parameter set NAL units) that can be placed in the NAL unit
stream to precede any other NAL units in decoding order. The
parameter MUST NOT be used to indicate codec capability in any
capability exchange procedure. The value of the parameter is a
comma-separated (',') list of base64 [7] representations of
parameter set NAL units as specified in Sections 7.3.2.1 and
7.3.2.2 of [1]. Note that the number of bytes in a parameter
set NAL unit is typically less than 10, but a picture parameter
set NAL unit can contain several hundred bytes.
Base64 is used in a straightforward way: it takes binary data on input (raw SPS/PPS byte arrays) and outputs text to be, for instance, a part of SDP.
Related
I'm in the process of writing a JPEG file decoder to learn about the workings of JPEG files. In working through ITU-T81, which specifies JPEG, I ran into the following regarding the DQT segment for quantization tables:
In many of JPEG's segments, there is an n parameter which you read from the segment, which then indicates how many iterations of the following item there are. However, in the DQT case, it just says "multiple", and its not defined how many multiples there are. One can possibly infer from Lq, but the way this multiple is defined is a bit of an anomaly compared to the other segments.
For anyone who is familiar with this specification, what is the right way to determine how many multiples, or n, of (Pq, Tq, Q0..Q63) there should be?
Take the length field (LQ), subtract the length of the Pq/Tq field (one byte if I remember), and that is N.
This is probably a very silly question, but after searching for a while, I couldn't find a straight answer.
If a source filter (such as the LAV Audio codec) is processing a 24-bit integral audio stream, how are individual audio samples delivered to the graph?
(for simplicity lets consider a monophonic stream)
Are they stored individually on a 32-bit integer with the most-significant bits unused, or are they stored in a packed form, with the least significant bits of the next sample occupying the spare, most-significant bits of the current sample?
The format is similar to 16-bit PCM: the values are signed integers, little endian.
With 24-bit audio you normally define the format with the help of WAVEFORMATEXTENSIBLE structure, as opposed to WAVEFORMATEX (well, the latter is also possible in terms of being accepted by certain filters, but in general you are expected to use the former).
The structure has two values: number of bits per sample and number of valid bits per sample. So it's possible to have the 24-bit data represented as 24-bit values, and also as 24-bit meaningful bits of 32-bit values. The payload data should match the format.
There is no mix of bits of different samples within a byte:
However, wBitsPerSample is the container size and must be a multiple of 8, whereas wValidBitsPerSample can be any value not exceeding the container size. For example, if the format uses 20-bit samples, wBitsPerSample must be at least 24, but wValidBitsPerSample is 20.
To my best knowledge it's typical to have just 24-bit values, that is three bytes per PCM sample.
Non-PCM formats might define different packing and use "unused" bits more efficiently, so that, for example, to samples of 20-bit audio consume 5 bytes.
A random string should be incompressible.
pi = "31415..."
pi.size # => 10000
XZ.compress(pi).size # => 4540
A random hex string also gets significantly compressed. A random byte string, however, does not get compressed.
The string of pi only contains the bytes 48 through 57. With a prefix code on the integers, this string can be heavily compressed. Essentially, I'm wasting space by representing my 9 different characters in bytes (or 16, in the case of the hex string). Is this what's going on?
Can someone explain to me what the underlying method is, or point me to some sources?
It's a matter of information density. Compression is about removing redundant information.
In the string "314159", each character occupies 8 bits, and can therefore have any of 28 or 256 distinct values, but only 10 of those values are actually used. Even a painfully naive compression scheme could represent the same information using 4 bits per digit; this is known as Binary Coded Decimal. More sophisticated compression schemes can do better than that (a decimal digit is effectively log210, or about 3.32, bits), but at the expense of storing some extra information that allows for decompression.
In a random hexadecimal string, each 8-bit character has 4 meaningful bits, so compression by nearly 50% should be possible. The longer the string, the closer you can get to 50%. If you know in advance that the string contains only hexadecimal digits, you can compress it by exactly 50%, but of course that loses the ability to compress anything else.
In a random byte string, there is no opportunity for compression; you need the entire 8 bits per character to represent each value. If it's truly random, attempting to compress it will probably expand it slightly, since some additional information is needed to indicate that the output is compressed data.
Explaining the details of how compression works is beyond both the scope of this answer and my expertise.
In addition to Keith Thompson's excellent answer, there's another point that's relevant to LZMA (which is the compression algorithm that the XZ format uses). The number pi does not consist of a single repeating string of digits, but neither is it completely random. It does contain substrings of digits which are repeated within the larger sequence. LZMA can detect these and store only a single copy of the repeated substring, reducing the size of the compressed data.
I figured out that the default setting on my device for audio is kAudioFormatLinearPCM.
I get 4 bytes per sample in mData in the AudioBuffer.
Is each value an absolute amplitude value? Is it always a positive number?
You need to know the stream format. If the format is unsigned then the value is always positive. If the sample format is signed, then the value can be either positive or negative.
The value may also need to be byte-swapped, depending on the endianness of the format, the endianness of the processor (little-endian on ARM iOS), and how the value is read from the stream, for the value to be a linear amplitude value.
Is each value an absolute amplitude value?
Yes.
Is it always a positive number?
It's variable across the APIs and implementations you will encounter. You will have to refer to other fields of the AudioStreamBasicDescription to determine the sample format and stream precisely.
What do the values in the mData member represent? It looks like each value is a 4 byte integer...
I guess my question is, what does each sample supposed to represent and what does the mNumberChannels member represent?
If I had to apply some sort of transform on the sound pattern, can I treat these samples as discrete samples in time? If so, what time period does each 512 samples represent?
Thanks
Deshawn
The mData buffer array elements can represent 16-bit signed integers, stereo pairs of 16-bit signed integers, 32-bit 8.24/s7.24 scaled-integer or fixed-point values, or 32-bit floating-point values, etc., depending on the Audio Unit and how it was configured.
The buffer duration will be its length in frames divided by the audio sample rate, for instance 512/44100 is about 11.61 milliSeconds.