Working with 24-bit audio samples

Working with 24-bit audio samples - audio

What is the "standard way" of working with 24-bit audio? Well, there are no 24-bit data types available, really. Here are the methods that come into my mind:
Represent 24-bit audio samples as 32-bit ints and ignore the upper eight bits.
Just like (1) but ignore the lower eight bits.
Represent 24-bit audio samples as 32-bit floats.
Represent the samples as structs of 3 bytes (acceptable for C/C++, but bad for Java).
How do you work this out?

Store them them as 32- or 64-bit signed ints or float or double unless you are space conscious and care about packing them into the smallest space possible.
Audio samples often appear as 24-bits to and from audio hardware since this is commonly the resolution of the DACs and ADCs - although on most computer hardware, don't be surprised to find the bottom 3 of 4 bits banging away randomly with noise.
Digital signal processing operations - which is what usually happens downstream from the acquisition of samples - all involve addition of weighted sums of samples. A sample stored in an integer type can be considered to be fixed-point binary with an implied binary point at some arbitrary point - the position of which you can chose strategically to maintain as many bits of precision as possible.
For instance, the sum of two 24-bit integer yields a result of 25 bits. After 8 such additions, the 32-bit type would overflow and you would need to re-normalize by rounding and shifting right.
Therefore, if you're using integer types to store your samples, use the largest you can and start with the samples in the least significant 24 bits.
Floating point types of course take care of this detail for you, although you get less choice about when renormalisation takes place. They are the usual choice for audio processing where hardware support is available. A single precision float has a 24-bit mantissa, so can hold a 24-bit sample without loss of precision.
Usually floating point samples are stored in the range -1.0f < x < 1.0f.

Related

Is bfloat16 ever used for graphics?

Bfloat16 is a half precision floating point format that has the same 8-bit exponent as single precision, but only 7 (plus 1 implied) bits of significand. Surprisingly, this turns out to be adequate precision for many machine learning applications, so a lot of resources are being put into making arithmetic in this format run fast.
Given that, it would seem to make sense to also try to use it for graphics. Using it for RGB components during calculation, for example, would allow a much wider dynamic range of light sources to be rendered, compared to just trying to calculate with 8-bit integers. At the same time, it could potentially be faster than using single precision floating point for RGB components.
Are any existing graphics rendering systems actually using it for such purposes?

How is a 24-bit audio stream delivered to the graph?

This is probably a very silly question, but after searching for a while, I couldn't find a straight answer.
If a source filter (such as the LAV Audio codec) is processing a 24-bit integral audio stream, how are individual audio samples delivered to the graph?
(for simplicity lets consider a monophonic stream)
Are they stored individually on a 32-bit integer with the most-significant bits unused, or are they stored in a packed form, with the least significant bits of the next sample occupying the spare, most-significant bits of the current sample?

The format is similar to 16-bit PCM: the values are signed integers, little endian.
With 24-bit audio you normally define the format with the help of WAVEFORMATEXTENSIBLE structure, as opposed to WAVEFORMATEX (well, the latter is also possible in terms of being accepted by certain filters, but in general you are expected to use the former).
The structure has two values: number of bits per sample and number of valid bits per sample. So it's possible to have the 24-bit data represented as 24-bit values, and also as 24-bit meaningful bits of 32-bit values. The payload data should match the format.
There is no mix of bits of different samples within a byte:
However, wBitsPerSample is the container size and must be a multiple of 8, whereas wValidBitsPerSample can be any value not exceeding the container size. For example, if the format uses 20-bit samples, wBitsPerSample must be at least 24, but wValidBitsPerSample is 20.
To my best knowledge it's typical to have just 24-bit values, that is three bytes per PCM sample.
Non-PCM formats might define different packing and use "unused" bits more efficiently, so that, for example, to samples of 20-bit audio consume 5 bytes.

svg viewbox resolution limits

I was wondering if there are any hard limits to the viewbox of a svg element. I see weird clipping when I reach very low values ( say when vb width is around .002 ) or very large ones firefox starts to play funny around 200000000 width.
Is there a rule, a spec somewhere where I can find the current limits ?
Fiddle here:
var dim = 0.00002;
http://jsfiddle.net/7v36sLj8/13/
You'll see funny things starting to happen from that dim onwards, you can decrease by a factor 10 or increase by a factor 10 as fitting.
Thanks for the answers, I'll just take the min/maxes for the lowest common denominator which seems to be ffox for now. ( thanks for answers, also Rob's answer explains why ffox has a much lower threshold on linux / osx).

Firefox originally used a graphics library called cairo to do cross platform graphics rendering. Cairo only allows units to have 32 bits of fixed point binary precision so Firefox chose 24 bits before the binary point and 8 bits of binary fraction. So the maximum co-ordinates are then 2^24 and the smallest deltas 1/256.
Firefox has been replacing cairo with more direct platform rendering e.g. Direct2D on Windows which is used in preference to cairo now if you have a hardware acceleration capable graphics chip and have hardware acceleration enabled. The platform libraries don't have the cairo range limitation but do seem to have their own bugs with large co-ordinates.

The spec requires that browsers support single-precision floating point numbers. Browsers are encouraged to use double-precision numbers for some calculations, mostly matrix transformations, where small decimals are often important, but the general rule is standard C++ "float" data type.
From http://www.w3.org/TR/SVG11/types.html#Precision:
4.3 Real number precision
Unless stated otherwise for a particular attribute or property, a has the capacity for at least a single-precision floating point number and has a range (at a minimum) of -3.4e+38F to +3.4e+38F.
It is recommended that higher precision floating point storage and computation be performed on operations such as coordinate system transformations to provide the best possible precision and to prevent round-off errors.
Conforming High-Quality SVG Viewers are required to use at least double-precision floating point for intermediate calculations on certain numerical operations.
How does that relate to your issue?
A value of 0.002 shouldn't be a problem at all. Numbers like 200000000 would only be a problem if you then needed fine decimals. If your viewbox was "200000000 200000000 0.002 0.002" -- in other words, a very small range of very large numbers -- then floating point precision would likely be the problem. However, if there's a problem with low-precision large numbers or with decimals that can be exactly encoded by a float, then the browser isn't living up to the specs.
It could be that the browser is trying to smooth shapes, but is rounding to the nearest user unit instead of rounding to the nearest display pixel. Can you put together a simple example that demonstrates the specific problems you're seeing?

How can an audio wave be represented in a long array of floats?

In my application I'm using the sound library Beads (this question isn't specifically about that library).
In the library there's a class WavePlayer. It takes a Buffer, and produces a sound wave by iterating over the Buffer.
Buffers simply wrap a float[].
For example, here's a beginning of a buffer:
0.0 0.0015339801 0.0030679568 0.004601926 0.0061358847 0.007669829 0.009203754 0.010737659 0.012271538 0.0138053885 0.015339206 0.016872987 0.01840673 0.019940428 0.02147408 ...
It's size is 4096 float values.
Iterating over it with a WavePlayer creates a smooth "sine wave" sound. (This buffer is actually a ready-made 'preset' in the Buffer class, i.e. Buffer.SINE).
My question is:
What kind of data does a buffer like this represent? What kind of information does it contain that allows one to iterate over it and produce an audio wave?

read this post What's the actual data in a WAV file?
Sound is just a curve. You can represent this curve using integers or floats.
There are two important aspects : bit-depth and sample-rate. First let's discuss bit-depth. Each number in your list (int/floats) represents the height of the sound curve at a given point in time. For simplicity, when using floats the values typically vary from -1.0 to +1.0 whereas integers may vary from say 0 to 2^16 Importantly, each of these numbers must be stored into a sound file or audio buffer in memory - the resolution/fidelity you choose to represent each point of this curve influences the audio quality and resultant sound file size. A low fidelity recording may use 8 bits of information per curve height measurement. As you climb the fidelity spectrum, 16 bits, 24 bits ... are dedicated to store each curve height measurement. More bits equates with more significant digits for floats or a broader range of integers (16 bits means you have 2^16 integers (0 to 65535) to represent height of any given curve point).
Now to the second aspect sample-rate. As you capture/synthesize sound in addition to measuring the curve height, you must decide how often you measure (sample) the curve height. Typical CD quality records (samples) the curve height 44100 times per second, so sample-rate would be 44.1kHz. Lower fidelity would sample less often, ultra fidelity would sample at say 96kHz or more. So the combination of curve height measurement fidelity (bit-depth) coupled with how often you perform this measurement (sample-rate) together define the quality of sound synthesis/recording
As with many things these two attributes should be in balance ... if you change one you should change the other ... so if you lower sample rate you are reducing the information load and so are lowering the audio fidelity ... once you have done this you can then lower the bit depth as well without further compromising fidelity

What do the values in AudioBuffer in the CoreAudio framework represent?

What do the values in the mData member represent? It looks like each value is a 4 byte integer...
I guess my question is, what does each sample supposed to represent and what does the mNumberChannels member represent?
If I had to apply some sort of transform on the sound pattern, can I treat these samples as discrete samples in time? If so, what time period does each 512 samples represent?
Thanks
Deshawn

The mData buffer array elements can represent 16-bit signed integers, stereo pairs of 16-bit signed integers, 32-bit 8.24/s7.24 scaled-integer or fixed-point values, or 32-bit floating-point values, etc., depending on the Audio Unit and how it was configured.
The buffer duration will be its length in frames divided by the audio sample rate, for instance 512/44100 is about 11.61 milliSeconds.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string