Is bfloat16 ever used for graphics? - graphics

Bfloat16 is a half precision floating point format that has the same 8-bit exponent as single precision, but only 7 (plus 1 implied) bits of significand. Surprisingly, this turns out to be adequate precision for many machine learning applications, so a lot of resources are being put into making arithmetic in this format run fast.
Given that, it would seem to make sense to also try to use it for graphics. Using it for RGB components during calculation, for example, would allow a much wider dynamic range of light sources to be rendered, compared to just trying to calculate with 8-bit integers. At the same time, it could potentially be faster than using single precision floating point for RGB components.
Are any existing graphics rendering systems actually using it for such purposes?

Related

svg viewbox resolution limits

I was wondering if there are any hard limits to the viewbox of a svg element. I see weird clipping when I reach very low values ( say when vb width is around .002 ) or very large ones firefox starts to play funny around 200000000 width.
Is there a rule, a spec somewhere where I can find the current limits ?
Fiddle here:
var dim = 0.00002;
http://jsfiddle.net/7v36sLj8/13/
You'll see funny things starting to happen from that dim onwards, you can decrease by a factor 10 or increase by a factor 10 as fitting.
Thanks for the answers, I'll just take the min/maxes for the lowest common denominator which seems to be ffox for now. ( thanks for answers, also Rob's answer explains why ffox has a much lower threshold on linux / osx).
Firefox originally used a graphics library called cairo to do cross platform graphics rendering. Cairo only allows units to have 32 bits of fixed point binary precision so Firefox chose 24 bits before the binary point and 8 bits of binary fraction. So the maximum co-ordinates are then 2^24 and the smallest deltas 1/256.
Firefox has been replacing cairo with more direct platform rendering e.g. Direct2D on Windows which is used in preference to cairo now if you have a hardware acceleration capable graphics chip and have hardware acceleration enabled. The platform libraries don't have the cairo range limitation but do seem to have their own bugs with large co-ordinates.
The spec requires that browsers support single-precision floating point numbers. Browsers are encouraged to use double-precision numbers for some calculations, mostly matrix transformations, where small decimals are often important, but the general rule is standard C++ "float" data type.
From http://www.w3.org/TR/SVG11/types.html#Precision:
4.3 Real number precision
Unless stated otherwise for a particular attribute or property, a has the capacity for at least a single-precision floating point number and has a range (at a minimum) of -3.4e+38F to +3.4e+38F.
It is recommended that higher precision floating point storage and computation be performed on operations such as coordinate system transformations to provide the best possible precision and to prevent round-off errors.
Conforming High-Quality SVG Viewers are required to use at least double-precision floating point for intermediate calculations on certain numerical operations.
How does that relate to your issue?
A value of 0.002 shouldn't be a problem at all. Numbers like 200000000 would only be a problem if you then needed fine decimals. If your viewbox was "200000000 200000000 0.002 0.002" -- in other words, a very small range of very large numbers -- then floating point precision would likely be the problem. However, if there's a problem with low-precision large numbers or with decimals that can be exactly encoded by a float, then the browser isn't living up to the specs.
It could be that the browser is trying to smooth shapes, but is rounding to the nearest user unit instead of rounding to the nearest display pixel. Can you put together a simple example that demonstrates the specific problems you're seeing?

Working with 24-bit audio samples

What is the "standard way" of working with 24-bit audio? Well, there are no 24-bit data types available, really. Here are the methods that come into my mind:
Represent 24-bit audio samples as 32-bit ints and ignore the upper eight bits.
Just like (1) but ignore the lower eight bits.
Represent 24-bit audio samples as 32-bit floats.
Represent the samples as structs of 3 bytes (acceptable for C/C++, but bad for Java).
How do you work this out?
Store them them as 32- or 64-bit signed ints or float or double unless you are space conscious and care about packing them into the smallest space possible.
Audio samples often appear as 24-bits to and from audio hardware since this is commonly the resolution of the DACs and ADCs - although on most computer hardware, don't be surprised to find the bottom 3 of 4 bits banging away randomly with noise.
Digital signal processing operations - which is what usually happens downstream from the acquisition of samples - all involve addition of weighted sums of samples. A sample stored in an integer type can be considered to be fixed-point binary with an implied binary point at some arbitrary point - the position of which you can chose strategically to maintain as many bits of precision as possible.
For instance, the sum of two 24-bit integer yields a result of 25 bits. After 8 such additions, the 32-bit type would overflow and you would need to re-normalize by rounding and shifting right.
Therefore, if you're using integer types to store your samples, use the largest you can and start with the samples in the least significant 24 bits.
Floating point types of course take care of this detail for you, although you get less choice about when renormalisation takes place. They are the usual choice for audio processing where hardware support is available. A single precision float has a 24-bit mantissa, so can hold a 24-bit sample without loss of precision.
Usually floating point samples are stored in the range -1.0f < x < 1.0f.

Number of polygons in a 3D object and the rendering workload?

Is there any relation (preferably an equation) between the number of polygons in a 3D object and the rendering workload? I want to see how much the rendering workload would be increased if for instance the number of polygons doubles.
There is no clear connection between the arbitrary number of polygons and the mythical "workload".
See the following samples:
You render a cube with 6 faces composed of 12 triangles. You get, say, 1000fps (without vsync). When you tesselate the cube into 120 triangles, most likely the fps counter remains 1000.
You render a single fullscreen-sized quad with a heavy fragment shader with a lot of calculation. You get 0.5fps (or more, but I hope you get the point).
Another extreme. You are rendering a thousand of similar cubes, each with different texture. The rendering state change will take most of the time, not the actual rendering.
So, polygons may have different screen area and they may be rendered not within a single primitive. If you're talking about one big vertex array with a large number of polygons, then for some certain scenarios the performance change must be something like linear. "Something" because the videocard and the drivers are clipping the invisible polys and perfrom the early-out tests for each pixel being rendered.
Could you define 'workload'? – Erno yesterday
Well, I mean working
calculations. I want to see how much overhead (for GPU, CPU,
memory,...) would be increased. Actually I want to conclude the energy
usage of the device – user1196937 2 hours ago
If that is the actual question, a comparison of energy usage:
You will have to pick specific configurations and test those. Energy usage is very different from GPU to GPU and machine to machine.
Some GPU manufactures give very detailed information on the performance of their processors but when you want to compare those you will need an actual machine.

Antialiasing and gamma compensation

The luminence of pixels on a computer screen is not usually linearly related to the digital RGB triplet values of a pixel. The nonlinear response of early CRTs required a compensating nonlinear encoding and we continue to use such encodings today.
Usually we produce images on a computer screen and consume them there as well, so it all works fine. But when we antialias, the nonlinearity — called gamma — means that we can't just add an alpha value of 0.5 to a 50% covered pixel and expect it to look right. An alpha value of 0.5 is only 0.5^2.2=22% as bright as an alpha of 1.0 with a typical gamma of 2.2.
Is there any widely established best practice for antialiasing gamma compensation? Do you have a pet method you use from day to day? Has anyone seen any studies of the results and human perceptions of the quality of the graphic output with different techniques?
I've thought of doing standard X^(1/2.2) compensation but that is pretty computationally intense. Maybe I can make it faster with a 256 entry lookup table, though.
Lookup tables are used quite often for work like that. They're small and fast.
But whether look-up or some formula, if the end result is an image file, and the format permits, it's best to save a color profile or at least the gamma value in the file for later viewing, rather than try adjusting RGB values yourself.
The reason: for typical byte-valued R, G, B channels, you have 256 unique values in each channel at each pixel. That's almost good enough to look good to the human eye (I wish "byte" had been defined as nine bits!) Any kind of math, aside from trivial value inversion, would map many-to-one for some of those values. The output won't have 256 values to pick from for each pixel for R, G, or B, but far fewer. That can lead to contouring, jaggies, color noise and other badness.
Precision issues aside, if any kind of decent quality is desired, all composting, mixing, blending, color correction, fake lens flare addition, chroma-keying and whatever, should be done in linear RGB space, where the values of R, G and B are in proportion to physical light intensity. The image math mimics physical light math. But where ultimate speed is vital, there are ways to cheat.
Jim Blinns - "Dirty Pixels" book outlines a fast and good compositing calculation by using 16 bit math plus lookup tables to accurately go back and forward to linear color space. This guy worked on NASAs visualisations, he knows his stuff.
I'm trying to answer, though mainly for reference now, to the actual questions:
First, there are the recommendations from ITU (http://www.itu.int/rec/T-REC-H.272-200701-I/en) which can be applied to programming (but you have to know your stuff).
In Jim Blinn's "Notation, Notation, Notation", Chapter 9, has a very detailed mathematical and perceptual error analysis, although he only covers compositing (many other graphics tasks are affected too).
The notation he establishes can also be used to derive a way of dealing with gamma, or to check if a given way of doing so is actually correct. Very handy, my pet method (mainly as I discovered it independently but later found his book).
When generating images, one typically works in a linear color space (like linear RGB or one of the CIE color spaces) and then converts to a non-linear RGB space at the end. That conversion can be accelerated in hardware or via lookup tables or even through tricky math. (See the other answers' references.)
When performing an alpha blend (e.g., render this icon onto this background), this kind of precision is often elided in favor of speed. The results are computed directly in the non-linear RGB-space by lerping with the alpha as the parameter. This is not "correct", but it's good enough in most cases. Especially for things like icons on desktops.
If you're trying to do more correct blending, you treat it like an original render. Work in linear space (which may require an initial conversion) and then convert to your non-linear display space at the end.
A lot of graphics nowadays use sRGB as the non-linear display color space. If I recall correctly, sRGB is very similar to a gamma of 2.2, but there are adjustments made to values at the low end.

8 bit audio samples to 16 bit

This is my "weekend" hobby problem.
I have some well-loved single-cycle waveforms from the ROMs of a classic synthesizer.
These are 8-bit samples (256 possible values).
Because they are only 8 bits, the noise floor is pretty high. This is due to quantization error. Quantization error is pretty weird. It messes up all frequencies a bit.
I'd like to take these cycles and make "clean" 16-bit versions of them. (Yes, I know people love the dirty versions, so I'll let the user interpolate between dirty and clean to whatever degree they like.)
It sounds impossible, right, because I've lost the low 8 bits forever, right? But this has been in the back of my head for a while, and I'm pretty sure I can do it.
Remember that these are single-cycle waveforms that just get repeated over and over for playback, so this is a special case. (Of course, the synth does all kinds of things to make the sound interesting, including envelopes, modulations, filters cross-fading, etc.)
For each individual byte sample, what I really know is that it's one of 256 values in the 16-bit version. (Imagine the reverse process, where the 16-bit value is truncated or rounded to 8 bits.)
My evaluation function is trying to get the minimum noise floor. I should be able to judge that with one or more FFTs.
Exhaustive testing would probably take forever, so I could take a lower-resolution first pass. Or do I just randomly push randomly chosen values around (within the known values that would keep the same 8-bit version) and do the evaluation and keep the cleaner version? Or is there something faster I can do? Am I in danger of falling into local minimums when there might be some better minimums elsewhere in the search space? I've had that happen in other similar situations.
Are there any initial guesses I can make, maybe by looking at neighboring values?
Edit: Several people have pointed out that the problem is easier if I remove the requirement that the new waveform would sample to the original. That's true. In fact, if I'm just looking for cleaner sounds, the solution is trivial.
You could put your existing 8-bit sample into the high-order byte of your new 16-bit sample, and then use the low order byte to linear interpolate some new 16 bit datapoints between each original 8-bit sample.
This would essentially connect a 16 bit straight line between each of your original 8-bit samples, using several new samples. It would sound much quieter than what you have now, which is a sudden, 8-bit jump between the two original samples.
You could also try apply some low-pass filtering.
Going with the approach in your question, I would suggest looking into hill-climbing algorithms and the like.
http://en.wikipedia.org/wiki/Hill_climbing
has more information on it and the sidebox has links to other algorithms which may be more suitable.
AI is like alchemy - we never reached the final goal, but lots of good stuff came out along the way.
Well, I would expect some FIR filtering (IIR if you really need processing cycles, but FIR can give better results without instability) to clean up the noise. You would have to play with it to get the effect you want but the basic problem is smoothing out the sharp edges in the audio created by sampling it at 8 bit resolutions. I would give a wide birth to the center frequency of the audio and do a low pass filter, and then listen to make sure I didn't make it sound "flat" with the filter I picked.
It's tough though, there is only so much you can do, the lower 8 bits is lost, the best you can do is approximate it.
It's almost impossible to get rid of noise that looks like your signal. If you start tweeking stuff in your frequency band it will take out the signal of interest.
For upsampling, since you're already using an FFT, you can add zeros to the end of the frequency domain signal and do an inverse FFT. This completely preserves the frequecy and phase information of the original signal, although it spreads the same energy over more samples. If you shift it 8bits to be a 16bit samples first, this won't be a too much of a problem. But I usually kick it up by an integer gain factor before doing the transform.
Pete
Edit:
The comments are getting a little long so I'll move some to the answer.
The peaks in the FFT output are harmonic spikes caused by the quantitization. I tend to think of them differently than the noise floor. You can dither as someone mentioned and eliminate the amplitude of the harmonic spikes and flatten out the noise floor, but you loose over all signal to noise on the flat part of your noise floor. As far as the FFT is concerned. When you interpolate using that method, it retains the same energy and spreads over more samples, this reduces the amplitude. So before doing the inverse, give your signal more energy by multipling by a gain factor.
Are the signals simple/complex sinusoids, or do they have hard edges? i.e. Triangle, square waves, etc. I'm assuming they have continuity from cycle to cycle, is that valid? If so you can also increase your FFT resolution to more precisely pinpoint frequencies by increasing the number of waveform cycles fed to your FFT. If you can precisely identify the frequencies use, assuming they are somewhat discrete, you may be able to completely recreate the intended signal.
The 16-bit to 8-bit via truncation requirement will produce results that do not match the original source. (Thus making finding an optimal answer more difficult.) Typically you would produce a fixed point waveform by attempting to "get the closest match" that means rounding to the nearest number (trunking is a floor operation). That is most likely how they were originally generated. Adding 0.5 (in this case 0.5 is 128) and then trunking the output would allow you to generate more accurate results. If that's not a worry then ok, but it definitely will have a negative effect on accuracy.
UPDATED:
Why? Because the goal of sampling a signal is to be able to as close a possible reproduce the signal. If conversion threshold is set poorly on the sampling all you're error is to one side of signal and not well distributed and centered about zero. On such systems you typically try to maximize the use the availiable dynamic range, particularly if you have low resolution such as an 8-bit ADC.
Band limited versions? If they are filtered at different frequencies, I'd suspect it was to allow you to play the same sound with out distortions when you went too far out from the other variation. Kinda like mipmapping in graphics.
I suspect the two are the same signal with different aliasing filters applied, this may be useful in reproducing the original. They should be the same base signal with different convolutions applied.
There might be a simple approach taking advantange of the periodicity of the waveforms. How about if you:
Make a 16-bit waveform where the high bytes are the waveform and the low bytes are zero - call it x[n].
Calculate the discrete Fourier transform of x[n] = X[w].
Make a signal Y[w] = (dBMag(X[w]) > Threshold) ? X[w] : 0, where dBMag(k) = 10*log10(real(k)^2 + imag(k)^2), and Threshold is maybe 40 dB, based on 8 bits being roughly 48 dB dynamic range, and allowing ~1.5 bits of noise.
Inverse transform Y[w] to get y[n], your new 16 bit waveform.
If y[n] doesn't sound nice, dither it with some very low level noise.
Notes:
A. This technique only works in the original waveforms are exactly periodic!
B. Step 5 might be replaced with setting the "0" values to random noise in Y[w] in step 3, you'd have to experiment a bit to see what works better.
This seems easier (to me at least) than an optimization approach. But truncated y[n] will probably not be equal to your original waveforms. I'm not sure how important that constraint is. I feel like this approach will generate waveforms that sound good.

Resources