GPU Pixel and Texel write speed - graphics

Many of the embedded/mobile GPUs are providing access to performance registers called Pixel Write Speed and Texel Write speed. Could you explain how those terms can be interpreted and defined from the actual GPU hardware point of view?

I would assume the difference between pixel and texel is pretty clear for you. Anyway, just to make this answer a little bit more "universal":
A pixel is the fundamental unit of screen space.
A texel, or texture element (also texture pixel) is the fundamental unit of texture space.
Textures are represented by arrays of texels, just as pictures are
represented by arrays of pixels. When texturing a 3D surface (a
process known as texture mapping) the renderer maps texels to
appropriate pixels in the output picture.
BTW, it is more common to use fill rate instead of write speed and you can easily find all required information, since this terminology is quite old and widely-used.
Answering your question
All fill-rate numbers (whatever definition is used) are expressed in
Mpixels/sec or Mtexels/sec.
Well the original idea behind fill-rate was the number of finished
pixels written to the frame buffer. This fits with the definition of
Theoretical Peak fill-rate. So in the good old days it made sense to
express that number in Mpixels.
However with the second generation of 3D accelerators a new feature
was added. This feature allows you to render to an off screen surface
and to use that as a texture in the next frame. So the values written
to the buffer are not necessarily on screen pixels anymore, they might
be texels of a texture. This process allows several cool special
effects, imagine rendering a room, now you store this picture of a
room as a texture. Now you don't show this picture of the room but you
use the picture as a texture for a mirror or even a reflection map.
Another reason to use MTexels is that games are starting to use
several layers of multi-texture effects, this means that a on-screen
pixel is constructed from various sub-pixels that end up being blended
together to form the final pixel. So it makes more sense to express
the fill-rate in terms of these sub-results, and you could refer to
them as texels.
Read the whole article - Fill Rate Explained
Additional details can be found here - Texture Fill Rate
Update
Texture Fill Rate = (# of TMU - texture mapping unit) x (Core Clock)
The number of textured pixels the card can render to the
screen every second.
It is obvious that the card with more TMUs will be faster at processing texture information.

The performance registers/counters Pixel Write Speed and Texel Write speed maintain stats / count operations about pixel and texel processed/written. I will explain the peak (maximum possible) fill rates.
Pixel Rate
A picture element is a physical point in a raster image, smallest
element of display device screen.
Pixel rate is the maximum amount of pixels the GPU could possibly write to the local memory in one second, measured in millions of pixels per second. The actual pixel output rate also depends on quite a few other factors, most notably the memory bandwidth - the lower the memory bandwidth is, the lower the ability to get to the maximum fill rate.
The pixel rate is calculated by multiplying the number of ROPs (Raster Operations Pipelines - aka Render Output Units) by the the core clock speed.
Render Output Units : The pixel pipelines take pixel and texel information and process it, via specific matrix and vector operations, into a final pixel or depth value. The ROPs perform the transactions between the relevant buffers in the local memory.
Importance : Higher the pixel rate, higher is the screen resolution of the GPU.
Texel Rate
A texture element is the fundamental unit of texture space (a tile of
3D object surface).
Texel rate is the maximum number of texture map elements (texels) that can be processed per second. It is measured in millions of texels in one second
This is calculated by multiplying the total number of texture units by the core speed of the chip.
Texture Mapping Units : Textures need to be addressed and filtered. This job is done by TMUs that work in conjunction with pixel and vertex shader units. It is the TMU's job to apply texture operations to pixels.
Importance : Higher the texel rate, faster the game renders displays demanding games fluently.
Example : Not a nVidia fan but here are specs for GTX 680, (could not find much for embedded GPU)
Model Geforce GTX 680
Memory 2048 MB
Core Speed 1006 MHz
Shader Speed 1006 MHz
Memory Speed 1502 MHz (6008 MHz effective)
Unified Shaders 1536
Texture Mapping Units 128
Render Output Units 32
Bandwidth 192256 MB/sec
Texel Rate 128768 Mtexels/sec
Pixel Rate 32192 Mpixels/sec

Related

Dx11: how to handler a large number of Textures

I have to load about 600 different textures, and a texture is 4096 x 4096. These textures should be updated by frame which means I need to load at least half of them at the same time.My laptop's memory is 8G, so when I test the game, computer is very laggy and memory is almost used up.What can I do to handler the problem? Thank for all replies!
The function used to load texture is
CreateWICTextureFromFile
And file format is "png".
Any reply would be helpful, thanks!

Calculate pixel generated via rate and display size

Consider 2 raster systems with resolution 640X480 and 1280X1024. How many pixels could be accessed per second in each of these systems by display controller that refreshes screen at rates of 60 frames per second. What is access time per pixel in each system?
Please explain in details

framebuffer size

Does the framebuffer contain depth buffer information, or just color buffer information in graphics applications? What about for gui's on windows, is their a frame buffer and does it hold both color buffer + depth info or just color info?
If you're talking about the kernel-level framebuffer in Linux, it sets both the resolution and the color depth. Here's a list of common framebuffer modes; notice that the modes are determined by both the resolution and the color depth. You can override the framebuffer by passing a command line parameter to the kernel in your bootloader (vga=...).
Unlike Linux, on Windows the graphics subsystem is a part of the OS. I don't think (and, please, someone correct me if I'm wrong) there is support for non-VGA output devices in the latest Windows, so the framebuffer is deprecated/unavailable there.
As for real-time 3D, the "standard" buffers are the color buffer, in RGBA format, 1 byte per component, and the depth buffer, 3 bytes per component. There is one sample per fragment ( i.e., if you have 8x antialiasing, you will have 8 colors and 8 depth samples per pixel )
Now, many applications use additional, custom buffers. They are usually called g-buffers. These can include : object ID, material ID, albedo, shininess, normals, tangent, binormal, AO factor, most influent lights impacting the fragment, etc. At 1080p and 4xMSAA and double- or triple-buffering, this can take huge amounts of memory, so all these information are usually packed as tightly as possible.

how to obtain ambient noise level in j2me/Blackberry

I want to measure the noise level and want to increase or decrease volume of device automatically according to surrounding area noise level.
Is it possible? If it is possible then how to get ambient noise level?
Anybody can give me an idea to proceed in right direction?
The RMS value of a signal (root mean square) is fast to calculate and is a standard measure of the overall energy of a signal.
Tip:
Taking the "noise" signal as the signal from the microphone when the user is listening, not talking (ambient noise). I would not set the volume based solely on the RMS value of the ambient noise signal, but rather set the volume based on the ratio between the RMS of the incoming call, and the RMS of the ambient noise.
Volume = k * RMS(IncomingCallSignal) / RMS(AmbientNoiseSignal)

Deciding on length of FFT

I am working on a tool to compare two wave files for similarity in their waveforms. Ex, I have a wave file of duration 1min and i make another wave file using the first one but have made each 5sec data at an interval of 5sec to 0.
Now my software will tell that there is waveform difference at time interval 5sec to 10sec, 15sec to 20sec, 25sec to 30 sec and so on...
As of now, with initial development, this is working fine.
Following are 3 test sets:
I have two wave files with sampling rate of 960Hz, mono, with no of data samples as 138551 (arnd 1min 12sec files). I am using 128 point FFT (splitting file in 128 samples chunk) and results are good.
When I use the same algorithm on wave files of sampling rate 48KHz, 2-channel with no of data samples 6927361 for each channel (arnd 2min 24 sec file), the process becomes too slow. When I use 4096 point FFT, the process is better.
But, the 4096 point FFT on files of 22050Hz, 2-channels with number of data samples 55776 for each channel (arnd 0.6sec file) gives very poor results. In this case 128 point FFT gives good result.
So, I am confused on how to decide the length of FFT so that my results are good in each case.
I guess the length should depend on number of samples and sampling rate.
Please give your inputs on this.
Thanks
The length of the FFT, N, will determine the resolution in the frequency domain:
resolution (Hz) = sample_rate (Hz) / N
So for example in case (1) you have resolution = 960 / 128 = 7.5 Hz. SO each bin in the resulting FFT (or presumably the power spectrum derived from this) will be 7.5 Hz wide, and you will be able to differentiate between frequency components which are at least this far apart.
Since you don't say what kind of waveforms these are, or what the purpose of your application is, it's hard to know what kind of resolution you need.
One important further point - many people using FFT for the first time are unaware that in general you need to apply a window function prior to the FFT to avoid spectral leakage.
I have to say I have found your question very cryptic. I think you should look into Short-time Fourier transform. The reason I say this is because you are looking at quite a large amount of samples if you use a sampling frequency of 44.1KhZ over 2mins with 2 channels. One fft across the entire amount will take quite a while indeed, not to mention the estimate will be biased as the signals mean and variance will change drastically over the whole duration. To avoid this you want to frame the time-domain signal first, these frames can be as small as 20ms-40ms (commonly used for speech) and often overlapping (Welch method of Spectral Estimation). Then you apply a window function such as Hamming or Hanning window to reduce spectral leakage and calculate an N-Point fft for each frame. Where N is the next power of two above the number of samples in that frame.
For example:
Fs = 8Khz, single channel;
time = 120sec;
no_samples = time * Fs = 960000 ;
frame length T_length= 20ms;
frame length in samples N_length = 160;
frame overlap T_overlap= 10ms;
frame overlap in samples N_overlap= 80;
Num of frames N_frames = (no_samples - (N_length-N_overlap))/N_overlap = 11999;
FFT length = 256;
So you will be processing 11999 frames in total, but your FFT length will be small. You will only need an FFT length of 256 (next power of two above frame length 160). Most algorithms that implement the fft require the signal length and fft length to be the same. All you have to do is append zeros to your framed signal up until 256. So pad each frame with x amount of zeros, where x = FFT_length-N_length. My latest android app does this on recorded speech and uses the short-time FFT data to display the Spectrogram of speech and also performs various spectral modification and filtering, its called Speech Enhancement for Android

Resources