Dx11: how to handler a large number of Textures

Dx11: how to handler a large number of Textures - graphics

I have to load about 600 different textures, and a texture is 4096 x 4096. These textures should be updated by frame which means I need to load at least half of them at the same time.My laptop's memory is 8G, so when I test the game, computer is very laggy and memory is almost used up.What can I do to handler the problem? Thank for all replies！
The function used to load texture is
CreateWICTextureFromFile
And file format is "png".
Any reply would be helpful, thanks!

Related

Node.js: How do I extract an embedded thumbnail from a jpg without loading the full jpg first?

I'm creating a Raspberry Pi Zero W security camera and am attempting to integrate motion detection using Node.js. Images are being taken with Pi camera module at 8 Megapixels (3280x2464 pixels, roughly 5MB per image).
On a Pi Zero, resources are limited, so loading an entire image from file to Node.js may limit how fast I can capture then evaluate large photographs. Surprisingly, I capture about two 8MB images per second in a background time lapse process and hope to continue to capture the largest sized images roughly once per second at least. One resource that could help with this is extracting the embedded thumbnail from the large image (thumbnail size customizable in raspistill application).
Do you have thoughts on how I could quickly extract the thumbnail from a large image without loading the full image in Node.js? So far I've found a partial answer here. I'm guessing I would manage this through a buffer somehow?

Opencv stereo cameras capture and framerate limits

I am trying to get pairs of images out of a Minoru stereo webcam, currently through opencv on linux.
It works fine when I force a low resolution:
left = cv2.VideoCapture(0)
left.set(cv2.cv.CV_CAP_PROP_FRAME_WIDTH, 320)
left.set(cv2.cv.CV_CAP_PROP_FRAME_HEIGHT, 240)
right = cv2.VideoCapture(0)
right.set(cv2.cv.CV_CAP_PROP_FRAME_WIDTH, 320)
right.set(cv2.cv.CV_CAP_PROP_FRAME_HEIGHT, 240)
while True:
_, left_img = left.read()
_, right_img = right.read()
...
However, I'm using the images for creating depth maps, and a bigger resolution would be good. But if I try leaving the default, or forcing resolution to 640x480, I'm hitting errors:
libv4l2: error turning on stream: No space left on device
I have read about USB bandwith limitations but:
this happens on the first iteration (first read() from right)
I don't need anywhere near 60 or even 30 FPS, but couldn't manage to reduce "requested FPS" via VideoCapture parameters (if this makes sense)
adding sleeps don't seem to help, even between the left/right reads
strangely if I do much processing (in the while loop), I start noticing "lag": things happening in the real world get shown much later on the images read. This would suggest that actually there is a buffer somewhere that can and does accumulate several images (a lot)
I tried a workaround of creating and releasing a separate VideoCapture for each image read, but this is a bit too slow overall (< 1FPS), and more importantly, image are too much out of sync for working on stereo matching.
I'm trying to understand why this fails, in order to find solutions. It looks like v4l is allocating a single global too-small buffer, used by the 2 capture objects somehow.
Any help would be appreciated.

I had the same problem and found this answer - https://superuser.com/questions/431759/using-multiple-usb-webcams-in-linux
Since both the minoru cameras show the format as 'YUYV', this is likely a USB bandwidth issue. I lowered the frames per second to 20 (didn't work at 24) and I can see both the 640x480 images.

GPU Pixel and Texel write speed

Many of the embedded/mobile GPUs are providing access to performance registers called Pixel Write Speed and Texel Write speed. Could you explain how those terms can be interpreted and defined from the actual GPU hardware point of view?

I would assume the difference between pixel and texel is pretty clear for you. Anyway, just to make this answer a little bit more "universal":
A pixel is the fundamental unit of screen space.
A texel, or texture element (also texture pixel) is the fundamental unit of texture space.
Textures are represented by arrays of texels, just as pictures are
represented by arrays of pixels. When texturing a 3D surface (a
process known as texture mapping) the renderer maps texels to
appropriate pixels in the output picture.
BTW, it is more common to use fill rate instead of write speed and you can easily find all required information, since this terminology is quite old and widely-used.
Answering your question
All fill-rate numbers (whatever definition is used) are expressed in
Mpixels/sec or Mtexels/sec.
Well the original idea behind fill-rate was the number of finished
pixels written to the frame buffer. This fits with the definition of
Theoretical Peak fill-rate. So in the good old days it made sense to
express that number in Mpixels.
However with the second generation of 3D accelerators a new feature
was added. This feature allows you to render to an off screen surface
and to use that as a texture in the next frame. So the values written
to the buffer are not necessarily on screen pixels anymore, they might
be texels of a texture. This process allows several cool special
effects, imagine rendering a room, now you store this picture of a
room as a texture. Now you don't show this picture of the room but you
use the picture as a texture for a mirror or even a reflection map.
Another reason to use MTexels is that games are starting to use
several layers of multi-texture effects, this means that a on-screen
pixel is constructed from various sub-pixels that end up being blended
together to form the final pixel. So it makes more sense to express
the fill-rate in terms of these sub-results, and you could refer to
them as texels.
Read the whole article - Fill Rate Explained
Additional details can be found here - Texture Fill Rate
Update
Texture Fill Rate = (# of TMU - texture mapping unit) x (Core Clock)
The number of textured pixels the card can render to the
screen every second.
It is obvious that the card with more TMUs will be faster at processing texture information.

The performance registers/counters Pixel Write Speed and Texel Write speed maintain stats / count operations about pixel and texel processed/written. I will explain the peak (maximum possible) fill rates.
Pixel Rate
A picture element is a physical point in a raster image, smallest
element of display device screen.
Pixel rate is the maximum amount of pixels the GPU could possibly write to the local memory in one second, measured in millions of pixels per second. The actual pixel output rate also depends on quite a few other factors, most notably the memory bandwidth - the lower the memory bandwidth is, the lower the ability to get to the maximum fill rate.
The pixel rate is calculated by multiplying the number of ROPs (Raster Operations Pipelines - aka Render Output Units) by the the core clock speed.
Render Output Units : The pixel pipelines take pixel and texel information and process it, via specific matrix and vector operations, into a final pixel or depth value. The ROPs perform the transactions between the relevant buffers in the local memory.
Importance : Higher the pixel rate, higher is the screen resolution of the GPU.
Texel Rate
A texture element is the fundamental unit of texture space (a tile of
3D object surface).
Texel rate is the maximum number of texture map elements (texels) that can be processed per second. It is measured in millions of texels in one second
This is calculated by multiplying the total number of texture units by the core speed of the chip.
Texture Mapping Units : Textures need to be addressed and filtered. This job is done by TMUs that work in conjunction with pixel and vertex shader units. It is the TMU's job to apply texture operations to pixels.
Importance : Higher the texel rate, faster the game renders displays demanding games fluently.
Example : Not a nVidia fan but here are specs for GTX 680, (could not find much for embedded GPU)
Model Geforce GTX 680
Memory 2048 MB
Core Speed 1006 MHz
Shader Speed 1006 MHz
Memory Speed 1502 MHz (6008 MHz effective)
Unified Shaders 1536
Texture Mapping Units 128
Render Output Units 32
Bandwidth 192256 MB/sec
Texel Rate 128768 Mtexels/sec
Pixel Rate 32192 Mpixels/sec

framebuffer size

Does the framebuffer contain depth buffer information, or just color buffer information in graphics applications? What about for gui's on windows, is their a frame buffer and does it hold both color buffer + depth info or just color info?

If you're talking about the kernel-level framebuffer in Linux, it sets both the resolution and the color depth. Here's a list of common framebuffer modes; notice that the modes are determined by both the resolution and the color depth. You can override the framebuffer by passing a command line parameter to the kernel in your bootloader (vga=...).
Unlike Linux, on Windows the graphics subsystem is a part of the OS. I don't think (and, please, someone correct me if I'm wrong) there is support for non-VGA output devices in the latest Windows, so the framebuffer is deprecated/unavailable there.

As for real-time 3D, the "standard" buffers are the color buffer, in RGBA format, 1 byte per component, and the depth buffer, 3 bytes per component. There is one sample per fragment ( i.e., if you have 8x antialiasing, you will have 8 colors and 8 depth samples per pixel )
Now, many applications use additional, custom buffers. They are usually called g-buffers. These can include : object ID, material ID, albedo, shininess, normals, tangent, binormal, AO factor, most influent lights impacting the fragment, etc. At 1080p and 4xMSAA and double- or triple-buffering, this can take huge amounts of memory, so all these information are usually packed as tightly as possible.

Texture Buffer for OpenGL Video Player

I am using OpenGL, Ffmpeg and SDL to play videos and am currently optimizing the process of getting frames, decoding them, converting them from YUV to RGB, uploading them to texture and displaying the texture on a quad. Each of these stages is performed by a seperate thread and they written to shared buffers which are controlled by SDL mutexes and conditions (except for the upload and display of the textures as they need to be in the same context).
I have the player working fine with the decode, convert and OpenGL context on seperate threads but realised that because the video is 25 frames per second, I only get a converted frame from the buffer, upload it to OpenGL and bind it/display it every 40 milliseconds in the OpenGL thread. The render loop goes round about 6-10 times not showing the next frame for every frame it shows, due to this 40ms gap.
Therefore I decided it might be a good idea to have a buffer for the textures too and set up an array of textures created and initialised with glGenTextures() and the glParameters I needed etc.
When it hasn't been 40ms since the last frame refresh, a method is ran which grabs the next converted frame from the convert buffer and uploads it to the next free texture in the texture buffer by binding it then calling glTexSubImage2D(). When it has been 40ms since the last frame refresh, a seperate method is ran which grabs the next GLuint texture from the texture buffer and binds it with glBindTexture(). So effectively, I am just splitting up what was being done before (grab from convert buffer, upload, display) into seperate methods (grab from convert buffer, upload to texture buffer | and | grab from texture buffer, display) to make use of the wasted time between 40ms refreshes.
Does this sound reasonable? Because when ran, the video halts all the time in a sporadic manner, sometimes about 4 frames are played when they are supposed to (every 40ms) but then there is a 2 second gap, then 1 frame is shown, then a 3 second gap and the video is totally unwatchable.
The code is near identical to how I manage the convert thread grabbing decoded frames from the decode buffer, converting them from YUV to RGB and then putting them into the convert buffer so can't see where the massive bottlenecking could be.
Could the bottlenecking be on the OpenGL side of things? Is the fact that I am storing new image data to 10 different textures the problem as when a new texture is grabbed from the texture buffer, the raw data could be a million miles away from the last one in terms of memory location on the video memory? That's my only attempt at an answer, but I don't know much about how OpenGL works internally so that's why I am posting here.
Anybody have any ideas?

I'm no OpenGL expert, but my guess of the bottleneck is that the textures are intialized properly in system memory but are sent to the video memory at the 'wrong' time (like all at once instead of as soon as possible), stalling the pipeline. When using glTexSubImage2D you have no guarantees about when a texture arrives in video memory until you bind it.
Googling around it seems pixelbuffer objects give you more control about when they are in video memory: http://www.opengl.org/discussion_boards/ubbthreads.php?ubb=showflat&Number=262523

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string