How does the GPU render to the correct window? - linux

I dont understand the rendering of an opengl window. So lets say i want my graphics card to draw a scene onto the screen. My opengl window is at location, lets say 50, 50, 100, 100 (x, y , width, height). The GPU knows to draw the scene at this location. This is called the viewport and the GPU gets this viewport information in form of a matrix.
What happens when i move my opengl window from location 50, 50, 100, 100 to location 150, 150, 100, 100. How does the GPU gets this new viewport information while still rendering the scene (maybe it is now in the fragmentshader stage). It will paint the wrong pixels at location 50, 50, 100, 100. Please explain.

My opengl window is at location, lets say 50, 50, 100, 100 (x, y , width, height). The GPU knows to draw the scene at this location.
Does it? You are making assumptions about the specific nature of what the internals of the rendering system, OS, and GPU are doing.
In many composition-based rendering systems, the rendering operation you generate when you perform OpenGL commands renders to a piece of memory, some image data. When you do a swap-buffers call, that image gets copied into the actual screen area that you see. So when you move the window around, all that happens is that the image you render to gets copied into a different location. While the copy is indeed performed by the GPU, the rendering operations you directly caused aren't aware of what location their results will ultimately end up in.
Even in rendering systems where you directly render to display memory, whose to say that what you can set as your "viewport" is not manipulated by the internal OpenGL driver? When you move the window from one location to another, the driver can simply offset your viewport settings, and you will be none the wiser.
Basically, there are many ways for it to happen, and which way gets selected depends entirely on how the OpenGL implementation (and attendant OS/rendering system) chooses to handle screen locations.

I don't know the entire internals of the graphics stack, but when you request a viewport, you are getting a pixel buffer from a display server like X server or wayland. The GPU does computations to fill a frame buffer with colors. The display server then throws the frame buffer to the window. This is oversimplifying, and I have some details wrong. On linux, there is a complex pipeline of events that occur from the exiting user space, getting into the kernel to finally seeing your frame buffer on the screen.

Related

Python script skyrockets size of pagefile.sys

I wrote a Python script that tends to crash sometimes with a Memory Allocation Error. I noticed that the pagefile.sys of my Win10 64 system skyrockets in this script and exceeds the free memory.
My current solution is to run the script in steps, so that every time the script runs through, the pagefile empties.
I would like the script to run through all at once, though.
Moving the pagefile to another drive is not an option, unfortunately, because I only have this one drive and moving the pagefile to an external drive does not seem to work.
During my research, I found out about the module gc but that is not working:
import gc
and after every iteration I use
gc.collect()
Am I using it wrong or is there another (python-based!) option?
[Edit:]
The script is very basic and only iterates over image files (using Pillow). The script only checks for width, height and resolution of the image, calculates the dimensions in cm.
If height > width, the image is rotated 90° counterclockwise.
The images are meant to be enlarged or shrunk to A3 size (42 x 29.7cm), so I use the width/height ratio to calculate whether I can enlarge the width to 42cm and the height remains < 29.7cm and in case the height is > 29.7cm, I enlarge the height to 29.7 cm.
For the moment, I do the actual enlargement/shrinking still in Photoshop. Based on whether it is a width/height enlargement, the file is moved to a certain folder that contains either one of those file types.
Anyways, the memory explosion happens in the iteration that only reads the file dimensions.
For that I use
with Image.open(imgOri) as pic:
widthPX = pic.size[0]
heightPX = pic.size[1]
resolution = pic.info["dpi"][0]
widthCM = float(widthPX) / resolution * 2.54
heightCM = float(heightPX) / resolution * 2.54
I also calculate whether the shrinking would be too strong, the image gets divided in half and re-evaluated.
Even though it is unnecessary, I still added pic.close
to the with open()statement, because I thought Python may be keeping the image files open, but that didn't help.
Once the iteration finished, the pagefile.sys goes back to its original size, so in case that error occurs, I take some files out and do them gradually.

How to update texture for every frame in vulkan?

As my question title says, I want update texture for every frame.
I got an idea :
create a VkImage as a texture buffer with bellow configurations :
initialLayout = VK_IMAGE_LAYOUT_PREINITIALIZED
usage= VK_IMAGE_USAGE_SAMPLED_BIT
and it's memory type is VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT
In draw loop :
first frame :
map texure data to VkImage(use vkMapMemory).
change VkImage layout from VK_IMAGE_LAYOUT_PREINITIALIZED to VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL.
use this VkImage as texture buffer.
second frame:
The layout will be VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL after the first frame , can I map next texure data to this VkImage directly without change it's layout ? if I can not do that, which layout can I change this VkImage to ?
In vkspec 11.4 it says :
The new layout used in a
transition must not be VK_IMAGE_LAYOUT_UNDEFINED or VK_IMAGE_LAYOUT_PREINITIALIZED
So , I can not change layout back to _PREINITIALIZED .
Any help will be appreciated.
For your case you do not need LAYOUT_PREINITIALIZED. That would only complicate your code (forcing you to provide separate code for the first frame).
LAYOUT_PREINITIALIZED is indeed a very special layout intended only for the start of the life of the Image. It is more useful for static textures.
Start with LAYOUT_UNDEFINED and use LAYOUT_GENERAL when you need to write the Image from CPU side.
I propose this scheme:
berfore render loop
Create your VkImage with UNDEFINED
1st to Nth frame (aka render loop)
Transition image to GENERAL
Synchronize (likely with VkFence)
Map the image, write it, unmap it (weell, the mapping and unmaping can perhaps be outside render loop)
Synchronize (potentially done implicitly)
Transition image to whatever layout you need next
Do your rendering and whatnot
start over at 1.
It is a naive implementation but should suffice for ordinary hobbyist uses.
Double buffered access can be implemented — that is e.g. VkBuffer for CPU access and VkImage of the same for GPU access. And VkCmdCopy* must be done for the data hand-off.
It is not that much more complicated than the above approach and there can be some performance benefits (if you need those at your stage of your project). You usually want your resources in device local memory, which often is not also host visible.
It would go something like:
berfore render loop
Create your VkBuffer b with UNDEFINED backed by HOST_VISIBLE memory and map it
Create your VkImage i with UNDEFINED backed by DEVICE_LOCAL memory
Prepare your synchronization primitives between i and b: E.g. two Semaphores, or Events could be used or Barriers if the transfer is in the same queue
1st to Nth frame (aka render loop)
Operations on b and i can be pretty detached (even can be on different queues) so:
For b:
Transition b to GENERAL
Synchronize to CPU (likely waiting on VkFence or vkQueueIdle)
invalidate(if non-coherent), write it, flush(if non-coherent)
Synchronize to GPU (done implicitly if 3. before queue submission)
Transition b to TRANSFER
Synchronize to make sure i is not in use (likely waiting on a VkSemaphore)
Transition i to TRANSFER
Do vkCmdCopy* from b to i
Synchronize to make known I am finished with i (likely signalling a VkSemaphore)
start over at 1.
(The fence at 2. and semaphore at 6. have to be pre-signalled or skipped for first frame to work)
For i:
Synchronize to make sure i is free to use (likely waiting on a VkSemaphore)
Transition i to whatever needed
Do your rendering
Synchronize to make known I am finished with i (likely signalling a VkSemaphore)
start over at 1.
You have a number of problems here.
First:
create a VkImage as a texture buffer
There's no such thing. The equivalent of an OpenGL buffer texture is a Vulkan buffer view. This does not use a VkImage of any sort. VkBufferViews do not have an image layout.
Second, assuming that you are working with a VkImage of some sort, you have recognized the layout problem. You cannot modify the memory behind the texture unless the texture is in the GENERAL layout (among other things). So you have to force a transition to that, wait until the transition command has actually completed execution, then do your modifications.
Third, Vulkan is asynchronous in its execution, and unlike OpenGL, it will not hide this from you. The image in question may still be accessed by the shader when you want to change it. So usually, you need to double buffer these things.
On frame 1, you set the data for image 1, then render with it. On frame 2, you set the data for image 2, then render with it. On frame 3, you overwrite the data for image 1 (using events to ensure that the GPU has actually finished frame 1).
Alternatively, you can use double-buffering without possible CPU waiting, by using staging buffers. That is, instead of writing to images directly, you write to host-visible memory. Then you use a vkCmdCopyBufferToImage command to copy that data into the image. This way, the CPU doesn't have to wait on events or fences to make sure that the image is in the GENERAL layout before sending data.
And BTW, Vulkan is not OpenGL. Mapping of memory is always persistent; there's no reason to unmap a piece of memory if you're going to map it every frame.

Use of PBO and VBO

My application (QT/OpenGL) needs to upload, at 25fps, a bunch of videos from IP camaras, and then process it applying:
for each videos, a demosaic filter, sharpening filter, LUT and
distortion docrretion.
Then i need to render in opengl (texture projection, etc..) picking one or more frames processed earlier
Then I need to show the result to some widgets (QGLWidget) and read the pixels to write into a movie file.
I try to understand the pros and cons of PBO and FBO, and i picture the following architecture which i want to validate with your help:
I create One thread per video, to capture in a buffer (array of images). There is one buffer for video.
I create a Upload-filter-render thread which aims to: a) upload the frames to GPU, b) apply the filter into GPU, c) apply the composition and render to a texture
I let the GUI thread to render in my widget the texture created in the previous step.
For the Upload-Frames-to-GPU process, i guess the best way is to use PBO (maybe two PBOS) for each video, to load asynchronously the frames.
For the Apply-Filter-info-GPU, i want to use FBO which seems the best to do render-to-texture. I will first bind the texture uploaded by the PBO, and then render to another texture, the filtered image. I am not sure to use only one FBO and change the binding texture input and binding texture target according the video upload, or use as many FBOS, as videos to upload.
Finally, in order to show the result into a widget, i use the final texture rendered by the FBO. For writing into a movie file, i use PBO to copy back asynchronously the pixels from GPU to CPU.
Does it seem correct?

Texture Buffer for OpenGL Video Player

I am using OpenGL, Ffmpeg and SDL to play videos and am currently optimizing the process of getting frames, decoding them, converting them from YUV to RGB, uploading them to texture and displaying the texture on a quad. Each of these stages is performed by a seperate thread and they written to shared buffers which are controlled by SDL mutexes and conditions (except for the upload and display of the textures as they need to be in the same context).
I have the player working fine with the decode, convert and OpenGL context on seperate threads but realised that because the video is 25 frames per second, I only get a converted frame from the buffer, upload it to OpenGL and bind it/display it every 40 milliseconds in the OpenGL thread. The render loop goes round about 6-10 times not showing the next frame for every frame it shows, due to this 40ms gap.
Therefore I decided it might be a good idea to have a buffer for the textures too and set up an array of textures created and initialised with glGenTextures() and the glParameters I needed etc.
When it hasn't been 40ms since the last frame refresh, a method is ran which grabs the next converted frame from the convert buffer and uploads it to the next free texture in the texture buffer by binding it then calling glTexSubImage2D(). When it has been 40ms since the last frame refresh, a seperate method is ran which grabs the next GLuint texture from the texture buffer and binds it with glBindTexture(). So effectively, I am just splitting up what was being done before (grab from convert buffer, upload, display) into seperate methods (grab from convert buffer, upload to texture buffer | and | grab from texture buffer, display) to make use of the wasted time between 40ms refreshes.
Does this sound reasonable? Because when ran, the video halts all the time in a sporadic manner, sometimes about 4 frames are played when they are supposed to (every 40ms) but then there is a 2 second gap, then 1 frame is shown, then a 3 second gap and the video is totally unwatchable.
The code is near identical to how I manage the convert thread grabbing decoded frames from the decode buffer, converting them from YUV to RGB and then putting them into the convert buffer so can't see where the massive bottlenecking could be.
Could the bottlenecking be on the OpenGL side of things? Is the fact that I am storing new image data to 10 different textures the problem as when a new texture is grabbed from the texture buffer, the raw data could be a million miles away from the last one in terms of memory location on the video memory? That's my only attempt at an answer, but I don't know much about how OpenGL works internally so that's why I am posting here.
Anybody have any ideas?
I'm no OpenGL expert, but my guess of the bottleneck is that the textures are intialized properly in system memory but are sent to the video memory at the 'wrong' time (like all at once instead of as soon as possible), stalling the pipeline. When using glTexSubImage2D you have no guarantees about when a texture arrives in video memory until you bind it.
Googling around it seems pixelbuffer objects give you more control about when they are in video memory: http://www.opengl.org/discussion_boards/ubbthreads.php?ubb=showflat&Number=262523

glDrawElements flicker and crash

I'm getting a fishy error when using glDrawElements(). I'm trying to render simple primitives (mainly rectangles) to speed up drawing of text and so forth, but when I call glDrawElements() the WHOLE screen blinks black (not just my window area) for one frame or so. The next frame it turns back to the same "Windows colors" as before. And so it flickers for a couple of seconds, ending up in a message box saying
The NVIDIA OpenGL Driver encountered an unrecoverable error
and must close this application.
Error 12
Is there any setting for GL which I need to reset before calling glDrawElements()? I know it's not some dangling glEnableClientState(), I checked it (I used to have one of those, but then glDrawElements() crashed instead).
Come to think of it, it almost looks like some back buffer error... Any ideas on what to try?
Obviously you are mixing VBO mode and VA mode. This is perfectly possible but must be use with care.
When you call:
glBindBuffer(GL_ARRAY_BUFFER, 0);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, 0);
This means that next time you render something with glDrawElements(..., ..., ..., x), it will use x as a pointer on the indices data, and the last call to glVertexPointer points on the vertices data.
If you don't unbind the current VBO and IBO (with the above two glBindBuffer calls), this means that when rendering with the same glDrawElements, x will be use as an offset on the indices data in the IBO, and the last call to glVertexPointer as an offset on the vertices data in the VBO.
Depending values of x and glVertexPointer you can make the driver crash because the offsets go out of bounds and/or the underlying data is of the wrong type (NaN).
So for answering your question, after drawing with VBO mode and then drawing with VA mode:
unbind the current VBO
unbind the current IBO
specify the right vertices address with glVertexPointer
specify the right indices address with glDrawElements
and then it will be fine.
Bah! Found it. When I did
glBindBuffer(GL_ARRAY_BUFFER, 0);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, 0);
before rendering the flickering+crashing stopped. Is this the expected behavior? Sorry for wasting time and space.

Resources