glDrawElements flicker and crash - visual-c++

I'm getting a fishy error when using glDrawElements(). I'm trying to render simple primitives (mainly rectangles) to speed up drawing of text and so forth, but when I call glDrawElements() the WHOLE screen blinks black (not just my window area) for one frame or so. The next frame it turns back to the same "Windows colors" as before. And so it flickers for a couple of seconds, ending up in a message box saying
The NVIDIA OpenGL Driver encountered an unrecoverable error
and must close this application.
Error 12
Is there any setting for GL which I need to reset before calling glDrawElements()? I know it's not some dangling glEnableClientState(), I checked it (I used to have one of those, but then glDrawElements() crashed instead).
Come to think of it, it almost looks like some back buffer error... Any ideas on what to try?

Obviously you are mixing VBO mode and VA mode. This is perfectly possible but must be use with care.
When you call:
glBindBuffer(GL_ARRAY_BUFFER, 0);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, 0);
This means that next time you render something with glDrawElements(..., ..., ..., x), it will use x as a pointer on the indices data, and the last call to glVertexPointer points on the vertices data.
If you don't unbind the current VBO and IBO (with the above two glBindBuffer calls), this means that when rendering with the same glDrawElements, x will be use as an offset on the indices data in the IBO, and the last call to glVertexPointer as an offset on the vertices data in the VBO.
Depending values of x and glVertexPointer you can make the driver crash because the offsets go out of bounds and/or the underlying data is of the wrong type (NaN).
So for answering your question, after drawing with VBO mode and then drawing with VA mode:
unbind the current VBO
unbind the current IBO
specify the right vertices address with glVertexPointer
specify the right indices address with glDrawElements
and then it will be fine.

Bah! Found it. When I did
glBindBuffer(GL_ARRAY_BUFFER, 0);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, 0);
before rendering the flickering+crashing stopped. Is this the expected behavior? Sorry for wasting time and space.

Related

Direct2D/DirectWrite Does IDWriteFactory::CreateTextLayout need to release every time when update the text in CreateTextLayout?

I am doing Direct2D show some text (like fps, resolution etc) on Direct3D surface. The weird thing is that in my Window Class there is a method called CalculateFrameStats() where every loop, use this to calcualte the FPS etc information and use Direct2D IDWriteFactory::CreateTextLayout to create a new Textlayout with latest updated FPS text strings. And do BeginDraw(), DrawTextLayout(), EndDraw() in the 3DFrameDraw() function. And after that I don't release the TextLayout pointer. And next round goes to CalculateFrameStats(), it CreateTextLayout again with newly update FPS etc strings. And in 3DFrameDraw() function, I drawTextlayout again. And it loops like this over and over. But when I run the program, it seems no memory leaks at all, the memory usage keeps low and constant.
But when put IDWriteFactory::CreateTextLayout in 3DFrameDraw() function, which means every 3D frame draw in the beginning I create a new TextLayout with updated FPS string and do some 3D manipulations and before D3D-present, I do BeginDraw(), DrawTextLayout(), EndDraw(). This is the same area in previous 3DFrameDraw(). But this time, the memory leaks, and I can see the memory keep growing when time elapse. But if I add Textlayout pointer->release() after BeginDraw(), DrawTextLayout(), EndDraw(), the memory leaks gone.
I don't really know why the first scenario Textlayout pointer never got release until close the program, the memory never leaks. Does TextLayout need to be released every time/frame when update its text string?

How to update texture for every frame in vulkan?

As my question title says, I want update texture for every frame.
I got an idea :
create a VkImage as a texture buffer with bellow configurations :
initialLayout = VK_IMAGE_LAYOUT_PREINITIALIZED
usage= VK_IMAGE_USAGE_SAMPLED_BIT
and it's memory type is VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT
In draw loop :
first frame :
map texure data to VkImage(use vkMapMemory).
change VkImage layout from VK_IMAGE_LAYOUT_PREINITIALIZED to VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL.
use this VkImage as texture buffer.
second frame:
The layout will be VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL after the first frame , can I map next texure data to this VkImage directly without change it's layout ? if I can not do that, which layout can I change this VkImage to ?
In vkspec 11.4 it says :
The new layout used in a
transition must not be VK_IMAGE_LAYOUT_UNDEFINED or VK_IMAGE_LAYOUT_PREINITIALIZED
So , I can not change layout back to _PREINITIALIZED .
Any help will be appreciated.
For your case you do not need LAYOUT_PREINITIALIZED. That would only complicate your code (forcing you to provide separate code for the first frame).
LAYOUT_PREINITIALIZED is indeed a very special layout intended only for the start of the life of the Image. It is more useful for static textures.
Start with LAYOUT_UNDEFINED and use LAYOUT_GENERAL when you need to write the Image from CPU side.
I propose this scheme:
berfore render loop
Create your VkImage with UNDEFINED
1st to Nth frame (aka render loop)
Transition image to GENERAL
Synchronize (likely with VkFence)
Map the image, write it, unmap it (weell, the mapping and unmaping can perhaps be outside render loop)
Synchronize (potentially done implicitly)
Transition image to whatever layout you need next
Do your rendering and whatnot
start over at 1.
It is a naive implementation but should suffice for ordinary hobbyist uses.
Double buffered access can be implemented — that is e.g. VkBuffer for CPU access and VkImage of the same for GPU access. And VkCmdCopy* must be done for the data hand-off.
It is not that much more complicated than the above approach and there can be some performance benefits (if you need those at your stage of your project). You usually want your resources in device local memory, which often is not also host visible.
It would go something like:
berfore render loop
Create your VkBuffer b with UNDEFINED backed by HOST_VISIBLE memory and map it
Create your VkImage i with UNDEFINED backed by DEVICE_LOCAL memory
Prepare your synchronization primitives between i and b: E.g. two Semaphores, or Events could be used or Barriers if the transfer is in the same queue
1st to Nth frame (aka render loop)
Operations on b and i can be pretty detached (even can be on different queues) so:
For b:
Transition b to GENERAL
Synchronize to CPU (likely waiting on VkFence or vkQueueIdle)
invalidate(if non-coherent), write it, flush(if non-coherent)
Synchronize to GPU (done implicitly if 3. before queue submission)
Transition b to TRANSFER
Synchronize to make sure i is not in use (likely waiting on a VkSemaphore)
Transition i to TRANSFER
Do vkCmdCopy* from b to i
Synchronize to make known I am finished with i (likely signalling a VkSemaphore)
start over at 1.
(The fence at 2. and semaphore at 6. have to be pre-signalled or skipped for first frame to work)
For i:
Synchronize to make sure i is free to use (likely waiting on a VkSemaphore)
Transition i to whatever needed
Do your rendering
Synchronize to make known I am finished with i (likely signalling a VkSemaphore)
start over at 1.
You have a number of problems here.
First:
create a VkImage as a texture buffer
There's no such thing. The equivalent of an OpenGL buffer texture is a Vulkan buffer view. This does not use a VkImage of any sort. VkBufferViews do not have an image layout.
Second, assuming that you are working with a VkImage of some sort, you have recognized the layout problem. You cannot modify the memory behind the texture unless the texture is in the GENERAL layout (among other things). So you have to force a transition to that, wait until the transition command has actually completed execution, then do your modifications.
Third, Vulkan is asynchronous in its execution, and unlike OpenGL, it will not hide this from you. The image in question may still be accessed by the shader when you want to change it. So usually, you need to double buffer these things.
On frame 1, you set the data for image 1, then render with it. On frame 2, you set the data for image 2, then render with it. On frame 3, you overwrite the data for image 1 (using events to ensure that the GPU has actually finished frame 1).
Alternatively, you can use double-buffering without possible CPU waiting, by using staging buffers. That is, instead of writing to images directly, you write to host-visible memory. Then you use a vkCmdCopyBufferToImage command to copy that data into the image. This way, the CPU doesn't have to wait on events or fences to make sure that the image is in the GENERAL layout before sending data.
And BTW, Vulkan is not OpenGL. Mapping of memory is always persistent; there's no reason to unmap a piece of memory if you're going to map it every frame.

VTK efficiency of vtkRenderWindow::GetZbufferData and vtkWindowToImageFilter::Update

I have a snippet that converts vtk (off screen) rendering to 1)Point cloud; 2)Color image. The implementation is correct, it just the speed/efficiency is an issue.
At the beginning of every iteration, I update my rendering by calling:
renderWin->Render ();
For point cloud, I get the depth using following line and then convert it to point cloud (code not posted).
float *depth = new float[width * height];
renderWin->GetZbufferData (0, 0, width - 1, height - 1, &(depth[0]));
For color image, I use vtkWindowToImageFilter to get current color rendered image:
windowToImageFilter->Modified(); // Must have this to get updated rendered image
windowToImageFilter->Update(); // this line takes a lot of time
render_img_vtk = windowToImageFilter->GetOutput();
Above program is run in the same thread sequentially. The renderWindow size is about 1000x1000. There is not a lot of polydata needs to be rendered. VTK was compiled with OpenGL2 support.
Issue:
This code only runs about 15-20Hz, when I disabled/comment the windowToImageFilter part (vtkWindowToImageFilter::Update() takes a lot of time), the framerate goes to about 30Hz.
When I disabled/comment vtkRenderWindow::GetZbufferData, it goes up to 50Hz (which is how fast I call my loop and update the rendering).
I had a quick look of the VTK source file of these two function, I see it copy data using GL command. I am not sure how can I speed this up.
Update:
After some search, I found that the glReadPixels function called in the GetZbufferData causes delay as it try to synchronize the data. Please see this post: OpenGL read pixels faster than glReadPixels.
In this post, it is suggested that PBO should be used. VTK has a class vtkPixelBufferObject but no example can be found for using it to avoid blocking the pipeline when do glReadPixels()
So how can I do this within the VTK pipeline?
My answer is just about the GetZbufferData portion.
vtkOpenGLRenderWindow already uses glReadPixels with little overhead from what I can tell. here
What happens after that I believe can introduce overhead. Main thing to note is that vtkOpenGLRenderWindow has 3 method overloads for GetZbufferData. You are using the method overload with the same signature as the one used in vtkWindowToImageFilter here
I believe you are copying that part of the implementation in vtkWindowToImageFilter, which makes total sense. What do you do with float pointer depthBuffer after you get it? Looking at the vtkWindowToImageFilter implementation, I see that they have a for loop that calls memcpy here. I believe their memcpy has to be in a for loop in order to deal with spacing, because of the variables inIncrY and outIncrY. For your situation you should only have to call memcpy once then free the array pointed to by depthBuffer. Unless you are just using the pointer. Then you have to think about who has to delete that float array, because it was created with new.
I think the better option is to use the method with this signature: int GetZbufferData( int x1, int y1, int x2, int y2, vtkFloatArray* z )
In python that looks likes this:
import vtk
# create render pipeline (not shown)
# define image bounds (not shown)
vfa = vtk.vtkFloatArray()
ib = image_bounds
render_window.GetZbufferData(ib[0], ib[1], ib[2], ib[3], vfa)
Major benefit is that the pointer for the vtkFloatArray gets handed straight to glReadPixels. Also, vtk will take of garbage collection of the vtkFloatArray if you create it with vtkSmartPointer (not needed in Python)
My python implementation is running at about 150Hz on a single pass. On a 640x480 render window.
edit: Running at 150Hz

cairo surface flushes only 1 fps

I have constructed a cairo (v1.12.16) image surface with:
surface = cairo_image_surface_create (CAIRO_FORMAT_ARGB32, size.width, size.height);
and for 60 fps; cleared it, drew stuff and flushed with:
cairo_surface_flush(surface);
then, got the resulting canvas with:
unsigned char * data = cairo_image_surface_get_data(surface);
but the resulting data variable was only modified (approximately) every second, not 60 times a second. I got the same (unexpected) result even when using cairo's quartz backend... Are there any flush/refresh rate settings in cairo that I am not (yet) aware of?
Edit: I am just trying to draw some filled (random and/or calculated) rectangles; tested 100 to 10K rects in each frame. All related code is run in the same (display?) thread. I am not caching the 'data' variable. I even modified one corner of it to flicker and I could see flickers in 60fps (for 100 rects) and 2-3 fps (for 10K rects); meaning the 'data' variable returned is not refreshed!? In a different project using cairo's quartz backend, I got the same 1 fps result!??
Edit2: The culprit turned out to be the time() function; when used in srand(time(NULL)) it was producing the same random variables in the same second; used srand(std::clock()) instead. Thanks to the quick response/reply (and it still answers my question!!)..
No there are no such flush/refresh rate settings. Cairo draws everything you tell it to and then just returns control.
I have two ideas:
Either cairo is drawing fast enough and something else is slowing things down (e.g. your copying the result of the drawing somewhere). You should measure the time that elapses between when you begin drawing and your call to cairo_surface_flush().
You are drawing something really, really complex and cairo really does need a second to render this (However, I have no idea how one could accidentally cause such a complex rendering).

Does Direct3D Mobile have a limit on the number of Draw*Primitive calls that can be made each frame?

I have Windows CE 6 running an Atom N270 with an Intel 945GSE chipset. I wrote a small test application for Direct3D Mobile and have been experiencing some strange behaviour. I can only call Draw*Primitive once per frame. Calling multiple times either results in a white screen as though Present failed (even though no error is given) or only the first call seems to be processed.
With the message handling omitted the body of the render loop is as follows:
HandleD3DMError(pD3DMobileDevice->Clear(0,0,D3DMCLEAR_TARGET | D3DMCLEAR_ZBUFFER ,0xA50AB0F,1.0f,0),_T("Clear"));
if(HandleD3DMError(pD3DMobileDevice->BeginScene(),_T("BeginScene")))
{
HandleD3DMError(pD3DMobileDevice->SetTexture(0,pTexture), _T("SetTexture"));
HandleD3DMError(pD3DMobileDevice->SetStreamSource(0,m_pPPVVB[0],sizeof(PrimitiveVertex)),_T("SetStreamSource Failed"));
HandleD3DMError(pD3DMobileDevice->SetIndices(pIndexBuffer),_T("SetIndices failed"));
HandleD3DMError(pD3DMobileDevice->DrawIndexedPrimitive(D3DMPT_TRIANGLELIST,0,0,NUM_TRIS*3,0,NUM_TRIS/2),_T("Draw Primitive 1"));
//HandleD3DMError(pD3DMobileDevice->DrawIndexedPrimitive(D3DMPT_TRIANGLELIST,0,0,NUM_TRIS*3,NUM_TRIS/2 * 3,NUM_TRIS/2),_T("Draw Primitive 2"));
HandleD3DMError(pD3DMobileDevice->SetStreamSource(0,0,0),_T("SetStreamSource Null Failed"));
HandleD3DMError(pD3DMobileDevice->SetIndices(0),_T("SetIndices Null failed"));
HandleD3DMError(pD3DMobileDevice->EndScene(),_T("EndScene"));
}
HandleD3DMError(pD3DMobileDevice->Present(0,0,0,0),_T("Present Failed"));
You can switch the two DrawIndexedPrimitive() lines and both render four triangles each, which is correct. However when they are both present nothing is rendered. The HandleD3DMError() function displays a message box when an error occurs. This is done all through initialisation too. No errors are displayed at any time.
I have tried drawing different primitive types and drawing non-indexed vertex buffers with no success. I am able to draw 10,000 triangles using a single buffer but trying to use multiple buffers fails (I assume it is related to the multiple Draw calls issue).
The documentation on MSDN does not mention any limitation on Draw calls. They even mention cases where you would need to make multiple Draw calls. The official samples I've looked at
only ever call Draw*() once per frame too.
If I try and BeginScene() and EndScene() multiple times in the one frame nothing is rendered, not even the cleared colour.
I can provide all of the source if needed.
I appreciate any help anyone can give me.
Cheers

Resources