Direct2D/DirectWrite Does IDWriteFactory::CreateTextLayout need to release every time when update the text in CreateTextLayout? - memory-leaks

I am doing Direct2D show some text (like fps, resolution etc) on Direct3D surface. The weird thing is that in my Window Class there is a method called CalculateFrameStats() where every loop, use this to calcualte the FPS etc information and use Direct2D IDWriteFactory::CreateTextLayout to create a new Textlayout with latest updated FPS text strings. And do BeginDraw(), DrawTextLayout(), EndDraw() in the 3DFrameDraw() function. And after that I don't release the TextLayout pointer. And next round goes to CalculateFrameStats(), it CreateTextLayout again with newly update FPS etc strings. And in 3DFrameDraw() function, I drawTextlayout again. And it loops like this over and over. But when I run the program, it seems no memory leaks at all, the memory usage keeps low and constant.
But when put IDWriteFactory::CreateTextLayout in 3DFrameDraw() function, which means every 3D frame draw in the beginning I create a new TextLayout with updated FPS string and do some 3D manipulations and before D3D-present, I do BeginDraw(), DrawTextLayout(), EndDraw(). This is the same area in previous 3DFrameDraw(). But this time, the memory leaks, and I can see the memory keep growing when time elapse. But if I add Textlayout pointer->release() after BeginDraw(), DrawTextLayout(), EndDraw(), the memory leaks gone.
I don't really know why the first scenario Textlayout pointer never got release until close the program, the memory never leaks. Does TextLayout need to be released every time/frame when update its text string?

Related

Why GBuffers need to be created for each frame in D3D12?

I have experience with D3D11 and want to learn D3D12. I am reading the official D3D12 multithread example and don't understand why the shadow map (generated in the first pass as a DSV, consumed in the second pass as SRV) is created for each frame (actually only 2 copies, as the FrameResource is reused every 2 frames).
The code that creates the shadow map resource is here, in the FrameResource class, instances of which is created here.
There is actually another resource that is created for each frame, the constant buffer. I kind of understand the constant buffer. Because it is written by CPU (D3D11 dynamic usage) and need to remain unchanged until the GPU finish using it, so there need to be 2 copies. However, I don't understand why the shadow map needs to do the same, because it is only modified by GPU (D3D11 default usage), and there are fence commands to separate reading and writing to that texture anyway. As long as the GPU follows the fence, a single texture should be enough for the GPU to work correctly. Where am I wrong?
Thanks in advance.
EDIT
According to the comment below, the "fence" I mentioned above should more accurately be called "resource barrier".
The key issue is that you don't want to stall the GPU for best performance. Double-buffering is a minimal requirement, but typically triple-buffering is better for smoothing out frame-to-frame rendering spikes, etc.
FWIW, the default behavior of DXGI Present is to stall only after you have submitted THREE frames of work, not two.
Of course, there's a trade-off between triple-buffering and input responsiveness, but if you are maintaining 60 Hz or better than it's likely not noticeable.
With all that said, typically you don't need to double-buffered depth/stencil buffers for rendering, although if you wanted to make the initial write of the depth-buffer overlap with the read of the previous depth-buffer passes then you would want distinct buffers per frame for performance and correctness.
The 'writes' are all complete before the 'reads' in DX12 because of the injection of the 'Resource Barrier' into the command-list:
void FrameResource::SwapBarriers()
{
// Transition the shadow map from writeable to readable.
m_commandLists[CommandListMid]->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(m_shadowTexture.Get(), D3D12_RESOURCE_STATE_DEPTH_WRITE, D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE));
}
void FrameResource::Finish()
{
m_commandLists[CommandListPost]->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(m_shadowTexture.Get(), D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE, D3D12_RESOURCE_STATE_DEPTH_WRITE));
}
Note that this sample is a port/rewrite of the older legacy DirectX SDK sample MultithreadedRendering11, so it may be just an artifact of convenience to have two shadow buffers instead of just one.

How to update texture for every frame in vulkan?

As my question title says, I want update texture for every frame.
I got an idea :
create a VkImage as a texture buffer with bellow configurations :
initialLayout = VK_IMAGE_LAYOUT_PREINITIALIZED
usage= VK_IMAGE_USAGE_SAMPLED_BIT
and it's memory type is VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT
In draw loop :
first frame :
map texure data to VkImage(use vkMapMemory).
change VkImage layout from VK_IMAGE_LAYOUT_PREINITIALIZED to VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL.
use this VkImage as texture buffer.
second frame:
The layout will be VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL after the first frame , can I map next texure data to this VkImage directly without change it's layout ? if I can not do that, which layout can I change this VkImage to ?
In vkspec 11.4 it says :
The new layout used in a
transition must not be VK_IMAGE_LAYOUT_UNDEFINED or VK_IMAGE_LAYOUT_PREINITIALIZED
So , I can not change layout back to _PREINITIALIZED .
Any help will be appreciated.
For your case you do not need LAYOUT_PREINITIALIZED. That would only complicate your code (forcing you to provide separate code for the first frame).
LAYOUT_PREINITIALIZED is indeed a very special layout intended only for the start of the life of the Image. It is more useful for static textures.
Start with LAYOUT_UNDEFINED and use LAYOUT_GENERAL when you need to write the Image from CPU side.
I propose this scheme:
berfore render loop
Create your VkImage with UNDEFINED
1st to Nth frame (aka render loop)
Transition image to GENERAL
Synchronize (likely with VkFence)
Map the image, write it, unmap it (weell, the mapping and unmaping can perhaps be outside render loop)
Synchronize (potentially done implicitly)
Transition image to whatever layout you need next
Do your rendering and whatnot
start over at 1.
It is a naive implementation but should suffice for ordinary hobbyist uses.
Double buffered access can be implemented — that is e.g. VkBuffer for CPU access and VkImage of the same for GPU access. And VkCmdCopy* must be done for the data hand-off.
It is not that much more complicated than the above approach and there can be some performance benefits (if you need those at your stage of your project). You usually want your resources in device local memory, which often is not also host visible.
It would go something like:
berfore render loop
Create your VkBuffer b with UNDEFINED backed by HOST_VISIBLE memory and map it
Create your VkImage i with UNDEFINED backed by DEVICE_LOCAL memory
Prepare your synchronization primitives between i and b: E.g. two Semaphores, or Events could be used or Barriers if the transfer is in the same queue
1st to Nth frame (aka render loop)
Operations on b and i can be pretty detached (even can be on different queues) so:
For b:
Transition b to GENERAL
Synchronize to CPU (likely waiting on VkFence or vkQueueIdle)
invalidate(if non-coherent), write it, flush(if non-coherent)
Synchronize to GPU (done implicitly if 3. before queue submission)
Transition b to TRANSFER
Synchronize to make sure i is not in use (likely waiting on a VkSemaphore)
Transition i to TRANSFER
Do vkCmdCopy* from b to i
Synchronize to make known I am finished with i (likely signalling a VkSemaphore)
start over at 1.
(The fence at 2. and semaphore at 6. have to be pre-signalled or skipped for first frame to work)
For i:
Synchronize to make sure i is free to use (likely waiting on a VkSemaphore)
Transition i to whatever needed
Do your rendering
Synchronize to make known I am finished with i (likely signalling a VkSemaphore)
start over at 1.
You have a number of problems here.
First:
create a VkImage as a texture buffer
There's no such thing. The equivalent of an OpenGL buffer texture is a Vulkan buffer view. This does not use a VkImage of any sort. VkBufferViews do not have an image layout.
Second, assuming that you are working with a VkImage of some sort, you have recognized the layout problem. You cannot modify the memory behind the texture unless the texture is in the GENERAL layout (among other things). So you have to force a transition to that, wait until the transition command has actually completed execution, then do your modifications.
Third, Vulkan is asynchronous in its execution, and unlike OpenGL, it will not hide this from you. The image in question may still be accessed by the shader when you want to change it. So usually, you need to double buffer these things.
On frame 1, you set the data for image 1, then render with it. On frame 2, you set the data for image 2, then render with it. On frame 3, you overwrite the data for image 1 (using events to ensure that the GPU has actually finished frame 1).
Alternatively, you can use double-buffering without possible CPU waiting, by using staging buffers. That is, instead of writing to images directly, you write to host-visible memory. Then you use a vkCmdCopyBufferToImage command to copy that data into the image. This way, the CPU doesn't have to wait on events or fences to make sure that the image is in the GENERAL layout before sending data.
And BTW, Vulkan is not OpenGL. Mapping of memory is always persistent; there's no reason to unmap a piece of memory if you're going to map it every frame.

cairo surface flushes only 1 fps

I have constructed a cairo (v1.12.16) image surface with:
surface = cairo_image_surface_create (CAIRO_FORMAT_ARGB32, size.width, size.height);
and for 60 fps; cleared it, drew stuff and flushed with:
cairo_surface_flush(surface);
then, got the resulting canvas with:
unsigned char * data = cairo_image_surface_get_data(surface);
but the resulting data variable was only modified (approximately) every second, not 60 times a second. I got the same (unexpected) result even when using cairo's quartz backend... Are there any flush/refresh rate settings in cairo that I am not (yet) aware of?
Edit: I am just trying to draw some filled (random and/or calculated) rectangles; tested 100 to 10K rects in each frame. All related code is run in the same (display?) thread. I am not caching the 'data' variable. I even modified one corner of it to flicker and I could see flickers in 60fps (for 100 rects) and 2-3 fps (for 10K rects); meaning the 'data' variable returned is not refreshed!? In a different project using cairo's quartz backend, I got the same 1 fps result!??
Edit2: The culprit turned out to be the time() function; when used in srand(time(NULL)) it was producing the same random variables in the same second; used srand(std::clock()) instead. Thanks to the quick response/reply (and it still answers my question!!)..
No there are no such flush/refresh rate settings. Cairo draws everything you tell it to and then just returns control.
I have two ideas:
Either cairo is drawing fast enough and something else is slowing things down (e.g. your copying the result of the drawing somewhere). You should measure the time that elapses between when you begin drawing and your call to cairo_surface_flush().
You are drawing something really, really complex and cairo really does need a second to render this (However, I have no idea how one could accidentally cause such a complex rendering).

Xna Xbox framedrops when GC kicks in

I'm developing an app (XNA Game) for the XBOX, which is a pretty simple app. The startpage contains tiles with moving gif images. Those gif images are actually all png images, which gets loaded once by every tile, and put in an array. Then, using a defined delay, these images are played (using a counter which increases every time a delay passes).
This all works well, however, I noticed some small lag every x seconds in the movement of the GIF images. I then started to add some benchmarking stuff:
http://gyazo.com/f5fe0da3ff81bd45c0c52d963feb91d8
As you can see, the FPS is pretty low for such a simple program (This is in debug, when running the app from the Xbox itself, I get an avg of 62fps).
2 important settings:
Graphics.SynchronizeWithVerticalRetrace = false;
IsFixedTimeStep = false;
Changing isFixedTimeStep to true increases the lag. The settings tile has wheels which rotate, and you can see the wheels go back a little every x seconds. The same counts for SynchronizeWVR, also increases lag.
I noticed a connection between the lag and the moment the garbage collector kicks in, every time it kicks in, there is a lag...
Don't mind the MAX HMU(Heap memory usage), as this is takes the amount of the start, the avg is more realistic.
Here is another screen from the performance monitor, however I don't understand much from this tool, first time I'm using it... Hope it helps:
http://gyazo.com/f70a3d400657ac61e6e9f2caaaf17587
After a little research I found the culprit.
I have custom components that all derive from GameComponent, and who get added to the Component list of the main Game class.
This was one (of a total of 2) major problem, causing to update everything that wasn't needing an update. (The draw method was the only one who kept the page state in mind, and only drew if needed).
I fixed this by using different "screens" (or pages as I called them), wich are the only components who derive from GameComponent.
Then I only update the page wich is active, and the custom components on that page also get updated. Problem fixed.
The second big problem, is the following;
I made a class which helps me on positioning stuff on the screen, relative that is, with percentages and stuff like that. Parent containers, aligns & v-aligns etc etc.
That class had properties, for size & vectors, but instead of saving the calculated value in a backing field, I recalculated them everytime I accessed a property. But calculating complex stuff like that uses references (to parent & child containers for example) wich made it very hard for the CLR, because it had alot of work to do.
I now rebuilt the whole positioning class to a fully functional optimized class, with different flags for recalculating when necessairy, and instead of drops of 20fps, I now get an average of 170+fps!

Texture Buffer for OpenGL Video Player

I am using OpenGL, Ffmpeg and SDL to play videos and am currently optimizing the process of getting frames, decoding them, converting them from YUV to RGB, uploading them to texture and displaying the texture on a quad. Each of these stages is performed by a seperate thread and they written to shared buffers which are controlled by SDL mutexes and conditions (except for the upload and display of the textures as they need to be in the same context).
I have the player working fine with the decode, convert and OpenGL context on seperate threads but realised that because the video is 25 frames per second, I only get a converted frame from the buffer, upload it to OpenGL and bind it/display it every 40 milliseconds in the OpenGL thread. The render loop goes round about 6-10 times not showing the next frame for every frame it shows, due to this 40ms gap.
Therefore I decided it might be a good idea to have a buffer for the textures too and set up an array of textures created and initialised with glGenTextures() and the glParameters I needed etc.
When it hasn't been 40ms since the last frame refresh, a method is ran which grabs the next converted frame from the convert buffer and uploads it to the next free texture in the texture buffer by binding it then calling glTexSubImage2D(). When it has been 40ms since the last frame refresh, a seperate method is ran which grabs the next GLuint texture from the texture buffer and binds it with glBindTexture(). So effectively, I am just splitting up what was being done before (grab from convert buffer, upload, display) into seperate methods (grab from convert buffer, upload to texture buffer | and | grab from texture buffer, display) to make use of the wasted time between 40ms refreshes.
Does this sound reasonable? Because when ran, the video halts all the time in a sporadic manner, sometimes about 4 frames are played when they are supposed to (every 40ms) but then there is a 2 second gap, then 1 frame is shown, then a 3 second gap and the video is totally unwatchable.
The code is near identical to how I manage the convert thread grabbing decoded frames from the decode buffer, converting them from YUV to RGB and then putting them into the convert buffer so can't see where the massive bottlenecking could be.
Could the bottlenecking be on the OpenGL side of things? Is the fact that I am storing new image data to 10 different textures the problem as when a new texture is grabbed from the texture buffer, the raw data could be a million miles away from the last one in terms of memory location on the video memory? That's my only attempt at an answer, but I don't know much about how OpenGL works internally so that's why I am posting here.
Anybody have any ideas?
I'm no OpenGL expert, but my guess of the bottleneck is that the textures are intialized properly in system memory but are sent to the video memory at the 'wrong' time (like all at once instead of as soon as possible), stalling the pipeline. When using glTexSubImage2D you have no guarantees about when a texture arrives in video memory until you bind it.
Googling around it seems pixelbuffer objects give you more control about when they are in video memory: http://www.opengl.org/discussion_boards/ubbthreads.php?ubb=showflat&Number=262523

Resources