Size of an OpenGL context - linux

Is there a way to get a size of an opengl context? Or at least to estimate it's size? If yes, how?
I have an application in glut, which creates several windows. Since glut doesn't share opengl context between windows, every window is going to create new. Now, I am trying to reduce needed memory, since it is for an embedded system. But if the opengl context is small enough to neglect it, then I am not going to see big reduction in memory usage.
I have found this patch to create windows with shared opengl context :
A small addendum for Windows users (by Misbah Qidwai): I added this subroutine to glut_win.c. I use this routine to call wglSharedLists()
//MQ
/* CENTRY */
GLXContext APIENTRY
glutGetWindowRenderContext(int win)
{
GLUTwindow *window;
if (win < 1 || win > __glutWindowListSize) {
__glutWarning("glutSetWindow attempted on bogus window.");
return NULL;
}
window = __glutWindowList[win - 1];
if (!window) {
__glutWarning("glutSetWindow attempted on bogus window.");
return NULL;
}
return window->renderCtx;
}

A OpenGL context is an abstract thing. The amount of data backing a particular context can be as small as a pointer, or as big as a few megabytes. The context itself is not some kind of data structure, it's merely a handle shared by your program and the graphics system so that each other "knows" what the other is talking about.
The only way to know in a particular configuration is to measure it.

Related

Why GBuffers need to be created for each frame in D3D12?

I have experience with D3D11 and want to learn D3D12. I am reading the official D3D12 multithread example and don't understand why the shadow map (generated in the first pass as a DSV, consumed in the second pass as SRV) is created for each frame (actually only 2 copies, as the FrameResource is reused every 2 frames).
The code that creates the shadow map resource is here, in the FrameResource class, instances of which is created here.
There is actually another resource that is created for each frame, the constant buffer. I kind of understand the constant buffer. Because it is written by CPU (D3D11 dynamic usage) and need to remain unchanged until the GPU finish using it, so there need to be 2 copies. However, I don't understand why the shadow map needs to do the same, because it is only modified by GPU (D3D11 default usage), and there are fence commands to separate reading and writing to that texture anyway. As long as the GPU follows the fence, a single texture should be enough for the GPU to work correctly. Where am I wrong?
Thanks in advance.
EDIT
According to the comment below, the "fence" I mentioned above should more accurately be called "resource barrier".
The key issue is that you don't want to stall the GPU for best performance. Double-buffering is a minimal requirement, but typically triple-buffering is better for smoothing out frame-to-frame rendering spikes, etc.
FWIW, the default behavior of DXGI Present is to stall only after you have submitted THREE frames of work, not two.
Of course, there's a trade-off between triple-buffering and input responsiveness, but if you are maintaining 60 Hz or better than it's likely not noticeable.
With all that said, typically you don't need to double-buffered depth/stencil buffers for rendering, although if you wanted to make the initial write of the depth-buffer overlap with the read of the previous depth-buffer passes then you would want distinct buffers per frame for performance and correctness.
The 'writes' are all complete before the 'reads' in DX12 because of the injection of the 'Resource Barrier' into the command-list:
void FrameResource::SwapBarriers()
{
// Transition the shadow map from writeable to readable.
m_commandLists[CommandListMid]->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(m_shadowTexture.Get(), D3D12_RESOURCE_STATE_DEPTH_WRITE, D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE));
}
void FrameResource::Finish()
{
m_commandLists[CommandListPost]->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(m_shadowTexture.Get(), D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE, D3D12_RESOURCE_STATE_DEPTH_WRITE));
}
Note that this sample is a port/rewrite of the older legacy DirectX SDK sample MultithreadedRendering11, so it may be just an artifact of convenience to have two shadow buffers instead of just one.

How to find TLS segments of the current thread on linux amd64?

I'm looking for a way to find out the memory addresses of TLS segments for the current thread on linux, amd64. Bonus point for a solution that works on OSX.
Looked into various language runtime or GC (like boehm), but couldn't go through the multiple layer of abstractions to support all kind of systems so far. Any help appreciated.
Did you have a look at the solution Martin and I came up with in druntime?
What we do there boils down to scanning the segments in the corresponding dl_phdr_info (obtained by looking for the correct one using dl_iterate_phdr) for the segment with type PT_TLS, and storing its module id and size.
You can then get the start of the address range on the current thread by calling __tls_get_addr for offset 0 and the module id (there is an offset on some archs), and the end by simply adding the size you determined to that. If you do not need to support shared libraries, you can also simply use fs/gs on x86 for that (might be required if you want to link a static executable).
This works for Linux and FreeBSD (and probably other ELF platforms), but not OS X. There, the best I could come up with so far is this:
void _d_dyld_getTLSRange(void* arbitraryTLSSymbol, void** start, size_t* size) {
dyld_enumerate_tlv_storage(
^(enum dyld_tlv_states state, const dyld_tlv_info *info) {
assert(state == dyld_tlv_state_allocated);
if (info->tlv_addr <= arbitraryTLSSymbol &&
arbitraryTLSSymbol < (info->tlv_addr + info->tlv_size)
) {
// Found the range we are looking for.
*start = info->tlv_addr;
*size = info->tlv_size;
}
}
);
}
The naive implementation currently used in LDC's druntime does not quite handle shared libraries, though, and dyld_enumerate_tlv_storage is from dyld_priv.h, which might or might not be a problem for App Store publishing.
On Linux, the thread-specific segment is set up via arch_prtcl(ARCH_SET_FS, <addr>) call. You can find out what it was set to in the current thread via arch_prctl(ARCH_GET_FS, ...).
Bonus point for a solution that works on OSX.
OSX is a completely different OS, and uses completely different mechanism for its TLS support.

OpenGL ES has a maximum speed limit?

I implemented a loop with up to 100fps and can not spend more than 63fps.
What I believe is that the thread that runs the method of drawing opengl has a speed limit.
"(
#Override
public void onDrawFrame(GL10 gl)
)"
It depends on whether or not your rendering context has vertical sync enabled. Most LCD devices refresh at 60hz, and it may be waiting for the next refresh to call onDrawFrame(). That's one reason you'd be seeing that number.
The other possibility is that your draw is just taking long enough that it can't run any faster.
You should read the spec, for eglSwapInterval. It has to be implemented in your driver (I presume this is an Android device) to be able to see the effect. You can use it in any OpenGLES2 based application.
http://www.khronos.org/registry/egl/sdk/docs/man/xhtml/eglSwapInterval.html
A gist showing the usage here:
https://gist.github.com/prabindh/8467984

malloc/realloc/free capacity optimization

When you have a dynamically allocated buffer that varies its size at runtime in unpredictable ways (for example a vector or a string) one way to optimize its allocation is to only resize its backing store on powers of 2 (or some other set of boundaries/thresholds), and leave the extra space unused. This helps to amortize the cost of searching for new free memory and copying the data across, at the expense of a little extra memory use. For example the interface specification (reserve vs resize vs trim) of many C++ stl containers have such a scheme in mind.
My question is does the default implementation of the malloc/realloc/free memory manager on Linux 3.0 x86_64, GLIBC 2.13, GCC 4.6 (Ubuntu 11.10) have such an optimization?
void* p = malloc(N);
... // time passes, stuff happens
void* q = realloc(p,M);
Put another way, for what values of N and M (or in what other circumstances) will p == q?
From the realloc implementation in glibc trunk at http://sources.redhat.com/git/gitweb.cgi?p=glibc.git;a=blob;f=malloc/malloc.c;h=12d2211b0d6603ac27840d6f629071d1c78586fe;hb=HEAD
First, if the memory has been obtained via mmap() instead of sbrk(), which glibc malloc does for large requests, >= 128 kB by default IIRC:
if (chunk_is_mmapped(oldp))
{
void* newmem;
#if HAVE_MREMAP
newp = mremap_chunk(oldp, nb);
if(newp) return chunk2mem(newp);
#endif
/* Note the extra SIZE_SZ overhead. */
if(oldsize - SIZE_SZ >= nb) return oldmem; /* do nothing */
/* Must alloc, copy, free. */
newmem = public_mALLOc(bytes);
if (newmem == 0) return 0; /* propagate failure */
MALLOC_COPY(newmem, oldmem, oldsize - 2*SIZE_SZ);
munmap_chunk(oldp);
return newmem;
}
(Linux has mremap(), so in practice this is what is done).
For smaller requests, a few lines below we have
newp = _int_realloc(ar_ptr, oldp, oldsize, nb);
where _int_realloc is a bit big to copy-paste here, but you'll find it starting at line 4221 in the link above. AFAICS, it does NOT do the constant factor optimization increase that e.g. the C++ std::vector does, but rather allocates exactly the amount requested by the user (rounded up to the next chunk boundaries + alignment stuff and so on).
I suppose the idea is that if the user wants this factor of 2 size increase (or any other constant factor increase in order to guarantee logarithmic efficiency when resizing multiple times), then the user can implement it himself on top of the facility provided by the C library.
Perhaps you can use malloc_usable_size (google for it) to find the answer experimentally. This function, however, seems undocumented, so you will need to check out if it is still available at your platform.
See also How to find how much space is allocated by a call to malloc()?

Does CG 3.0 leak?

I'm finding CG appears to have a memory leak. I submitted a report via nvidia.com, but if you try this here:
If you remove the line that says
cgD3D11SetTextureParameter( g.theTexture, g.sharedTex ) ;
The leak stops.
Does CG 3.0 really leak?
Using ATI Radeon 5850 GPU / Windows 7 64-bit.
Yes, it leaks. Internally it creates a ShaderResourceView on every call, and never releases it. I think the API is ill-designed, they should have taken a ShaderResourceView* as a parameter to this function, instead of just a Resource*.
I posted about this on nvidia forums about 6 months ago and never got a response
Is your report posted publicly? Or some kind of private support ticket?
Yes, Cg 3.0 leaks every time you call cgD3D11SetTextureParameter(), causing your application's memory usage to climb. Unfortunately it makes Cg 3.0 with D3D11 completely unusable. One symptom of this is that, after a while of your application running, it will stop rendering and the screen will just go black. I wasted a lot of time trying to determine the cause of this before discovering the Cg bug.
If anybody is wondering why this isn't apparent with the Cg D3D11 demos, its because the few that actually use textures are so simple that they can get away with only calling cgD3D11SetTextureParameter() once at the start.
This same bug remains with Cg Toolkit 3.1 (April 2012).
jmp [UPDATE] ;; skip obsolete text segment
Could it be that Cg is being destroyed after d3d so it doesn't release the reference on time? Or vice-versa? such as the function acquiring the texture but not releasing it before d3d closes, because when you set a texture to a shader, the texture is acquired until shader resources are released somehow. You are destroying the d3d context, here:
SAFE_RELEASE( g.d3d );
SAFE_RELEASE( g.gpu );
Later on, you free the shader, as follows CleanupCg():
cgDestroyProgram( g.v_vncShader );
checkForCgError( "destroying vertex program" );
cgDestroyProgram( g.px_vncShader );
checkForCgError( "destroying fragment program" );
Try to change the order of the calls in a way you first release all resources from both cg and d3d, this: cgD3D11SetDevice( g.cgContext, NULL ); should also be called before releasing the d3d context, just in case.
UPDATE:
This should be different inside WinMain():
initD3D11() ; // << FIRST you init D3D
initCg() ; // << SECOND you init CG with the D3D pointers
initD2D1() ; //
initVBs() ;
// Main message loop
while( WM_QUIT != msg.message ){ /* loop code */ }
CleanupDevice(); //// << FIRST you release all D3D, when Cg is still referencing it (why?).
CleanupCg(); //// << SECOND if something in the Cg runtime depend on d3dcontext which you just destroyed, it will crash or leak or do whatever it wants
so you should swap them to ensure Cg to free any d3d pointer:
CleanupCg(); //// << FIRST release Cg to ensure it's not referencing D3D anymore.
CleanupDevice(); //// << SECOND D3D isn't either referencing or being referenced by Cg, so just release it all
You could also provide the debugger output and other info as I asked down there, because you're basically saying "Cg seems to be broken, this is the whole code, look the line ###, is it broken?" but there are more than a thousand lines (1012) of C, C++ and shader code in your file, you basically provide no info but readily point to a Cg bug (based on... what?) which of course, if you're so sure, why would anyone look at the code if the code is fine? Which isn't by the way, not that I don't like it but... it got these little things such as the call ordering which are silly mistakes but that can make debugging a real hell, it's a clear bug, and I may also think that if I just looked into Main and found a bug, well there is a long way up to the render call and the Cg implementation, isn't it? I can't run the app on WinXP, but these errors are in the most predictable places :)
So... when your code is clean of any bug... ohh! look! what I've just found..
~VertexBuffer()
{
SAFE_RELEASE( vb );
SAFE_RELEASE( layout ) ;
}
turns out in VertexBuffer constructor you call iD3D->GetImmediateContext( &gpu ); and store the pointer in a private member, so... shouldn't you add:
SAFE_RELEASE( gpu ); // ? there are 3 VertexBuffers instances, so that's another memory leak.
Ok so there are some things you should fix in your code that cause memory leaks, and I just took a look on it, so you didn't really try. On the other hand, it seems your code is clear and full of explanations and I need to learn some DX11, so actually I should thank you for it. The downvote was somewhat rude though :P specially because I'm probably right, and other people would avoid reading your code as soon as the page displays.

Resources