what is colorAttachment[n] in Metal? - graphics

Very new to graphics in general, starting off with Metal for immediate needs, will try out OpenGL soon enough.
Was wondering what the question means in lay man terms. Also, what is the extent of 'n', I have just used it as 0 in the 2D triangle I made.

In general, the color attachment(s) are where rendered images are stored (at least temporarily) during a render pass. It is common to only use one color attachment at index 0, so what you're doing is fine. It is also possible to render to multiple color attachments simultaneously, which is why there's an array. It's an advanced technique that you don't have to worry about until you see the need, at which point it should be straightforward how to do it.
There are two places where colorAttachments[n] appears in Metal. First is in MTLRenderPipelineDescriptor. The other is in MTLRenderPassDescriptor.
In both cases, the extent is given in the Metal Implementation Limits table, in the "Render targets" section, in the row labelled "Maximum number of color render targets per render pass descriptor".
For MTLRenderPipelineDescriptor, colorAttachments[n] is a reference to a MTLRenderPipelineColorAttachmentDescriptor. Here, you configure the pixel format, write mask, and color blending operation.
For MTLRenderPassDescriptor, colorAttachments[n] is a reference to a MTLRenderPassColorAttachmentDescriptor. This is a subclass of MTLRenderPassAttachmentDescriptor, which is where most of its properties are defined. Here, you configure which part of which texture you will render to, what should happen to that texture's data when the render pass starts and ends, and, if it's to be cleared, what color it should be cleared to.
Information about color attachments is split across those two objects based on how expensive it is to change. The render pipeline state object is fairly expensive to create from the pipeline descriptor. You would typically create all of your pipeline state objects once in a run and reuse them for the rest of your app's lifetime.
By contrast, you will create render command encoders from render pass descriptors fairly often; at least once per frame. They are relatively inexpensive to create, so you can change the descriptor and create a new one to render elsewhere.
The render pipeline state's color attachment pixel format has to match the pixel format of the texture of the render command encoder's color attachment texture.

Related

OpenSceneGraph: Don't update the z-buffer when drawing semi-transparent objects

Question
Is it possible to tell OpenSceneGraph to use the Z-buffer but not update it when drawing semi-transparent objects?
Motivation
When drawing semitransparent objects, the order in which they are drawn is important, as surfaces that should be visible might be occluded if they are drawn in the wrong order. In some cases, OpenSceneGraph's own intuition about the order in which the objects should be drawn fails—semitransparent surfaces become occluded by other semitransparent surfaces, and "popping" (if that word can be used in this way) may occur, when OSG thinks the order of the object centers' distance to the camera has changed and decides to change the render order. It therefore becomes necessary to manually control the render order of semitransparent objects by manually specifying the render bin for each object using the setRenderBinDetails method on the state set.
However, this might still not always work either, as it in the general case, is impossible to choose a render order for the objects (even if the individual triangles in the scene were ordered) such that all fragments are drawn correctly (see e.g. the painter's problem), and one might still get occlusion. An alternative is to use depth peeling or some other order-independent transparency method but, frankly, I don't know how difiicult this is to implement in OpenSceneGraph or how much it would slow the application down.
In my case, as a trade-off between aestetics and algorithmic complexity and speed, I would ideally always want to draw a fragment of a semi-transparent surface, even though another fragment of another semi-transparent surface that (in that pixel) is closer to the camera has already been drawn. This would prevent both popping and occlusion of semi-transparent surfaces by other semi-transparent surfaces, and would effectivelly be achieved if—for every semi-transparent object that was rendered—the Z-buffer was used to test visibility but wasn't updated when the fragment was drawn.
You're totally on the right track.
Yes, it's possible to leave Z-test enabled but turn off Z-writes with setWriteMask() during drawing:
// Disable Z-writes
osg::ref_ptr<osg::Depth> depth = new osg::Depth;
depth->setWriteMask(false);
myNode->getOrCreateStateSet()->setAttributeAndModes(depth, osg::StateAttribute::ON)
// Enable Z-test (needs to be done after Z-writes are disabled, since the latter
// also seems to disable the Z-test)
myNode->getOrCreateStateSet()->setMode(GL_DEPTH_TEST, osg::StateAttribute::ON);
https://www.mail-archive.com/osg-users#openscenegraph.net/msg01119.html
http://public.vrac.iastate.edu/vancegroup/docs/OpenSceneGraphReferenceDocs-2.8/a00206.html#a2cef930c042c5d8cda32803e5e832dae
You may wish to check out the osgTransparencyTool nodekit we wrote for a CAD client a few years ago: https://github.com/XenonofArcticus/OSG-Transparency-Tool
It includes several transparency methods that you can test with your scenes and examine the source implementation of, including an Order Independent Transparency Depth Peeling implementation and a Delayed Blend method inspired by Open Inventor. Delayed Blend is a high performance single pass unsorted approximation that probably checks all the boxes you want if absolute transparency accuracy is not the most important criteria.
Here's a paper discussing the various approaches in excruciating detail, if you haven't read it:
http://lips.informatik.uni-leipzig.de/files/bathesis_cbluemel_digital_0.pdf

How to keep NSBitmapImageRep from creating lots of intermediate CGImages?

I have a generative art application which starts with a small set of points, grows them outwards, and checks the growth to make sure it doesn't intersect with anything. My first naive implementation was to do it all on the main UI thread with the expected consequences. As the size grows, there are more points to check, it slows down and eventually blocks the UI.
I did the obvious thing and moved the calculations to another thread so the UI could stay responsive. This helped, but only a little. I accomplished this by having an NSBitmapImageRep that I wrap an NSGraphicsContext around so I can draw into it. But I needed to ensure that I'm not trying to draw it to the screen on the main UI thread while I'm also drawing to it on the background thread. So I introduced a lock. The drawing can take a long time as the data gets larger, too, so even this was problematic.
My latest revision has 2 NSBitmapImageReps. One holds the most recently drawn version and is drawn to the screen whenever the view needs updating. The other is drawn to on the background thread. When the drawing on the background thread is done, it's copied to the other one. I do the copy by getting the base address of each and simply calling memcpy() to actually move the pixels from one to the other. (I tried swapping them rather than copying, but even though the drawing ends with a call to [-NSGraphicsContext flushContext], I was getting partially-drawn results drawn to the window.)
The calculation thread looks like this:
BOOL done = NO;
while (!done)
{
self->model->lockBranches();
self->model->iterate();
done = (!self->model->moreToDivide()) || (!self->keepIterating);
self->model->unlockBranches();
[self drawIntoOffscreen];
dispatch_async(dispatch_get_main_queue(), ^{
self.needsDisplay = YES;
});
}
This works well enough for keeping the UI responsive. However, every time I copy the drawn image into the blitting image, I call [-NSBitmapImageRep baseAddress]. Looking at a memory profile in instruments, each call to that function causes a CGImage to be created. Furthermore, that CGImage isn't released until the calculations finish, which can be several minutes. This causes memory to grow pretty large. I'm seeing around 3-4 Gigs of CGImages in my process, even though I never need more than 2 of them. After the calculations finish and the cache is emptied, my app's memory goes down to only 350-500 MB. I hadn't thought to use an autorelease pool in the calculation loop for this, but will give it a try.
It appears that the OS is caching the images it creates. However, it doesn't clear out the cache until the calculations are finished, so it grows without bound until then. Is there any way to keep this from happening?
Don't use -bitmapData and memcpy() to copy the image. Draw the one image into the other.
I often recommend that developers read the section "NSBitmapImageRep: CoreGraphics impedance matching and performance notes" from the 10.6 AppKit release notes:
NSBitmapImageRep: CoreGraphics impedance matching and performance notes
Release notes above detail core changes at the NSImage level for
SnowLeopard. There are also substantial changes at the
NSBitmapImageRep level, also for performance and to improve impedance
matching with CoreGraphics.
NSImage is a fairly abstract representation of an image. It's pretty
much just a thing-that-can-draw, though it's less abstract than NSView
in that it should not behave differently based aspects of the context
it's drawn into except for quality decisions. That's kind of an opaque
statement, but it can be illustrated with an example: If you draw a
button into a 100x22 region vs a 22x22 region, you can expect the
button to stretch its middle but not its end caps. An image should not
behave that way (and if you try it, you'll probably break!). An image
should always linearly and uniformly scale to fill the rect in which
its drawn, though it may choose representations and such to optimize
quality for that region. Similarly, all the image representations in
an NSImage should represent the same drawing. Don't pack some totally
different image in as a rep.
That digression past us, an NSBitmapImageRep is a much more concrete
object. An NSImage does not have pixels, an NSBitmapImageRep does. An
NSBitmapImageRep is a chunk of data together with pixel format
information and colorspace information that allows us to interpret the
data as a rectangular array of color values.
That's the same, pretty much, as a CGImage. In SnowLeopard an
NSBitmapImageRep is natively backed by a CGImageRef, as opposed to
directly a chunk of data. The CGImageRef really has the chunk of data.
While in Leopard an NSBitmapImageRep instantiated from a CGImage would
unpack and possibly process the data (which happens when reading from
a bitmap file format), in SnowLeopard we try hard to just hang onto
the original CGImage.
This has some performance consequences. Most are good! You should see
less encoding and decoding of bitmap data as CGImages. If you
initialize a NSImage from a JPEG file, then draw it in a PDF, you
should get a PDF of the same file size as the original JPEG. In
Leopard you'd see a PDF the size of the decompressed image. To take
another example, CoreGraphics caches, including uploads to the
graphics card, are tied to CGImage instances, so the more the same
instance can be used the better.
However: To some extent, the operations that are fast with
NSBitmapImageRep have changed. CGImages are not mutable,
NSBitmapImageRep is. If you modify an NSBitmapImageRep, internally it
will likely have to copy the data out of a CGImage, incorporate your
changes, and repack it as a new CGImage. So, basically, drawing
NSBitmapImageRep is fast, looking at or modifying its pixel data is
not. This was true in Leopard, but it's more true now.
The above steps do happen lazily: If you do something that causes
NSBitmapImageRep to copy data out of its backing CGImageRef (like call
bitmapData), the bitmap will not repack the data as a CGImageRef until
it is drawn or until it needs a CGImage for some other reason. So,
certainly accessing the data is not the end of the world, and is the
right thing to do in some circumstances, but in general you should be
thinking about drawing instead. If you think you want to work with
pixels, take a look at CoreImage instead - that's the API in our
system that is truly intended for pixel processing.
This coincides with safety. A problem we've seen with our SnowLeopard
changes is that apps are rather fond of hardcoding bitmap formats. An
NSBitmapImageRep could be 8, 32, or 128 bits per pixel, it could be
floating point or not, it could be premultiplied or not, it might or
might not have an alpha channel, etc. These aspects are specified with
bitmap properties, like -bitmapFormat. Unfortunately, if someone wants
to extract the bitmapData from an NSBitmapImageRep instance, they
typically just call bitmapData, treat the data as (say) premultiplied
32 bit per pixel RGBA, and if it seems to work, call it a day.
Now that NSBitmapImageRep is not processing data as much as it used
to, random bitmap image reps you may get ahold of may have different
formats than they used to. Some of those hardcoded formats might be
wrong.
The solution is not to try to handle the complete range of formats
that NSBitmapImageRep's data might be in, that's way too hard.
Instead, draw the bitmap into something whose format you know, then
look at that.
That looks like this:
NSBItmapImageRep *bitmapIGotFromAPIThatDidNotSpecifyFormat;
NSBitmapImageRep *bitmapWhoseFormatIKnow = [[NSBitmapImageRep alloc] initWithBitmapDataPlanes:NULL pixelsWide:width pixelsHigh:height
bitsPerSample:bps samplesPerPixel:spp hasAlpha:alpha isPlanar:isPlanar
colorSpaceName:colorSpaceName bitmapFormat:bitmapFormat bytesPerRow:rowBytes
bitsPerPixel:pixelBits];
[NSGraphicsContext saveGraphicsState];
[NSGraphicsContext setContext:[NSGraphicsContext graphicsContextWithBitmapImageRep:bitmapWhoseFormatIKnow]];
[bitmapIGotFromAPIThatDidNotSpecifyFormat draw];
[NSGraphicsContext restoreGraphicsState];
unsigned char *bitmapDataIUnderstand = [bitmapWhoseFormatIKnow bitmapData];
This produces no more copies of the data than just accessing
bitmapData of bitmapIGotFromAPIThatDidNotSpecifyFormat, since that
data would need to be copied out of a backing CGImage anyway. Also
note that this doesn't depend on the source drawing being a bitmap.
This is a way to get pixels in a known format for any drawing, or just
to get a bitmap. This is a much better way to get a bitmap than
calling -TIFFRepresentation, for example. It's also better than
locking focus on an NSImage and using -[NSBitmapImageRep
initWithFocusedViewRect:].
So, to sum up: (1) Drawing is fast. Playing with pixels is not. (2) If
you think you need to play with pixels, (a) consider if there's a way
to do it with drawing or (b) look into CoreImage. (3) If you still
want to get at the pixels, draw into a bitmap whose format you know
and look at those pixels.
In fact, it's a good idea to start at the earlier section with a similar title — "NSImage, CGImage, and CoreGraphics impedance matching" — and read through to the later section.
By the way, there's a good chance that swapping the image reps would work, but you just weren't synchronizing them properly. You would have to show the code where both reps were used for us to know for sure.

3D entity always visible, even when behind another entity

Is there an easy way to show 3D entity at all times, even when that entity is hidden behind another entity? For example, I want that lines are always shown event when they are behind mesh surface.
I use Qt3D framework.
Assuming that you are talking about the Qt3D framework I want to extend the answer of Rabbid76.
To disable depth-testing in the Qt3D framework, add a QRenderStateSet to the framegraph branch that renders things (the one that as a QViewPort for example) and add a QDepthTest to it. Then, set the depth function of the QDepthTest to always. This way, the depth test is always passed and entities in the back will also be drawn, depending on the drawing order. You can use QSortPolicy to adjust the drawing order to back-to-front.
But this won't work when the camera position changes and your entity that you always want to be drawn is in the front. I'd suggest you add another framegraph branch and use a QLayerFilter to only deactivate depth-testing for this one entity.
If your entity looks weird when deactivating depth-testing (likely for complex objects), you could replace the QDepthTest by a QClearBuffers and simply clear the depth buffer.
Have a look at my answer here, where I showed an example of a custom framegraph with depth test.
If the depth test is disabled, then the geometry (like a line) is always drawn on top of the previously drawn geometry. The depth test can be disabled by:
glDisable(GL_DEPTH_TEST)
See glEnable
As an alternative the depth test function, can be set to let a fragment always pass depth test. In Qt this can be done by the class QDepthTest, using the enumerator constant Qt3DRender::QDepthTest::Always.
In this case, you have to take care about the order in which the geometry is drawn.
You have to find a way, to render the polygons (opaque geometry) first, by using the depth test function Qt3DRender::QDepthTest::Less.
After that you have to render the lines on top, by using the depth test function Qt3DRender::QDepthTest::Always.

Multithreaded shading pass Vulkan

I'm currently implementing a basic deferred renderer with multithreading in Vulkan. Since my G-Buffer should have the same resolution as the final image I want to do it in a single render-pass with multiple sub-passes, according to this presentation, on slide 44 (page 138). It says:
vkCmdBeginCommandBuffer
vkCmdBeginRenderPass
vkCmdExecuteCommands
vkCmdNextSubpass
vkCmdExecuteCommands
vkCmdEndRenderPass
vkCmdEndCommandBuffer
I get that in the first sub-pass, you iterate the scene graph and record one secondary commandbuffer for each entity/mesh. What I don't get is how you are supposed to do the shading pass with secondary command buffers. Do you somehow spit the screen into parts and render each part in a separate thread or just record one secondary commandbuffer for the entire second sub-pass?
To me, like you said, you can need to multi thread your command buffer for the "building g-buffer subpass". However for the shading pass, it must depends on how are you doing things. To me (again), you do not need to multi thread your shading subpasses. However, you must take into consideration that you can have one "by region dependency".
So, I encourage you to procede that way.
Before to begin your RenderPass, use a Compute Shader to splat all your lights on the screen (here you have a kind of array of "quad").
By splatting I mean this kind of thing. You have a point light (for example), the idea is to compute the quad in screen space affected by the light. With that you have 4 vertices (that represents a quad) that you put into a SSBO and you can use it as a vertex Buffer in the shading subpass.
Now you begin the render pass.
MT the scene graph rendering if needed. and do your vkCmdExecuteCommands();
NextSubpass
Use the "array of quads" you create from the earlier compute shader (do not forget a VK_SUBPASS_EXTERNAL dependency).
NextSubpass and so on
However, you said
you iterate the scene graph and record one secondary commandbuffer for each entity/mesh.
I am not sure I really understand what you meant, but if you intend to have one secondary command buffer for one mesh, I really advice you to change the way you are doing. You must use batching. Let's say you have 64 000 different meshes to draw. You could for exemple create 64 command buffers (that you dispatch on 4 threads) and each command buffers have 1000 meshes to draw. (The number are took randomly, so profile your application).
So to answer your question for the shading subpass, I would not use command buffers or only very few (by kind of lights (punctual, directional))
What I don't get is how you are supposed to do the shading pass with secondary command buffers.
The shading pass (assumably the second subpass) would possibly take the G-buffers created by the first subpass as an Input Attachment. Then it would draw to equally sized screen-size quad using data from the G-buffers + from a set of lights (or whatever your deferred shader tries to defer).
The presentation you link tries to hint at this structure style starting at page 13 (marked "Page 107").
First step would be to make it working. Use e.g. this SW example. Then the next step of optimizing it into single renderpass should be easier.

OpenGL ES 2.0: Efficient Rendering of Static and Dynamic Vertex Data

I am writing an iOS/Android game and looking for the most performant way to render my vertex data with OpenGL ES 2.0. I have two different kinds of data: dynamic data that changes its attributes every frame, for example the player or animated background objects, and static data such as the static background or the terrain. I googled a lot since yesterday, but I could not find a clear and unique answer to the question of what is the best was to render such data.
There are basically three options for rendering such data (If I do not miss one. If so, feel free to correct me.):
Vertex Arrays Only:
Just fill your vertex every frame on the CPU (including the dynamic data).
Vertex Buffer Objects Only:
Allocate a VBO on the GPU with GL_DYNAMIC_DRAW where both, the dynamic and static data is stored. The dynamic data is then updated every frame via glBufferSubData.
Use both:
Static data is stored and render with a VBO and the dynamic data is rendered with a Vertex Array. With this option, we need two rendering passes, one for rendering the VBO and one for rendering the vertex array.
Since the first option does not exploit the immutability of the static data and since the third option requires two rendering passes, my guess is that I should go with the second option. However, I am absolutely not sure about this and I hope you can clarify my confusion.
Allocate two Vertex Buffer Objects. One with hint GL_DYNAMIC_DRAW that will be updated frequently. Allocate a second VBO for immutable data and use the hint GL_STATIC_DRAW. According to the API documentation, GL_STATIC_DRAW should be used for data that "will be modified once and used many times"; just what you need.
Speaking of two rendering passes here is probably a misuse of the term: what you do is to render your scene in two separate drawing commands. Since drawing commands run asynchronously, you should not expericence any performance hit by doing so.
A second rendering pass, on the other hand, is when you render the entire scene twice (see for example here) with different settings, or when you do some image processing effects on outputs of previous rendering passes.

Resources