Is it OK to read from an input attachment and write to the same attachment in the same drawcall?

Is it OK to read from an input attachment and write to the same attachment in the same drawcall? - graphics

I was wondering if an attachment is used as both input attachment and color/ds attachment, a drawcall read from the input attachment then write to the same color/ds attachment, is it allowed?
If the next drawcall is also doing the same thing, from the spec I see I need a vkCmdPipelineBarrier to make the next drawcall fetch correct results, but not sure about the same drawcall case.
Another question is can I use input attachment in the first subpass? like I attach the depth texture generated from a pre-z pass as depth attachment and input attachment?

It is possible to perform a read/modify/write (RMW) for the same image through color/input attachments in a shader, so long as:
You ensure that exactly one fragment shader will perform the RMW for a particular output value in the color attachment. This basically boils down to "no overdraw".
If you need to have overdraw (ie: multiple FS's doing repeated RMW operations to the same input/output), then between each set of overdrawing operations within a subpass, you must have a pipeline barrier. So you have to break up your rendering commands into small chunks. Note that for the barrier to work, you have to have a subpass self-dependency as part of this subpass's dependency graph, and the barrier needs to invoke it. Also, your self-dependency ought to be per-region, since you only care about the dependency between individual locations on the screen. You can't random-access input attachments, after all.
You can use any attachment as an input attachment on any subpass, so long as it makes sense to do so. If your loadOp said that you don't want to load data, then obviously it doesn't make sense to read from an image that has undefined values.

Using an attachment as both input attachment and color or depth/stencil attachment is known as a feedback loop, and essentially you get undefined results if you both read and write to the same parts of it without a pipeline barrier in between. Since you can't have a pipeline barrier within a draw call, you're out of luck.
You can use feedback loops in a well-defined way if all accesses are reads (e.g. depth test enabled but depth writes disabled) or for color attachments if the reads and writes access disjoint components (using color write mask).
For your second question, yes, an input attachment doesn't have to have been written earlier in the same renderpass. Though in your example, it might be best to do the z pre-pass in a first subpass and then use it as input attachment and read-only depth test in the second subpass. On tiled architectures, this might save bandwidth since the depth buffer would never have to be written to memory.

Related

Is it possible to have non-interleaved vertex data in libgdx?

From the code it looks like Vertex data is always stored interleaved.
I am trying to not use interleaving as I want to update one attribute every frame, while 3-4 other attributes stay the same. Perhaps separating out that attribute in one tightly packed buffer will allow buffer subregion changes to be sent faster to the GPU. Interleaved data will always transmit the entire buffer I guess?
Same issue applies to instance attributes, which are technically vertex attributes.

Why GBuffers need to be created for each frame in D3D12?

I have experience with D3D11 and want to learn D3D12. I am reading the official D3D12 multithread example and don't understand why the shadow map (generated in the first pass as a DSV, consumed in the second pass as SRV) is created for each frame (actually only 2 copies, as the FrameResource is reused every 2 frames).
The code that creates the shadow map resource is here, in the FrameResource class, instances of which is created here.
There is actually another resource that is created for each frame, the constant buffer. I kind of understand the constant buffer. Because it is written by CPU (D3D11 dynamic usage) and need to remain unchanged until the GPU finish using it, so there need to be 2 copies. However, I don't understand why the shadow map needs to do the same, because it is only modified by GPU (D3D11 default usage), and there are fence commands to separate reading and writing to that texture anyway. As long as the GPU follows the fence, a single texture should be enough for the GPU to work correctly. Where am I wrong?
Thanks in advance.
EDIT
According to the comment below, the "fence" I mentioned above should more accurately be called "resource barrier".

The key issue is that you don't want to stall the GPU for best performance. Double-buffering is a minimal requirement, but typically triple-buffering is better for smoothing out frame-to-frame rendering spikes, etc.
FWIW, the default behavior of DXGI Present is to stall only after you have submitted THREE frames of work, not two.
Of course, there's a trade-off between triple-buffering and input responsiveness, but if you are maintaining 60 Hz or better than it's likely not noticeable.
With all that said, typically you don't need to double-buffered depth/stencil buffers for rendering, although if you wanted to make the initial write of the depth-buffer overlap with the read of the previous depth-buffer passes then you would want distinct buffers per frame for performance and correctness.
The 'writes' are all complete before the 'reads' in DX12 because of the injection of the 'Resource Barrier' into the command-list:
void FrameResource::SwapBarriers()
{
// Transition the shadow map from writeable to readable.
m_commandLists[CommandListMid]->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(m_shadowTexture.Get(), D3D12_RESOURCE_STATE_DEPTH_WRITE, D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE));
}
void FrameResource::Finish()
{
m_commandLists[CommandListPost]->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(m_shadowTexture.Get(), D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE, D3D12_RESOURCE_STATE_DEPTH_WRITE));
}
Note that this sample is a port/rewrite of the older legacy DirectX SDK sample MultithreadedRendering11, so it may be just an artifact of convenience to have two shadow buffers instead of just one.

piping node.js object streams to multiple destinations is producing bizarre results -- why?

When piping one transform stream to two other transform streams, occasionally I'm getting a few of the objects from one destination stream appearing in place of the proper objects in the other destination stream. In a stream of 90,000 objects, in about 1 out of 3 runs about 10 objects starting at the sequence number about 10,000 are from the wrong stream (the start position of number of anomolous objects varies). What in the world could account for such bizarre results?
The setup:
sourceStream.pipe(processingStream1).pipe(check1);
processingStream1.pipe(check2).pipe(destinationStream1);
processingStream1.pipe(processingStream2).pipe(destinationStream2);
The sourceStream is a transform stream fed by a file read. The two destination streams are transform streams leading to file writes. Both the file read and file write are through the fs streaming API. All the streams rely on node.js automatic backpressure in piping.
Occasionally objects from processingStream2 are leaking into destinationStream1, as described above.
The checking streams (check1 a sink, check2 a passthrough) show the anomalous objects exist in the stream through check2 but not in the stream into check1.
The file reads and writes are of text (csv) files. I'm using Node.js version 8.6 on Windows 7 (though deserved, please don't throw rocks at me for the latter).
Suggestions on how to better isolate the problem also welcomed. The anomoly is structured enought that it doesn't seem like a generic memory leak, but not consistent enough to be a code error. I'm mystified.

Ugh! processingStream2 modifies the object in the stream coming through it (actually modifies a property of a sub-object). Apparently you can't count on the order of the pipes as controlling the order in changes in the streamed objects. Very occassionally, after sending the source objects through processingStream2, the input object to processingStream2 goes into processingStream1 via node internals. Probably as part of some optimization under the hood.
Lesson learned: don't change the input streamed object when piping to multiple destinations, even if you think you're making the change downstream. May you never have to learn this lesson the hard way!

How to update texture for every frame in vulkan?

As my question title says, I want update texture for every frame.
I got an idea :
create a VkImage as a texture buffer with bellow configurations :
initialLayout = VK_IMAGE_LAYOUT_PREINITIALIZED
usage= VK_IMAGE_USAGE_SAMPLED_BIT
and it's memory type is VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT
In draw loop :
first frame :
map texure data to VkImage(use vkMapMemory).
change VkImage layout from VK_IMAGE_LAYOUT_PREINITIALIZED to VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL.
use this VkImage as texture buffer.
second frame:
The layout will be VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL after the first frame , can I map next texure data to this VkImage directly without change it's layout ? if I can not do that, which layout can I change this VkImage to ?
In vkspec 11.4 it says :
The new layout used in a
transition must not be VK_IMAGE_LAYOUT_UNDEFINED or VK_IMAGE_LAYOUT_PREINITIALIZED
So , I can not change layout back to _PREINITIALIZED .
Any help will be appreciated.

For your case you do not need LAYOUT_PREINITIALIZED. That would only complicate your code (forcing you to provide separate code for the first frame).
LAYOUT_PREINITIALIZED is indeed a very special layout intended only for the start of the life of the Image. It is more useful for static textures.
Start with LAYOUT_UNDEFINED and use LAYOUT_GENERAL when you need to write the Image from CPU side.
I propose this scheme:
berfore render loop
Create your VkImage with UNDEFINED
1st to Nth frame (aka render loop)
Transition image to GENERAL
Synchronize (likely with VkFence)
Map the image, write it, unmap it (weell, the mapping and unmaping can perhaps be outside render loop)
Synchronize (potentially done implicitly)
Transition image to whatever layout you need next
Do your rendering and whatnot
start over at 1.
It is a naive implementation but should suffice for ordinary hobbyist uses.
Double buffered access can be implemented — that is e.g. VkBuffer for CPU access and VkImage of the same for GPU access. And VkCmdCopy* must be done for the data hand-off.
It is not that much more complicated than the above approach and there can be some performance benefits (if you need those at your stage of your project). You usually want your resources in device local memory, which often is not also host visible.
It would go something like:
berfore render loop
Create your VkBuffer b with UNDEFINED backed by HOST_VISIBLE memory and map it
Create your VkImage i with UNDEFINED backed by DEVICE_LOCAL memory
Prepare your synchronization primitives between i and b: E.g. two Semaphores, or Events could be used or Barriers if the transfer is in the same queue
1st to Nth frame (aka render loop)
Operations on b and i can be pretty detached (even can be on different queues) so:
For b:
Transition b to GENERAL
Synchronize to CPU (likely waiting on VkFence or vkQueueIdle)
invalidate(if non-coherent), write it, flush(if non-coherent)
Synchronize to GPU (done implicitly if 3. before queue submission)
Transition b to TRANSFER
Synchronize to make sure i is not in use (likely waiting on a VkSemaphore)
Transition i to TRANSFER
Do vkCmdCopy* from b to i
Synchronize to make known I am finished with i (likely signalling a VkSemaphore)
start over at 1.
(The fence at 2. and semaphore at 6. have to be pre-signalled or skipped for first frame to work)
For i:
Synchronize to make sure i is free to use (likely waiting on a VkSemaphore)
Transition i to whatever needed
Do your rendering
Synchronize to make known I am finished with i (likely signalling a VkSemaphore)
start over at 1.

You have a number of problems here.
First:
create a VkImage as a texture buffer
There's no such thing. The equivalent of an OpenGL buffer texture is a Vulkan buffer view. This does not use a VkImage of any sort. VkBufferViews do not have an image layout.
Second, assuming that you are working with a VkImage of some sort, you have recognized the layout problem. You cannot modify the memory behind the texture unless the texture is in the GENERAL layout (among other things). So you have to force a transition to that, wait until the transition command has actually completed execution, then do your modifications.
Third, Vulkan is asynchronous in its execution, and unlike OpenGL, it will not hide this from you. The image in question may still be accessed by the shader when you want to change it. So usually, you need to double buffer these things.
On frame 1, you set the data for image 1, then render with it. On frame 2, you set the data for image 2, then render with it. On frame 3, you overwrite the data for image 1 (using events to ensure that the GPU has actually finished frame 1).
Alternatively, you can use double-buffering without possible CPU waiting, by using staging buffers. That is, instead of writing to images directly, you write to host-visible memory. Then you use a vkCmdCopyBufferToImage command to copy that data into the image. This way, the CPU doesn't have to wait on events or fences to make sure that the image is in the GENERAL layout before sending data.
And BTW, Vulkan is not OpenGL. Mapping of memory is always persistent; there's no reason to unmap a piece of memory if you're going to map it every frame.

Objective C For loops with #autorelease and ARC

As part of an app that allows auditors to create findings and associate photos to them (Saved as Base64 strings due to a limitation on the web service) I have to loop through all findings and their photos within an audit and set their sync value to true.
Whilst I perform this loop I see a memory spike jumping from around 40MB up to 500MB (for roughly 350 photos and 255 findings) and this number never goes down. On average our users are creating around 1000 findings and 500-700 photos before attempting to use this feature. I have attempted to use #autorelease pools to keep the memory down but it never seems to get released.
for (Finding * __autoreleasing f in self.audit.findings){
#autoreleasepool {
[f setToSync:#YES];
NSLog(#"%#", f.idFinding);
for (FindingPhoto * __autoreleasing p in f.photos){
#autoreleasepool {
[p setToSync:#YES];
p = nil;
}
}
f = nil;
}
}
The relationships and retain cycles look like this
Audit has a strong reference to Finding
Finding has a weak reference to Audit and a strong reference to FindingPhoto
FindingPhoto has a weak reference to Finding
What am I missing in terms of being able to effectively loop through these objects and set their properties without causing such a huge spike in memory. I'm assuming it's got something to do with so many Base64 strings being loaded into memory when looping through but never being released.

So, first, make sure you have a batch size set on the fetch request. Choose a relatively small number, but not too small because this isn't for UI processing. You want to batch a reasonable number of objects into memory to reduce loading overhead while keeping memory usage down. Try 50 or 100 and see how it goes, then consider upping the batch size a little.
If all of the objects you're loading are managed objects then the correct way to evict them during processing is to turn them into faults. That's done by calling refreshObject:mergeChanges: on the context. BUT - that discards any changes, and your loop is specifically there to make changes.
So, what you should really be doing is batch saving the objects you've modified and then turning those objects back into faults to remove the data from memory.
So, in your loop, keep a counter of how many you've modified and save the context each time you hit that count and refresh all the objects that were processed so far. The batch on the fetch and the batch size to save should be the same number.

There's probably a big difference in size between your "Finding" objects and the associated images. So your primary aim should be to redesign your database in a way so that unfaulting (loading) a Finding object does not automatically load the base64 encoded image.
That's actually one of the major strengths of Code Data: Loading part of an object hierarchy. Just try to move the base64 encoded data to an own (managed) object so that Core Data does not load it. It will still be loaded as needed when the reference is touched.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string