Access framebuffers from compute shaders - graphics

I have a compute shader which renders an image. The image is basically a finished frame.
I would like to render said image on the screen. The most obvious way to do this is to instead of rendering to this image, rendering straight to the frambuffer. I have been told however, that this requires the storage bit to be set on the framebuffer, which is not the case on my machine.
The next best thing is then to copy over the image to the framebuffer. This requires the target bit to be set on the framebuffer image, which luckily happens to be the case on my machine.
However, when I try to copy into the framebuffer, Vulkan gives an access error, saying the framebuffer is not initialised.
AccessError {
error: ImageNotInitialized { requested: PresentSrc },
command_name: "vkCmdCopyImage",
command_param: "destination", command_offset: 4
}
If it matters, I am using the Rust bindings to Vulkan. The code is quite bulky, the entire thing is available on GitLab. The command buffer is created as follows:
let command_buffer = AutoCommandBufferBuilder::new(queue.device().clone(), queue.family())?
.clear_color_image(image.clone(), ClearValue::Float([1.0, 0.0, 0.0, 1.0]))?
.dispatch(
[(WIDTH / 16) as u32, (HEIGHT / 16) as u32, 1],
compute_pipeline.clone(),
set.clone(),
(),
)?
.copy_image(
image.clone(),
[0, 0, 0],
0,
0,
render_manager.images[next_image_index].clone(),
[0, 0, 0],
0,
0,
[WIDTH as u32, HEIGHT as u32, 1],
1,
)?
.build()?;
I know I could just render to a quad on the image, which is what I am doing now, but it's a lot of bulky code that doesn't do much.

The doc says about ImageNotInitialized:
Trying to use an image without transitionning it from the "undefined" or "preinitialized" layouts first.

Related

What may be wrong about my use of SetGraphicsRootDescriptorTable in D3D12?

For 7 meshes that I would like to draw, I load 7 textures and create the corresponding SRVs in a descriptor heap. Then there's another SRV for IMGUI. There are also 3 CBVs, for triple buffer usage. So it should be like: | srv x7 | srv x1 | cbv x3| in the heap.
The problem I met is that when I called SetGraphicsRootDescriptorTable on range 0, which should be an SRV (which is the texture actually), something went wrong. Here's the code:
ID3D12DescriptorHeap* ppHeaps[] = { pCbvSrvDescriptorHeap, pSamplerDescriptorHeap };
pCommandList->SetDescriptorHeaps(_countof(ppHeaps), ppHeaps);
pCommandList->IASetPrimitiveTopology(D3D_PRIMITIVE_TOPOLOGY_TRIANGLELIST);
pCommandList->IASetIndexBuffer(pIndexBufferViewDesc);
pCommandList->IASetVertexBuffers(0, 1, pVertexBufferViewDesc);
CD3DX12_GPU_DESCRIPTOR_HANDLE srvHandle(pCbvSrvDescriptorHeap->GetGPUDescriptorHandleForHeapStart(), indexMesh, cbvSrvDescriptorSize);
pCommandList->SetGraphicsRootDescriptorTable(0, srvHandle);
pCommandList->SetGraphicsRootDescriptorTable(1, pSamplerDescriptorHeap->GetGPUDescriptorHandleForHeapStart());
If indexMesh is 5, SetGraphicsRootDescriptorTable will cause the following error though the render output seems still good. And when indexMesh is 6, the following error will still occur and there will be another identical error except that the offset 8 turns into 9.
D3D12 ERROR: CGraphicsCommandList::SetGraphicsRootDescriptorTable: Specified GPU Descriptor Handle (ptr = 0x400750000002c0 at 8 offsetInDescriptorsFromDescriptorHeapStart) of type CBV, for Root Signature (0x0000020A516E8BF0:'m_rootSignature')'s Descriptor Table (at Parameter Index [0])'s Descriptor Range (at Range Index [0] of type D3D12_DESCRIPTOR_RANGE_TYPE_SRV) have mismatching types. All descriptors of descriptor ranges declared STATIC (not-DESCRIPTORS_VOLATILE) in a root signature must be initialized prior to being set on the command list. [ EXECUTION ERROR #646: INVALID_DESCRIPTOR_HANDLE]
That is really weird, because I suppose that the only thing that may cause this is that cbvSrvDescriptorSize is not right. It is 64, and it is set by m_device->GetDescriptorHandleIncrementSize(D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV);which I think should work. Besides, if I set it to another value such as 32, the application would crash.
So if cbvSrvDescriptorSize is right, why would the correct indexMesh cause the wrong offset of the descriptor handle? The consequence of this error is that it seems to be influencing my CBV which breaks the render output. Any suggestion would be appreciated, thanks!
Thanks for Chuck's suggestion, here's the code about the rootSig:
CD3DX12_DESCRIPTOR_RANGE1 ranges[3];
ranges[0].Init(D3D12_DESCRIPTOR_RANGE_TYPE_SRV, 4, 0, 0, D3D12_DESCRIPTOR_RANGE_FLAG_DATA_STATIC);
ranges[1].Init(D3D12_DESCRIPTOR_RANGE_TYPE_SAMPLER, 1, 0);
ranges[2].Init(D3D12_DESCRIPTOR_RANGE_TYPE_CBV, 1, 0, 0, D3D12_DESCRIPTOR_RANGE_FLAG_DATA_STATIC);
CD3DX12_ROOT_PARAMETER1 rootParameters[3];
rootParameters[0].InitAsDescriptorTable(1, &ranges[0], D3D12_SHADER_VISIBILITY_PIXEL);
rootParameters[1].InitAsDescriptorTable(1, &ranges[1], D3D12_SHADER_VISIBILITY_PIXEL);
rootParameters[2].InitAsDescriptorTable(1, &ranges[2], D3D12_SHADER_VISIBILITY_ALL);
CD3DX12_VERSIONED_ROOT_SIGNATURE_DESC rootSignatureDesc;
rootSignatureDesc.Init_1_1(_countof(rootParameters), rootParameters, 0, nullptr, D3D12_ROOT_SIGNATURE_FLAG_ALLOW_INPUT_ASSEMBLER_INPUT_LAYOUT);
ComPtr<ID3DBlob> signature;
ComPtr<ID3DBlob> error;
ThrowIfFailed(D3DX12SerializeVersionedRootSignature(&rootSignatureDesc, featureData.HighestVersion, &signature, &error));
ThrowIfFailed(m_device->CreateRootSignature(0, signature->GetBufferPointer(), signature->GetBufferSize(), IID_PPV_ARGS(&m_rootSignature)));
NAME_D3D12_OBJECT(m_rootSignature);
And here's some declarations in the pixel shader:
Texture2DArray g_textures : register(t0);
SamplerState g_sampler : register(s0);
cbuffer cb0 : register(b0)
{
float4x4 g_mWorldViewProj;
float3 g_lightPos;
float3 g_eyePos;
...
};
It's not very often I come across the exact problem I'm experiencing (my code is almost verbatim) and it's an in-progress post! Let's suffer together.
My problem turned out to be the calls to CreateConstantBufferView()/CreateShaderResourceView() - I was passing srvHeap->GetCPUDescriptorHandleForHeapStart() as the destDescriptor handle. These need to be offset to match your table layout (the offsetInDescriptorsFromTableStart param of CD3DX12_DESCRIPTOR_RANGE1).
I found it easier to just maintain one D3D12_CPU_DESCRIPTOR_HANDLE to the heap and just increment handle.ptr after every call to CreateSomethingView() which uses that heap.
CD3DX12_DESCRIPTOR_RANGE1 rangesV[1] = {{}};
CD3DX12_DESCRIPTOR_RANGE1 rangesP[1] = {{}};
// Vertex
rangesV[0].Init(D3D12_DESCRIPTOR_RANGE_TYPE_CBV, 1, 0, 0, D3D12_DESCRIPTOR_RANGE_FLAG_NONE, 0); // b0 at desc offset 0
// Pixel
rangesP[0].Init(D3D12_DESCRIPTOR_RANGE_TYPE_SRV, 1, 0, 0, D3D12_DESCRIPTOR_RANGE_FLAG_NONE, 1); // t0 at desc offset 1
CD3DX12_ROOT_PARAMETER1 rootParameters[2] = {{}};
rootParameters[0].InitAsDescriptorTable(1, &rangesV[0], D3D12_SHADER_VISIBILITY_VERTEX);
rootParameters[1].InitAsDescriptorTable(1, &rangesP[0], D3D12_SHADER_VISIBILITY_PIXEL);
D3D12_CPU_DESCRIPTOR_HANDLE srvHeapHandle = srvHeap->GetCPUDescriptorHandleForHeapStart();
// ----
device->CreateConstantBufferView(&cbvDesc, srvHeapHandle);
srvHeapHandle.ptr += device->GetDescriptorHandleIncrementSize(D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV);
// ----
device->CreateShaderResourceView(texture, &srvDesc, srvHeapHandle);
srvHeapHandle.ptr += device->GetDescriptorHandleIncrementSize(D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV);
Perhaps an enum would help keep it tidier and more maintainable, though. I'm still experimenting.

Why does directx9 AddDirtyRect cause memory violation on Intel integrated graphics?

I'm using the following code to lock an IDirect3DTexture9 for update. This piece of code works fine on my NVidia graphic card (NVidia GeForce GTX 970M) but causes memory violation on Intel integrated graphic card (Intel HD Graphics 530), even the texture is immediately unlocked and no data is written on locked region. x, y, w and h parameters are far from boundary conditions so the locked rect is totally inside the texture.
void InnerLock(int x, int y, int w, int h, D3DLOCKED_RECT *lr)
{
int left, right, top, bottom;
left = clamp(x, 0, Width() - 1);
top = clamp(y, 0, Height() - 1);
right = clamp(x + w, left + 1, Width());
bottom = clamp(y + h, top + 1, Height());
RECT rc = { left, top, right, bottom };
texture->LockRect(0, lr, &rc, D3DLOCK_NO_DIRTY_UPDATE);
// this line returns zero but causes an exception on igdumdim32.dll later
texture->AddDirtyRect(&rc)
// everything become all right when I set the whole texture as dirty region
//RECT fc = { 0, 0, Width(), Height() };
//texture->AddDirtyRect(&fc);
}
The AddDirtyRect operator returns correct value, but the error occurs later in igdumdim32.dll (I'm not sure where the error exactly occurs, maybe in the draw call).
I first found the error when using LockRect with zero flag. The program crashed on some rect parameter (in my case the error occurred when y value of the rect is large enough, but still smaller than texture height). Then I used D3DLOCK_NO_DIRTY_UPDATE and manually added dirty rect. The error only occurs when AddDirtyRect is called.
This error is reproduced on another user with intel graphics card. My operating system is Windows 10. All drivers are updated to the latest version. If you need any information please tell me. Thank you!

X11 - Setting Cursor Position Not Working

I'm trying to set my X11 cursor position. I tried calling XWarpPointer with the window set to None, root (DefaultRootWindow(display)), to the previously created window (XCreateWindow). The function IS being called, the mouse slows down a bit, but it does not physically move. Why could this be?
void GameWindow::ResetCursor() {
SetCursor(resX / 2, resY / 2);
}
void GameWindow::SetCursor(int x, int y) {
// Window root = DefaultRootWindow(display);
XWarpPointer(display, None, root, 0, 0, 0, 0, x, y);
XFlush(display);
}
EDIT: Here's the entire X11 Windowing file in case you can't find the reason here. https://gist.github.com/KarimIO/7db1f50778fda63a36c10242989baab6
The answer to this was relatively silly. I was using Gnome on Wayland, assuming it supported X11 as well. I assumed wrong.

Linux framebuffers with ARGB32. Alpha? How does a framebuffer support Alpha?

After looking at the source for Qt, it seems that it, and framebuffers in general, support alpha transparency.
static QImage::Format determineFormat(const fb_var_screeninfo &info, int depth)
{
const fb_bitfield rgba[4] = { info.red, info.green,
info.blue, info.transp };
QImage::Format format = QImage::Format_Invalid;
switch (depth) {
case 32: {
const fb_bitfield argb8888[4] = {{16, 8, 0}, {8, 8, 0},
{0, 8, 0}, {24, 8, 0}};
const fb_bitfield abgr8888[4] = {{0, 8, 0}, {8, 8, 0},
{16, 8, 0}, {24, 8, 0}};
if (memcmp(rgba, argb8888, 4 * sizeof(fb_bitfield)) == 0) {
format = QImage::Format_ARGB32;
} else if (memcmp(rgba, argb8888, 3 * sizeof(fb_bitfield)) == 0) {
format = QImage::Format_RGB32;
} else if (memcmp(rgba, abgr8888, 3 * sizeof(fb_bitfield)) == 0) {
format = QImage::Format_RGB32;
// pixeltype = BGRPixel;
}
break;
}
// code ommited
}
What does it mean if a framebuffer supports alpha? Don't framebuffers typically represent monitors?
I am investigating the possibility of sending the alpha channel out HDMI for video overlay on an FPGA chip, similar to this users question.
I am wondering that if I have an external monitor, that some hows registers itself within linux to have a depth of 32bits with an alpha channel, with this get sent out the HDMI?
The alpha component is not transmitted to the monitor. But,
Alpha might be used by the compositor, allowing a window on screen to be transparent. For example, you can use the alpha channel in a WebGL framebuffer to show the document underneath the WebGL canvas.
You might use the alpha component in your application, even if the compositor doesn't use it.
It is more convenient to waste a byte of memory per pixel than it is to have an odd-sized pixel. Hardware framebuffers support a variety of 1, 2, and 4-channel formats, but only a few 3-channel formats.
The HDMI cable itself can carry a small variety of different video formats, such as RGB and YCbCr, with variations in subsampling and bit depth. The advantage to even-sized pixel formats does not apply to streamed data.

Am I doing something wrong, or do Intel graphic cards suck so bad?

I have
VGA compatible controller: Intel Corporation 82G33/G31 Express Integrated Graphics Controller (rev 10) on Ubuntu 10.10 Linux.
I'm rendering statically one VBO per frame. This VBO has 30,000 triangles, with 3 lights and one texture, and I'm getting 15 FPS.
Are intel cards so bad, or am I doing sth wrong?
Drivers are standard, open source drivers from intel.
My code:
void init() {
glGenBuffersARB(4, vbos);
glBindBufferARB(GL_ARRAY_BUFFER_ARB, vbos[0]);
glBufferDataARB(GL_ARRAY_BUFFER_ARB, sizeof(GLfloat) * verticesNum * 3, vertXYZ, GL_STATIC_DRAW_ARB);
glBindBufferARB(GL_ARRAY_BUFFER_ARB, vbos[1]);
glBufferDataARB(GL_ARRAY_BUFFER_ARB, sizeof(GLfloat) * verticesNum * 4, colorRGBA, GL_STATIC_DRAW_ARB);
glBindBufferARB(GL_ARRAY_BUFFER_ARB, vbos[2]);
glBufferDataARB(GL_ARRAY_BUFFER_ARB, sizeof(GLfloat) * verticesNum * 3, normXYZ, GL_STATIC_DRAW_ARB);
glBindBufferARB(GL_ARRAY_BUFFER_ARB, vbos[3]);
glBufferDataARB(GL_ARRAY_BUFFER_ARB, sizeof(GLfloat) * verticesNum * 2, texXY, GL_STATIC_DRAW_ARB);
}
void draw() {
glPushMatrix();
const Vector3f O = ps.getPosition();
glScalef(scaleXYZ[0], scaleXYZ[1], scaleXYZ[2]);
glTranslatef(O.x() - originXYZ[0], O.y() - originXYZ[1], O.z()
- originXYZ[2]);
glEnableClientState(GL_VERTEX_ARRAY);
glEnableClientState(GL_COLOR_ARRAY);
glEnableClientState(GL_NORMAL_ARRAY);
glEnableClientState(GL_TEXTURE_COORD_ARRAY);
glBindBufferARB(GL_ARRAY_BUFFER_ARB, vbos[0]);
glVertexPointer(3, GL_FLOAT, 0, 0);
glBindBufferARB(GL_ARRAY_BUFFER_ARB, vbos[1]);
glColorPointer(4, GL_FLOAT, 0, 0);
glBindBufferARB(GL_ARRAY_BUFFER_ARB, vbos[2]);
glNormalPointer(GL_FLOAT, 0, 0);
glBindBufferARB(GL_ARRAY_BUFFER_ARB, vbos[3]);
glTexCoordPointer(2, GL_FLOAT, 0, 0);
texture->bindTexture();
glDrawArrays(GL_TRIANGLES, 0, verticesNum);
glBindBufferARB(GL_ARRAY_BUFFER_ARB, 0); //disabling VBO
glDisableClientState(GL_VERTEX_ARRAY);
glDisableClientState(GL_COLOR_ARRAY);
glEnableClientState(GL_NORMAL_ARRAY);
glEnableClientState(GL_TEXTURE_COORD_ARRAY);
glPopMatrix();
}
EDIT: maybe it's not clear - initialization is in different function, and is only called once.
A few hints:
Using that number of vertices you should interleave the arrays. Vertex caches usually don't hold more than 1000 entries. Interleaving the data of course implies that the data is hold by a single VBO.
Using glDrawArrays is suboptimal if there are a lot of shared vertices, which is likely the case for a (static) terrain. Instead draw using glDrawElements. You can use the index array to implement some cheap LOD
Experiment with the number of vertices in the index buffer given to glDrawArrays. Try batches of at most 2^14, 2^15 or 2^16 indices. This is again to relieve cache pressure.
Oh and in your code the last two lines
glDisableClientState(GL_VERTEX_ARRAY);
glDisableClientState(GL_COLOR_ARRAY);
glEnableClientState(GL_NORMAL_ARRAY);
glEnableClientState(GL_TEXTURE_COORD_ARRAY);
I think you meant those to be glDisableClientState.
Make sure your system has OpenGL acceleration enabled:
$ glxinfo | grep rendering
direct rendering: Yes
If you get 'no', then you don't have OpenGL acceleration.
Thanks fo answers.
Yeah, I have direct rendering on, according to glxinfo. In glxgears I get sth like 150 FPS, and games like Warzone or glest works fast enough. So probably problem is in my code.
I'll buy some real graphic card eventually anyway, but I wanted my game to work on integrated graphic cards too, that's why I posted this question.

Resources