I set a texture2d as a device render target. After draw, how can i read and write pixels from render target directly , and present it then.
If your intent is to use it as an input to another shader:
- simply ensure you created the texture2D with the D3D11_BIND_RENDER_TARGET and D3D11_BIND_SHADER_RESOURCE.
Bind the texture as render target and render to it.
Unbind it as a render target and Bind it as a shader resource and use it in the next shader.
Just note that a texture cannot be bound as a target and a resource at the same time.
If your intent is to access the texture on the CPU using C++ as an array of pixels, then you have to do some work. Unfortunately, due to current GPU architectures, it is not possible to directly access a texture2D's pixels since the pixels actually live in GPU memory potential in a special format(swizzled format).
Create a texture the GPU can render into.
Create a staging texture(D3D11_USAGE_STAGING) that will be used to receive the output of the GPU.
Render to the GPU texture.
Issue a ID3D11DeviceContext::CopyResource() or ID3D11DeviceContext::CopySubresource()
Call ID3D11DeviceContext::Map() on the staging resource to get access to the pixels.
Call ID3D11DeviceContext::Unmap() on the staging resource.
Call ID3D11DeviceContext::UpdateSubresource() to update the version of the resource the GPU has.
As you can see this is certainly not a trivial set of operations and goes against what the GPU architecture of today is optimized to do. I certainly would not recommend it.
If you do end up going down this path, be sure to also read about all the perf concerns GPU memory read back comes with: http://msdn.microsoft.com/en-us/library/windows/desktop/bb205132(v=vs.85).aspx#Performance_Considerations
Related
In the official documentation on how to minimize shader jank, they mention how to do that for android and ios, but mention nothing about linux, so I tried running the command they mention flutter run --profile --cache-sksl --purge-persistent-cache , interacted with the app (as they request), and pressed M in the command line to
save the captured shaders into a file. but when I do this, I get this output
No data was received. To ensure SkSL data can be generated use a physical device then:
1. Pass "--cache-sksl" as an argument to flutter run.
2. Interact with the application to force shaders to be compiled.
My question Is this shader optimization technique available for linux, and if yes, then why no data is being received?
note I am sure that there are slow frames due to jank shaders in my app because I spot this in the devtools
and on the page above they say
Definitive evidence for the presence of shader compilation jank is to
see GrGLProgramBuilder::finalize in the tracing with --trace-skia
enabled.
How to get screenshot of graphical application programmatically? Application draw its window using EGL API via DRM/KMS.
I use Ubuntu Server 16.04.3 and graphical application written using Qt 5.9.2 with EGLFS QPA backend. It started from first virtual terminal (if matters), then it switch display to output in full HD graphical mode.
When I use utilities (e.g. fb2png) which operates on /dev/fb?, then only textmode contents of first virtual terminal (Ctrl+Alt+F1) are saved as screenshot.
It is hardly, that there are EGL API to get contents of any buffer from context of another process (it would be insecure), but maybe there are some mechanism (and library) to get access to final output of GPU?
One way would be to get a screenshot from within your application, reading the contents of the back buffer with glReadPixels(). Or use QQuickWindow::grabWindow(), which internally uses glReadPixels() in the correct way. This seems to be not an option for you, as you need to take a screenshot when the Qt app is frozen.
The other way would be to use the DRM API to map the framebuffer and then memcpy the mapped pixels. This is implemented in Chromium OS with Python and can be translated to C easily, see https://chromium-review.googlesource.com/c/chromiumos/platform/factory/+/367611. The DRM API can also be used by another process than the Qt UI process that does the rendering.
This is a very interesting question, and I have fought this problem from several angles.
The problem is quite complex and dependant on platform, you seem to be running on EGL, which means embedded, and there you have few options unless your platform offers them.
The options you have are:
glTexSubImage2D
glTexSubImage2D can copy several kinds of buffers from OpenGL textures to CPU memory. Unfortunatly it is not supported in GLES 2/3, but your embedded provider might support it via an extension. This is nice because you can either render to FBO or get the pixels from the specific texture you need. It also needs minimal code intervertion.
glReadPixels
glReadPixels is the most common way to download all or part of the GPU pixels which are already rendered. Albeit slow, it works on GLES and Desktop. On Desktop with a decent GPU is bearable up to interactive framerates, but beware on embedded it might be really slow as it stops your render thread to get the data (horrible framedrops ensured). You can save code as it can be made to work with minimal code modifications.
Pixel Buffer Objects (PBO's)
Once you start doing real research PBO's appear here and there because they can be made to work asynchronously. They are also generally not supported in embedded but can work really well on desktop even on mediocre GPU's. Also a bit tricky to setup and require specific render modifications.
Framebuffer
On embedded, sometimes you already render to the framebuffer, so go there and fetch the pixels. Also works on desktop. You can enven mmap() the buffer to a file and get partial contents easily. But beware in many embedded systems EGL does not work on the framebuffer but on a different 'overlay' so you might be snapshotting the background of it. Also to note some multimedia applications are run with UI's on the EGL and media players on the framebuffer. So if you only need to capture the video players this might work for you. In other cases there is EGL targeting a texture which is copied to the framebuffer, and it will also work just fine.
As far as I know render to texture and stream to a framebuffer is the way they made the sweet Qt UI you see on the Ableton Push 2
More exotic Dispmanx/OpenWF
On some embedded systems (notably the Raspberry Pi and most Broadcom Videocore's) you have DispmanX. Whichs is really interesting:
This is fun:
The lowest level of accessing the GPU seems to be by an API called Dispmanx[...]
It continues...
Just to give you total lack of encouragement from using Dispmanx there are hardly any examples and no serious documentation.
Basically DispmanX is very near to baremetal. So it is even deeper down than the framebuffer or EGL. Really interesting stuff because you can use vc_dispmanx_snapshot() and really get a snapshot of everything really fast. And by fast I mean I got 30FPS RGBA32 screen capture with no noticeable stutter on screen and about 4~6% of extra CPU overhead on a Rasberry Pi. Night and day because glReadPixels got was producing very noticeable framedrops even for 1x1 pixel capture.
That's pretty much what I've found.
I want to get the address space layout from Intel Pin on Linux.
At first, I try to read file - /proc/PID/maps and get the address space layout. But when do you execute such part of code?
If you put it before PIN_StartProgram, the maps file will not contain some regions, like heap;
If you put it in the Fini, and hook it with PIN_AddFiniFunction(Fini, 0);, it should be good. However, when you just trace one ls execution, you cannot see any output related address space layout. That's wired.
Perhaps not the best solution, but it worked for me. The main problem is that when the tool starts, the address space is not prepared yet. You can wait until all of the images are loaded and then read the contents of procfs.
So you should add an instrumentation function for each image. For example, add the following statement to the main function:
IMG_AddInstrumentFunction(Image, 0);
Then you should read procfs, every time an image is loaded. This is because you do not know which image is the last image loaded (Of course, if you know which image is the last one, you can simply read the file only once, after that image is loaded):
VOID Image(IMG img, VOID *v)
{
...
/* open /proc/PID/maps and read its contents */
...
}
During the execution of the program, you always have the latest mappings of the address space and everything will be fine. Albeit, you should always be careful with runtime layout modifications, situations such as heap size increase using brk() system call.
Pin has a more fine-grained approach to address space layout. You can get callbacks for image loads using IMG_AddInstrumentFunction(), and get callbacks for heap allocations by instrumenting malloc() and free() calls using RTN_Replace() or even instrumenting mmap(), brk() and other syscalls for heap allocation with PIN_AddSyscallEntryFunction().
You can find examples for the use of these APIs in the Pin tutorial and the examples in the Pin kit.
To transfer my static data into the GPU, I'm thinking of having a single staging VkMemory object (ballpark 64MB), and using it as a rotating queue. However, I have multiple threads producing content (eg: rendering glyphs, loading files, procedural) and I'd like it if they could upload their data entirely by themselves (i.e. write plus submit Vulkan transfer commands).
I'm intending to keep the entire staging VkMemory permanently mapped (if this is dumb please say so) at least during loading (but perhaps longer if I want to stream data).
To achieve the above, once a thread's data is fully written/flushed to staging I'd like it to be able to immediately submit GPU transfer commands.
However, that means the GPU will be reading from one part of the VkMemory while other threads may be writing/flushing to it.
AFAIK I will also need to use image memory barriers for the transition from VK_IMAGE_LAYOUT_PREINITIALIZED to VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL.
I couldn't find anything on the spec explicitly saying this was legal or illegal, only that care should be taken to ensure synchronization. However, I didn't find enough detail for me to be sure one way or the other.
NOTE: The staging queue will need to ensure transfers have been completed before overwriting anything - I intend to keep a complimentary queue of VkFences for this.
Questions:
Is this OK?
Do I need to align each separate object to a page boundary? Or something else.
Am I correct in assuming that the image memory barrier (above) won't require the device to write to staging memory.
yes the spec talks about the region being read from and written to must be synced.
if the memory is not coherent then you must align the blocks being read from or written to to NonCoherentAtomSize
source: Vulkan spec under the note after the declaration of vkMapMemory
vkMapMemory does not check whether the device memory is currently in
use before returning the host-accessible pointer. The application must
guarantee that any previously submitted command that writes to this
range has completed before the host reads from or writes to that
range, and that any previously submitted command that reads from that
range has completed before the host writes to that region (see here
for details on fulfilling such a guarantee). If the device memory was
allocated without the VK_MEMORY_PROPERTY_HOST_COHERENT_BIT set, these
guarantees must be made for an extended range: the application must
round down the start of the range to the nearest multiple of
VkPhysicalDeviceLimits::nonCoherentAtomSize, and round the end of the
range up to the nearest multiple of
VkPhysicalDeviceLimits::nonCoherentAtomSize.
a layout transition may write to the memory however barriers will do their own syncing with regards to previous and subsequent memory accesses.
I'm using OpenGL via OpenTK and I'm rendering to a FramebufferObject in a background thread.
Now after each rendered frame, I want to display (part(s) of) the FBO in one or more OpenGL controls in my UI.
How does that work ?
You have two choices:
If your drivers support context sharing, you can bind the FBO texture on your OpenGL controls and display that directly (bind texture, render quad, done). Simple and fast - just make sure to synchronize your rendering with the display.
If your drivers don't, you'll have to readback the results of the rendering into a Bitmap object (or equivalent) via GL.ReadPixels. You can then re-upload them to your other OpenGL controls as textures or display them directly on non-OpenGL controls.
By default, OpenTK will always try to share contexts. Unfortunately, Intel drivers don't support context sharing, so you cannot use the first approach there.