This is an experimental project of mine related to remote browser isolation. I'm trying to intercept Skia draw commands in a running Chromium instance and later replay them in a different browser instance at client-side via CanvasKit, a WebAssembly build of Skia.
However, I'm having a hard time figuring out where and how to intercept those draw commands within Chromium source code. Any advice on how to approach my problem is much appreciated!
In Chromium, all the draw operations will be recorded in a DisplayItemList which you can find in the definition of class GraphicsContext in blink module. Second, these recorded operations will be replayed later when the CC thinks it will be right time.
On blink's end, all the things above related code was scattered mostly in blink/renderer/platform/graphics/graphics_context.cc and its related files. But if you see all Chromium as a whole, all the graphics things were triggered by CC (Chrome Compositor) which maintain a state machine and runs a draw frame loops triggered by system's vsync signal in Android. On this loop start, blink ends draw recording operations will be pushed. At the end of this loop, the composited frame's image will be translated to a serial of GPU ops and calling the system's GPU devices related APIs to do them. The CC related code files can
be found in components/viz/. You should read code of class Display as a starting key point.
My opinion comes from version 68 and you know code in Chromium changes frequently. So I cannot confirm the files and locations are still correct.
Related
In the official documentation on how to minimize shader jank, they mention how to do that for android and ios, but mention nothing about linux, so I tried running the command they mention flutter run --profile --cache-sksl --purge-persistent-cache , interacted with the app (as they request), and pressed M in the command line to
save the captured shaders into a file. but when I do this, I get this output
No data was received. To ensure SkSL data can be generated use a physical device then:
1. Pass "--cache-sksl" as an argument to flutter run.
2. Interact with the application to force shaders to be compiled.
My question Is this shader optimization technique available for linux, and if yes, then why no data is being received?
note I am sure that there are slow frames due to jank shaders in my app because I spot this in the devtools
and on the page above they say
Definitive evidence for the presence of shader compilation jank is to
see GrGLProgramBuilder::finalize in the tracing with --trace-skia
enabled.
I am implementing a simple Vulkan renderer according to a popular Vulkan tutorial (https://vulkan-tutorial.com/Introduction), and I've run into an interesting issue with the presentation mode and the desktop environment performance.
I wrote the triangle demo on Windows, and it performed well; however, I ported it to my Ubuntu installation (running MATE 1.20.1) and discovered a curious problem with the performance of the entire desktop environment while running it; certain swapchain presentation modes seem to wreak utter havoc with the desktop environment.
When setting up a Vulkan swapchain with presentMode set to VK_PRESENT_MODE_FIFO_KHR and subsequently running the application, the entire desktop environment grinds to a halt whenever any window is dragged. When literally any window on the entire desktop is dragged, the entire desktop environment slows to a crawl, appearing to run at roughly 4-5 fps. However, when I replace the presentMode with VK_PRESENT_MODE_IMMEDIATE_KHR, the desktop environment is immune to this issue and does not suffer the performance issues when dragging windows.
When I researched this before asking here, I saw that several people discovered that they experienced this behavior when their application was delivering frames as fast as possible (not vsync'd), and that properly synchronizing with vsync resolved this stuttering. However, in my case, it's the opposite; when I use VK_PRESENT_MODE_IMMEDIATE_KHR, i.e., not waiting for vsync, the dragging performance is smooth, and when I synchronize with vsync with VK_PRESENT_MODE_FIFO_KHR, it stutters.
VK_PRESENT_MODE_FIFO_RELAXED_KHR produces identical (catastrophic) results as the standard FIFO mode.
I tried using the Compton GPU compositor instead of Compiz; the effect was still there (regardless of what window was being dragged, the desktop still became extremely slow) but was slightly less pronounced than when using Compiz.
I have fully implemented the VkSemaphore-based frame/image/swapchain synchronization scheme as defined in the tutorial, and I verified that while using VK_PRESENT_MODE_FIFO_KHR the application is only rendering frames at the target 60 frames per second. (When using IMMEDIATE, it runs at 7,700 fps.)
Most interestingly, when I measured the frametimes (using glfwGetTime()), during the periods when the window is being dragged, the frametime is extremely short. The screenshot shows this; you can see the extremely short/abnormal frame time when a window is being dragged, and then the "typical" frametime (locked to 60fps) while the window is still.
In addition, only while using VK_PRESENT_MODE_FIFO_KHR, while this extreme performance degradation is being observed, Xorg pegs the CPU to 100% on one core, while the running Vulkan application uses a significant amount of CPU time as well (73%) as shown in the screenshot below. This spike is only observed while dragging windows in the desktop environment, and is not observed at all if VK_PRESENT_MODE_IMMEDIATE_KHR is used.
I am curious if anyone else has experienced this and if there is a known fix for this window behavior.
System info: Ubuntu 18.04, Mate 1.20.1 w/ Compiz, Nvidia proprietary drivers.
Edit: This Reddit thread seems to have a similar description of an issue; the VK_PRESENT_MODE_FIFO_KHR causing extreme desktop performance issues under Nvidia proprietary drivers.
Edit 2: This bug can be easily reproduced using vkcube from vulkan-tools. Compare the desktop performance of vkcube using --present-mode 0 vs --present-mode 2.
My question is about the delay between calling the present method in DirectX9 and the update appearing on the screen.
On a Windows system, I have a window opened using DirectX9 and update it in a simple way (change the color of the entire window, then call the IDirect3DSwapChain9's present method). I call the swapchain's present method with the flag D3DPRESENT_DONOTWAIT during a vertical blank interval. There is only one buffer associated with the swapchain.
I also obtain an external measurement of when the CRT screen I use actually changes color through a photodiode connected to the center of the screen. I obtain this measurement with sub-millisecond accuracy and delay.
What I found was that the changes appear exactly in the third refresh after the call to present(). Thus, when I call present() at the end of the vertical blank, just before the screen refreshing, the change will appear on the screen exactly 2*screen_duration + 0.5*refresh_duration after the call to present().
My question is a general one:
in how far can I rely on this delay (changes appearing in the third refresh) being the same on different systems ...
... or does it vary with monitors (leaving aside the response times of LCD and LED monitors)
... or with graphics-cards
are there other factors influencing this delay
An additional question:
does anybody know a way of determining, within DirectX9, when a change appeared on the screen (without external measurements)
There's a lot of variables at play here, especially since DirectX 9 itself is legacy and is effectively emulated on modern versions of Windows.
You might want to read Accurately Profiling Direct3D API Calls (Direct3D 9), although that article doesn't directly address presentation.
On Windows Vista or later, once you call Present to flip the front and back buffers, it's passed off to the Desktop Windows Manager for composition and eventual display. There are a lot of factors at play here including GPU vendor, driver version, OS version, Windows settings, 3rd party driver 'value add' features, full-screen vs. windowed mode, etc.
In short: Your Mileage May Vary (YMMV) so don't expect your timings to generalize beyond your immediate setup.
If your application requires knowing exactly when present happens instead of just "best effort" as is more common, I recommend moving to DirectX9Ex, DirectX 11, or DirectX 12 and taking advantage of the DXGI frame statistics.
In case somebody stumbles upon this with a similar question: I found out the reason why my screen update appears exactly on the third refresh after calling present(). As it turns out, the Windows OS by default queues exactly 3 frames before presenting them, and so changes appear on the third refresh. As it stands, this can only be "fixed" by the application starting with Directx10 (and Directx9Ex); for Directx9 and earlier, one has to either use the graphics card driver or the Windows registry to reduce this queueing.
How to get screenshot of graphical application programmatically? Application draw its window using EGL API via DRM/KMS.
I use Ubuntu Server 16.04.3 and graphical application written using Qt 5.9.2 with EGLFS QPA backend. It started from first virtual terminal (if matters), then it switch display to output in full HD graphical mode.
When I use utilities (e.g. fb2png) which operates on /dev/fb?, then only textmode contents of first virtual terminal (Ctrl+Alt+F1) are saved as screenshot.
It is hardly, that there are EGL API to get contents of any buffer from context of another process (it would be insecure), but maybe there are some mechanism (and library) to get access to final output of GPU?
One way would be to get a screenshot from within your application, reading the contents of the back buffer with glReadPixels(). Or use QQuickWindow::grabWindow(), which internally uses glReadPixels() in the correct way. This seems to be not an option for you, as you need to take a screenshot when the Qt app is frozen.
The other way would be to use the DRM API to map the framebuffer and then memcpy the mapped pixels. This is implemented in Chromium OS with Python and can be translated to C easily, see https://chromium-review.googlesource.com/c/chromiumos/platform/factory/+/367611. The DRM API can also be used by another process than the Qt UI process that does the rendering.
This is a very interesting question, and I have fought this problem from several angles.
The problem is quite complex and dependant on platform, you seem to be running on EGL, which means embedded, and there you have few options unless your platform offers them.
The options you have are:
glTexSubImage2D
glTexSubImage2D can copy several kinds of buffers from OpenGL textures to CPU memory. Unfortunatly it is not supported in GLES 2/3, but your embedded provider might support it via an extension. This is nice because you can either render to FBO or get the pixels from the specific texture you need. It also needs minimal code intervertion.
glReadPixels
glReadPixels is the most common way to download all or part of the GPU pixels which are already rendered. Albeit slow, it works on GLES and Desktop. On Desktop with a decent GPU is bearable up to interactive framerates, but beware on embedded it might be really slow as it stops your render thread to get the data (horrible framedrops ensured). You can save code as it can be made to work with minimal code modifications.
Pixel Buffer Objects (PBO's)
Once you start doing real research PBO's appear here and there because they can be made to work asynchronously. They are also generally not supported in embedded but can work really well on desktop even on mediocre GPU's. Also a bit tricky to setup and require specific render modifications.
Framebuffer
On embedded, sometimes you already render to the framebuffer, so go there and fetch the pixels. Also works on desktop. You can enven mmap() the buffer to a file and get partial contents easily. But beware in many embedded systems EGL does not work on the framebuffer but on a different 'overlay' so you might be snapshotting the background of it. Also to note some multimedia applications are run with UI's on the EGL and media players on the framebuffer. So if you only need to capture the video players this might work for you. In other cases there is EGL targeting a texture which is copied to the framebuffer, and it will also work just fine.
As far as I know render to texture and stream to a framebuffer is the way they made the sweet Qt UI you see on the Ableton Push 2
More exotic Dispmanx/OpenWF
On some embedded systems (notably the Raspberry Pi and most Broadcom Videocore's) you have DispmanX. Whichs is really interesting:
This is fun:
The lowest level of accessing the GPU seems to be by an API called Dispmanx[...]
It continues...
Just to give you total lack of encouragement from using Dispmanx there are hardly any examples and no serious documentation.
Basically DispmanX is very near to baremetal. So it is even deeper down than the framebuffer or EGL. Really interesting stuff because you can use vc_dispmanx_snapshot() and really get a snapshot of everything really fast. And by fast I mean I got 30FPS RGBA32 screen capture with no noticeable stutter on screen and about 4~6% of extra CPU overhead on a Rasberry Pi. Night and day because glReadPixels got was producing very noticeable framedrops even for 1x1 pixel capture.
That's pretty much what I've found.
I am thinking of creating an arcade machine for fun. Something like this one. I wonder if it's possible to get events from some game, e.g.Super Mario. Assume I finish a level and I want to get that event, with the score and some other data and perform some actions with that data. I am thinking of running the emulator in Windows. Did anybody work on something like this? Are there not too difficult ways to get events and data from old NES games? May be I should run not Windows, but some Linux for that? Well, please share your thoughts about how to do the software part of it.
Modern emulators such as FCEUX make it possible to interact with the running ROM through Lua scripts (see example video). Using this API you could write a Lua script to:
monitor a certain memory location
wait for it to hold some special value (such as level_just_finished)
read out the current score from memory
do something with the score
In order to know which memory locations to check, you will either need to disassemble the ROM or run it through a debugger, or both. As for Super Mario Bros, there's already a commented disassembly available. The FCEUX emulator also has a built-in debugger/disassembler that you can use.
All of this takes a lot of effort and you would need to know Lua, 6502 assembly, and the inner workings of an NES. For your arcade machine, you might be better off just using an emulator such as UberNES, which automatically can track your highscore for many popular titles.
Class NES games don't have standard hooks for achievement reporting. The only options I can think of are the following:
Rebuild the ROMs in question, with your own hooks (which a custom emulator could handle).
Watch the ROM memory footprint directly, and parse the state continually, triggering when you observe some known state.
Both options require that you really understand the internals of a NES ROM.
IRQ...Go for Interrupt_requests..they triger a interrupt...I have read / and seen the code about it somewhere...even x86 also uses IRQs for communciation with various device a simple exmaple:keyboard when a key is pressed a call is made ti PIC and an IRQ is generated and system knows which key is pressed and the same mech is used in NES