Tips to reduce opengl 3 & 4 frame-rate stuttering under Linux - linux

In recent years I've developed several small games and applications for OpenLG 2 and ES.
I'm now trying to build a scene-graph based on opengl 3+ for casual “3D” graphics on desktop systems. (Nothing complex like the unreal- or crytec-engine in mind.)
I started my development with OsX 10.7 and was impressed by Apples recent ogl 3.2 release which achieves equivalent results compared to windows systems.
Otherwise the results for Linux are a disappointment. Even the most basic animation is stuttering and destroys the impression of reality. The results did not differ between the windows toolkits freeglut and glfw. (The Extensions are loaded with glew 1.7)
I would like to mention that I'm talking about the new opengl core, not the old opengl 2 render-path, which works fine under Linux but uses the cpu instead of the gpu for complex operations.
After watching professional demos like the “Unigine heaven demo” I think there is a general problem to use modern real-time 3D graphics with Linux.
Any suggestions to overcome this problem are very welcome.
I'm using:
AMD Phenom II X6, Radeon HD57XX with latest proprietary drivers (11.8) and Unity(64Bit).
You could take my renderloop from the toolkit documentation:
do {
} while (!glfwGetKey(GLFW_KEY_ESC) && glfwGetWindowParam(GLFW_OPENED));
I'm using VBOs and all transformation stages are done with shaders. Animation timing is done with glfwGetTime(). This problem occurs in window and full-screen mode. I don't know if a composition manager interferes with full-screen applications. But it is also impossible to demand from the user to disable it.
Update 2:
Typo: I'm using a HD57XX card.
PCI Dump: 01:00.0 VGA compatible controller: ATI Technologies Inc Juniper [Radeon HD 5700 Series]
X Info:
Update 3:
Disabling the composition manager reduces, but did not completely remove the stuttering.
(I replaced the standard window manager with "ubuntu classic without extensions")
Once a second the animation freezes and ugly distortions appear:
Although Vertical synchronisation is enabled in the driver and checked in the application.

Since you're running Linux we require a bit of detailed information:
Which hardware do you use?
Only NVidia, AMD/ATI and Intel offer 3D acceleration so far.
Which drivers?
For NVidia and AMD/ATI there are propritary (nvidia-glx, fglrx) and open source drivers (nouveau, radeon). For Intel there are only the open source drivers.
Of all open source 3D drivers, the Intel drivers offer the best quality.
The open source AMD/ATI drivers, "radeon" have reached an acceptable state, but still are not on par, performance wise.
For NVidia GPUs, the only drivers that makes sense to use productively are the propritary ones. The open source "nouveau" drivers simply don't cut it, yet.
Do you run a compositing window manager?
Compositing creates a whole bunch of synchronization and timing issues. Also (some of) the OpenGL code you can find in the compositing WMs at some places drives tears into the eyes of a seasoned OpenGL coder, especially if one has experience writing realtime 3D (game) engines.
KDE4 and GNOME3 by default use compositing, if available. The same holds for the Ubuntu Unity desktop shell. Also for some non-compositing WMs the default skripts start xcompmgr for transparency and shadow effects.
And last but not least: How did you implement your rendering loop?
A mistake oftenly found is, that a timer is used to issue redisplay events at "regular" intervals. This is not how it's done properly. Timer events can be delayed arbitrarily, and the standard timers are not very accurate by themself, too.
The proper way is to call the display function in a tight loop and measure the time it takes between rendering iterations, then use this timing to advance the animation accordingly. A truly elegant method is using one of the VSync extensions that delivers one the display refresh frequency and the refresh counter. That way instead of using a timer you are told exactly the time advanced between frames in display refresh cycle periods.


Why does VK_PRESENT_MODE_FIFO_KHR cause catastrophic performance issues in Ubuntu MATE?

I am implementing a simple Vulkan renderer according to a popular Vulkan tutorial (, and I've run into an interesting issue with the presentation mode and the desktop environment performance.
I wrote the triangle demo on Windows, and it performed well; however, I ported it to my Ubuntu installation (running MATE 1.20.1) and discovered a curious problem with the performance of the entire desktop environment while running it; certain swapchain presentation modes seem to wreak utter havoc with the desktop environment.
When setting up a Vulkan swapchain with presentMode set to VK_PRESENT_MODE_FIFO_KHR and subsequently running the application, the entire desktop environment grinds to a halt whenever any window is dragged. When literally any window on the entire desktop is dragged, the entire desktop environment slows to a crawl, appearing to run at roughly 4-5 fps. However, when I replace the presentMode with VK_PRESENT_MODE_IMMEDIATE_KHR, the desktop environment is immune to this issue and does not suffer the performance issues when dragging windows.
When I researched this before asking here, I saw that several people discovered that they experienced this behavior when their application was delivering frames as fast as possible (not vsync'd), and that properly synchronizing with vsync resolved this stuttering. However, in my case, it's the opposite; when I use VK_PRESENT_MODE_IMMEDIATE_KHR, i.e., not waiting for vsync, the dragging performance is smooth, and when I synchronize with vsync with VK_PRESENT_MODE_FIFO_KHR, it stutters.
VK_PRESENT_MODE_FIFO_RELAXED_KHR produces identical (catastrophic) results as the standard FIFO mode.
I tried using the Compton GPU compositor instead of Compiz; the effect was still there (regardless of what window was being dragged, the desktop still became extremely slow) but was slightly less pronounced than when using Compiz.
I have fully implemented the VkSemaphore-based frame/image/swapchain synchronization scheme as defined in the tutorial, and I verified that while using VK_PRESENT_MODE_FIFO_KHR the application is only rendering frames at the target 60 frames per second. (When using IMMEDIATE, it runs at 7,700 fps.)
Most interestingly, when I measured the frametimes (using glfwGetTime()), during the periods when the window is being dragged, the frametime is extremely short. The screenshot shows this; you can see the extremely short/abnormal frame time when a window is being dragged, and then the "typical" frametime (locked to 60fps) while the window is still.
In addition, only while using VK_PRESENT_MODE_FIFO_KHR, while this extreme performance degradation is being observed, Xorg pegs the CPU to 100% on one core, while the running Vulkan application uses a significant amount of CPU time as well (73%) as shown in the screenshot below. This spike is only observed while dragging windows in the desktop environment, and is not observed at all if VK_PRESENT_MODE_IMMEDIATE_KHR is used.
I am curious if anyone else has experienced this and if there is a known fix for this window behavior.
System info: Ubuntu 18.04, Mate 1.20.1 w/ Compiz, Nvidia proprietary drivers.
Edit: This Reddit thread seems to have a similar description of an issue; the VK_PRESENT_MODE_FIFO_KHR causing extreme desktop performance issues under Nvidia proprietary drivers.
Edit 2: This bug can be easily reproduced using vkcube from vulkan-tools. Compare the desktop performance of vkcube using --present-mode 0 vs --present-mode 2.

AMD GPU won't work with Blender's "Cycles Render" on Linux

I've been working a lot with Blender and it's "Cycles Render" on Fedora lately. But Blender keeps getting a lot slower while rendering. So I discovered that my Blender is only capable of rendering with my CPU. I tried running Blender from the terminal, so I could see any errors. And if I set "Device" to "GPU Compute" in the rendering settings, I get this output:
DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument
Assuming 131072kB available aperture size.
May lead to reduced performance or incorrect rendering.
get chip id failed: -1 [2]
param: 4, val: 0
My machine's specifications are:
Operaring system: Fedora GNU/Linux 27
Blender version: 2.79
Graphics card: AMD Radeon RX 480 using "amdgpu" driver (default open-source driver)
So it seems like, Blender's Cycle Render won't work with my AMD GPU...
Any ideas?
As far as I've seen on the release docs, the Blender cycles engine is not yet fully optimized for all AMD graphics cards, currently they only support AMD cards with GCN architecture 2.0 and above. The dev team focuses on NVIDIA cards mostly (also blender is most optimized for windows).
However, you might as well try to change the settings, first you must make sure you are using OpenCL and not CUDA in your User Preferences, under the System tab, Compute Device(s). Then if your card is not supported, enable the experimental features on the render properties of your workspace, which warn you that will make everything go unstable, this usually enables most AMD GPUs to be selectable as a render device. here on the render properties, you will also be selecting the compute device you desire to use for each scene.
Also, Using an official AMD driver would make rendering faster (also this is a requirement by Blender to use AMD cards) but its not available for fedora as far as I know. I suggest changing your distro to Ubuntu.
EDIT: You MUST use an official AMD driver for the desired card, I have checked the card you have is on the list of supported cards, just that it IS a requirement to have the AMD driver and not opensource. this is the list of supported cards, according to the blender documentation.
But it must be a driver from this list:, according to the blender documentation.
Now if that doesn't solve the issue then it must be a hardware issue or blender bug, although you could try to run it on windows to discard it being a hardware issue, if you are willing to do a dual boot or usb boot test.

Get screenshot of EGL DRM/KMS application

How to get screenshot of graphical application programmatically? Application draw its window using EGL API via DRM/KMS.
I use Ubuntu Server 16.04.3 and graphical application written using Qt 5.9.2 with EGLFS QPA backend. It started from first virtual terminal (if matters), then it switch display to output in full HD graphical mode.
When I use utilities (e.g. fb2png) which operates on /dev/fb?, then only textmode contents of first virtual terminal (Ctrl+Alt+F1) are saved as screenshot.
It is hardly, that there are EGL API to get contents of any buffer from context of another process (it would be insecure), but maybe there are some mechanism (and library) to get access to final output of GPU?
One way would be to get a screenshot from within your application, reading the contents of the back buffer with glReadPixels(). Or use QQuickWindow::grabWindow(), which internally uses glReadPixels() in the correct way. This seems to be not an option for you, as you need to take a screenshot when the Qt app is frozen.
The other way would be to use the DRM API to map the framebuffer and then memcpy the mapped pixels. This is implemented in Chromium OS with Python and can be translated to C easily, see The DRM API can also be used by another process than the Qt UI process that does the rendering.
This is a very interesting question, and I have fought this problem from several angles.
The problem is quite complex and dependant on platform, you seem to be running on EGL, which means embedded, and there you have few options unless your platform offers them.
The options you have are:
glTexSubImage2D can copy several kinds of buffers from OpenGL textures to CPU memory. Unfortunatly it is not supported in GLES 2/3, but your embedded provider might support it via an extension. This is nice because you can either render to FBO or get the pixels from the specific texture you need. It also needs minimal code intervertion.
glReadPixels is the most common way to download all or part of the GPU pixels which are already rendered. Albeit slow, it works on GLES and Desktop. On Desktop with a decent GPU is bearable up to interactive framerates, but beware on embedded it might be really slow as it stops your render thread to get the data (horrible framedrops ensured). You can save code as it can be made to work with minimal code modifications.
Pixel Buffer Objects (PBO's)
Once you start doing real research PBO's appear here and there because they can be made to work asynchronously. They are also generally not supported in embedded but can work really well on desktop even on mediocre GPU's. Also a bit tricky to setup and require specific render modifications.
On embedded, sometimes you already render to the framebuffer, so go there and fetch the pixels. Also works on desktop. You can enven mmap() the buffer to a file and get partial contents easily. But beware in many embedded systems EGL does not work on the framebuffer but on a different 'overlay' so you might be snapshotting the background of it. Also to note some multimedia applications are run with UI's on the EGL and media players on the framebuffer. So if you only need to capture the video players this might work for you. In other cases there is EGL targeting a texture which is copied to the framebuffer, and it will also work just fine.
As far as I know render to texture and stream to a framebuffer is the way they made the sweet Qt UI you see on the Ableton Push 2
More exotic Dispmanx/OpenWF
On some embedded systems (notably the Raspberry Pi and most Broadcom Videocore's) you have DispmanX. Whichs is really interesting:
This is fun:
The lowest level of accessing the GPU seems to be by an API called Dispmanx[...]
It continues...
Just to give you total lack of encouragement from using Dispmanx there are hardly any examples and no serious documentation.
Basically DispmanX is very near to baremetal. So it is even deeper down than the framebuffer or EGL. Really interesting stuff because you can use vc_dispmanx_snapshot() and really get a snapshot of everything really fast. And by fast I mean I got 30FPS RGBA32 screen capture with no noticeable stutter on screen and about 4~6% of extra CPU overhead on a Rasberry Pi. Night and day because glReadPixels got was producing very noticeable framedrops even for 1x1 pixel capture.
That's pretty much what I've found.

Graphics development on ARM

I am planning to make a small OS and run a Tetris clone on it using an ARM Cortex-M3. Unfortunately, I am not able to buy any development boards as of now, so I will have to use simulators.
I have actually looked into QEMU which has LM3S6965EVB support, which contains an ARM Cortex-M3 processor. But apparently newer revisions of the board are not compatible with the model in QEMU as none of the examples I have downloaded from TI seem to work. Even the OLED display is different.
Another problem is to do graphics development as the OLED display for LM3S6965EVB has a really low resolution. I was able to get it up to 640x480 by editing the QEMU source but as I could not get any examples to work, so I don't know if it works either. Using the debug parameters for SSD0323, all I can see is that it accepts some of the data that is sent to initialize the device, then hangs...
I have considered choosing another board in QEMU but that would mean redoing many things from scratch when I get my hands on a real device, as the other ones are too powerful for something as simple as this.
What should I do? Are there any other simulators out there that can help me accomplish what I am trying to do? I want to develop a small OS and some small games.
Thanks in advance. I have been searching for a solution for days and I am really stuck.
How much you have to redo, in part, has to do with your software/system engineering, you can abstract where needed and only have to re-write the abstraction layer not the entire package. Actually you can do much of your software design/testing on your host system and never cross compile, only later cross compile to a simulator or real hardware.
For example, I assume you would construct the next video screen somewhere in ram then depending on the hardware change some bits in a register and page flip or have to do a copy from this frame buffer to the video/lcd in whatever form it wants. Using thumbulator you could build your screen in ram somewhere (in the simulated memory space) then add to the simulator when the simulation writes to such and such register take these bytes from ram and display them on the host computer (running the simulation) basically simulate some hardware. use sdl or basic X calls or whatever you prefer. I normally take snapshots to .bmp files (very easy to write) then look at them later.
Later, on hardware your abstracted update_screen() function would have hardware specific code to display that screen.
thumbulator only runs thumb instructions not ARM and not Thumb2, thumb being the common denominator between the arm processors (ARMv4T and newer except for cortex-m) and those that support thumb2 extensions (cortex-m). other than startup code the compiling and programming experience is the same across the arm family. the code (other than startup code and of course hardware specific accesses) will run across the arm family as well as the simulator. If you go to a cortex-m then adding an architecture specification to the command line will change the build from thumb only to thumb+thumb2 instructions giving you some performance boost. if you surf around my other projects on github you will find this idea reapeated over and over again, I have many simple cortex-m examples where I use gcc and llvm and build the same .C code with thumb instructions and thumb+thumb2.
Another completely different answer is get a GBA (Nintendo Game Boy Advance). You can get a GBA SP (has a backlit display, makes the whole experience better) for about $30 or so on ebay. You can buy flash cartridges that take sd cards for about the same amount. It has an ARM7TDMI, it runs thumb code much faster than ARM code, giving you that thumb experience in preparation for other/newer cores like the cortex-m. For another $30 you can get a game link cable, chop it up, attach a rs232 level shifter (I can talk you through all of this), and make a gba serial cable. My preferred setup is to have a flash cartridge that I have pre-programmed with a serial bootloader, I download the program over serial into ram then run from ram. This avoids having to yank the flash cartridge and/or sd card every time you re-compile the program. doable, and a cheaper solution but gets tiring fast.
If you have a Nintendo DS for $12 to $15 you can get an sd based flash cartridge that you can likewise use for development. I recommend learning the gba first, which you can do on the NDS if you buy a gba side memory cartridge (need a ds lite not an ndsi nor 3d) supported by the software on the cartridge. (the ez flash 3 in 1 gba size for example is a good one, as well as memory you can flash that one with the nds and carry it over to the gba (this is how I put my serial bootloader on it)). these loaders will let you put your .gba file on the nds cartridge sd card then load it into the gba cartridge and it switches the nds into gba mode and runs as a gba.
there are lots of other solutions, likely has a number of arm based boards that can drive lcds and/or come with lcds. You can go to earthlcd and get one of the serial based lcd panels that make for rapid development, later of course a cheaper solution is desired. Along the same lines you could instead simulate an earthlcd like thing using your host computer have the embedded microcontroller send screen updates over serial to the host and the host displays the graphics. Later replace that screen update with something else.
This latter solution, for about $20 you can get a stm32f4 discovery board, has a cortex-m4, runs up to 168MHz, has a number of serial ports of which at least two have pins not being used by something else you could easily have one port for debug messages and the other for this virtual serial screen. In the stm32f4 directory in my stm32vld repo on github I have a number of getting started examples for using that board (as well as the stm32vld which is a few bucks cheaper but not as powerful as this stm32f4). Likewise your host application can take keystrokes and turn them into user control/game control commands back into the game software on the microcontroller.
There is of course the beagleboard or hawkboard or raspberri pi when it comes out, or open-rd (I dont like the plug computer but do like the open rd) which have video processing and video output direct to a monitor and/or tv using composite or whatever. About $150 to $200 and it just works run with it. You definitely dont need to run linux on these platforms, you can make your own os or whatever you like and run that, very simple.
There are more solutions than you probably have time and/or money to pursue you need to find one that fits within your comfort or happyness zone for how you like to do development and try that path.

No touches after adding touchscreen driver to CE 6 in platform builder

I have added a TSHARC touchscreen driver to my Windows CE project, but the touch does not work. The dll is there, as is the touchscreen calibration executable. I have no visibility into which drivers are loaded and when. Any guidance would be appreciated.
You're going to have to do some debugging, and touchscreen drivers tend to be challenging because they get loaded into GWES and because the electrical characteristics of touchpanels change dramatically based on size and manufacturer. It's very rare for a driver to just work right out of the box - you almost always have to adjust sample timings and the like based on panel characteristics, and that's best done using an oscilloscope.
Things to check:
Is the driver getting loaded at all? A RETAILMSG/DEBUGMSG would tell you that
Are you getting touch interrupts?
After a down interrupt, is your code getting back to state to receive an up?
If you look at the timings from panel signals themselves, are you sampling when the signals are stable (i.e. you're not sampling too soon after the interrupt)?
Turns out it was a conflict between the OHCI driver and another USB driver already installed.
