Why is my Windows application/game capped at 30FPS when running on an Intel GPU? - graphics

I'm working on a fullscreen Windows desktop application that's moderately graphics-intensive, it uses OpenGL but only renders 2D content. Nothing fancy, mostly pushing pixels to the screen (up to 4K, single monitor) and uploading textures. We're using VSync to control the rendering framerate, ie. calling SwapBuffers() at the end of rendering to block until the next VBlank.
The main requirement we have is that the app runs at a solid 60FPS as it's used with a touchscreen, and interactions need to be as fluid as possible.
Because it's pretty basic, the app runs just fine on a 8th gen Intel i7 CPU with integrated Intel HD Graphics 630 GPU. Neither the CPU or GPU are anywhere near peak usage, and we can see that we're hitting a comfortable 60FPS through our in-app FPS meter. I also have it running with similar results on my Surface Book 2 with Intel i7 and integrated Intel UHD Graphics 620 GPU.
However, what I've recently started noticing is that the app sometimes starts dropping to 30FPS, then staying there either for long periods of time or sometimes even permanently. Through our FPS meter, I can tell that we're not actually spending any time rendering, it's just our SwapBuffers() call that blocks arbitrarily for 2 frames, capping us at 30FPS. The only way to get back to 60FPS is to alt-tab with another app and back to ours, or simply bringing the Windows menu up then going back to the app.
Because of the app going back to 60FPS afterwards, I'm positive that this is an intended behavior of the Intel driver, probably meant for gaming (gamers prefer a stable 30FPS rather than irregular/occasional dropped frames which make the game look choppy).
However in our case, dropping an occasional frame isn't a big deal, however being capped at 30FPS makes our UI and interactions far less pleasing to the eye, especially when it could easily render at a smooth 60FPS instead.
Is there any way to switch the driver behavior to prefer pushing 60FPS with occasional drops rather than capping at 30FPS?

OK so I was able to figure this out with a little bit of tweaking and reverse-engineering: The answer is that yes this is an intended but unfortunate default behavior of the Intel driver, and it can be fixed via the Intel HD Graphics Control Panel app if available, or directly in the registry otherwise (which is the only way to fix the issue on the Surface Book and other Surface devices, where the custom Intel driver doesn't expose the Intel HD Graphics Control Panel app anymore).
Starting with the simple solution: In the Intel HD Graphics Control Panel app, go to "3D", then "Application Settings". You'll first need to create an application profile, by selecting the file on disk for the process that creates the OpenGL window. Once that's done, the setting you want to adjust is "Vertical Sync". By default, "Use Application Default Settings" is selected. This is the setting that causes the capping at 30FPS. Select "Use Driver Settings" instead to disable that behavior and always target 60FPS:
This should've been pretty obvious, if it wasn't for Intel's horrible choice of terms and incomprehensible documentation. To me it looks like the choices for the settings are inverted: I would expect the capping to happen when I select "Use Driver Settings", which then implies the driver is free to adjust buffer swapping as it sees fit. Similarly, "Use Application Default Settings" implies that the app decides when to push frames, which is precisely the opposite of what the setting does. Even the little help bubbles in the app seem to contradict what these settings do...
ps: I'll post the registry-based solution in a separate answer to keep it short

Here is the registry-based answer, if your driver does not expose the Intel HD control panel (such as the driver used on the Surface Book and possibly other Surface laptops), or if you want to make that fix programmatically via regedit.exe or the Win32 API:
The application profiles created by the Intel HD control panel are saved in the registry under HKCU\Software\Intel\Display\igfxcui\3D using a key with the process file name (e.g. my_game.exe) and a REG_BINARY value with a 536-byte data blob divided like this:
Byte 0-3: Anisotropic Filtering (0 = use app default, 2-10 = multiplier setting)
Byte 4-7: Vertical Sync (0 = use app default, 1 = use driver setting)
Byte 8-11: MSAA (0 = use app default, 1 = force off)
Byte 12-15: CMAA (0 = use app default, 1 = override, 2 = enhance)
Byte 16-535: Application Display Name (wide-chars, for use in the control panel's application list)
Note: all values are stored in little-endian
In addition, you need to make sure that the Global value under the same key has its first byte set to 1, which is a sort of global toggle.(the control panel sets it to 1 when one or more entries are added to the applications list, then back to zero when the last entry is deleted from the list).
The format of value is also a REG_BINARY value with 8 bytes encoded like this:
Byte 0-3: Global toggle for application entries (0 = no entries, 1 = entries)
Byte 4-7: Application Optimal mode (0 = enabled, 1 = disabled)
For example:

Related

ESP32: BLE transmission speed is very slow

I am trying to build an Android app that interfaces with the ESP32 using BLE. I am using the RxBluetoothKotlin library from Vincent Masselis for the Android side. For the ESP32 side, I am using the default Kolban libraries that are included in the Arduino IDE. My phone is a OnePlus 5T and my ESP32 is a MH ET Live ESP32DevKIT. My Android app can be found here, and my ESP32 program here.
The whole system works pretty much perfectly for me in terms of pure functionality. That is to say, every button does what it's supposed to do, and I get the exact behaviour I had expected to get. However, the communication itself is very slow. Around 200 bytes/second. My test button in the Android app requests a bunch of text data from the ESP32, and displays this in a dialog. It also lists a number which represents the time between request and reception in milliseconds. Using this, I get around 2 seconds for 440 bytes of data. When I send less data, the time decreases approximately linearly with data size. 40 bytes of data will take around 200ms, and 20 bytes or under typically takes less than 100ms.
This seems rather slow to me. From what I understand, I should be able to at least get a few kilobytes per second. I have tried to check the speed using nRF Connect, but I get the same 2 seconds timespan for my data transfer. This suggests that the problem is not in my app, since I also have it with a completely different app. I also put the code in my main loop inside of callbacks instead (which I probably should have done in the first place), but this didn't change things at all. I have tried taking the microcontroller and my phone to a few different locations, hoping to eliminate interference. I have tried to mess with BLEDevice::setPower and BLEDevice::setMTU, as well as setting RxBluetoothGatt.requestMtu(500) on the Android side. Everything so far seems to have had little to no effect. The only thing that did anything, was adding the line "pServer->updatePeerMTU(0,500);" in my loop during the connection phase. This caused the first 23 bytes of data to be repeated whenever I pressed the test button in my app, and made the data transfer take about 3 seconds. If I'm lucky, I can get maybe a bit under 1.8 seconds for 440 bytes, but this is a very small change when I'm expecting an order of magnitude of difference, and might even be down to pure chance rather than anything I did.
Does anyone have an idea of how to increase my transfer speed?
The data transmission speed is mainly influenced by the Bluetooth LE connection interval (between 7.5 ms and 4 seconds) and is negotiated between the master (central unit) and the peripheral device. The master establishes a connection with a parameter set and the peripheral can propose to change this parameter set. In the end, however, the central unit decides which parameter set is to be used.
But the Bluetooth connection interval cannot be changed by an Android applications directly, which normally act as the central role. Instead it can request a connection priority which is known to have an influence on the connection interval.

How identify the primary video card on Linux programmatically?

The primary video card usually can be set in the BIOS (option Primary VGA card), and it will be the first card used by the system.
My question is how can I programmatically identify (using a shell script and utilities is preferable) which of my two video cards is the primary card?
Edit: I was hoping that I wouldn't have to elaborate why I need this, because it is a bit complex, but I will explain the problem below.
I have a configuration wizard which allows the dynamic configuration of a multiseat system in a LiveCD, with two independent displays, keyboards and mice, my wizard works in this way:
Start one Xorg server per video card (after generating xorg.conf with the correct PCI bus ID).
Start a ui in each of Xorgs with messages for configuration (press key and press mouse). One seat can be configured each time.
After the first seat is configured, the wizard closes the first Xorg and start the definitive Xorg in this seat with the desktop environment (already with the correct keyboard and mouse set).
The second seat now is able to be configured (press key and mouse), after this pass 3 is repeated for seat two.
The problem is: If I start the first Xorg in the primary video card, everything works, but if I start the first Xorg in the secondary card this is what happens:
The passes 1, 2, and 3 works, but at the end of pass 3, when the Xorg of the first seat is closed, the Xorg of the second seat blinks and doesn't come back, just show a blank screen with a _ cursor at top, the desktop of the first seat loads, but I don't see the wizard screen in the second seat, the first Xorg just comes back if I execute a kill -HUP , and I need to start the ui from it again.
So, it is why I need to identify the primary video card before I can start Xorg (sorry I didn't mention this before). I tested this system in different computers, with different video cards and the behavior is the same. I also tested with the lasted packages of the kernel and Xorg in Ubuntu 12.04 (packages of the raring release).
Assuming X11 is running, you could suppose that primary card is the one used by Xorg... then you might try
ls -l /proc/$(pidof X)/fd |grep /dev/dri
on my system Debian/Sid/x86-64 with Linux 3.12 kernel (which has an Nvidia card on a Intel3770K which also has its VGA) I'm getting /dev/dri/card0 etc...
But you should explain really why you are asking and what problem do you want to solve.... Why does that matter to you?
I am not at all sure that Linux has a notion of primary graphic card like the BIOS knows it.
And probably hwinfo is telling you everything about your graphical cards.
There are several command line tools in Linux that give you human readable information from the BIOS. Maybe you can find your Video boards in there and see which one is made primary. From what I see in my output, it does not look like something says "this is the primary video", but I do see quite a lot of information. You could output that information to a file when video card A is primary, again when B is primary, then compare those two files and see whether there is a difference.
The command I used, which gives me a lot of information, is dmidecode:
sudo dmidecode | less
If you look at the manual page:
man dmidecode
You will notice that the programmers offer a few other similar tools such as biosdecode and vpddecode.
From those you learn that the BIOS information is available from the /dev/mem device. Although you need to be root to read it, if you know the address (I don't) then you could go in there and peek and poke as required to find out where that one information of which video card is defined as the primary video card.
Running dmidecode, there are some details about my motherboard:
Handle 0x0002, DMI type 2, 15 bytes
Base Board Information
Manufacturer: Supermicro
Product Name: X9SCI/X9SCA
Version: 1.01
Serial Number: ZM25U44192
Asset Tag: To be filled by O.E.M.
Features:
Board is a hosting board
Board is replaceable
Location In Chassis: To be filled by O.E.M.
Chassis Handle: 0x0003
Type: Motherboard
Contained Object Handles: 0
Here I have one video entry:
Handle 0x000E, DMI type 10, 6 bytes
On Board Device Information
Type: Video
Status: Enabled
Description: To Be Filled By O.E.M.
Then the other entry looks like this:
Handle 0x0036, DMI type 41, 11 bytes
Onboard Device
Reference Designation: Onboard IGD
Type: Video
Status: Enabled
Type Instance: 1
Bus Address: 0000:00:02.0
It could also be something you need to read from the Flash memory your BIOS uses. This is done with flashrom (that you may need to install):
sudo flashrom --programmer internal --read my-flash.bin
In my case the ROM on my computer is 2Gb of data. So quite large. However, you can be sure that the information you are looking for exists within that data block since that is the only mean for the BIOS to save data that will stay around when the computer gets turned off.
I have found a way how to check the primary GPU when they are from different vendors or at least have different names.
In KDE go to Info Center, then open Graphics -> OpenGL. In Direct Rendering (GLX) and Direct Rendering (EGL) you can see a Driver block. You can see the Vendor and Device there. It will name the GPU which is primary.
At this screenshot you can see that AMD gpu is primary:
Also, you can get that Vendor value programmatically by running
glxinfo | grep "OpenGL vendor string" | cut -f2 -d":" | xargs.
I guess this method will stop working when kde will switch to vulkan for rendering (in kde 6). But for now I do not know another method of determining primary gpu.

Calculate latency for touch screen UI running on ARM controller board running Linux

I have an embedded board which has ARM controller, runs Linux as OS, which also has touch based screen. The data to the screen is taken from the Frame Buffer (/dev/fb0). Is there any way we can calculate the response time between two UI screen switching occurs when any option is selected by touch?
There are 3 latencies involved in the above scenario
1. Time taken for the touchscreen to register the finger and raise an input-event.
Usually a few milliseconds.
Enable FTRACE and log the following with timestamps
-- ISR
-- Entry of Bottom-half
-- Invoking of input_report()
2. Time taken by the app responsible for the GUI to update it.
Depending upon the app/framework, usually the most significant contributor to latency.
Add normal console logs with timestamps in the GUI app's code
-- upon receiving the input event
-- just before the command to modify the GUI
3. The time taken by the display to update.
Usually within 15-30 milliseconds
The final latency is a sum-total of the above 3 latencies.

Programatically painting pixels on the screen at the max (or next to) refresh rate (eg 60 hz) on linux?

I'm looking into rendering frames at a high rate (ideally next to the max monitor rate) and I was wondering if anyone had any idea at what level I should start looking into: kernel/driver level (OS space) ? X11 level ? svgalib (userspace) ?
On a modern computer, you can do it using the ordinary tools and APIs for graphics. If you have full frames full of random pixels, a simple bit blit from an in-memory buffer will perform more than adequately. Without any optimization work, I found that I could generate more than 500 frames per second on Windows XP using 2008 PCs.

Linux: Screen desktop video capture over network, and VNC framerate

Sorry for the wall of text - TL;DR:
What is the framerate of VNC connection (in frames/sec) - or rather, who determines it: client or server?
Any other suggestions for desktop screen capture - but "correctly timecoded"/ with unjittered framerate (with a stable period); and with possibility to obtain it as uncompressed (or lossless) image sequence?
Briefly - I have a typical problem that I am faced with: I sometimes develop hardware, and want to record a video that shows both commands entered on the PC ('desktop capture'), and responses of the hardware ('live video'). A chunk of an intro follows, before I get to the specific detail(s).
Intro/Context
My strategy, for now, is to use a video camera to record the process of hardware testing (as 'live' video) - and do a desktop capture at the same time. The video camera produces a 29.97 (30) FPS MPEG-2 .AVI video; and I want to get the desktop capture as an image sequence of PNGs at the same frame rate as the video. The idea, then, would be: if the frame rate of the two videos is the same; then I could simply
align the time of start of the desktop capture, with the matching point in the 'live' video
Set up a picture-in-picture, where a scaled down version of the desktop capture is put - as overlay - on top of the 'live' video
(where a portion of the screen on the 'live' video, serves as a visual sync source with the 'desktop capture' overlay)
Export a 'final' combined video, compressed appropriately for the Internet
In principle, I guess one could use a command line tool like ffmpeg for this process; however I would prefer to use a GUI for finding the alignment start point for the two videos.
Eventually, what I also want to achieve, is to preserve maximum quality when exporting the 'final' video: the 'live' video is already compressed when out of the camera, which means additional degradation when it passes through the Theora .ogv codec - which is why I'd like to keep the original videos, and use something like a command line to generate a 'final' video anew, if a different compression/resolution is required. This is also why I like to have the 'desktop capture' video as a PNG sequence (although I guess any uncompressed format would do): I take measures to 'adjust' the desktop, so there aren't many gradients, and lossless encoding (i.e. PNG) would be appropriate.
Desktop capture options
Well, there are many troubles in this process under Ubuntu Lucid, which I currently use (and you can read about some of my ordeals in 10.04: Video overlay/composite editing with Theora ogv - Ubuntu Forums). However, one of the crucial problems is the assumption, that the frame rate of the two incoming videos is equal - in reality, usually the desktop capture is of a lower framerate; and even worse, very often frames are out of sync.
This, then, requires the hassle of sitting in front of a video editor, and manually cutting and editing less-than-a-second clips on frame level - requiring hours of work for what will be in the end a 5 minute video. On the other hand, if the two videos ('live' and 'capture') did have the same framerate and sync: in principle, you wouldn't need more than a couple of minutes for finding the start sync point in a video editor - and the rest of the 'merged' video processing could be handled by a single command line. Which is why, in this post, I would like to focus on the desktop capture part.
As far as I can see, there are only few viable (as opposed to 5 Ways to Screencast Your Linux Desktop) alternatives for desktop capture in Linux / Ubuntu (note, I typically use a laptop as target for desktop capturing):
Have your target PC (laptop) clone the desktop on its VGA output; use a VGA-to-composite or VGA-to-S-video hardware to obtain a video signal from VGA; use video capture card on a different PC to grab video
Use recordMyDesktop on the target PC
Set up a VNC server (vino on Ubuntu; or vncserver) on the target PC to be captured; use VNC capture software (such as vncrec) on a different PC to grab/record the VNC stream (which can, subsequently, be converted to video).
Use ffmpeg with x11grab option
*(use some tool on the target PC, that would do a DMA transfer of a desktop image frame directly - from the graphics card frame buffer memory, to the network adapter memory)
Please note that the usefulness of the above approaches are limited by my context of use: the target PC that I want to capture, typically runs software (utilizing the tested hardware) that moves around massive ammounts of data; best you could say about describing such a system is "barely stable" :) I'd guess this is similar to problems gamers face, when wanting to obtain a video capture of a demanding game. And as soon as I start using something like recordMyDesktop, which also uses quite a bit of resources and wants to capture on the local hard disk - I immediately get severe kernel crashes (often with no vmcore generated).
So, in my context, I typically do assume involvement of a second computer - to run the capture and recording of the 'target' PC desktop. Other than that, the pros and cons I can see so far with the above options, are included below.
(Desktop preparation)
For all of the methods discussed below, I tend to "prepare" the desktop beforehand:
Remove desktop backgrounds and icons
Set the resolution down to 800x600 via System/Preferences/Monitors (gnome-desktop-properties)
Change color depth down to 16 bpp (using xdpyinfo | grep "of root" to check)
... in order to minimize the load on desktop capture software. Note that changing color depth on Ubuntu requires changes to xorg.conf; however, "No xorg.conf (is) found in /etc/X11 (Ubuntu 10.04)" - so you may need to run sudo Xorg -configure first.
In order to keep graphics resource use low, also I usually had compiz disabled - or rather, I'd have 'System/Preferences/Appearance/Visual Effects' set to "None". However, after I tried enabling compiz by setting 'Visual Effects' to "Normal" (which doesn't get saved), I can notice windows on the LCD screen are redrawn much faster; so I keep it like this, also for desktop capture. I find this a bit strange: how could more effects cause a faster screen refresh? It doesn't look like it's due to a proprietary driver (the card is "Intel Corporation N10 Family Integrated Graphics Controller", and no proprietary driver option is given by Ubuntu upon switch to compiz) - although, it could be that all the blurring and effects just cheat my eyes :) ).
Cloning VGA
Well, this is the most expencive option (as it requires additional purchase of not just one, but two pieces of hardware: VGA converter, and video capture card); and applicable mostly to laptops (which have both a screen + additional VGA output - for desktops one may also have to invest in an additional graphics card, or a VGA cloning hardware).
However, it is also the only option that requires no additional software of the target PC whatsoever (and thus uses 0% processing power of the target CPU) - AND also the only one that will give a video with a true, unjittered framerate of 30 fps (as it is performed by separate hardware - although, with the assumption that clock domains misalignment, present between individual hardware pieces, is negligible).
Actually, as I already own something like a capture card, I have already invested in a VGA converter - in expectation that it will eventually allow me to produce final "merged" videos with only 5 mins of looking for alignment point, and a single command line; but I am yet to see whether this process will work as intended. I'm also wandering how possible it will be to capture desktop as uncompressed video # 800x600, 30 fps.
recordMyDesktop
Well, if you run recordMyDesktop without any arguments - it starts first with capturing (what looks like) raw image data, in a folder like /tmp/rMD-session-7247; and after you press Ctrl-C to interrupt it, it will encode this raw image data into an .ogv. Obviously, grabbing large image data on the same hard disk as my test software (which also moves large ammounts of data), is usually a cause for an instacrash :)
Hence, what I tried doing is to setup Samba to share a drive on the network; then on the target PC, I'd connect to this drive - and instruct recordMyDesktop to use this network drive (via gvfs) as its temporary files location:
recordmydesktop --workdir /home/user/.gvfs/test\ on\ 192.168.1.100/capture/ --no-sound --quick-subsampling --fps 30 --overwrite -o capture.ogv
Note that, while this command will use the network location for temporary files (and thus makes it possible for recordMyDesktop to run in parallel with my software) - as soon as you hit Ctrl-C, it will start encoding and saving capture.ogv directly on the local hard drive of the target (though, at that point, I don't really care :) )
First of my nags with recordMyDesktop is that you cannot instruct it to keep the temporary files, and avoid encoding them, on end: you can use Ctrl+Alt+p for pause - or you can hit Ctrl-C quickly after the first one, to cause it to crash; which will then leave the temporary files (if you don't hit Ctrl-C quickly enough the second time, the program will "Cleanning up cache..."). You can then run, say:
recordmydesktop --rescue /home/user/.gvfs/test\ on\ 192.168.1.100/capture/rMD-session-7247/
... in order to convert the raw temporary data. However, more often than not, recordMyDesktop will itself segfault in the midst of performing this "rescue". Although, the reason why I want to keep the temp files, is to have the uncompressed source for the picture-in-picture montage. Note that the "--on-the-fly-encoding" will avoid using temp files altogether - at the expence of using more CPU processing power (which, for me, again is cause for crashes.)
Then, there is the framerate - obviously, you can set requested framerate using the '--fps N' option; however, that is no guarantee that you will actually obtain that framerate; for instance, I'd get:
recordmydesktop --fps 25
...
Saved 2983 frames in a total of 6023 requests
...
... for a capture with my test software running; which means that the actually achieved rate is more like 25*2983/6032 = 12.3632 fps!
Obviously, frames are dropped - and mostly that shows as video playback is too fast. However, if I lower the requested fps to 12 - then according to saved/total reports, I achieve something like 11 fps; and in this case, video playback doesn't look 'sped up'. And I still haven't tried aligning such a capture with a live video - so I have no idea if those frames that actually have been saved, also have an accurate timestamp.
VNC capture
The VNC capture, for me, consists of running a VNC server on the 'target' PC, and running vncrec (twibright edition) on the 'recorder' PC. As VNC server, I use vino, which is "System/Preferences/Remote Desktop (Preferences)". And apparently, even if vino configuration may not be the easiest thing to manage, vino as a server seems not too taxing to the 'target' PC; as I haven't experienced crashes when it runs in parallel with my test software.
On the other hand, when vncrec is capturing on the 'recorder' PC, it also raises a window showing you the 'target' desktop as it is seen in 'realtime'; when there are large updates (i.e. whole windows moving) on the 'target' - one can, quite visibly, see problems with the update/refresh rate on the 'recorder'. But, for only small updates (i.e. just a cursor moving on a static background), things seem OK.
This makes me wonder about one of my primary questions with this post - what is it, that sets the framerate in a VNC connection?
I haven't found a clear answer to this, but from bits and pieces of info (see refs below), I gather that:
The VNC server simply sends changes (screen changes + clicks etc) as fast as it can, when it receives them ; limited by the max network bandwidth that is available to the server
The VNC client receives those change events delayed and jittered by the network connection, and attempts to reconstruct the desktop "video" stream, again as fast as it can
... which means, one cannot state anything in terms of a stable, periodic frame rate (as in video).
As far as vncrec as a client goes, the end videos I get usually are declared as 10 fps, although frames can be rather displaced/jittered (which then requires the cutting in video editors). Note that the vncrec-twibright/README states: "The sample rate of the movie is 10 by default or overriden by VNCREC_MOVIE_FRAMERATE environment variable, or 10 if not specified."; however, the manpage also states "VNCREC_MOVIE_FRAMERATE - Specifies frame rate of the output movie. Has an effect only in -movie mode. Defaults to 10. Try 24 when your transcoder vomits from 10.". And if one looks into "vncrec/sockets.c" source, one can see:
void print_movie_frames_up_to_time(struct timeval tv)
{
static double framerate;
....
memcpy(out, bufoutptr, buffered);
if (appData.record)
{
writeLogHeader (); /* Writes the timestamp */
fwrite (bufoutptr, 1, buffered, vncLog);
}
... which shows that some timestamps are written - but whether those timestamps originate from the "original" 'target' PC, or the 'recorder' one, I cannot tell.
EDIT: thanks to the answer by #kanaka, I checked through vncrec/sockets.c again, and can see that it is the writeLogHeader function itself calling gettimeofday; so the timestamps it writes are local - that is, they originate from the 'recorder' PC (and hence, these timestamps do not accurately describe when the frames originated on the 'target' PC).
In any case, it still seems to me, that the server sends - and vncrec as client receives - whenever; and it is only in the process of encoding a video file from the raw capture afterwards, that some form of a frame rate is set/interpolated.
I'd also like to state that on my 'target' laptop, the wired network connection is broken; so the wireless is my only option to get access to the router and the local network - at far lower speed than the 100MB/s that the router could handle from wired connections. However, if the jitter in captured frames is caused by wrong timestamps due to load on the 'target' PC, I don't think good network bandwidth will help too much.
Finally, as far as VNC goes, there could be other alternatives to try - such as VNCast server (promising, but requires some time to build from source, and is in "early experimental version"); or MultiVNC (although, it just seems like a client/viewer, without options for recording).
ffmpeg with x11grab
Haven't played with this much, but, I've tried it in connection with netcat; this:
# 'target'
ffmpeg -f x11grab -b 8000k -r 30 -s 800x600 -i :0.0 -f rawvideo - | nc 192.168.1.100 5678
# 'recorder'
nc -l 0.0.0.0 5678 > raw.video #
... does capture a file, but ffplay cannot read the captured file properly; while:
# 'target'
ffmpeg -f x11grab -b 500k -r 30 -s 800x600 -i :0.0 -f yuv4mpegpipe -pix_fmt yuv444p - | nc 192.168.1.100 5678
# 'recorder'
nc -l 0.0.0.0 5678 | ffmpeg -i - /path/to/samplimg%03d.png
does produce .png images - but with compression artifacts (result of the compression involved with yuv4mpegpipe, I guess).
Thus, I'm not liking ffmpeg+x11grab too much currently - but maybe I simply don't know how to set it up for my needs.
*( graphics card -> DMA -> network )
I am, admittedly, not sure something like this exists - in fact, I would wager it doesn't :) And I'm no expert here, but I speculate:
if DMA memory transfer can be initiated from the graphics card (or its buffer that keeps the current desktop bitmap) as source, and the network adapter as destination - then in principle, it should be possible to obtain an uncompressed desktop capture with a correct (and decent) framerate. The point in using DMA transfer would be, of course, to relieve the processor from the task of copying the desktop image to the network interface (and thus, reduce the influence the capturing software can have on the processes running on the 'target' PC - especially those dealing with RAM or hard-disk).
A suggestion like this, of course, assumes that: there are massive ammounts of network bandwidth (for 800x600, 30 fps at least 800*600*3*30 = 43200000 bps = 42 MiB/s, which should be OK for local 100 MB/s networks); plenty of hard disk on the other PC that does the 'recording' - and finally, software that can afterwards read that raw data, and generate image sequences or videos based on it :)
The bandwidth and hard disk demands I could live with - as long as there is guarantee both for a stable framerate and uncompressed data; which is why I'd love to hear if something like this already exists.
-- -- -- -- --
Well, I guess that was it - as brief as I could put it :) Any suggestions for tools - or process(es), that can result with a desktop capture
in uncompressed format (ultimately convertible to uncompressed/lossless PNG image sequence), and
with a "correctly timecoded", stable framerate
..., that will ultimately lend itself to 'easy', single command-line processing for generating 'picture-in-picture' overlay videos - will be greatly appreciated!
Thanks in advance for any comments,
Cheers!
References
Experiences Producing a Screencast on Linux for CryptoTE - idlebox.net
The VideoLAN Forums • View topic - VNC Client input support (like screen://)
VNCServer throttles user inpt for slow client - Kyprianou, Mark - com.realvnc.vnc-list - MarkMail
Linux FAQ - X Windows: How do I Display and Control a Remote Desktop using VNC
How much bandwidth does VNC require? RealVNC - Frequently asked questions
x11vnc: a VNC server for real X displays
HowtoRecordVNC (an X11 session) - Debian Wiki
Alternative To gtk-RecordMyDesktop in Ubuntu
(Ffmpeg-user) How do I use pipes in ffmpeg
(ffmpeg-devel) (PATCH) Fix segfault in x11grab when drawing Cursor on Xservers that don't support the XFixes extension
You should get a badge for such a long well though out question. ;-)
In answer to your primary question, VNC uses the RFB protocol which is a remote frame buffer protocol (thus the acronym) not a streaming video protocol. The VNC client sends a FrameBufferUpdateRequest message to the server which contains a viewport region that the client is interested in and an incremental flag. If the incremental flag is not set then the server will respond with a FrameBufferUpdate message that contains the content of the region requested. If the incremental flag is set then the server may respond with a FrameBufferUpdate message that contains whatever parts of the region requested that have changed since the last time the client was sent that region.
The definition of how requests and updates interact is not crisply defined. The server won't necessarily respond to every request with an update if nothing has changed. If the server has multiple requests queued from the client it is also allowed to send a single update in response. In addition, the client really needs to be able to respond to an asynchronous update message from the server (not in response to a request) otherwise the client will fall out of sync (because RFB is not a framed protocol).
Often clients are simply implemented to send incremental update requests for the entire frame buffer viewport at a periodic interval and handle any server update messages as they arrive (i.e. no attempt is made to tie requests and updates together).
Here is a description of FrameBufferUpdateRequest messages.

Resources