play audio file over VoIP

play audio file over VoIP - audio

I want to implement a simple VoIP system which can achieve following;
The user uploads a mp3 or wav file and gives a phone number.
the given phone number is dialed, when the phone is picked, the uploaded mp3/wav file is played. once the whole file is played, the call is hung up.
i want to know if there is any opensource library which supports this?? or an opensource software using which i can achieve this?

I do similar testing as this for my job.
I have
a test framework on my box in my office using Freeswitch and I've created some users with passwords on the FreeSWITCH box.
Then I use a sip testing tool / client to manage the connection to the sip proxy, to another user.
For example... say my freeswitch is ip: 120.0.0.7
I am registering on that freeswitch as user 5000 and i want to call user 4000 who is also registered.
I use either SIPP (linux) or SIPCLI (windows.)
SIPP
The benefits of SIPP is that it's truly robust and can do a myriad of performance testing, and what not. But ot send audio it's a bit challenging, but it's doable. you're basically sending pcap's of recorded audio in some codec (g711, g729, etc.) so you run a command like:
sudo sipp -s [the phone number/ user] [your freeswitch] -sn uac_pcap -mi [your ip] -l 1 -m 1
The last two parameters (l and m) set how much load, by default sipp will send 10calls per sec. you prob dont want that. so l says "limit the calls to #" and m says "only run x calls at a time."
SIPCLI
The much easier method is sipcli (but it's a windows only tool.)
In sipcli, you basically can send a wav file, as well as text to speech. I love it. it has a built in library that will dial the number and you could pass something like -t "This is a test of the test harness for sip and v o i p testing." it would convert that to audio on the call, on the fly. you can also build out scenarios that point to wav files you've recorded....
SIPCLI would use a command like SIPP to connect:
sipcli [user/phone number] -d [domain or proxy] -t "This is text i want converted to speech on the phone call"
you could also pass in a link to a wav.
sipcli can also send dtmf tones, or you could point to wav's of dtmf tones.
the scenario editor is a bit complex at first, and takes a bit of getting used to. But once you get the hang of making scenario files, it's pretty easy.
Benefits of SIPP
SIPP can capture performance metrics (the over all time in ms between your configured start and end point)
SIPP can drive thousands of calls at your desired end
SIPP can ramp up calls or ramp them down on the fly
SIPP can generate statisics and csv files for analysis
SIPP scenarios you write are building the packets themselves. So you have more control over what your packet sends on the INVITE.
SIPP is open source
Negatives of SIPP
SIPP can NOT send a wav file
SIPP can NOT generate it's own dtmf tones (it uses pcaps, which can be problematic)
SIPP can NOT generate text to speech
SIPP is somewhat complicated to get going
Benefits of SIPCLI
SIPCLI can convert text to speech on the fly
SIPCLI can use recorded wav's to send to the recipient
SIPCLI is easy to use
SIPCLI can also act as a reciever (i.e. an IVR playing a greeting and taking input)
SIPCLI has some logic to validate data received (like user pressed #3, then #4.)
Negatives of SIPCLI
SIPCLI doesn't let you have access to the SIP headers it sends (so less control over the test)
SIPCLI doesn't do load or performance metrics
SIPCLI's editor is kinda difficult at first, but it's not as hard as learning SIPP's advanced features
SIPCLI is NOT opensource.... it's trial is 90% useful. To get the other 10% (longer phone calls) you need to purchase it at $70.
I've also tried other tools like PJSua, but these two are my bread and butter for testing the scenarios you are talking about.
Regarding the Framework/softwsitch/proxy... I use Freeswitch.

Yes You can use Asterisk, Freeswitch ( My personal preference) Or a number of other platforms similar to this.
Once you have freeswitch setup, check out this link to get it going:
http://wiki.freeswitch.org/wiki/Javascript_QuickStart

use ivrworx for simple testing
see streamer example.

Related

Intercepting Sound From Other Programs

I want to do a couple of things:
-I want to hear sound from all other programs through max, and max only.
-I want to edit that sound in real time and hear only the edited sound.
-I want to slow down the sound, while stacking the non-slowed, incoming input onto a buffer, which I can then speed through to catch up.
Is this possible in Max? I have had a lot of difficulty working even step 1. Even if I use my speakers as an input device, I am unable to monitor it let alone edit it. I am using Max for Live, for what it's worth.

Step 1 and 2
On Mac, you can use Loopback
You can set your system output to the loopback driver, then set the loopback driver as the input in Max and then the speakers as the output.
For Windows you would do the same, but with a different internal audio routing system like Jack
Step 3
You can do that with the buffer~ object. Of course the buffer will have a finite size, and storing hours of audio might be problematic, but minutes shouldn't be a problem on a decent computer. The buffer~ help file will show you the first steps needed to store and read audio from it.

Easiest way to play mp3 files in python to a specific device

I'm converting an ESP32 project to a Raspberry Pi zero. One of the project behaviors is to play back sound effects based on specific events or triggers. I prefer to use MP3 format so I can store information about the contents of the file in the ID3TAGs to make the files themselves easier to manage. (there are a lot of them!)
I can find examples of using any number of libraries to play mp3s in python, and I found an example of selecting a device using 'sounddevice' but it seems to want numpy arrays to play sound data.
I'm wondering what the easiest and quickest way is to play mp3 files (or should I go to some other file format with a data stub file for each to do my file management?).
Since these behaviors are played as responses, they need to at least start playback quickly (i.e. not wait for a format conversion to take place). And in some cases, other behaviors (such as voice recognition triggers) are already going to add to potential latency on the device in it's total response time.
EDIT: additional info
quickest means processor speed (pi zeros slow down quick under heavy load)
These are real time responses so any 'lag' converting defeats the purpose of the playback.
Also, the device from seeed is configured as an alsa (asound) device

Is G729 codec 32kbps or 8 kbps?

I'm building a VOIP app for iphone and android. I'm currently using the GSM codec ( I chose it arbitrarily) on both versions of my app and on my asterisk server.
Now that I'm fine tuning my app, I'd like to try different audio codecs. I'm considering G729. I did a research and wasn't sure why some sites say the G729 codec uses about 32kbps as in this site here
http://voip.about.com/od/voipbandwidth/f/How-Much-Of-My-Mobile-Data-Plan-Does-Voip-Consume.htm
while others say it is 8kpbs like this site here
http://www.javvin.com/protocolG7xx.html
I did some tests and it seems that 1 minute of conversation with the G729 codec uses up 0.5 mb of data. So it seems like the first link is correct. But i've seen other sites list similar stats of 8kbps...why the discrepancy?

If you look towards the bottom of the first link you show, it hints at the reason - the 8kbps is how much is used to encode the speech itself. You then need to send that encoded speech out over the network to the other end of the VoIP call, and hence need to pack it into an IP 'packet', typically using the RTP protocol.
The actual number of bits transmitted will depend on the number of samples taken per second, the number of samples packed into each IP packet, the protocol headers etc. Much of this is influenced by the codec chosen - the following link gives a good overview (see the table in the section titled 'VOIP - Per Call Bandwidth'):
http://www.cisco.com/en/US/tech/tk652/tk698/technologies_tech_note09186a0080094ae2.shtml

Bi-directional sniffing/snooping on an ALSA MIDI SysEx exchange

Does anyone know of a good way to get a bi-directional dump of MIDI SysEx data on Linux? (between a Yamaha PSR-E413 MIDI keyboard and a copy of the Yamaha MusicSoft Downloader running in Wine)
I'd like to reverse-engineer the protocol used to copy MIDI files to and from my keyboard's internal memory and, to do that, I need to do some recording of valid exchanges between the two.
The utility does work in Wine (with a little nudging) but I don't want to have to rely on a cheap, un-scriptable app in Wine when I could be using a FUSE filesystem.
Here's the current state of things:
My keyboard connects to my PC via a built-in USB-MIDI bridge. USB dumpers/snoopers are a possibility, but I'd prefer to avoid them if possible. I don't want to have to decode yet another layer of protocol encoding before I even get started.
I run only Linux. However, if there really is no other option than a Windows-based dumper/snooper, I can try getting the USB 1.1 pass-through working on my WinXP VirtualBox VM.
I run bare ALSA for my audio system with dmix for waveform audio mixing.
If a sound server is necessary, I'm willing to experiment with JACK.
No PulseAudio please. It took long enough to excise it from my system.
If the process involves ALSA MIDI routing:
a virtual pass-through device I can select from inside the Downloader is preferred because it often only appears in an ALSA patch bay GUI like patchage an instant before it starts communicating with the keyboard.
Neither KMIDIMon nor GMIDIMonitor support snooping bi-directionally as far as I can tell.
virmidi isn't relevant and I haven't managed to get snd-seq-dummy working.
I I suppose I could patch ALSA to get dumps if I really must, but it's really an option of last resort.
The vast majority of my programming experience is in Python, PHP, Javascript, and shell script.
I have almost no experience programming in C.
I've never even seen a glimpse of kernel-mode code.
I'd prefer to keep my system stable and my uptime high.

This question has been unanswered for some time and while I do not have an exact answer to your problem I maybe have something that can push you in the right direction (or maybe others with similar problems).
I had a similar albeit less complex problem when I wanted to sniff the data used to set and read presets on an Akai LPK25 MIDI keyboard. Similar to your setup the software to setup the keyboard can run in Wine but I also had no luck in finding a sniffer setup for Linux.
For the lack of an existing solution I rolled my own using ALSA MIDI routing over virmidi ports. I understand why you see them as useless because without additional software they cannot help at sniffing MIDI traffic.
My solution was programming a MIDI relay/bridge in Java where I read input from a virmidi port, display the data and send it further to the keyboard. The answer from the keyboard (if any) is also read, displayed and finally transmitted back to the virmidi port. The application in Wine can be setup to use the virmidi port for communication and in theory this process is completely transparent (except for potential latency issues). The application is written in a generic way and not hardcoded to my problem.
I was only dealing with SysEx messages of about 20 bytes length so I am not sure how well the software works for sniffing the transfer of large amounts of data. But maybe you can modify it / write your own program following the example.
Sources available here: https://github.com/hiben/MIDISpy
(Java 1.6, ant build file included, source is under BSD license)

I like using aseqdump for that.
http://www.linuxcommand.org/man_pages/aseqdump1.html

You could use virtual midi devices for this purpose. So you have to load snd_seq_dummy so that it creates at least two ports:
$ sudo modprobe -r snd_seq_dummy
$ sudo modprobe snd_seq_dummy ports=1 duplex=1
Then you should have a device named Midi through:
$ aconnect -i -o -l
client 0: 'System' [type=kernel]
0 'Timer '
1 'Announce '
client 14: 'Midi Through' [type=kernel]
0 'Midi Through Port-0:A'
1 'Midi Through Port-0:B'
client 131: 'VMPK Input' [type=user,pid=50369]
0 'in '
client 132: 'VMPK Output' [type=user,pid=50369]
0 'out '
I will take the port and device numbers form this example. You have to inspect them yourself according to your setup.
Now you plug your favourate MIDI Device to the Midi Through ports:
$ aconnect 132:0 14:0
$ aconnect 14:0 131:0
At this time you have a connection where you can spy on both devices at the same time. You could use aseqdump to spy the MIDI conversation. There are different possibilities. I suggest to spy the connection between the loopback devices and the real device. This allows you for rawmidi connections to the loopback devices.
$ aseqdump -p 14:0,132:0 | tee dump.log
Now everything is set up for use. You just have to be careful about port names in your MIDI application. It should read MIDI data from Midi Through Port-0:B and write data to Midi Through Port-0:B.
Some additional hint: You could use the graphical frontend patchage for connecting and inspecting the MIDI connections via drag and drop. If you do this you will see that every Midi Through port occurs twice once as input and once as output. Both have to be connected in order to make this setup work.
If you want to use GMidiMonitor or some other application you spy on both streams intermixed (without showing the direction) using aconnect suppose 129:0 is the Midi Monitor port :
$ aconnect 14:0 129:0
$ aconnect 132:0 129:0
If you want to have exact direction information you could add another GMidiMonitor instance that you connect only to one of the ports. The missing messages come from the other port.

What about using gmidimonitor? See http://home.gna.org/gmidimonitor/

Linux: Screen desktop video capture over network, and VNC framerate

Sorry for the wall of text - TL;DR:
What is the framerate of VNC connection (in frames/sec) - or rather, who determines it: client or server?
Any other suggestions for desktop screen capture - but "correctly timecoded"/ with unjittered framerate (with a stable period); and with possibility to obtain it as uncompressed (or lossless) image sequence?
Briefly - I have a typical problem that I am faced with: I sometimes develop hardware, and want to record a video that shows both commands entered on the PC ('desktop capture'), and responses of the hardware ('live video'). A chunk of an intro follows, before I get to the specific detail(s).
Intro/Context
My strategy, for now, is to use a video camera to record the process of hardware testing (as 'live' video) - and do a desktop capture at the same time. The video camera produces a 29.97 (30) FPS MPEG-2 .AVI video; and I want to get the desktop capture as an image sequence of PNGs at the same frame rate as the video. The idea, then, would be: if the frame rate of the two videos is the same; then I could simply
align the time of start of the desktop capture, with the matching point in the 'live' video
Set up a picture-in-picture, where a scaled down version of the desktop capture is put - as overlay - on top of the 'live' video
(where a portion of the screen on the 'live' video, serves as a visual sync source with the 'desktop capture' overlay)
Export a 'final' combined video, compressed appropriately for the Internet
In principle, I guess one could use a command line tool like ffmpeg for this process; however I would prefer to use a GUI for finding the alignment start point for the two videos.
Eventually, what I also want to achieve, is to preserve maximum quality when exporting the 'final' video: the 'live' video is already compressed when out of the camera, which means additional degradation when it passes through the Theora .ogv codec - which is why I'd like to keep the original videos, and use something like a command line to generate a 'final' video anew, if a different compression/resolution is required. This is also why I like to have the 'desktop capture' video as a PNG sequence (although I guess any uncompressed format would do): I take measures to 'adjust' the desktop, so there aren't many gradients, and lossless encoding (i.e. PNG) would be appropriate.
Desktop capture options
Well, there are many troubles in this process under Ubuntu Lucid, which I currently use (and you can read about some of my ordeals in 10.04: Video overlay/composite editing with Theora ogv - Ubuntu Forums). However, one of the crucial problems is the assumption, that the frame rate of the two incoming videos is equal - in reality, usually the desktop capture is of a lower framerate; and even worse, very often frames are out of sync.
This, then, requires the hassle of sitting in front of a video editor, and manually cutting and editing less-than-a-second clips on frame level - requiring hours of work for what will be in the end a 5 minute video. On the other hand, if the two videos ('live' and 'capture') did have the same framerate and sync: in principle, you wouldn't need more than a couple of minutes for finding the start sync point in a video editor - and the rest of the 'merged' video processing could be handled by a single command line. Which is why, in this post, I would like to focus on the desktop capture part.
As far as I can see, there are only few viable (as opposed to 5 Ways to Screencast Your Linux Desktop) alternatives for desktop capture in Linux / Ubuntu (note, I typically use a laptop as target for desktop capturing):
Have your target PC (laptop) clone the desktop on its VGA output; use a VGA-to-composite or VGA-to-S-video hardware to obtain a video signal from VGA; use video capture card on a different PC to grab video
Use recordMyDesktop on the target PC
Set up a VNC server (vino on Ubuntu; or vncserver) on the target PC to be captured; use VNC capture software (such as vncrec) on a different PC to grab/record the VNC stream (which can, subsequently, be converted to video).
Use ffmpeg with x11grab option
*(use some tool on the target PC, that would do a DMA transfer of a desktop image frame directly - from the graphics card frame buffer memory, to the network adapter memory)
Please note that the usefulness of the above approaches are limited by my context of use: the target PC that I want to capture, typically runs software (utilizing the tested hardware) that moves around massive ammounts of data; best you could say about describing such a system is "barely stable" :) I'd guess this is similar to problems gamers face, when wanting to obtain a video capture of a demanding game. And as soon as I start using something like recordMyDesktop, which also uses quite a bit of resources and wants to capture on the local hard disk - I immediately get severe kernel crashes (often with no vmcore generated).
So, in my context, I typically do assume involvement of a second computer - to run the capture and recording of the 'target' PC desktop. Other than that, the pros and cons I can see so far with the above options, are included below.
(Desktop preparation)
For all of the methods discussed below, I tend to "prepare" the desktop beforehand:
Remove desktop backgrounds and icons
Set the resolution down to 800x600 via System/Preferences/Monitors (gnome-desktop-properties)
Change color depth down to 16 bpp (using xdpyinfo | grep "of root" to check)
... in order to minimize the load on desktop capture software. Note that changing color depth on Ubuntu requires changes to xorg.conf; however, "No xorg.conf (is) found in /etc/X11 (Ubuntu 10.04)" - so you may need to run sudo Xorg -configure first.
In order to keep graphics resource use low, also I usually had compiz disabled - or rather, I'd have 'System/Preferences/Appearance/Visual Effects' set to "None". However, after I tried enabling compiz by setting 'Visual Effects' to "Normal" (which doesn't get saved), I can notice windows on the LCD screen are redrawn much faster; so I keep it like this, also for desktop capture. I find this a bit strange: how could more effects cause a faster screen refresh? It doesn't look like it's due to a proprietary driver (the card is "Intel Corporation N10 Family Integrated Graphics Controller", and no proprietary driver option is given by Ubuntu upon switch to compiz) - although, it could be that all the blurring and effects just cheat my eyes :) ).
Cloning VGA
Well, this is the most expencive option (as it requires additional purchase of not just one, but two pieces of hardware: VGA converter, and video capture card); and applicable mostly to laptops (which have both a screen + additional VGA output - for desktops one may also have to invest in an additional graphics card, or a VGA cloning hardware).
However, it is also the only option that requires no additional software of the target PC whatsoever (and thus uses 0% processing power of the target CPU) - AND also the only one that will give a video with a true, unjittered framerate of 30 fps (as it is performed by separate hardware - although, with the assumption that clock domains misalignment, present between individual hardware pieces, is negligible).
Actually, as I already own something like a capture card, I have already invested in a VGA converter - in expectation that it will eventually allow me to produce final "merged" videos with only 5 mins of looking for alignment point, and a single command line; but I am yet to see whether this process will work as intended. I'm also wandering how possible it will be to capture desktop as uncompressed video # 800x600, 30 fps.
recordMyDesktop
Well, if you run recordMyDesktop without any arguments - it starts first with capturing (what looks like) raw image data, in a folder like /tmp/rMD-session-7247; and after you press Ctrl-C to interrupt it, it will encode this raw image data into an .ogv. Obviously, grabbing large image data on the same hard disk as my test software (which also moves large ammounts of data), is usually a cause for an instacrash :)
Hence, what I tried doing is to setup Samba to share a drive on the network; then on the target PC, I'd connect to this drive - and instruct recordMyDesktop to use this network drive (via gvfs) as its temporary files location:
recordmydesktop --workdir /home/user/.gvfs/test\ on\ 192.168.1.100/capture/ --no-sound --quick-subsampling --fps 30 --overwrite -o capture.ogv
Note that, while this command will use the network location for temporary files (and thus makes it possible for recordMyDesktop to run in parallel with my software) - as soon as you hit Ctrl-C, it will start encoding and saving capture.ogv directly on the local hard drive of the target (though, at that point, I don't really care :) )
First of my nags with recordMyDesktop is that you cannot instruct it to keep the temporary files, and avoid encoding them, on end: you can use Ctrl+Alt+p for pause - or you can hit Ctrl-C quickly after the first one, to cause it to crash; which will then leave the temporary files (if you don't hit Ctrl-C quickly enough the second time, the program will "Cleanning up cache..."). You can then run, say:
recordmydesktop --rescue /home/user/.gvfs/test\ on\ 192.168.1.100/capture/rMD-session-7247/
... in order to convert the raw temporary data. However, more often than not, recordMyDesktop will itself segfault in the midst of performing this "rescue". Although, the reason why I want to keep the temp files, is to have the uncompressed source for the picture-in-picture montage. Note that the "--on-the-fly-encoding" will avoid using temp files altogether - at the expence of using more CPU processing power (which, for me, again is cause for crashes.)
Then, there is the framerate - obviously, you can set requested framerate using the '--fps N' option; however, that is no guarantee that you will actually obtain that framerate; for instance, I'd get:
recordmydesktop --fps 25
...
Saved 2983 frames in a total of 6023 requests
...
... for a capture with my test software running; which means that the actually achieved rate is more like 25*2983/6032 = 12.3632 fps!
Obviously, frames are dropped - and mostly that shows as video playback is too fast. However, if I lower the requested fps to 12 - then according to saved/total reports, I achieve something like 11 fps; and in this case, video playback doesn't look 'sped up'. And I still haven't tried aligning such a capture with a live video - so I have no idea if those frames that actually have been saved, also have an accurate timestamp.
VNC capture
The VNC capture, for me, consists of running a VNC server on the 'target' PC, and running vncrec (twibright edition) on the 'recorder' PC. As VNC server, I use vino, which is "System/Preferences/Remote Desktop (Preferences)". And apparently, even if vino configuration may not be the easiest thing to manage, vino as a server seems not too taxing to the 'target' PC; as I haven't experienced crashes when it runs in parallel with my test software.
On the other hand, when vncrec is capturing on the 'recorder' PC, it also raises a window showing you the 'target' desktop as it is seen in 'realtime'; when there are large updates (i.e. whole windows moving) on the 'target' - one can, quite visibly, see problems with the update/refresh rate on the 'recorder'. But, for only small updates (i.e. just a cursor moving on a static background), things seem OK.
This makes me wonder about one of my primary questions with this post - what is it, that sets the framerate in a VNC connection?
I haven't found a clear answer to this, but from bits and pieces of info (see refs below), I gather that:
The VNC server simply sends changes (screen changes + clicks etc) as fast as it can, when it receives them ; limited by the max network bandwidth that is available to the server
The VNC client receives those change events delayed and jittered by the network connection, and attempts to reconstruct the desktop "video" stream, again as fast as it can
... which means, one cannot state anything in terms of a stable, periodic frame rate (as in video).
As far as vncrec as a client goes, the end videos I get usually are declared as 10 fps, although frames can be rather displaced/jittered (which then requires the cutting in video editors). Note that the vncrec-twibright/README states: "The sample rate of the movie is 10 by default or overriden by VNCREC_MOVIE_FRAMERATE environment variable, or 10 if not specified."; however, the manpage also states "VNCREC_MOVIE_FRAMERATE - Specifies frame rate of the output movie. Has an effect only in -movie mode. Defaults to 10. Try 24 when your transcoder vomits from 10.". And if one looks into "vncrec/sockets.c" source, one can see:
void print_movie_frames_up_to_time(struct timeval tv)
{
static double framerate;
....
memcpy(out, bufoutptr, buffered);
if (appData.record)
{
writeLogHeader (); /* Writes the timestamp */
fwrite (bufoutptr, 1, buffered, vncLog);
}
... which shows that some timestamps are written - but whether those timestamps originate from the "original" 'target' PC, or the 'recorder' one, I cannot tell.
EDIT: thanks to the answer by #kanaka, I checked through vncrec/sockets.c again, and can see that it is the writeLogHeader function itself calling gettimeofday; so the timestamps it writes are local - that is, they originate from the 'recorder' PC (and hence, these timestamps do not accurately describe when the frames originated on the 'target' PC).
In any case, it still seems to me, that the server sends - and vncrec as client receives - whenever; and it is only in the process of encoding a video file from the raw capture afterwards, that some form of a frame rate is set/interpolated.
I'd also like to state that on my 'target' laptop, the wired network connection is broken; so the wireless is my only option to get access to the router and the local network - at far lower speed than the 100MB/s that the router could handle from wired connections. However, if the jitter in captured frames is caused by wrong timestamps due to load on the 'target' PC, I don't think good network bandwidth will help too much.
Finally, as far as VNC goes, there could be other alternatives to try - such as VNCast server (promising, but requires some time to build from source, and is in "early experimental version"); or MultiVNC (although, it just seems like a client/viewer, without options for recording).
ffmpeg with x11grab
Haven't played with this much, but, I've tried it in connection with netcat; this:
# 'target'
ffmpeg -f x11grab -b 8000k -r 30 -s 800x600 -i :0.0 -f rawvideo - | nc 192.168.1.100 5678
# 'recorder'
nc -l 0.0.0.0 5678 > raw.video #
... does capture a file, but ffplay cannot read the captured file properly; while:
# 'target'
ffmpeg -f x11grab -b 500k -r 30 -s 800x600 -i :0.0 -f yuv4mpegpipe -pix_fmt yuv444p - | nc 192.168.1.100 5678
# 'recorder'
nc -l 0.0.0.0 5678 | ffmpeg -i - /path/to/samplimg%03d.png
does produce .png images - but with compression artifacts (result of the compression involved with yuv4mpegpipe, I guess).
Thus, I'm not liking ffmpeg+x11grab too much currently - but maybe I simply don't know how to set it up for my needs.
*( graphics card -> DMA -> network )
I am, admittedly, not sure something like this exists - in fact, I would wager it doesn't :) And I'm no expert here, but I speculate:
if DMA memory transfer can be initiated from the graphics card (or its buffer that keeps the current desktop bitmap) as source, and the network adapter as destination - then in principle, it should be possible to obtain an uncompressed desktop capture with a correct (and decent) framerate. The point in using DMA transfer would be, of course, to relieve the processor from the task of copying the desktop image to the network interface (and thus, reduce the influence the capturing software can have on the processes running on the 'target' PC - especially those dealing with RAM or hard-disk).
A suggestion like this, of course, assumes that: there are massive ammounts of network bandwidth (for 800x600, 30 fps at least 800*600*3*30 = 43200000 bps = 42 MiB/s, which should be OK for local 100 MB/s networks); plenty of hard disk on the other PC that does the 'recording' - and finally, software that can afterwards read that raw data, and generate image sequences or videos based on it :)
The bandwidth and hard disk demands I could live with - as long as there is guarantee both for a stable framerate and uncompressed data; which is why I'd love to hear if something like this already exists.
-- -- -- -- --
Well, I guess that was it - as brief as I could put it :) Any suggestions for tools - or process(es), that can result with a desktop capture
in uncompressed format (ultimately convertible to uncompressed/lossless PNG image sequence), and
with a "correctly timecoded", stable framerate
..., that will ultimately lend itself to 'easy', single command-line processing for generating 'picture-in-picture' overlay videos - will be greatly appreciated!
Thanks in advance for any comments,
Cheers!
References
Experiences Producing a Screencast on Linux for CryptoTE - idlebox.net
The VideoLAN Forums • View topic - VNC Client input support (like screen://)
VNCServer throttles user inpt for slow client - Kyprianou, Mark - com.realvnc.vnc-list - MarkMail
Linux FAQ - X Windows: How do I Display and Control a Remote Desktop using VNC
How much bandwidth does VNC require? RealVNC - Frequently asked questions
x11vnc: a VNC server for real X displays
HowtoRecordVNC (an X11 session) - Debian Wiki
Alternative To gtk-RecordMyDesktop in Ubuntu
(Ffmpeg-user) How do I use pipes in ffmpeg
(ffmpeg-devel) (PATCH) Fix segfault in x11grab when drawing Cursor on Xservers that don't support the XFixes extension

You should get a badge for such a long well though out question. ;-)
In answer to your primary question, VNC uses the RFB protocol which is a remote frame buffer protocol (thus the acronym) not a streaming video protocol. The VNC client sends a FrameBufferUpdateRequest message to the server which contains a viewport region that the client is interested in and an incremental flag. If the incremental flag is not set then the server will respond with a FrameBufferUpdate message that contains the content of the region requested. If the incremental flag is set then the server may respond with a FrameBufferUpdate message that contains whatever parts of the region requested that have changed since the last time the client was sent that region.
The definition of how requests and updates interact is not crisply defined. The server won't necessarily respond to every request with an update if nothing has changed. If the server has multiple requests queued from the client it is also allowed to send a single update in response. In addition, the client really needs to be able to respond to an asynchronous update message from the server (not in response to a request) otherwise the client will fall out of sync (because RFB is not a framed protocol).
Often clients are simply implemented to send incremental update requests for the entire frame buffer viewport at a periodic interval and handle any server update messages as they arrive (i.e. no attempt is made to tie requests and updates together).
Here is a description of FrameBufferUpdateRequest messages.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string