Sending a webcam input to zoom using a recorded clip - linux

I have an idea that I have been working on, but there are some technical details that I would love to understand before I proceed.
From what I understand, Linux communicates with the underlying hardware through the /dev/. I was messing around with my video cam input to zoom and I found someone explaining that I need to create a virtual device and mount it to the output of another program called v4loop.
My questions are
1- How does Zoom detect the webcams available for input. My /dev directory has 2 "files" called video (/dev/video0 and /dev/video1), yet zoom only detects one webcam. Is the webcam communication done through this video file or not? If yes, why does simply creating one doesn't affect Zoom input choices. If not, how does zoom detect the input and read the webcam feed?
2- can I create a virtual device and write a kernel module for it that feeds the input from a local file. I have written a lot of kernel modules, and I know they have a read, write, release methods. I want to parse the video whenever a read request from zoom is issued. How should the video be encoded? Is it an mp4 or a raw format or something else? How fast should I be sending input (in terms of kilobytes). I think it is a function of my webcam recording specs. If it is 1920x1080, and each pixel is 3 bytes (RGB), and it is recording at 20 fps, I can simply calculate how many bytes are generated per second, but how does Zoom expect the input to be Fed into it. Assuming that it is sending the strean in real time, then it should be reading input every few milliseconds. How do I get access to such information?
Thank you in advance. This is a learning experiment, I am just trying to do something fun that I am motivated to do, while learning more about Linux-hardware communication. I am still a beginner, so please go easy on me.

Apparently, there are two types of /dev/video* files. One for the metadata and the other is for the actual stream from the webcam. Creating a virtual device of the same type as the stream in the /dev directory did result in Zoom recognizing it as an independent webcam, even without creating its metadata file. I did finally achieve what I wanted, but I used OBS Studio virtual camera feature that was added after update 26.0.1, and it is working perfectly so far.

Related

Easiest way to play mp3 files in python to a specific device

I'm converting an ESP32 project to a Raspberry Pi zero. One of the project behaviors is to play back sound effects based on specific events or triggers. I prefer to use MP3 format so I can store information about the contents of the file in the ID3TAGs to make the files themselves easier to manage. (there are a lot of them!)
I can find examples of using any number of libraries to play mp3s in python, and I found an example of selecting a device using 'sounddevice' but it seems to want numpy arrays to play sound data.
I'm wondering what the easiest and quickest way is to play mp3 files (or should I go to some other file format with a data stub file for each to do my file management?).
Since these behaviors are played as responses, they need to at least start playback quickly (i.e. not wait for a format conversion to take place). And in some cases, other behaviors (such as voice recognition triggers) are already going to add to potential latency on the device in it's total response time.
EDIT: additional info
quickest means processor speed (pi zeros slow down quick under heavy load)
These are real time responses so any 'lag' converting defeats the purpose of the playback.
Also, the device from seeed is configured as an alsa (asound) device

How to simultaneously play two PCM audio streams in one DMA

I am working on a project on the stm32f769i-disco board, which has to run a video game made by me. The graphical part is solved, where I have problems is when I need to play two or more wav files, I do not know which is the correct way to add them in a single stream.
What is the algorithm that I must follow for more than one stream?
PD: I know manage a single stream which is played through dma with double buffer.

Adding audio effects (reverb etc..) to a BackgroundAudioPlayer driven streaming audio app

I have a windows phone 8 app which plays audio streams from a remote location or local files using the BackgroundAudioPlayer. I now want to be able to add audio effects, for example, reverb or echo, etc...
Please could you advise me on how to do this? I haven't been able to find a way of hooking extra audio processing code into the pipeline of audio processing even through I've read much about WASAPI, XAudio2 and looked at many code examples.
Note that the app is written in C# but, from my previous experience with writing audio processing code, I know that I should be writing the audio code in native C++. Roughly speaking, I need to find a point at which there is an audio buffer containing raw PCM data which I can use as an input for my audio processing code which will then write either back to the same buffer or to another buffer which is read by the next stage of audio processing. There need to be ways of synchronizing what happens in my code with the rest of the phone's audio processing mechanisms and, of course, the process needs to be very fast so as not to cause audio glitches. Or something like that; I'm used to how VST works, not how such things might work in the Windows Phone world.
Looking forward to seeing what you suggest...
Kind regards,
Matt Daley
I need to find a point at which there is an audio buffer containing
raw PCM data
AFAIK there's no such point. This MSDN page hints that audio/video decoding is performed not by the OS, but by the Qualcomm chip itself.
You can use something like Mp3Sharp for decoding. This way the mp3 will be decoded on the CPU by your managed code, you can interfere / process however you like, then feed the PCM into the media stream source. Main downside - battery life: the hardware-provided codecs should be much more power-efficient.

low latency sounds on key presses

I am trying to write an application(I'm a gui first timer) for my son, he has autism. There is a video player in the top half and a text entry area in the bottom. When letters are typed sounds are produced to mimic the words in the video.
There have been other posts on this site in regard to playing sounds on key presses, using gstreamer as a system call. I have also tried libcanberra but both seem to have significant delays between sounds. I can write the app in python or C but will likely do at least some of it in C.
I also want to mention that the video portion is being played by gstreamer. I tried to create two instances of gstreamer, to avoid expensive system calls but the audio instance seemed to kill the app when called.
If anyone has any tips on creating faster responding sounds I would really appreciate it.
You can upload a raw audio sample directly to PulseAudio so there will be no decoding and (perhaps save) extra switches by using the following function from Canberra:
http://developer.gnome.org/libcanberra/unstable/libcanberra-canberra.html#ca-context-cache
The next ca_context_play() will use it.
However, the biggest problem you'll encounter with this scenario (with simultaneous video playback) is that the audio device might be configured with large latency with PulseAudio (up to 1/2s or more for normal playback). It may be reasonable to file a bug to libcanberra to support a LOW_LATENCY flag, as it currently doesn't attempt to minimize delay for sound events afaik. That would be great to have.
GStreamer pulsesink could probably get low latency too (it has some properties for that), but I am afraid it won't be as lightweight as libcanberra, and you won't be able to cache a sample for instance. Ideally, GStreamer could also learn to cache samples, or pre-fill PulseAudio...

Can v4l2 be used to read audio and video from the same device?

I have a capture card that captures SDI video with embedded audio. I have source code for a Linux driver, which I am trying to enhance to add video4linux2 support. My changes are based on the vivi example.
The problem I've come up against is that all the example I can find deal with only video or only audio. Even on the client side, everything seems to assume v4l is just video, like ffmpeg's libavdevice.
Do I need to have my driver create two separate devices, a v4l2 device and an alsa device? It seems like this makes the job of keeping audio and video in sync much more difficult.
I would prefer some way for each buffer passed between the driver and the app (through v4l2's mmap interface) contain a frame, plus some audio that matches up (with respect to time) with that frame.
Or perhaps have each buffer contain a flag indicating if it is a video frame, or a chunk of audio. Then the time stamps on the buffers could be used to sync things up.
But I don't see a way to do this with the V4L2 API spec, nor do I see any examples of v4l2-enabled apps (gstreamer, ffmpeg, transcode, etc) reading both audio and video from a single device.
Generally, the audio capture part of a device shows up as a separate device. It's usually a different physical device (posibly sharing a card), which makes sense. I'm not sure how much help that is, but it's how all of the software I'm familiar with works...
There are some spare or reserved fields in the v4l2 buffers that can be used to pass audio or other data from the driver to the calling application via pointers to mmaped buffers.
I modified the BT8x8 driver to use this approach to pass data from an A/D card synchronized to the video on Ubuntu 6.06.
It worked OK, but the effort of maintaining my modified driver caused me to abandon this approach.
If you are still interested I could dig out the details.
IF you want your driver to play with gstreamer etc. a separate audio device generally is what is expected.
Most of the cheap v4l2 capture card's audio is only an analog pass through with a volume control requiring a jumper to capture the audio via the sound card's line input.

Resources