How to only list audio devices using FFmpeg? - audio

I am doing an audio recording project using FFmpeg 5.0.1. To list all the available audio input devices for the user to select and use, I used the function avdevice_list_input_sources(). My code goes like this:
avdevice_list_input_sources(av_find_input_format("dshow"), NULL, NULL, &device_list);
Where device_list is declared as
AVDeviceInfoList* device_list;
But this way, all the input devices supported, including audio and video, are listed. This could compromise the program's stability, for the user could select a video device listed and cause the program to crash. So I need to find a way to only list audio input devices or delete the video devices from device_list. But so far, I have not found a feasible away.
Does anyone know how to do so?

Related

Sending a webcam input to zoom using a recorded clip

I have an idea that I have been working on, but there are some technical details that I would love to understand before I proceed.
From what I understand, Linux communicates with the underlying hardware through the /dev/. I was messing around with my video cam input to zoom and I found someone explaining that I need to create a virtual device and mount it to the output of another program called v4loop.
My questions are
1- How does Zoom detect the webcams available for input. My /dev directory has 2 "files" called video (/dev/video0 and /dev/video1), yet zoom only detects one webcam. Is the webcam communication done through this video file or not? If yes, why does simply creating one doesn't affect Zoom input choices. If not, how does zoom detect the input and read the webcam feed?
2- can I create a virtual device and write a kernel module for it that feeds the input from a local file. I have written a lot of kernel modules, and I know they have a read, write, release methods. I want to parse the video whenever a read request from zoom is issued. How should the video be encoded? Is it an mp4 or a raw format or something else? How fast should I be sending input (in terms of kilobytes). I think it is a function of my webcam recording specs. If it is 1920x1080, and each pixel is 3 bytes (RGB), and it is recording at 20 fps, I can simply calculate how many bytes are generated per second, but how does Zoom expect the input to be Fed into it. Assuming that it is sending the strean in real time, then it should be reading input every few milliseconds. How do I get access to such information?
Thank you in advance. This is a learning experiment, I am just trying to do something fun that I am motivated to do, while learning more about Linux-hardware communication. I am still a beginner, so please go easy on me.
Apparently, there are two types of /dev/video* files. One for the metadata and the other is for the actual stream from the webcam. Creating a virtual device of the same type as the stream in the /dev directory did result in Zoom recognizing it as an independent webcam, even without creating its metadata file. I did finally achieve what I wanted, but I used OBS Studio virtual camera feature that was added after update 26.0.1, and it is working perfectly so far.

How to read video file using v4l2

I want to read a video file using v4l2, say an AVI file. And read it frame by frame.
As far as I can tell I need to use the read() function. But how isn't very clear to me. There are also hardly any examples available. So maybe a simple example on how to do this would help.
This is not what the Video4Linux2 (V4L2) API is for. It is not designed for reading multimedia files from disk, decoding them and playing them. Rather, it is designed to interface to assorted multimedia input devices (like webcams, microphones, TV tuners, and video capture devices), capture A/V data, and play it.
Take it from the V4L2 API introduction:
Video For Linux Two is [...] a kernel interface for analog radio and
video capture and output drivers.
For reading an AVI file and decoding/playing it (programmatically) on Linux, look into FFmpeg or GStreamer.

Adding audio effects (reverb etc..) to a BackgroundAudioPlayer driven streaming audio app

I have a windows phone 8 app which plays audio streams from a remote location or local files using the BackgroundAudioPlayer. I now want to be able to add audio effects, for example, reverb or echo, etc...
Please could you advise me on how to do this? I haven't been able to find a way of hooking extra audio processing code into the pipeline of audio processing even through I've read much about WASAPI, XAudio2 and looked at many code examples.
Note that the app is written in C# but, from my previous experience with writing audio processing code, I know that I should be writing the audio code in native C++. Roughly speaking, I need to find a point at which there is an audio buffer containing raw PCM data which I can use as an input for my audio processing code which will then write either back to the same buffer or to another buffer which is read by the next stage of audio processing. There need to be ways of synchronizing what happens in my code with the rest of the phone's audio processing mechanisms and, of course, the process needs to be very fast so as not to cause audio glitches. Or something like that; I'm used to how VST works, not how such things might work in the Windows Phone world.
Looking forward to seeing what you suggest...
Kind regards,
Matt Daley
I need to find a point at which there is an audio buffer containing
raw PCM data
AFAIK there's no such point. This MSDN page hints that audio/video decoding is performed not by the OS, but by the Qualcomm chip itself.
You can use something like Mp3Sharp for decoding. This way the mp3 will be decoded on the CPU by your managed code, you can interfere / process however you like, then feed the PCM into the media stream source. Main downside - battery life: the hardware-provided codecs should be much more power-efficient.

I hear clicking in audio with a DirectShow graph created with Graph Edit yet player software on my PC plays audio smoothly

I have a DirectShow application that I built with Delphi 6 using the DSPACK component library. For two days I have been trying to solve a problem with audio playback. When I run the filter graph I create I hear repetitive clicks in the playback. What was really confusing was that the audio file I created simultaneously with my filter graph had clean continuous audio, not gaps. So I knew that the audio buffers were being delivered properly but something I was doing was "jamming up" the "live" playback. Or so I thought. I spent two days diagnosing the problem looking for semaphores being held too long (locks) or perhaps timestamp problems, which I documented in this other Stack Overflow post:
Getting stuttering during rendering of my DirectShow filter despite output file being "smooth"
A few minutes ago I decided to try a test with the Graph Edit utility. I created a dead simple graph consisting of just the capture device I was using (VOIP phone microphone), and the renderer device I was using (HD ATI Rear Audio output to headphones). Two filters total. Much to my surprise I heard the same clicking. So here was a case that did not involve my code at all and I heard clicking.
Then I changed the audio renderer in the Graph Edit created filter graph to the VOIP phone ear piece. The clicking went away.
Now I know there's a way to get smooth audio on ut the ATI Rear Audio device since its the preferred audio output device and everything from videos I play on my PC to wave files I play on it sound flawless. So are the other software programs doing something different than just connecting filters? I am wondering if perhaps the default mode for the HD ATI Rear Audio is without double-buffering and perhaps those other software programs know how to enable that feature? Or are they doing something else, perhaps using another DirectShow or DirectSound filter or technique for example, to make the audio play smoothly on the HD ATI Rear Audio renderer?
What you possibly having (depends on actual stuttering though) is that when you are using capture and playback devices backed by different hardware, their sampling rates slightly differ. For example, you capture 22050 Hz at actual rate of (22050 - 2%) Hz and you play it back with hardware consuming bytes at (22050 + 2%) Hz.
Now obviously this won't work out smooth: eventually playback will experience data underlow... If you save into file and play back from file, it will go smooth as the file will be able to supply data at the rate of playback device. If capture and playback devices are the same hardware, they are likely to use shared "hardware" clock and rates match.
The problem is known as "rate matcing" and is discussed on MSDN in Live Sources section.

Can v4l2 be used to read audio and video from the same device?

I have a capture card that captures SDI video with embedded audio. I have source code for a Linux driver, which I am trying to enhance to add video4linux2 support. My changes are based on the vivi example.
The problem I've come up against is that all the example I can find deal with only video or only audio. Even on the client side, everything seems to assume v4l is just video, like ffmpeg's libavdevice.
Do I need to have my driver create two separate devices, a v4l2 device and an alsa device? It seems like this makes the job of keeping audio and video in sync much more difficult.
I would prefer some way for each buffer passed between the driver and the app (through v4l2's mmap interface) contain a frame, plus some audio that matches up (with respect to time) with that frame.
Or perhaps have each buffer contain a flag indicating if it is a video frame, or a chunk of audio. Then the time stamps on the buffers could be used to sync things up.
But I don't see a way to do this with the V4L2 API spec, nor do I see any examples of v4l2-enabled apps (gstreamer, ffmpeg, transcode, etc) reading both audio and video from a single device.
Generally, the audio capture part of a device shows up as a separate device. It's usually a different physical device (posibly sharing a card), which makes sense. I'm not sure how much help that is, but it's how all of the software I'm familiar with works...
There are some spare or reserved fields in the v4l2 buffers that can be used to pass audio or other data from the driver to the calling application via pointers to mmaped buffers.
I modified the BT8x8 driver to use this approach to pass data from an A/D card synchronized to the video on Ubuntu 6.06.
It worked OK, but the effort of maintaining my modified driver caused me to abandon this approach.
If you are still interested I could dig out the details.
IF you want your driver to play with gstreamer etc. a separate audio device generally is what is expected.
Most of the cheap v4l2 capture card's audio is only an analog pass through with a volume control requiring a jumper to capture the audio via the sound card's line input.

Resources