I want to programmatically detect if a (local) computer (not mobile device) is playing any sound or music. Preferably via some high level api from Java or Python or a similar language.
I have never done it, but as a first approach I would open (open as input, not output) a fake recording stream on the master windows playback lineout device (instead the normal use of opening the mic or linein device for recording).
I would then monitor the captured frames. If for a certain time there are values over some small threshold, I would infer there is sound.
Related
I have an idea that I have been working on, but there are some technical details that I would love to understand before I proceed.
From what I understand, Linux communicates with the underlying hardware through the /dev/. I was messing around with my video cam input to zoom and I found someone explaining that I need to create a virtual device and mount it to the output of another program called v4loop.
My questions are
1- How does Zoom detect the webcams available for input. My /dev directory has 2 "files" called video (/dev/video0 and /dev/video1), yet zoom only detects one webcam. Is the webcam communication done through this video file or not? If yes, why does simply creating one doesn't affect Zoom input choices. If not, how does zoom detect the input and read the webcam feed?
2- can I create a virtual device and write a kernel module for it that feeds the input from a local file. I have written a lot of kernel modules, and I know they have a read, write, release methods. I want to parse the video whenever a read request from zoom is issued. How should the video be encoded? Is it an mp4 or a raw format or something else? How fast should I be sending input (in terms of kilobytes). I think it is a function of my webcam recording specs. If it is 1920x1080, and each pixel is 3 bytes (RGB), and it is recording at 20 fps, I can simply calculate how many bytes are generated per second, but how does Zoom expect the input to be Fed into it. Assuming that it is sending the strean in real time, then it should be reading input every few milliseconds. How do I get access to such information?
Thank you in advance. This is a learning experiment, I am just trying to do something fun that I am motivated to do, while learning more about Linux-hardware communication. I am still a beginner, so please go easy on me.
Apparently, there are two types of /dev/video* files. One for the metadata and the other is for the actual stream from the webcam. Creating a virtual device of the same type as the stream in the /dev directory did result in Zoom recognizing it as an independent webcam, even without creating its metadata file. I did finally achieve what I wanted, but I used OBS Studio virtual camera feature that was added after update 26.0.1, and it is working perfectly so far.
I have an audio coming from a radio transceiver on my sound card's microphone input. What i want to make is a simple software-based parrot repeater using Linux CLI tools like the sox suite and arecord. For it to work, i think a flow similar to the following must take place:
The audio that comes on the microphone subdevice is getting recorded in a buffer (file or RAM-based)
When the buffer stops filling (audio stopped), start playing it's content on the audio output device (it is connected to the radio's microphone input)
When it's over, empty the buffer and start expecting step 1 to occur again
I'm looking for an elegant way to implement the logic behind step 2. Is there a CLI tool that i can use for that, so i can pipe the microphone audio taken with arecord to it and play the output of the buffer with sox?
Try looking at this. I did this on a raspberry pi a little while ago, only I made a voice changer.
https://www.instructables.com/Halloween-Voice-Changer-With-Raspberry-Pi/
Basically, play "|rec --buffer 2048 -d" takes recorded sound and puts it in a buffer that is passed in 4096 bit (byte?) chunks to play. -d stands for duration, and if left blank defaults to 0, and will run until killed. If you want to play with the options, there is some helpful info in the links.
Good luck with your project!
I have a program written in C++ that uses RtAudio ( Directsound ) to capture and playback audio at 48kHz samplerate.
The input capture uses a callback option. The callback writes data to a ringbuffer.
The output is a blocking write function in a separate thread that reads from the ringbuffer.
If the input and output devices are the same the audio loops thru perfectly.
Now I want to get audio from device 1 and playback on device 2. Each device has its own sampleclock set to 48kHz but are not in sync. After a couple of seconds the input and output are out of sync.
Is it possible to sync two independent oudio devices?
There are two challenges you face:
getting the two devices to start at the same time.
getting the two devices to stay in sync.
Both of these tasks are difficult. In the pro audio world, #2 is accomplished with special hardware to sync the word-clocks of multiple devices. It can also be done with a high quality video signal. I believe it can also be done with firewire devices, but I'm not sure how that works. In practice, I have used devices with no sync ("wild") and gotten very reasonable sync for up to an hour or two. Depending on what you are trying to do, the sync should not drift more than a few milliseconds over the course of a few minutes. If it does, you can consider your hardware broken (of course, cheap hardware is often broken).
As for #1, I'm not sure this is possible in any reliable sense with directsound. To the extent that it's possible with any audio API, it is difficult at best: both cards have streams that require some time to setup, open and start playing. In general, the solution is to use an API where this time is super low (ASIO, for example). This works reasonably well for applications like video, but I don't know if it really solves the problem in general.
If you really need to solve this problem, you could open both cards, starting to play silence, and use the timing information generated by the cards to establish the delay between putting data into the card and its eventual playback (this will be different for each card and probably each time you run) and use that data to calculate when to start actual playback. I don't know if RTAudio supplies the necessary timing information, but PortAudio does. This document may help.
I have a DirectShow application that I built with Delphi 6 using the DSPACK component library. For two days I have been trying to solve a problem with audio playback. When I run the filter graph I create I hear repetitive clicks in the playback. What was really confusing was that the audio file I created simultaneously with my filter graph had clean continuous audio, not gaps. So I knew that the audio buffers were being delivered properly but something I was doing was "jamming up" the "live" playback. Or so I thought. I spent two days diagnosing the problem looking for semaphores being held too long (locks) or perhaps timestamp problems, which I documented in this other Stack Overflow post:
Getting stuttering during rendering of my DirectShow filter despite output file being "smooth"
A few minutes ago I decided to try a test with the Graph Edit utility. I created a dead simple graph consisting of just the capture device I was using (VOIP phone microphone), and the renderer device I was using (HD ATI Rear Audio output to headphones). Two filters total. Much to my surprise I heard the same clicking. So here was a case that did not involve my code at all and I heard clicking.
Then I changed the audio renderer in the Graph Edit created filter graph to the VOIP phone ear piece. The clicking went away.
Now I know there's a way to get smooth audio on ut the ATI Rear Audio device since its the preferred audio output device and everything from videos I play on my PC to wave files I play on it sound flawless. So are the other software programs doing something different than just connecting filters? I am wondering if perhaps the default mode for the HD ATI Rear Audio is without double-buffering and perhaps those other software programs know how to enable that feature? Or are they doing something else, perhaps using another DirectShow or DirectSound filter or technique for example, to make the audio play smoothly on the HD ATI Rear Audio renderer?
What you possibly having (depends on actual stuttering though) is that when you are using capture and playback devices backed by different hardware, their sampling rates slightly differ. For example, you capture 22050 Hz at actual rate of (22050 - 2%) Hz and you play it back with hardware consuming bytes at (22050 + 2%) Hz.
Now obviously this won't work out smooth: eventually playback will experience data underlow... If you save into file and play back from file, it will go smooth as the file will be able to supply data at the rate of playback device. If capture and playback devices are the same hardware, they are likely to use shared "hardware" clock and rates match.
The problem is known as "rate matcing" and is discussed on MSDN in Live Sources section.
I have a capture card that captures SDI video with embedded audio. I have source code for a Linux driver, which I am trying to enhance to add video4linux2 support. My changes are based on the vivi example.
The problem I've come up against is that all the example I can find deal with only video or only audio. Even on the client side, everything seems to assume v4l is just video, like ffmpeg's libavdevice.
Do I need to have my driver create two separate devices, a v4l2 device and an alsa device? It seems like this makes the job of keeping audio and video in sync much more difficult.
I would prefer some way for each buffer passed between the driver and the app (through v4l2's mmap interface) contain a frame, plus some audio that matches up (with respect to time) with that frame.
Or perhaps have each buffer contain a flag indicating if it is a video frame, or a chunk of audio. Then the time stamps on the buffers could be used to sync things up.
But I don't see a way to do this with the V4L2 API spec, nor do I see any examples of v4l2-enabled apps (gstreamer, ffmpeg, transcode, etc) reading both audio and video from a single device.
Generally, the audio capture part of a device shows up as a separate device. It's usually a different physical device (posibly sharing a card), which makes sense. I'm not sure how much help that is, but it's how all of the software I'm familiar with works...
There are some spare or reserved fields in the v4l2 buffers that can be used to pass audio or other data from the driver to the calling application via pointers to mmaped buffers.
I modified the BT8x8 driver to use this approach to pass data from an A/D card synchronized to the video on Ubuntu 6.06.
It worked OK, but the effort of maintaining my modified driver caused me to abandon this approach.
If you are still interested I could dig out the details.
IF you want your driver to play with gstreamer etc. a separate audio device generally is what is expected.
Most of the cheap v4l2 capture card's audio is only an analog pass through with a volume control requiring a jumper to capture the audio via the sound card's line input.