Compare the volume of two audio files - audio

I want to test the performance of a hand-made microphone, so I recorded the same audio source with or without the microphone and got two files. Is there a way to compare the volume of two files so that I know the mic actually works?
Could the possible solution be a package in Python or Audacity?

You will want to compare by loudness. The minimally accurate measure for this is A-weighted RMS. RMS is root-mean-square, ie. the square root of the mean of the squares of all the sample values. This is significantly thrown off by low-frequency energy, and so you need to apply a frequency weighting. The A curve is commonly used.
The answer here explains how to do this with Python, though it doesn't go into detail on how to apply the weighting curve: Using Python to measure audio "loudness"
There doesn't seem to be a built-in function to do this with Audacity, but viable plugins might be available, eg: http://forum.audacityteam.org/viewtopic.php?f=39&t=38134&p=99454#p99454
Another promising route might be ffmpeg, but all the options I found either normalise or tag the files, rather than simply printing a measurement. You might look into http://r128gain.sourceforge.net/ (it uses LUFS, a more sophisticated loudness measure).
Update: for a quick and dirty un-weighted RMS reading, looks like you can use the following command from https://trac.ffmpeg.org/wiki/AudioVolume :
ffmpeg -i input.wav -filter:a volumedetect -f null /dev/null
This question might be best migrated to Sound Design Stack Exchange.

Related

How to change pitch and tempo together, reliably with ffmpeg

I know how to change tempo with atempo, but the audio file becomes distorted a bit, and I can't find a reliable way to change pitch. (say, increase tempo and pitch together 140%)
Sox has a speed option, but truncates the volume AND isn't as widely available as ffmpeg. mplayer has a speed option which works perfectly, but I can't output without additional libraries.
I seem to understand ffmpeg doesn't have a way to change pitch (maybe it does recently?) but is there a way to change frequency or some other flags to emulate changing pitch? Looked quite far and can't find a decent solution.
Edit: asetrate:48k*1.4 (assuming originally 48k) doesn't seem to work, still distortion and pitch doesn't really change much.
Edit2: https://superuser.com/a/1076762 this answer sort of works, but the quality is so much lower than sox speed 1.4 option
ffmpeg -i <input file name> -filter:a "asetrate=<new frequency>" -y <output file name> seems to be working for me. I checked the properties of both input and output files with ffprobe and there doesn't seem to be any differences that could affect its quality. Although it's true that I've run it a few times and the resulting file on some of those had some artifacts, even if the line of code was the same, so it may be caused by some ffmpeg bug; try to run it again if you aren't satisfied with the quality.
As of 2022 (though contributed in 2015), FFmpeg has a rubberband filter that works out of the box without any aforementioned ugly, allegedly slow and poor quality and unintuitive workarounds.
To change the pitch using the rubber band filter, you will have to specify the pitch using the frequency ratio of a semi-tone. This is based on using the formula (2^x/12), where x represents the number of semitones you would like to transpose.
For example, to transpose up by one semitone you would use the following command:
ffmpeg -i my.mp3 -filter:a "rubberband=pitch=1.059463094352953" -acodec copy my-up.mp3
To transpose down, simply use a negative number for x.
To alter both properties simultaneously, specify tempo and pitch values. The tempo value is specified as a multiple of the original speed.
The following command transposes down by one semitone and bumps the speed up 4x:
ffmpeg -i slow.mp3 -filter:a "rubberband=pitch=0.9438743126816935, rubberband=tempo=4" -acodec copy fast.mp3
Quality degradation is imperceptible unless measured statistically.

Is there a way to use ffmpeg audio filters to automatically synchronize 2 streams with similar content

I have a situation where I have a video capture of HD content via HDMI with audio from a sound board that goes through a impedance drop into a microphone input of a camcorder. That same signal is split at line level to a 'line in' jack on the same computer that is capturing the HDMI. Alternatively I can capture the audio via USB from the soundboard which is probably the best plan, but carries with it the same issue.
The point is that the line in or usb capture will be much higher quality than the one on HDMI because the line out -> impedance change -> mic in path generates inferior quality in that simply brushing the mic jack on the camera while trying to change the zoom (close proximity) can cause noise on the recording.
So I can do this today:
Take the good sound and the camera captured sound and load each into
audacity and pretty quickly use the timeshift toot to perfectly fit
the good audio to the questionable audio from the HDMI capture and
cut the good audio to the exact size of the video. Then I can use
ffmpeg or other video editing software to replace the questionable
audio with the better audio.
But while somewhat quick and easy, it always carries with it a bit of human error and time. I'd like to automate this if possible as this process is repeated at least weekly throughout the year.
Does anyone have a suggestion if any of these ideas have merit or could suggest another approach?
I suspect but have yet to confirm that the system timestamp of the start time may be recorded in both audio captured with something like Audacity, or the USB capture tool from the sound board as well as the HDMI mpeg-2 video. I tried ffprobe on a couple audacity captured .wav files but didn't see anything in the results about such a time code, but perhaps other audio formats or other probing tools may include this info. Can anyone advise if this is common with any particular capture tools or file formats?
if so, I think I could get best results by extracting this information and then using simple adelay and atrim filters in ffmpeg to sync reliably directly from the two sources in one ffmpeg call. This is all theoretical for me right now-- I've never tried either of these filters yet-- just trying to optimize against blind alleys by asking for advice up front.
If such timestamps are not embedded, possibly I can use the file system timestamp for the same idea expressed in 1a, but I suspect the file open of the two capture tools may have different inherant delays. Possibly these delays will be found to be nearly constant and the approach can work with a built-in constant anticipation delay but sounds messy and less reliable than idea 1. Still, I'd take it, if it turns out reasonably reliable
Are there any ffmpeg or general digital audio experts out there that know of particular filters that can be employed on the actual data to look for similarities like normalizing the peak amplitudes or normalizing the amplification of the two to some RMS value and then stepping through a short 10 second snippet of audio, moving one time stream .01s left against the other repeatedly and subtracting the two and looking for a minimum? Sounds like it could take a while, but if it could do this in less than a minute and be reliable, I suspect it could work. But I have only rudimentary knowledge of audio streams and perhaps what I suggest is just not plausible-- but since each stream starts with the same source I think there should be a chance. I am just way out of my depth as to how to go down this road, so if someone out there knows such magic or can throw me some names of filters and example calls, I can explore if I can make it work.
any hardware level suggestions to take a line level output down to a mic level input and not have the problems I am seeing using a simple in-line impedance drop module, so that I can simply rely on the audio from the HDMI?
Thanks in advance for any pointers or suggestinons!

Detect if video file contains movement

I have a bunch of video clips from a webcam (duration is 5, 10, 60 seconds), and I'm looking for a way to detect "does this video clip have movement", to decide whether the file should be saved or discarded in a future processing phase.
I've looked into motion and OpenCV, but motion seems to only want to work on the raw video stream, and OpenCV seems to be way too advanced for my use.
My ideal solution would be a linux command-line tool that I can feed video files into, and get a simple "does/doesn't contain movement" answer back, so I can discard the irrelevant files. False positives (in a reasonable quantity) are perfectly acceptable for my use.
Does such a tool exist? Or any simple examples of doing this with other tools?
You can check dvr-scan which is simple cross-platform command line tool based on OpenCV.
To just list motion events in csv format (scan only):
dvr-scan -i some_video.mp4 -so
To extract motion in single video:
dvr-scan -i some_video.mp4 -o some_video_motion_only.avi
For more examples and various other parameters see:
https://dvr-scan.readthedocs.io/en/latest/guide/examples/
I had the same problem and wrote the solution: https://github.com/jooray/motion-detection
Should be fairly easy to use from command-line.
If you would like to post-process already-captured video then motion can be useful.
VLC allow you to stream or convert your media for use locally, on your private network, or on the Internet. So an already-captured video can be streamed over HTTP, RTSP, etc. and motion can handle it as a network camera.
Furthermore:
How to Stream using VLC Media Player
If OpenCv is to advanced for you, maybe you should consider something easier which is... SimpleCV (wrapper for OpenCV) "This is computer vision made easy". There is even an example of motion detection using SimpleCV - https://github.com/sightmachine/simplecv-examples/blob/master/code/motion-detection.py Unfortunetely i can't test it(because my OpenCv version isn't compatible with SimpleCV), but generally it looks fine (and isn't complicated) - it just substract previous frame from current and calculate mean of the result. If this value is bigger than some threshold (which most likely you will have to adjust) than we can assume that there were some motion between those 2 frames. Note that setting threshold to 0 is really a bad idea, because always there is some difference between 2 consecuitve frames (changes of lighting, noises, etc).

Motion detection in compressed domain (JPEG/Mpeg4/H264)

everyone!
I process video from IP cameras and have wrote a motion detection algorithm based on decompressed video analysis. But i really something more fast. I've found several papers about compressed domain analysis but have failed to find any implementations.
Can anyone recommend me some code?
found materials:
http://www.ist-live.org/intranet/school-of-informatics-university-of-bradford001-7/41410206.pdf/view
http://doc.rero.ch/lm.php?url=1000,43,4,20061128120121-NA/Bracamonte_Javier_-_A_Low_Complexity_Change_Detection_Algorithm_20061128.pdf
I had to detect motion in H.264-video, and for me the frame size was a really good indicator.
I used ffprobe (from the ffmpeg project) to export frame sizes like this:
./ffprobe -show_frames -pretty video.mp4 | grep 'size' | grep -o '[0-9]*' > sizes.txt
In my case no movement meant larger I-frames (for me, every 30th frame was an I-frame) and smaller sizes for some of the frames in between.
I'm new to video encoding so I guess these things might be very dependent on encoding and type of video signal, but it's worth a look since it's very fast to try out. Export framesizes and have a look in e.g. Matlab.
Edit:
In the end I re-encoded the video so that every second frame was an I-frame, as this gave better time resolution. One idea I did not test was to reverse the video and do the same thing, this should give more accurate estimations of when the motion started/ended, akin to removing the phase delay by forward-backward filtering.
https://github.com/Breakthrough/DVR-Scan
DVR-Scan is a cross-platform command-line (CLI) application that
automatically detects motion events in video files (e.g. security
camera footage). In addition to locating both the time and duration of
each motion event, DVR-Scan will save the footage of each motion event
to a new, separate video clip. Not only is DVR-Scan free and
open-source software (FOSS), written in Python, and based on Numpy and
OpenCV, it was built to be extendable and hackable.
I can confirm that it works perfectly with MPEG4 (H264) AVI files.
Scanning speed is about 30 fps at my laptop with i5 4300U CPU for 1200x900 video.
You can check the sources for the algorithm used.
And here are some explaning tutorial links from the same author:
https://github.com/Breakthrough/python-scene-detection-tutorial
See also Python scene change detection.

Frequency differences from MP3 to mic

I'm trying to compare sound clips based on microphone recording. Simply put I play an MP3 file while recording from the speakers, then attempt to match the two files. I have the algorithms in place that works, but I'm seeing a slight difference I'd like to sort out to get better accuracy.
The microphone seem to favor some frequencies (add amplitude), and be slightly off on others (peaks are wider on the mic).
I'm wondering what the cause of this difference is, and how to compensate for it.
Background:
Because of speed issues in how I'm doing comparison I select certain frequencies with certain characteristics. The problem is that a high percentage of these (depending on how many I choose) don't match between MP3 and mic.
It's called the response characteristic of the microphone. Unfortunately, you can't easily get around it without buying a different, presumably more expensive, microphone.
If you can measure the actual microphone frequency response by some method (which generally requires having some etalon acoustic system and an anechoic chamber), you can compensate for it by applying an equaliser tuned to exactly inverse characteristic, like discussed here. But in practice, as Kilian says, it's much simpler to get a more precise microphone. I'd recommend a condenser or an electrostatic one.

Resources