How do I De-Ess a sound file with SoX? - audio

I am using SoX to create slow but pitch corrected audio files. The resulting files sound pretty good, but often have a very hard "S" sound that I would like to filter out. Many desktop programs include a "De-Essing" filter that works well, but I would like to have a filter that works on the server side.
What SoX filter and parameters should I use to De-Ess an audio file?
Edit: I should add that this needs to work on Linux.

There is a LADSPA DeEsser plugin that can be used from SoX. You need to have tap plugins installed and properly configured on your system. On Archlinux this can be easily achieved with
pacman -S tap-plugins
You can specify threshold and frequency as first and second arguments. I succesfully used a variant of the following command
# -30: threshold (dB)
# 6200: hiss frequency (Hz)
sox from.wav to.wav ladspa tap_deesser tap_deesser -30 6200
The filter has a fistful of other options I did not analyzed. More details can be found here.

While far from perfect, you may be able to get sufficient results by a suitable low-pass filter. That should not affect other parts of a speech signal too much.

You could use a de-esser VST such as spitfish and a command-line VST host such as MissWatson. Sox has very limited plugin support, so if you need something more specific, you're better off going the VST route.

Related

Getting multiple audio clips to same level

I am working on a project that involves using a lot of found audio clips (some new, some very old archival and poor quality etc).
I am trying to figure out a way to have all audio clips to be of a similar quality (if this is possible) and play at a similar volume?
I have use of both audacity and ableton...any suggestions would be great.
What you are asking for is commonly called normalization. There are several tools that can do it, including commandline tools and also audacity.
You'll find the tool in audacity under Effect > Normalize...
You can select multiple audio tracks.
You could also consider using a limiter and/or a compressor on your track. Have a look in the Live effect reference for more info on these: https://www.ableton.com/en/manual/live-audio-effect-reference/
The results will not be as good as applying normalization by hand, but it will be a lot quicker.

Compare the volume of two audio files

I want to test the performance of a hand-made microphone, so I recorded the same audio source with or without the microphone and got two files. Is there a way to compare the volume of two files so that I know the mic actually works?
Could the possible solution be a package in Python or Audacity?
You will want to compare by loudness. The minimally accurate measure for this is A-weighted RMS. RMS is root-mean-square, ie. the square root of the mean of the squares of all the sample values. This is significantly thrown off by low-frequency energy, and so you need to apply a frequency weighting. The A curve is commonly used.
The answer here explains how to do this with Python, though it doesn't go into detail on how to apply the weighting curve: Using Python to measure audio "loudness"
There doesn't seem to be a built-in function to do this with Audacity, but viable plugins might be available, eg: http://forum.audacityteam.org/viewtopic.php?f=39&t=38134&p=99454#p99454
Another promising route might be ffmpeg, but all the options I found either normalise or tag the files, rather than simply printing a measurement. You might look into http://r128gain.sourceforge.net/ (it uses LUFS, a more sophisticated loudness measure).
Update: for a quick and dirty un-weighted RMS reading, looks like you can use the following command from https://trac.ffmpeg.org/wiki/AudioVolume :
ffmpeg -i input.wav -filter:a volumedetect -f null /dev/null
This question might be best migrated to Sound Design Stack Exchange.

Is there a way to use ffmpeg audio filters to automatically synchronize 2 streams with similar content

I have a situation where I have a video capture of HD content via HDMI with audio from a sound board that goes through a impedance drop into a microphone input of a camcorder. That same signal is split at line level to a 'line in' jack on the same computer that is capturing the HDMI. Alternatively I can capture the audio via USB from the soundboard which is probably the best plan, but carries with it the same issue.
The point is that the line in or usb capture will be much higher quality than the one on HDMI because the line out -> impedance change -> mic in path generates inferior quality in that simply brushing the mic jack on the camera while trying to change the zoom (close proximity) can cause noise on the recording.
So I can do this today:
Take the good sound and the camera captured sound and load each into
audacity and pretty quickly use the timeshift toot to perfectly fit
the good audio to the questionable audio from the HDMI capture and
cut the good audio to the exact size of the video. Then I can use
ffmpeg or other video editing software to replace the questionable
audio with the better audio.
But while somewhat quick and easy, it always carries with it a bit of human error and time. I'd like to automate this if possible as this process is repeated at least weekly throughout the year.
Does anyone have a suggestion if any of these ideas have merit or could suggest another approach?
I suspect but have yet to confirm that the system timestamp of the start time may be recorded in both audio captured with something like Audacity, or the USB capture tool from the sound board as well as the HDMI mpeg-2 video. I tried ffprobe on a couple audacity captured .wav files but didn't see anything in the results about such a time code, but perhaps other audio formats or other probing tools may include this info. Can anyone advise if this is common with any particular capture tools or file formats?
if so, I think I could get best results by extracting this information and then using simple adelay and atrim filters in ffmpeg to sync reliably directly from the two sources in one ffmpeg call. This is all theoretical for me right now-- I've never tried either of these filters yet-- just trying to optimize against blind alleys by asking for advice up front.
If such timestamps are not embedded, possibly I can use the file system timestamp for the same idea expressed in 1a, but I suspect the file open of the two capture tools may have different inherant delays. Possibly these delays will be found to be nearly constant and the approach can work with a built-in constant anticipation delay but sounds messy and less reliable than idea 1. Still, I'd take it, if it turns out reasonably reliable
Are there any ffmpeg or general digital audio experts out there that know of particular filters that can be employed on the actual data to look for similarities like normalizing the peak amplitudes or normalizing the amplification of the two to some RMS value and then stepping through a short 10 second snippet of audio, moving one time stream .01s left against the other repeatedly and subtracting the two and looking for a minimum? Sounds like it could take a while, but if it could do this in less than a minute and be reliable, I suspect it could work. But I have only rudimentary knowledge of audio streams and perhaps what I suggest is just not plausible-- but since each stream starts with the same source I think there should be a chance. I am just way out of my depth as to how to go down this road, so if someone out there knows such magic or can throw me some names of filters and example calls, I can explore if I can make it work.
any hardware level suggestions to take a line level output down to a mic level input and not have the problems I am seeing using a simple in-line impedance drop module, so that I can simply rely on the audio from the HDMI?
Thanks in advance for any pointers or suggestinons!

Detect if video file contains movement

I have a bunch of video clips from a webcam (duration is 5, 10, 60 seconds), and I'm looking for a way to detect "does this video clip have movement", to decide whether the file should be saved or discarded in a future processing phase.
I've looked into motion and OpenCV, but motion seems to only want to work on the raw video stream, and OpenCV seems to be way too advanced for my use.
My ideal solution would be a linux command-line tool that I can feed video files into, and get a simple "does/doesn't contain movement" answer back, so I can discard the irrelevant files. False positives (in a reasonable quantity) are perfectly acceptable for my use.
Does such a tool exist? Or any simple examples of doing this with other tools?
You can check dvr-scan which is simple cross-platform command line tool based on OpenCV.
To just list motion events in csv format (scan only):
dvr-scan -i some_video.mp4 -so
To extract motion in single video:
dvr-scan -i some_video.mp4 -o some_video_motion_only.avi
For more examples and various other parameters see:
https://dvr-scan.readthedocs.io/en/latest/guide/examples/
I had the same problem and wrote the solution: https://github.com/jooray/motion-detection
Should be fairly easy to use from command-line.
If you would like to post-process already-captured video then motion can be useful.
VLC allow you to stream or convert your media for use locally, on your private network, or on the Internet. So an already-captured video can be streamed over HTTP, RTSP, etc. and motion can handle it as a network camera.
Furthermore:
How to Stream using VLC Media Player
If OpenCv is to advanced for you, maybe you should consider something easier which is... SimpleCV (wrapper for OpenCV) "This is computer vision made easy". There is even an example of motion detection using SimpleCV - https://github.com/sightmachine/simplecv-examples/blob/master/code/motion-detection.py Unfortunetely i can't test it(because my OpenCv version isn't compatible with SimpleCV), but generally it looks fine (and isn't complicated) - it just substract previous frame from current and calculate mean of the result. If this value is bigger than some threshold (which most likely you will have to adjust) than we can assume that there were some motion between those 2 frames. Note that setting threshold to 0 is really a bad idea, because always there is some difference between 2 consecuitve frames (changes of lighting, noises, etc).

How do I play a wav file from a Free Pascal application running on Linux?

I have a multi-platform application written in Free Pascal. This application plays a short sound on some event. On Windows, I can do this by MMSystem and sndPlaySound('sound.wav'). However, I don't know how to do this on Linux without external libraries.
I have a solution to play it with SDL and OpenAL, but I don't want any dependency on these libraries to play one short sound. Does there exist a Linux command line player that exists on most distros by default? The file format doesn't matter; I will convert it.
mplayer is command line and graphical. You can start it on tty and pty.
You could try aplay, but that has a dependency on ALSA. Maybe sox?
The program mplayer - "the movie player" gives you the option to use a graphical user interface or to use the console. So i would imagine it has a solution to your problem.
Are you looking to BEEP, BLEEP and BOOP and BOP ( and low frequency fart) ? Use sox. If youre looking to play a file: use sox or SDL.
You need a for looped array to get a sort-of piano effect, like a song. Its ugly, messy, and cant be tweaked much like the ole PC speaker, but its passable.
Beep is probably want you want, tho. Install the package, put one on your motherboard(YEAH...no hookup? use sox), and enable the pcspkr module. (On ubuntu its blacklisted by default.) If BEEP produces nothing, try sox.
At least youll have something. Yes, you can check for loaded modules and installed packages. I believe Ive done both.

Resources