How can I Remove a Wandering DC Offset from an Audio Clip? - audio

I've licensed some audio clips, but some of them come with what I have learned is a "DC Offset" that should normally have been removed during production.
Audacity's "Normalize" filter is able to fix a static DC Offset, but after applying it to my audio clips, I noticed that their DC offset varies (within 0.5 seconds it could go from 0.05 to 0.03 along a normalized amplitude range). For example:
To the left, silence is at 0.02, to the right, it's at 0.00 - this is after normalization by Audacity.
With me not being an audio engineer and not having any professional tools, is there a way to fix this?

A DC offset is a frequency component at 0 Hz. The "wandering DC offset" will be made of very low frequency components, so you should be able to remove this by using a high-pass filter with a cutoff of around 15 Hz. That way, you'll remove any sub-sonic DC related stuff without altering the audible frequency range.
Use a filter with a steep rolloff. Seeing as you're doing this offline, you can use a simple IIR type and filter the signal in both forward and reverse directions to remove any phase distortion that would otherwise be imposed by the filtering.
If you use matlab, the operation would look something like this . .
[x, fs] = wavread('myfile.wav');
[b,a] = butter(8, 15/(fs/2), 'highpass');
y = filtfilt(b,a,x);

From the command line, you can have a try with sox.
sox fileIn.wav fileOut.wav highpass 10
This will apply an high pass filter at a frequency of 10 Hz.
This should remove the DC offset (but maybe not in the early beginning of the files).
See the sox manual for a little bit more information (but not so much).

As #learnvst explains in his answer, what looks like "wandering DC offset" is actually just content at very low frequencies. You can remove this LF content with a high pass filter. Since frequencies below 20 Hz are generally inaudible, you should be able to take out the "wandering DC" without actually changing how the file sounds.
The latest version of Audacity (2.0.5) includes a high pass filter. Select Effect > High pass filter ... and adjust the cutoff frequency and rolloff parameters. A cutoff of around 15 Hz and a rolloff of 6 dB/oct should do the trick.

for f in *.wav; do
mv "$f" /tmp/dc1.wav
dc=$(ffprobe -f lavfi "amovie=/tmp/dc1.wav,astats=metadata=1" 2>&1 | sed '/Overall/,$!d' | grep DC )
#echo "$dc"
dc=$(echo "$dc" | awk '{ print $6 }')
#echo "$dc"
dc=$(echo "$dc * -1" | bc)
echo "bc" "$dc"
ffmpeg -hide_banner -loglevel error -y -i "/tmp/dc1.wav" -af "dcshift=$dc:limitergain=0.02" "$f"
done

Related

How to warp/shift a pitch so I can hear a bat

I am looking for a way to "hear" a bat.
I have a 192khz sound recording of a bat and want to hear it. So "transform" it into a 0-12kHz recording?
I saw what I thought might be similar:
change pitch of multiple audio files with Sox
And tried using something like:
log(12/192) * log(2) * 1200 == 4800
sox 331817.flac 331817_warp.wav pitch -4800
You can see the whole spectrogram here:(192Khz)
sox 331817.flac -n rate 192.0k spectrogram -l -m -X 160 -z 95 -Z 0 -r -Y 257 -o spectro.png
You can see my warped spectro here:
sox 331817_warp.wav -n rate 12.0k spectrogram -l -m -X 160 -z 95 -Z 0 -r -Y 257 -o spectro_warp.png
Any help would be appeciated.
Here's a video which encouraged me its possible:
https://www.youtube.com/watch?v=qJOloliWvB8
Not really a programming question, but intriguing nevertheless, so here's my two cents...
Try speed -4800c; it lowers both pitch and tempo. This is the least intrusive way of lowering pitch as it does not need to resample the sound. It will make the entire sound fragment a factor 16 longer, so take your time listening to it. Trim it down if possible; I suspect this is also what they did in the video.
Keep in mind that even a sample rate of 192 kHz may not be enough to accurately capture the full spectrum of a bat's voice. Nyquist frequency is half of the sample rate; any audio above 96 kHz will be distorted. No post-processing is going to fix that.

ffmpeg cut the video and get accurate begining time of the result

I do the cut via:
ffmpeg -i long_clip.mp4 -ss 00:00:10.0 -c copy -t 00:00:04.0 short_clip.mp4
I need to know the precise time where did the ffmpeg do the cut (Time of the closest keyframe before the 00:00:10.0)
Currently, I'm using the following ffprobe command to list all the keyframes and select the closest before 00:00:10.0
ffprobe -show_frames -skip_frame nokey long_clip.mp4
It works extremely slow (I run It on Jetson Nano, and It is a few minutes to list the keyframes for 30 sec video, although the cutting is done in 0.2seconds)
I hope there is the much faster way to know the time of the keyframe where ffmpeg does the cut, at least because ffmpeg seeks to this keyframe and cuts the video less than in half a second.
So in other words the question is: How to get the time of the keyframe where ffmpeg does the cut not listing all the keyframes?
I think this is not possible. The most information you can get from a program is obtained when you use the verbosity level of debugging. For ffmpeg I just used
ffmpeg -v debug -i "Princess Chelsea - Frack.mp4" -ss 00:03:00.600 -c copy -to 00:03:03.800 3.mkv 2> out.txt
One has to redirect output, because there is too much of it with debug, it doesn't fit the terminal.
Unfortunately, it gives only some cryptic/internal messages, like
Automatically inserted bitstream filter 'vp9_superframe'; args=''
[matroska # 0x55987904cac0] Starting new cluster with timestamp 5 at offset 885 bytes
[matroska # 0x55987904cac0] Writing block of size 375 with pts 5, dts 5, duration 23 at relative offset 9 in cluster at offset 885. TrackNumber 2, keyframe 1
With less verbosity it gives less information. Therefore I think this is not possible. However, what is your actual question? Maybe you need something different apart from just knowing the time of cuts?..
For those who look how to actually cut at the proper time (as I was looking for): one has to apply not copy, but to actually decode the video anew.

Frequency response of ffmpeg filters

I'm using ffmpeg to decode and encode signal. It works perfectly and I added filters. For example, I'm using such a command :
ffmpeg -re -i /home/dr_click/live.wav -af "anequalizer=c0 f=200 w=100 g=-5 t=0|c1 f=200 w=100 g=-5 t=0, anequalizer=c0 f=1000 w=100 g=3 t=0|c1 f=1000 w=100 g=3 t=0" -acodec pcm_s16be -ar 44100 -ac 2 -f rtp rtp://127.0.0.1:1234
I'm streaming my file, adding 2 filters with 200 Hz and 1000 Hz as central frequency and 100 Hz width and it works.
With such a filter, I know my gain will be -5db at 200Hz. But what is the gain for frequencies at 250 Hz ? Still -5db ? -4.5db ? -3db ? And same question at 350Hz or any other frequency.
What I'm looking for and didn't found is the way to get the frequency response of such a filter for a bandwith from 20Hz to 20kHz. In other words, what I'd like to know for any frequency is : gain = f (frequency) with a given ffmpeg filter
Thank you for your help,
Dr_Click
i'm working on a quite similar issue. Mine is to replace the system wide 15 band graphical LADSPA equalizer (mbeq_1197, controlled by JACK Rack) with an ffmpeg filter. As it is AFAIK impossible to adjust ffmpeg filter parameters during runtime, I have to rely on my already generated JACK EQ settings and need to transfer them to the ffmpeg EQ. Alas, I could not find any two "comparable" EQs: ffmpeg only offers a 18 band "superequalizer". My previous EQ has 15 bands, so I decided to do some interpolations and compare the frequency responses of the old and the new EQ.
Now to answer your question: I'm not an audio engineer, and I'm sure there are more professional ways. But what I found out for now is my current workflow:
Generate some white noise. In Linux you can e.g. use sox oder Audacity. In Audacity do Generate -> Built-in -> Noise... => White noise (1 min should be enough)
Save the file as WAV.
Apply your filter to this WAV: ffmpeg -i whitenoise.wav -af "<your filter>" whitenoise_filtered.wav
Load the filtered file into Audacity and do Analyze -> Plot Spectrum...
The output will be a little scattered because the white noise is not perfect, but this should be negligible.
Good luck!
Flittermice

Remove start of new track from end of audio file

I've a couple of recordings where at the end of the track there's silence and the start of a new track (fraction).
I tried to remove the start of the new track from the end of the file.
My command (*nux)
sox file_in.mp3 -C 320 file_out.mp3 silence 1 0.75 0.2% -1 0.75 0.2%
Preferably keep the silence at the end or just add some new. Any help appreciated
Probably you can use the below period parameter without the above period.
A bit of silence can optionally be kept with the -l parameter (or you can just use pad with a fixed amount).
For example:
sox input.mp3 -C 320 output.mp3 silence -l 0 1 0.5 0.5%
Sample input:
Result:
Note, if the beginning of some new tracks has long silence periods inside, it may be tricky - but usually possible - to find appropriate duration and threshold parameters (the general case, apparently, it is not tractable at all by simple processing, one needs a machine learning approach to reliably mark inner periods of silence that belong to a track).
Also, if it is just a couple of files, opening an editor e.g. Audacity is the quickest and the best outcome approach.

Using SoX to change the volume level of a range of time in an audio file

I’d like to change the volume level of a particular time range/slice in an audio file using SoX.
Right now, I’m having to:
Trim the original file three times to get: the part before the audio effect change, the part during (where I’m changing the sound level), and the part after
Perform the effect to change the sound level on the extracted “middle” chunk of audio, in its own file
Splice everything back together, taking into account the fading/crossfading 5ms overlaps that SoX recommends
Is there a better way to do this that doesn’t involve writing a script to do the above?
For anyone who stumbles across this highly ranked thread, searching for a way to duck the middle of an audio file:
I've been playing with SoX for ages and the method I built uses pipes to process each part without creating all those temporary files!
The result is a single line solution, though you will need to set timings and so, unless your fade timings will be the same for all files, it may be useful to generate the line with an algorithm.
I was pleased to get piping working, as I know this aspect has proved difficult for others. The command line options can be difficult to get right. However I really didn't like the messy additional files as an alternative.
By using mix functionality and positioning each part using pad, then giving each section trim & fade we can also avoid use of 'splice' here. I really wasn't a fan.
A working single line example, tested in SoX 14.4.2 Windows:
It fades (ducks) by -6dB at 2 seconds, returning to 0dB at 5 seconds (using linear fades of 0.4 seconds):
sox -m -t wav "|sox -V1 inputfile.wav -t wav - fade t 0 2.2 0.4" -t wav "|sox -V1 inputfile.wav -t wav - trim 1.8 fade t 0.4 3.4 0.4 gain -6 pad 1.8" -t wav "|sox -V1 inputfile.wav -t wav - trim 4.8 fade t 0.4 0 0 pad 4.8" outputfile.wav gain 9.542
Let's make that a little more readable here by breaking it down into sections:
Section 1 = full volume, Section 2 = ducked, Section 3 = full volume
sox -m
-t wav "|sox -V1 inputfile.wav -t wav - fade t 0 2.2 0.4"
-t wav "|sox -V1 inputfile.wav -t wav - trim 1.8 fade t 0.4 3.4 0.4 gain -6 pad 1.8"
-t wav "|sox -V1 inputfile.wav -t wav - trim 4.8 fade t 0.4 0 0 pad 4.8"
outputfile.wav gain 9.542
Now, to break it down, very thoroughly
'-m' .. says we're going to mix (this automatically reduces gain, see last parameter)
'-t wav' .. says the piped command that follows will return a WAV (it seems the WAV header is being lost in the pipeline)
Then.. the FIRST piped part (full volume before duck)
'-V1' .. says ignore warnings - there will be a warning about not knowing length of output file for this specific section as it's piping out, but there should be no other warning from this operation
then the input filename
'-t wav' .. forces the output type
'-' .. is the standard name for a piped output which will return to SoX command line
'fade t 0 2.2 0.4' .. fades out the full volume section. t = linear. 0 fade in. Then (as we want the crossfade's halfway point to be at 2 seconds) we fade out by 2.2 seconds, with a 0.4 second fade (the fadeout parameter is for when the fade ENDS!)
'-t wav' .. to advise type of next part - as above
Then.. the SECOND piped part (the ducked section)
'-V1' .. again, to ignore output length warning - see above
then the same input filename
'-t wav' .. forces output type, as above
'-' .. for piped output, see above
'trim 1.8' .. because this middle section will hit the middle of the transition at 2 seconds, so (with a 0.4 second crossfade) the ducked audio file will start 0.2 seconds before that
'fade t 0.4 3.4 0.4' .. to fade in the ducked section & fade back out again. So a 0.4 fade in. Then (the most complicated part) as the next crossfade will end at 5.2 seconds we must take that figure minus trimmed amount for this section, so 5.2-1.8=3.4 (again this is because fadeout position deals with the end timing of the fadeout)
'gain -6' .. is the amount, in dB, by which we should duck
'pad 1.8' .. must match the trim figure above, so that amount of silence is inserted at the start to make it synch when sections are mixed
'-t wav' .. to advise type of next part - as above
Then.. the THIRD piped part (return to full level)
'-V1' .. again - see above
then the same input filename
-t wav' .. to force output type, as above
-' .. for piped output, see above
trim 4.8' .. this final section will start at 5 seconds, but (with a 0.4 second crossfade) the audio will start 0.2 seconds before that
'fade t 0.4 0 0' .. just fade in to this full volume section. No fade out
'pad 4.8' .. must match the trim figure above, as explained above
then output filename
'gain 9.542' .. looks tricky, but basically when you "-m" to mix 3 files the volume is reduced to 1/3 (one third) by SoX to give headroom.
Rather than defeating that, we boost to 300%. We get the dB amount of 9.542 with this formula 20*log(3)/log(10)
If you copy & paste the single line somewhere you can see it all easily, it's a lot less scary than the explanation!
Final though - I was initially concerned about whether the crossfades needed to be logarithmic rather than linear, but in my case from listening to the results linear has definitely given the sound I expected.
You may like to try longer crossfades, or have the point of transition happening earlier or later but I hope that single line gives hope to anyone who thought many temporary files would be required!
Let me know if more clarification would help!
audacity waveform
Okay, with ffmpeg and filters it's all quite simple.
Imagine that you have 2 tracks, A and B. And you want to crop ones and do something about the volume. So the solution would be:
ffmpeg -y -i 1.mp3 -i 2.mp3 i f454495482c151aea8761dda.mp3 -i f5544954796af4a171f11b57.mp3 -i f754495448788e35e6123679.mp3 -i f754495448788e35e6123679.mp3 -i f85449545e646dea98e5dd19.mp3 \
-filter_complex "[0]aformat=sample_fmts=fltp:sample_rates=44100:channel_layouts=stereo,volume='if(between(t,129.00,129.20),0.15000*(t - 129.00) + 0.03,1)':eval=frame,volume='if(between(t,129.20,181.50),-0.00057*(t - 129.20) + 0.06,1)':eval=frame,volume='if(between(t,181.50,181.60),0.40000*(t - 181.50) + 0.03,1)':eval=frame,volume='if(between(t,181.60,183.50),-0.03684*(t - 181.60) + 0.07,1)':eval=frame,volume='if(between(t,183.50,188.00),0.00000*(t - 183.50) + 0.00,1)':eval=frame,atrim=0.00:56.00,adelay=129000|129000|129000|129000,apad[0:o];[1]aformat=sample_fmts=fltp:sample_rates=44100:channel_layouts=stereo,volume='if(between(t,0.00,134.00),0.00000*(t - 0.00) + 0.06,1)':eval=frame,atrim=0.00:134.00,apad[1:o];[0:o][1:o]amix=inputs=28,atrim=duration=185.00" -shortest -ac 2 output.mp3
which will take 2 input files, transform both of the streams to the appropriate aformat and then apply volume filters.
The syntax for volume is simple: if time t is between some start and end time - then apply the volume filter, based on the desired start volume level plus by some coefficient multiplied by difference between the start time and current time t.
This will increase the volume linearly from initial volume to desired value on a range.
atrim will trim the audio chunk after the volume has been adjusted on all ranges.
ffmpeg is just amazing, the expressions could be very complex and many of math functions may be used in the expressions.

Resources