Envelope pattern in SoX (Sound eXchange) or ffmpeg - audio

I've been using SoX to generate white noise. I'm after a way of modulating the volume across the entire track in a way that will create a pattern similar to this:
White noise envelope effect
I've experimented with fade, but that fades in to 100% volume and fades out to 0% volume, which is just a pain in this instance.
The tremolo effect isn't quite what I'm after either, as the frequency of the pattern will be changing over time.
The only other alternative is to split the white noise file into separate files, apply fade and then apply trim to either end so it doesn't fade all the way, but this seems like a lot of unnecessary processing.
I've been checking out this example Using SoX to change the volume level of a range of time in an audio file, but I don't think it's quite what I'm after.
I'm using the command-line in Ubuntu with SoX, but I'm open to suggestions with ffmpeg, or any other Linux based command-line solution.

With ffmpeg, you could use the volume filter
ffmpeg -i input.wav -af \
"volume='if(lt(mod(t\,5)/5\,0.5), 0.2+0.8*mod(2*t\,5)/5\, 1.0-0.8*mod(t-(5/2)\,5)/(5/2))':eval=frame" \
output.wav
The expression in the filter above, increases the volume from 0.2 to 1.0 over t=0 to t=2.5 seconds, then gradually back down to 0.2 at t=5 seconds. The period of the envelope here is 5 seconds.

Related

Frequency response of ffmpeg filters

I'm using ffmpeg to decode and encode signal. It works perfectly and I added filters. For example, I'm using such a command :
ffmpeg -re -i /home/dr_click/live.wav -af "anequalizer=c0 f=200 w=100 g=-5 t=0|c1 f=200 w=100 g=-5 t=0, anequalizer=c0 f=1000 w=100 g=3 t=0|c1 f=1000 w=100 g=3 t=0" -acodec pcm_s16be -ar 44100 -ac 2 -f rtp rtp://127.0.0.1:1234
I'm streaming my file, adding 2 filters with 200 Hz and 1000 Hz as central frequency and 100 Hz width and it works.
With such a filter, I know my gain will be -5db at 200Hz. But what is the gain for frequencies at 250 Hz ? Still -5db ? -4.5db ? -3db ? And same question at 350Hz or any other frequency.
What I'm looking for and didn't found is the way to get the frequency response of such a filter for a bandwith from 20Hz to 20kHz. In other words, what I'd like to know for any frequency is : gain = f (frequency) with a given ffmpeg filter
Thank you for your help,
Dr_Click
i'm working on a quite similar issue. Mine is to replace the system wide 15 band graphical LADSPA equalizer (mbeq_1197, controlled by JACK Rack) with an ffmpeg filter. As it is AFAIK impossible to adjust ffmpeg filter parameters during runtime, I have to rely on my already generated JACK EQ settings and need to transfer them to the ffmpeg EQ. Alas, I could not find any two "comparable" EQs: ffmpeg only offers a 18 band "superequalizer". My previous EQ has 15 bands, so I decided to do some interpolations and compare the frequency responses of the old and the new EQ.
Now to answer your question: I'm not an audio engineer, and I'm sure there are more professional ways. But what I found out for now is my current workflow:
Generate some white noise. In Linux you can e.g. use sox oder Audacity. In Audacity do Generate -> Built-in -> Noise... => White noise (1 min should be enough)
Save the file as WAV.
Apply your filter to this WAV: ffmpeg -i whitenoise.wav -af "<your filter>" whitenoise_filtered.wav
Load the filtered file into Audacity and do Analyze -> Plot Spectrum...
The output will be a little scattered because the white noise is not perfect, but this should be negligible.
Good luck!
Flittermice

SoX detecting volume is always near max

I'm trying to detect speech volume above a threshold in short, 2-3 second, audio files with sox but it's always coming out about 90% max volume regardless of silence or noise.
This is the command i'm using (i've tried varying the scale option):
sox noise.wav -n stats -s 99
If i shout and have the microphone in my mouth or bash it i can get a detectable difference of about 95% volume but it is a desktop style microphone. Playing back the audio files there is an audible silence recorded but there is still a big distinction when speaking from a distance.
Is there a setting i'm missing or has anyone else encountered this?

How can low frame rate video be made to look more smooth?

I am trying to clean up a video that was recorded in 2003 in low-light conditions on what was possibly a cameraphone. The video has been cleaned up somewhat (cropped, logos removed and stabilized), but it remains quite jerky, due in large part to its low frame rate. What are some tricks that might clean up the video in this regard? I feel that I am asking for something a bit like tweening in flash animations, but for pixels, whereby additional frames are generated using nearby frames of the video. Does such a trick exist? Is there another way to approach this problem?
To reproduce the video processing so far, take the following steps:
# get video
wget http://www.anwarweb.net/saddamdown.wmv
# crop
ffmpeg -i saddamdown.wmv -filter:v "crop=292:221:14:10" -c:a copy saddamdown_crop.wmv
# remove logo 1
ffmpeg -i saddamdown_crop.wmv -vf delogo=x=17:y=77:w=8:h=54 -c:a copy saddamdown_crop_delogo_1.wmv
# remove logo 2
ffmpeg -i saddamdown_crop_delogo_1.wmv -vf delogo=x=190:y=174:w=54:h=8 -c:a copy saddamdown_crop_delogo_1_delogo_2.wmv
# stabilize
ffmpeg -i saddamdown_crop_delogo_1_delogo_2.wmv -vf deshake saddamdown_crop_delogo_1_delogo_2_deshake.wmv
Note: The video is of the Saddam Hussein execution.
You could try with slowmoVideo: https://github.com/slowmoVideo/slowmoVideo
It's an open source software to create smooth slow motion effects from pixel motion analysis (Windows, Linux, OSX with wine or crossover. Read and write with ffmpeg).
First calculate the slow down ratio: for example if the original video is 18fps and the desired output is 24fps, set the speed of slowmo to 75% (18/24=0.75).
The result depends a lot on the video content, obviously the more fixed are the shots the better.
Anyway you can tweak what they call "Optical Flow", that is the analysis part of the process.
Good luck ;)

Using sox for voice detection and streaming

Currently, I use sox like this:
sox -d -e u-law --endian little -b 8 -c 1 -r 8000 -t ul - silence 1 0.3 1% 1 0.3 1%
For reference, this is recording audio from the default microphone and outputting little endian, ulaw formatted audio at 8 bits and a 8k rate. The effects filter trims audio until the noise hits a threshold for 0.3 seconds, then continues to record until there is 0.3 seconds of silence. All of this streams to stdout which I use to stream to a remote server.
I am using all of this to record a bit of voice and finish when I am done speaking. To trigger sox, I use specialized hardware to trigger the start of the recording. I can switch to using almost any audio format or codec as long as it supports on the fly formatting/encoding. My target platform is raspbian on the raspberry pi 2 B.
My ideal solution would be to use vad to stop the recording when the user is finished speaking. My hope is that this would work even with background chatter. However, the sox documentation on the vad effect states this:
The use of the norm effect is recommended, but remember that neither
reverse nor norm is suitable for use with streamed audio.
I haven't been able to piece parameters together to get vad and streaming working. Is it possible to use the vad effect to stop the recording of audio while still maintaining the stdin->sox->stdout piping? Are there better alternatives?
Is it possible to use the vad effect to stop the recording of audio while still maintaining the stdin->sox->stdout piping?
No. The vad effect can trim silence only from the front of the audio. So you could only use it to detect recording start, and not ending and pauses.
The reverse and norm filters need all the input data before they produce any data on output, that is why they cannot be used with streaming.
The key is to select a good threshold for silence filter so it takes "background chatter" as silence.
You could use also noisered (with a profile based on previous recordings) before silence to reduce noise triggering the recording, but this will also affect output and probably will not take "background chatter" as noise.

Using SoX to change the volume level of a range of time in an audio file

I’d like to change the volume level of a particular time range/slice in an audio file using SoX.
Right now, I’m having to:
Trim the original file three times to get: the part before the audio effect change, the part during (where I’m changing the sound level), and the part after
Perform the effect to change the sound level on the extracted “middle” chunk of audio, in its own file
Splice everything back together, taking into account the fading/crossfading 5ms overlaps that SoX recommends
Is there a better way to do this that doesn’t involve writing a script to do the above?
For anyone who stumbles across this highly ranked thread, searching for a way to duck the middle of an audio file:
I've been playing with SoX for ages and the method I built uses pipes to process each part without creating all those temporary files!
The result is a single line solution, though you will need to set timings and so, unless your fade timings will be the same for all files, it may be useful to generate the line with an algorithm.
I was pleased to get piping working, as I know this aspect has proved difficult for others. The command line options can be difficult to get right. However I really didn't like the messy additional files as an alternative.
By using mix functionality and positioning each part using pad, then giving each section trim & fade we can also avoid use of 'splice' here. I really wasn't a fan.
A working single line example, tested in SoX 14.4.2 Windows:
It fades (ducks) by -6dB at 2 seconds, returning to 0dB at 5 seconds (using linear fades of 0.4 seconds):
sox -m -t wav "|sox -V1 inputfile.wav -t wav - fade t 0 2.2 0.4" -t wav "|sox -V1 inputfile.wav -t wav - trim 1.8 fade t 0.4 3.4 0.4 gain -6 pad 1.8" -t wav "|sox -V1 inputfile.wav -t wav - trim 4.8 fade t 0.4 0 0 pad 4.8" outputfile.wav gain 9.542
Let's make that a little more readable here by breaking it down into sections:
Section 1 = full volume, Section 2 = ducked, Section 3 = full volume
sox -m
-t wav "|sox -V1 inputfile.wav -t wav - fade t 0 2.2 0.4"
-t wav "|sox -V1 inputfile.wav -t wav - trim 1.8 fade t 0.4 3.4 0.4 gain -6 pad 1.8"
-t wav "|sox -V1 inputfile.wav -t wav - trim 4.8 fade t 0.4 0 0 pad 4.8"
outputfile.wav gain 9.542
Now, to break it down, very thoroughly
'-m' .. says we're going to mix (this automatically reduces gain, see last parameter)
'-t wav' .. says the piped command that follows will return a WAV (it seems the WAV header is being lost in the pipeline)
Then.. the FIRST piped part (full volume before duck)
'-V1' .. says ignore warnings - there will be a warning about not knowing length of output file for this specific section as it's piping out, but there should be no other warning from this operation
then the input filename
'-t wav' .. forces the output type
'-' .. is the standard name for a piped output which will return to SoX command line
'fade t 0 2.2 0.4' .. fades out the full volume section. t = linear. 0 fade in. Then (as we want the crossfade's halfway point to be at 2 seconds) we fade out by 2.2 seconds, with a 0.4 second fade (the fadeout parameter is for when the fade ENDS!)
'-t wav' .. to advise type of next part - as above
Then.. the SECOND piped part (the ducked section)
'-V1' .. again, to ignore output length warning - see above
then the same input filename
'-t wav' .. forces output type, as above
'-' .. for piped output, see above
'trim 1.8' .. because this middle section will hit the middle of the transition at 2 seconds, so (with a 0.4 second crossfade) the ducked audio file will start 0.2 seconds before that
'fade t 0.4 3.4 0.4' .. to fade in the ducked section & fade back out again. So a 0.4 fade in. Then (the most complicated part) as the next crossfade will end at 5.2 seconds we must take that figure minus trimmed amount for this section, so 5.2-1.8=3.4 (again this is because fadeout position deals with the end timing of the fadeout)
'gain -6' .. is the amount, in dB, by which we should duck
'pad 1.8' .. must match the trim figure above, so that amount of silence is inserted at the start to make it synch when sections are mixed
'-t wav' .. to advise type of next part - as above
Then.. the THIRD piped part (return to full level)
'-V1' .. again - see above
then the same input filename
-t wav' .. to force output type, as above
-' .. for piped output, see above
trim 4.8' .. this final section will start at 5 seconds, but (with a 0.4 second crossfade) the audio will start 0.2 seconds before that
'fade t 0.4 0 0' .. just fade in to this full volume section. No fade out
'pad 4.8' .. must match the trim figure above, as explained above
then output filename
'gain 9.542' .. looks tricky, but basically when you "-m" to mix 3 files the volume is reduced to 1/3 (one third) by SoX to give headroom.
Rather than defeating that, we boost to 300%. We get the dB amount of 9.542 with this formula 20*log(3)/log(10)
If you copy & paste the single line somewhere you can see it all easily, it's a lot less scary than the explanation!
Final though - I was initially concerned about whether the crossfades needed to be logarithmic rather than linear, but in my case from listening to the results linear has definitely given the sound I expected.
You may like to try longer crossfades, or have the point of transition happening earlier or later but I hope that single line gives hope to anyone who thought many temporary files would be required!
Let me know if more clarification would help!
audacity waveform
Okay, with ffmpeg and filters it's all quite simple.
Imagine that you have 2 tracks, A and B. And you want to crop ones and do something about the volume. So the solution would be:
ffmpeg -y -i 1.mp3 -i 2.mp3 i f454495482c151aea8761dda.mp3 -i f5544954796af4a171f11b57.mp3 -i f754495448788e35e6123679.mp3 -i f754495448788e35e6123679.mp3 -i f85449545e646dea98e5dd19.mp3 \
-filter_complex "[0]aformat=sample_fmts=fltp:sample_rates=44100:channel_layouts=stereo,volume='if(between(t,129.00,129.20),0.15000*(t - 129.00) + 0.03,1)':eval=frame,volume='if(between(t,129.20,181.50),-0.00057*(t - 129.20) + 0.06,1)':eval=frame,volume='if(between(t,181.50,181.60),0.40000*(t - 181.50) + 0.03,1)':eval=frame,volume='if(between(t,181.60,183.50),-0.03684*(t - 181.60) + 0.07,1)':eval=frame,volume='if(between(t,183.50,188.00),0.00000*(t - 183.50) + 0.00,1)':eval=frame,atrim=0.00:56.00,adelay=129000|129000|129000|129000,apad[0:o];[1]aformat=sample_fmts=fltp:sample_rates=44100:channel_layouts=stereo,volume='if(between(t,0.00,134.00),0.00000*(t - 0.00) + 0.06,1)':eval=frame,atrim=0.00:134.00,apad[1:o];[0:o][1:o]amix=inputs=28,atrim=duration=185.00" -shortest -ac 2 output.mp3
which will take 2 input files, transform both of the streams to the appropriate aformat and then apply volume filters.
The syntax for volume is simple: if time t is between some start and end time - then apply the volume filter, based on the desired start volume level plus by some coefficient multiplied by difference between the start time and current time t.
This will increase the volume linearly from initial volume to desired value on a range.
atrim will trim the audio chunk after the volume has been adjusted on all ranges.
ffmpeg is just amazing, the expressions could be very complex and many of math functions may be used in the expressions.

Resources