Sox mixing with a delay - audio

I have a problem with mixing sounds with delay.
I run this
sox -M f1.wav f1.wav f1.wav f1.wav out.wav delay 3 3 4 4 5 5
In the final file volume of the sound is changing(decreasing). How can i avoid this.

You can also control the volume for each of the signals using -v. You hadn't asked for this, but my experience is that at some point you might use this. And it took me a while to find this option online.
sox -m -v 1 file1.wav -v 0.5 file2.wav out.wav
Hope someone finds this useful.

While man sox does not describe automatic attenuation for merge (-M), this surprises me.
The following applies to "mix" mode (-m)
Unlike the other methods, 'mix' combining has the potential to cause
clipping in the combiner if no balancing is performed. So here, if
manual volume adjustments are not given, to ensure that clipping does
not occur, SoX will automatically adjust the volume (amplitude) of
each input signal by a factor of ¹/ n , where n is the number of input
files. If this results in audio that is too quiet or otherwise
unbalanced then the input file volumes can be set manually as
described above; using the norm effect on the mix is another
alternative.
So even though the manual explicitly excludes all modes other than "mix", I would certainly try the 'norm' effect or otherwise specify volume adjustments, since I cannot see how one would merge signals whilst avoiding clipping without attenuating somewhere.

Related

Remove start of new track from end of audio file

I've a couple of recordings where at the end of the track there's silence and the start of a new track (fraction).
I tried to remove the start of the new track from the end of the file.
My command (*nux)
sox file_in.mp3 -C 320 file_out.mp3 silence 1 0.75 0.2% -1 0.75 0.2%
Preferably keep the silence at the end or just add some new. Any help appreciated
Probably you can use the below period parameter without the above period.
A bit of silence can optionally be kept with the -l parameter (or you can just use pad with a fixed amount).
For example:
sox input.mp3 -C 320 output.mp3 silence -l 0 1 0.5 0.5%
Sample input:
Result:
Note, if the beginning of some new tracks has long silence periods inside, it may be tricky - but usually possible - to find appropriate duration and threshold parameters (the general case, apparently, it is not tractable at all by simple processing, one needs a machine learning approach to reliably mark inner periods of silence that belong to a track).
Also, if it is just a couple of files, opening an editor e.g. Audacity is the quickest and the best outcome approach.

Crossfade many audio files into one with FFmpeg?

Using FFmpeg, I am trying to combine many audio files into one long one, with a crossfade between each of them. To keep the numbers simple, let's say I have 10 input files, each 5 minutes, and I want a 10 second crossfade between each. (Resulting duration would be 48:30.) Assume all input files have the same codec/bitrate.
I was pleasantly surprises to find how simple it was to crossfade two files:
ffmpeg -i 0.mp3 -i 1.mp3 -vn -filter_complex acrossfade=d=10:c1=tri:c2=tri out.mp3
But the acrossfade filter does not allow 3+ inputs. So my naive solution is to repeatedly run ffmpeg, crossfading the previous intermediate output with the next input file. It's not ideal. It leads me to two questions:
1. Does acrossfade losslessly copy the streams? (Except where they're actively crossfading, of course.) Or do the entire input streams get reencoded?
If the input streams are entirely reencoded, then my naive approach is very bad. In the example above (calling acrossfade 9 times), the first 4:50 of the first file would be reencoded 9 times! If I'm combining 50 files, the first file gets reencoded 49 times!
2. To avoid multiple runs and the reencoding issue, can I achieve the many-crossfade behavior in a single ffmpeg call?
I imagine I would need some long filtergraph, but I haven't figured it out yet. Does anyone have an example of crossfading just 3 input files? From that I could automate the filtergraphs for longer chains.
Thanks for any tips!

Envelope pattern in SoX (Sound eXchange) or ffmpeg

I've been using SoX to generate white noise. I'm after a way of modulating the volume across the entire track in a way that will create a pattern similar to this:
White noise envelope effect
I've experimented with fade, but that fades in to 100% volume and fades out to 0% volume, which is just a pain in this instance.
The tremolo effect isn't quite what I'm after either, as the frequency of the pattern will be changing over time.
The only other alternative is to split the white noise file into separate files, apply fade and then apply trim to either end so it doesn't fade all the way, but this seems like a lot of unnecessary processing.
I've been checking out this example Using SoX to change the volume level of a range of time in an audio file, but I don't think it's quite what I'm after.
I'm using the command-line in Ubuntu with SoX, but I'm open to suggestions with ffmpeg, or any other Linux based command-line solution.
With ffmpeg, you could use the volume filter
ffmpeg -i input.wav -af \
"volume='if(lt(mod(t\,5)/5\,0.5), 0.2+0.8*mod(2*t\,5)/5\, 1.0-0.8*mod(t-(5/2)\,5)/(5/2))':eval=frame" \
output.wav
The expression in the filter above, increases the volume from 0.2 to 1.0 over t=0 to t=2.5 seconds, then gradually back down to 0.2 at t=5 seconds. The period of the envelope here is 5 seconds.

Sox: concatenate multiple audio files without a gap in between

I am concatenating multiple (max 25) audio files using SoX with
sox first.mp3 second.mp3 third.mp3 result.mp3
which does what it is supposed to; concatenates given files into one file. But unfortunately there is a small time-gap between those files in result.mp3. Is there a way to remove this gap?
I am creating first.mp3, second.mp3 and so on before concatenating them by merging multiple audios(same length/format/rate):
sox -m drums.mp3 bass.mp3 guitar.mp3 first.mp3
How can I check and assure that there is no time-gap added on all those files? (merged and concatenated)
I need to achieve a seamless playback of all the concatenated files (when playing them in browser one after another it works ok).
Thank you for any help.
EDIT:
The exact example (without real file-names) of a command I am running is now:
sox "|sox -m file1.mp3 file2.mp3 file3.mp3 file4.mp3 -p" "|sox -m file1.mp3 file6.mp3 file7.mp3 -p" "|sox -m file5.mp3 file6.mp3 file4.mp3 -p" "|sox -m file0.mp3 file2.mp3 file9.mp3 -p" "|sox -m file1.mp3 file15.mp3 file4.mp3 -p" result.mp3
This merges files and pipes them directly into concatenation command. The resulting mp3 (result.mp3) has an ever so slight delay between concatenated files. Any ideas really appreciated.
The best — though least helpful — way to do this is not to use MP3 files as your source files. WAV, FLAC or M4A files don't have this problem.
MP3s aren't made up of fixed-rate samples, so cropping out a section of an arbitrary length will not work as you expect. Unless the encoder was smart (like lame), there will often be a gap at the start or end of the MP3 file's audio. I did a test with a sample 0.98s long (which is precisely 73½ CDDA frames, and many MP3 encoders use frames for minimum sample lengths). I then encoded the sample with three different MP3 encoders (lame, sox, and the ancient shine), then decoded those files with three decoders (lame, sox, and madplay). Here's how the sample lengths compare to the original:
Enc.→Dec. Length Samples CDDA Frames
----------------- --------- ------- -----------
shine→lame 0.95" 42095 71.5901
shine→madplay 0.97" 42624 72.4898
shine→sox 0.97" 42624 72.4898
lame→lame 0.98" 43218 73.5000
*Original 0.98" 43218 73.5000
sox→sox 0.99" 43776 74.4490
sox→lame 1.01" 44399 75.5085
lame→madplay 1.02" 44928 76.4082
lame→sox 1.02" 44928 76.4082
sox→madplay 1.02" 44928 76.4082
Only the file encoded and decoded by lame ended up the same length (mostly because lame inserts a length tag to correct for these too-short samples, and knows how to decode it). Everything encoded by sox ended up with a tiny gap, no matter what decoder I used. So joining the files will result in tiny clicks.
Your browser is likely mixing and overlapping the source files very slightly so you don't hear the clicks. Gapless playback is hard to do correctly.
This is my guess for your issue:
sox does not add time gap during concatenation,
however it add time-gap in other operations, for instance if you do a conversion before the concatenation.
To find out what happens I suggest you to check all durations of your files at each time (you can use soxi for instance) to see what's going on.
If it doesn't work (the time-gap is added during concatenation), let me please do another guess:
Sox add time gap because your samples at the beginning or at the end of the file are not close to zero.
To solve this, you could use very short fade-in an fade-out on you files.
Moreover, to force sox to output files with a well-defined length, you could use the trim parameter like this:
sox filein.mp3 trim 0 duration fileout.mp3
First you need really check if the start and the end of your files has no silences, i dont know if sox can do it but you need check the energy(rms, dB) of the start and end audio signals and cut start and end silence, to join audio files without gaps you need apply one window function in your signal to works like a fadein/fadeout and then crossfade the beginning of one with the end of the other.
sox provide a splice function to crossfade:
splice [−h|−t|−q] { position[,excess[,leeway]] }
Splice together audio sections. This effect provides two things over simple audio concatenation: a (usually short) cross-fade is applied at the join, and a wave similarity comparison is made to help determine the best place at which to make the join.
Check Documentation here

glreadpixels is slower than x11 based screenshot

I am working on an opengl based simulation application, in which I need to make multiple screenshots in a second. I have tried 2 ways of doing it in my application.
1) use glreadpixels
2) use x11 based screenshot. ex: ffmpeg -f x11grab -s 1024x768 -i :0.0 output.png
I found that second solution is about 3 times faster than first one. I have expected the first solution to be faster. But in practice it is slower. I am curious why glreadpixels is slower?
glReadPixels (...) is a synchronous round-trip operation (when it is not used with a Pixel Buffer Object). You send it to GL, it has to finish all of the commands it has buffered up to that point and then it returns the result of that operation.
On the other hand, if you use a method defined by a window system to grab the contents of a window, the window system is free to implement the operation a number of different ways. Often you will get a copy of the last thing the window system actually presented, which may be 1 or more frames older than what you would get if you called glReadPixels (...) and waited for GL to finish drawing.

Resources