Split audio file but only in pauses - audio

I've been playting around with sox and using the trim command it should be fairly simple to split the whole audio into n parts (with a fixed length per part).
However as I intend to split spoken recordings it might happen that a simple splitting will split in the middle of a word.
Is there a way to prevent that and make sure that parts contain "whole words"?

Take a look at the sox silence command on the sox webpage.
sox original.wav new.wav silence 1 0.5 2% 1 2.0 2% : newfile : restart
Parameters:
original.wav - the audio file to be spliced.
new.wav - will be the name of the new audio files with numbers appended to each slice (new1.wav, new2.wav, new3.wav...).
silence - name of the effect.
1 0.5 2% - above_periods, duration, threshold.
1 2.0 2% - below_periods, duration, threshold.

Related

How to detect delay or silence in an audio file?

I want to detect silence or delay in audio for a given duration file and remove it. For example, if someone started speaking and then paused for some duration to think.
There's this question but it only detects the silence at the end and doesn't remove it. My colleague suggested sox but I'm not sure if it's the best tool for the job nor how to use it frankly, moreover, the project died in 2015.
The Sox man page describes this in detail.
silence [-l] above-periods [duration threshold[d|%]
[below-periods duration threshold[d|%]]
if we start with a sample command:
sox input.mp3 out.mp3 -S silence -l 1 0.2 1% -1 0.2 1%
`-S` - show progress
`silence` - the filter
`-l` - leave x amount of each silence in tact
`1` - trim from 1st silence [above-periods]
`0.2` - amount of each silence to leave untouched [duration]
`1%` - test for near absolute (0%) silence [threshold]
`-1` - trim silence from the middle of the file [below-periods]
`0.2` - amount of each silence to leave untouched [duration]
`1%` - test for near absolute (0%) silence [threshold]
The detail courtesy of Sox:
silence [-l] above-periods [duration threshold[d|%]
[below-periods duration threshold[d|%]]
Removes silence from the beginning, middle, or end of the audio. `Si‐
lence' is determined by a specified threshold.
The above-periods value is used to indicate if audio should be trimmed at
the beginning of the audio. A value of zero indicates no silence should
be trimmed from the beginning. When specifying a non-zero above-periods,
it trims audio up until it finds non-silence. Normally, when trimming si‐
lence from beginning of audio the above-periods will be 1 but it can be
increased to higher values to trim all audio up to a specific count of
non-silence periods. For example, if you had an audio file with two songs
that each contained 2 seconds of silence before the song, you could spec‐
ify an above-period of 2 to strip out both silence periods and the first
song.
When above-periods is non-zero, you must also specify a duration and
threshold. duration indicates the amount of time that non-silence must be
detected before it stops trimming audio. By increasing the duration,
burst of noise can be treated as silence and trimmed off.
threshold is used to indicate what sample value you should treat as si‐
lence. For digital audio, a value of 0 may be fine but for audio
recorded from analog, you may wish to increase the value to account for
background noise.
When optionally trimming silence from the end of the audio, you specify a
below-periods count. In this case, below-period means to remove all au‐
dio after silence is detected. Normally, this will be a value 1 of but
it can be increased to skip over periods of silence that are wanted. For
example, if you have a song with 2 seconds of silence in the middle and 2
second at the end, you could set below-period to a value of 2 to skip
over the silence in the middle of the audio.
For below-periods, duration specifies a period of silence that must exist
before audio is not copied any more. By specifying a higher duration,
silence that is wanted can be left in the audio. For example, if you
have a song with an expected 1 second of silence in the middle and 2 sec‐
onds of silence at the end, a duration of 2 seconds could be used to skip
over the middle silence.
Unfortunately, you must know the length of the silence at the end of your
audio file to trim off silence reliably. A workaround is to use the si‐
lence effect in combination with the reverse effect. By first reversing
the audio, you can use the above-periods to reliably trim all audio from
what looks like the front of the file. Then reverse the file again to
get back to normal.
To remove silence from the middle of a file, specify a below-periods that
is negative. This value is then treated as a positive value and is also
used to indicate that the effect should restart processing as specified
by the above-periods, making it suitable for removing periods of silence
in the middle of the audio.
The option -l indicates that below-periods duration length of audio
should be left intact at the beginning of each period of silence. For
example, if you want to remove long pauses between words but do not want
to remove the pauses completely.
duration is a time specification with the peculiarity that a bare number
is interpreted as a sample count, not as a number of seconds. For speci‐
fying seconds, either use the t suffix (as in `2t') or specify minutes,
too (as in `0:02').
threshold numbers may be suffixed with d to indicate the value is in
decibels, or % to indicate a percentage of maximum value of the sample
value (0% specifies pure digital silence).
Finally, a python sample code, to enable monitoring of the output:
try:
self.comm = Popen(['sox', self.orig_file, self.new_file, '-S', 'silence',\
'-l', '1', '0.1', '1%', '-1', str(self.secs.GetValue()), '1%'],\
stdout=PIPE, stderr=STDOUT, universal_newlines=True)
except Exception as e:
......
It's worth pointing out that ffmpeg has the filters silencedetect and silenceremove.
While I do use silencedetect e.g.:
ffmpeg -hide_banner -stats -i interview.wav -af silencedetect=noise=0dB:d=3 -vn -sn -dn -f null -
silenceremove e.g.:
ffmpeg -hide_banner -v quiet -i interview.wav -af silenceremove=stop_periods=-1:stop_duration=2:stop_threshold=-3dB -vn -sn -dn -f wav - | ffplay -hide_banner -v quiet -autoexit -i -
I've found to be less dependable.
It should also be pointed out, that silence is notoriously difficult to pin down, except on an individual/ad-hoc basis, due to background noise.

Remove start of new track from end of audio file

I've a couple of recordings where at the end of the track there's silence and the start of a new track (fraction).
I tried to remove the start of the new track from the end of the file.
My command (*nux)
sox file_in.mp3 -C 320 file_out.mp3 silence 1 0.75 0.2% -1 0.75 0.2%
Preferably keep the silence at the end or just add some new. Any help appreciated
Probably you can use the below period parameter without the above period.
A bit of silence can optionally be kept with the -l parameter (or you can just use pad with a fixed amount).
For example:
sox input.mp3 -C 320 output.mp3 silence -l 0 1 0.5 0.5%
Sample input:
Result:
Note, if the beginning of some new tracks has long silence periods inside, it may be tricky - but usually possible - to find appropriate duration and threshold parameters (the general case, apparently, it is not tractable at all by simple processing, one needs a machine learning approach to reliably mark inner periods of silence that belong to a track).
Also, if it is just a couple of files, opening an editor e.g. Audacity is the quickest and the best outcome approach.

Trim audio file and get the part between silence

My goal is to get the parts of audio file that contains non-noise sounds by using SoX. I have read the effects of SoX and found noisered and silence which I consider helpful. The problem is that I have not found command that can trim the audio file based on the silent pauses in it.
I believe that what you are looking for can be achieved with a sox silence command. It allows you to remove the silence from any part of the audio given a threshold, durations above it, etc.
For a detailed manual please refer to the sox webpage, the silence section is very well written.
If you want to split at silence and not to "squeeze" everything together, then you might want to try something like:
sox input.wav slice.wav silence 1 1.0 2% 1 3.0 2% : newfile : restart
Parameters are:
input.wav - input audio file
slice.wav - output audio files name (numbers will be appended to each slice)
silence - effect name
1 1.0 2% - above_periods, duration, threshold
1 3.0 2% - below_periods, duration, threshold

How to batch split audio files wherever there is silence?

I am using the following command in SoX to split many large audio files at each place where there is silence longer than 0.3 seconds:
sox -V3 input.wav output.wav silence 1 0.50 0.1% 1 0.3 0.1% : newfile : restart
This however ends up occasionally creating files that are entirely silent and trimming the audio before each break.
I found better results with Audacity, but I need to split hundreds of WAV files and Audacity cannot even open 10 files simultaneously without freezing.
How can I use SoX or similar software to split the files at the end of the 0.3 second periods of silence, such that the silent portion is still affixed to the end of the speaking, but not before and there are no clips that are entirely silent, unless they come from the beginning of input.wav?
if you change 0.5 to 3.0, it works fine:
sox -V3 input.wav output.wav silence 1 3.0 0.1% 1 0.3 0.1% : newfile : restart
You didn't specify any programming language, so I assume that you're not especially looking for a way to program it yourself (which makes it a bit off-topic here). It wouldn't be very hard to do by the way.
Anyway, maybe this does the trick for you:
http://www.nch.com.au/splitter/
You can set a threshold in dB to split. I guess that when you set it to 0dB, you'll get all the audio that you need per slice.
sox -V3 orig.wav p.wav silence -l 0 1 0.5 0.1% : newfile : restart
this works for me. I found some useful explanations on the command here and here.

Detecting and printing timestamps of periods of silence using SoX

I am trying to output the begin-timestamps of periods of silence (since there is background noise, by silence I mean a threshold) in a given audio file. Eventually, I want to split the audio file into smaller audio files, given these timestamps. It is important that no part of the original file be discarded.
I tried
sox in.wav out.wav silence 1 0.5 1% 1 2.0 1% : newfile : restart
(courtesy http://digitalcardboard.com/blog/2009/08/25/the-sox-of-silence/)
Although, it somewhat did the job, it also trimmed and discarded the periods of silence, which I do not want happening.
Is 'silence' the right option, or is there a simpler way to accomplish what I need to do?
Thanks.
Unfortunately not Sox, but ffmpeg has a silencedetect filter that does exactly what you're looking for:
ffmpeg -i in.wav -af silencedetect=noise=-50dB:d=1 -f null -
(detecting threshold of -50db, for a minimum of 1 seconds, cribbed from the ffmpeg documentation)
...this would print a result like this:
Press [q] to stop, [?] for help
[silencedetect # 0x7ff2ba5168a0] silence_start: 264.718
[silencedetect # 0x7ff2ba5168a0] silence_end: 265.744 | silence_duration: 1.02612
size=N/A time=00:04:29.53 bitrate=N/A
There is (currently, at least) no way to make the silence effect output the position where it has detected silence, or to retain all of the silent audio.
If you are able to recompile SoX yourself, you could add an output statement yourself to find out about the cut positions, then use trim in a separate invocation to split the file. With the stock version, you are out of luck.
SoX can easily give you the timestamps of the actual silences in a text file. Not periods of silence though, but you can calculate those with a simple script
.dat Text Data files. These files contain a textual representation of the sample data. There is one line at the beginning that contains the sample
rate, and one line that contains the number of channels. Subsequent lines contain two or more numeric data intems: the time since the beginning of
the first sample and the sample value for each channel.
Values are normalized so that the maximum and minimum are 1 and -1. This file format can be used to create data files for external programs such as
FFT analysers or graph routines. SoX can also convert a file in this format back into one of the other file formats.
Example containing only 2 stereo samples of silence:
; Sample Rate 8012
; Channels 2
0 0 0
0.00012481278 0 0
So you can do sox in.wav out.dat, then parse the text file and consider a silence a sequence of rows with a value close to 0 (depending on your threshold)
necroposting:
You can run a separate script that iterates all of the sox output files, (for f in *.wav), and use the command; soxi -D $f to obtain the DURATION of the sound clip.
Then, get the system time in seconds date "+%s", then subtract to find the time the recording starts.

Resources