Currently, I use sox like this:
sox -d -e u-law --endian little -b 8 -c 1 -r 8000 -t ul - silence 1 0.3 1% 1 0.3 1%
For reference, this is recording audio from the default microphone and outputting little endian, ulaw formatted audio at 8 bits and a 8k rate. The effects filter trims audio until the noise hits a threshold for 0.3 seconds, then continues to record until there is 0.3 seconds of silence. All of this streams to stdout which I use to stream to a remote server.
I am using all of this to record a bit of voice and finish when I am done speaking. To trigger sox, I use specialized hardware to trigger the start of the recording. I can switch to using almost any audio format or codec as long as it supports on the fly formatting/encoding. My target platform is raspbian on the raspberry pi 2 B.
My ideal solution would be to use vad to stop the recording when the user is finished speaking. My hope is that this would work even with background chatter. However, the sox documentation on the vad effect states this:
The use of the norm effect is recommended, but remember that neither
reverse nor norm is suitable for use with streamed audio.
I haven't been able to piece parameters together to get vad and streaming working. Is it possible to use the vad effect to stop the recording of audio while still maintaining the stdin->sox->stdout piping? Are there better alternatives?
Is it possible to use the vad effect to stop the recording of audio while still maintaining the stdin->sox->stdout piping?
No. The vad effect can trim silence only from the front of the audio. So you could only use it to detect recording start, and not ending and pauses.
The reverse and norm filters need all the input data before they produce any data on output, that is why they cannot be used with streaming.
The key is to select a good threshold for silence filter so it takes "background chatter" as silence.
You could use also noisered (with a profile based on previous recordings) before silence to reduce noise triggering the recording, but this will also affect output and probably will not take "background chatter" as noise.
Related
My machine is running Ubuntu 20 LTS. I want to manipulate the input live audio in real-time. I have achieved pitch shifting using sox. The command being -
sox -t pulseaudio default -t pulseaudio null pitch +1000
and then routing the audio from "Monitor of Nullsink" .
What I actually want to do is, silence randomized parts of the input audio, with a range. What I mean is, randomly mute 1-2s of the input audio.
The final goal of this project will be to write a script that manipulates my voice and makes it seems like my network is bad.
There is no restriction in method of achieving. That is we may use any language, make an extension, directly manipulate the input audio with sox, ffmpeg etc. Anything goes.
Found the solution by using trim in sox. The project can be found in
https://github.com/TathagataRoy1278/Bad_Internet_Audio_Modulator
I'm trying to detect speech volume above a threshold in short, 2-3 second, audio files with sox but it's always coming out about 90% max volume regardless of silence or noise.
This is the command i'm using (i've tried varying the scale option):
sox noise.wav -n stats -s 99
If i shout and have the microphone in my mouth or bash it i can get a detectable difference of about 95% volume but it is a desktop style microphone. Playing back the audio files there is an audible silence recorded but there is still a big distinction when speaking from a distance.
Is there a setting i'm missing or has anyone else encountered this?
I want to mix about 20 audio streams with ffmpeg amix, however as described here, amix has a weired way of making the input streams quieter the more of them you mix together:
"amix scales each input's volume by 1/n where n = no. of active inputs. This is evaluated for each audio frame. So when an input drops out, the volume of the remaining inputs is scaled by a smaller amount, hence their volumes increase"
How can I get rid of this anoying behaviour?
I just want the audio streams to keep the same loudness, since only one of them has actual audio in it at any give time anyway..
At the moment I end up with a file that is about 1/20 the loudness of the original, making it effectively unusable.
Adjust the volume of each stream by multiplying by n (20 in your case)
https://ffmpeg.org/ffmpeg-filters.html#volume
I've been using SoX to generate white noise. I'm after a way of modulating the volume across the entire track in a way that will create a pattern similar to this:
White noise envelope effect
I've experimented with fade, but that fades in to 100% volume and fades out to 0% volume, which is just a pain in this instance.
The tremolo effect isn't quite what I'm after either, as the frequency of the pattern will be changing over time.
The only other alternative is to split the white noise file into separate files, apply fade and then apply trim to either end so it doesn't fade all the way, but this seems like a lot of unnecessary processing.
I've been checking out this example Using SoX to change the volume level of a range of time in an audio file, but I don't think it's quite what I'm after.
I'm using the command-line in Ubuntu with SoX, but I'm open to suggestions with ffmpeg, or any other Linux based command-line solution.
With ffmpeg, you could use the volume filter
ffmpeg -i input.wav -af \
"volume='if(lt(mod(t\,5)/5\,0.5), 0.2+0.8*mod(2*t\,5)/5\, 1.0-0.8*mod(t-(5/2)\,5)/(5/2))':eval=frame" \
output.wav
The expression in the filter above, increases the volume from 0.2 to 1.0 over t=0 to t=2.5 seconds, then gradually back down to 0.2 at t=5 seconds. The period of the envelope here is 5 seconds.
To detect speech I'm playing with this sox command:
rec voice.wav silence 1 5 30% 1 0:00:02 30%
It should start recording whenever the input volume raises about the threshold of 30% and stops after 2 seconds the audio falls below the same threshold.
It works. But It would be much better if it could be "retriggerable". I mean: after the audio falls below the threshold and the audio rises again, it should continue the registration (i.e. the user is still speaking).
It should stops only when it detects silence for whole 2 seconds.
Or do you recommend any other "VOX" tool?
I've spent a lot of time experimenting with SOX to do VOX and have gotten it to work reasonably well. I've been using Audacity to view the resultant wave form, and have settled on the following SOX command...
rec snd.wav silence 1 .5 2.85% 1 1.0 3.0% vad gain -n : newfile : restart
This will:
wait until it hears activity above the threshold for a half second, then start recording (silence 1 .5 2.85%)
stop recording when audible activity falls to zero for one second (... 1 1.0 3.0%)
trim off any initial silence up to voice detection (vad)
normalize the gain (gain -n)
store the result into a new file (snd001.wav, snd002.wav)
restart the process
Getting the "silence" numbers correct involved a lot of trial and error, and will depend on ambient noise as well as the sensitivity of your microphone. I'm using the microphone in the Logitech QuickCam IM on a Raspberry Pi through USB.
On a side note, this whole thing complains with the following...
rec FAIL formats: can't open input `default': snd_pcm_open error: No such file or directory
... until I created this variable in the environment:
export AUDIODEV=hw:1,0
Again - this involved a lot of experimentation with the values for "silence", and it WILL need some tweaking for your environment.