I need to concat multiple mp3 files together then adjust there volume then play via aplay. I currently do this using the following 3 commands
sox file1.mp3 file2.mp3 file3.mp3 out1.wav
sox -v 0.5 out1.wav out2.wav
aplay -D plughw:1,0 out2.wav
This works correctly the only minor issue is it creates temporary files and I know it can be done by piping all these commands together somehow. Sort of like.
sox file1.mp3 file2.mp3 file3.mp3 | sox -v 0.5 | aplay -D plughw:1,0
But can't appear to get the piping to work (I am not really a linux user) Any help would be much appreciated :)
I am using SoX command line tool on Linux inside a Makefile to interleave two raw (float 32 bit) input audio files into one file:
make_combine:
sox \
--bits 32 --channels 1 --rate 48000 signal_1.f32 \
--bits 32 --channels 1 --rate 48000 signal_2.f32 \
--type raw --channels 2 --combine merge signal_mixed.f32
I ran into problems when signal_1 and signal_2 are different length. How would I limit the mixed output to shorter of the two inputs?
Use soxi -s to find the shortest file, e.g.:
samps=$(soxi -s signal_1.f32 signal_2.f32 | sort -n | head -n1)
Then use the trim effect to shorten the files, e.g. (untested):
sox --combine merge \
"| sox signal_1.f32 -p trim 0 ${samps}s" \
"| sox signal_2.f32 -p trim 0 ${samps}s" \
signal_mixed.f32
Note: If you want me to test it, provide some sample data.
I'm creating one of those "Brady Bunch" videos for a choir using a C# application I'm writing that uses ffmpeg for all the heavy lifting, and for the most part it's working great but I'm having trouble getting the audio levels just right.
What I'm doing right now, is first "normalizing" the audio from the individual singers like this:
Extract audio into a WAV file using ffmpeg
Load the WAV file into my application using NAudio
Find the maximum 16-bit value
When I create the merged video, specify a volume for this stream that boosts the maximum value to 32767
So, for example, if I have 3 streams: stream A's maximum audio is 32767 already, stream B's maximum audio is 32000, and stream C's maximum audio is 16000, then when I merge these videos I will specify
[0:a]volume=1.0,aresample=async=1:first_pts=0[aud0]
[1:a]volume=1.02,aresample=async=1:first_pts=0[aud1]
[2:a]volume=2.05,aresample=async=1:first_pts=0[aud2]
[aud0][aud1][aud2]amix=inputs=3[a]
(I have an additional "volume tweak" that lets me adjust the volume level of individual singers as necessary, but we can ignore that for this question)
I am reading the ffmpeg wiki on Audio Volume Manipulation, and I will implement that next, but I don't know what to do with the output it generates. It looks like I'm going to get mean and max volume levels in dB and while I understand decibels in a "yeah, I learned about those in college 30 years ago" kind of way, I don't know how to use those values to normalize the audio of my input videos.
The problem is, in the ffmpeg output video, the audio level is quite low. If I do the same process of extracting the audio and looking at the WAV file in the merged video that ffmpeg generated, the maximum value is only 4904.
How do I implement an algorithm that automatically sets the output volume to a "reasonable" level? I realize I can simply add a manual volume filter and have the human set the level, but that's going to be a lot of back & forth of generating the merged video, listening to it, adjusting the level, merging again, etc. I want a way where my application figures out an appropriate output volume (possibly with human adjustment allowed).
EDIT
Asking ffmpeg to determine the mean and max volume of each clip does provide mean and max volume in dB, and I can then use those values to scale each input clip:
[0:a]volume=3.40dB,aresample=async=1:first_pts=0[aud0]
[1:a]volume=3.90dB,aresample=async=1:first_pts=0[aud1]
[2:a]volume=4.40dB,aresample=async=1:first_pts=0[aud2]
[3:a]volume=-0.00dB,aresample=async=1:first_pts=0[aud3]
But my final video is still strangely quiet. For now, I've added a manually-entered volume factor that gets applied at the very end:
[aud0][aud1][aud2]amix=inputs=3[a]
[a]volume=volume=3.00[b]
So my question is, in effect, how do I determine algorithmically what this final volume factor needs to be?
MORE EDIT
There's something deeper going on here, I just set the volume filter to 100 and the output is only slightly louder. Here are my filters, and the relevant portions of the command line:
color=size=1920x1080:c=0x0000FF [base];
[0:v] scale=576x324 [clip0];
[0:a]volume=1.48,aresample=async=1:first_pts=0[aud0];
[1:v] crop=808:1022:202:276,scale=384x486 [clip1];
[1:a]volume=1.57,aresample=async=1:first_pts=0[aud1];
[2:v] crop=1160:1010:428:70,scale=558x486 [clip2];
[2:a]volume=1.66,aresample=async=1:first_pts=0[aud2];
[3:v] crop=1326:1080:180:0,scale=576x469 [clip3];
[3:a]volume=1.70,aresample=async=1:first_pts=0[aud3];
[4:a]volume=0.20,aresample=async=1:first_pts=0[aud4];
[5:a]volume=0.73,aresample=async=1:first_pts=0[aud5];
[6:v] crop=1326:1080:276:0,scale=576x469 [clip4];
[6:a]volume=1.51,aresample=async=1:first_pts=0[aud6];
[base][clip0] overlay=shortest=1:x=32:y=158 [tmp0];
[tmp0][clip1] overlay=shortest=1:x=768:y=27 [tmp1];
[tmp1][clip2] overlay=shortest=1:x=1321:y=27 [tmp2];
[tmp2][clip3] overlay=shortest=1:x=32:y=625 [tmp3];
[tmp3][clip4] overlay=shortest=1:x=672:y=625 [tmp4];
[aud0][aud1][aud2][aud3][aud4][aud5][aud6]amix=inputs=7[a];
[a]adelay=delays=200:all=1[b];
[b]volume=volume=100.00[c];
[c]asplit[a1][a2];
ffmpeg -y ....
-map "[tmp4]" -map "[a1]" -c:v libx264 "D:\voutput.mp4"
-map "[a2]" "D:\aoutput.mp3""
When I do this, the audio I want is louder (loud enough to clip and get distorted), but definitely not 100x louder.
after mixing audio, run
ffmpeg -i output.mp3 -filter:a volumedetect -map 0:a -f null /dev/null
get value from string like this:
[Parsed_volumedetect_0 # 0xdigitsletters] max_volume: -16.5 dB
add to filters this value, but positive: ...]amix=inputs=7,volume=16.5dB[a]
[edit]
do it after mixing audio.
[update]
I did some investigation:
[update 2]
#!/bin/bash
f="input 1.mp3"
INP=("-ss" "30" "-i" "$f")
FCT=1
FLA="[0:a:0]aresample=async=1:first_pts=0[0a0]; "
AUD="[0a0]"
MAX=$(ffmpeg -hide_banner -i "$f" -map 0:a -filter:a volumedetect -f null /dev/null 2>&1 | grep 'max_volume' | awk '{print $4, $5, $6}')
echo " $FCT $MAX"
for f in /mnt/sklad/Музыка/*.mp3; do
INP+=("-ss" "30" "-i" "$f")
FLA+="[${FCT}:a:0]aresample=async=1:first_pts=0[${FCT}a0]; "
AUD+="[${FCT}a0]"
((FCT++))
printf -v OUT "%02d" $FCT
ffmpeg -v error -hide_banner "${INP[#]}" -filter_complex "${FLA} ${AUD}amix=inputs=${FCT}[a]" -map [a] -c:a aac -q:a 4 -t 30 -y "out_${OUT}.mkv"
MAX=$(ffmpeg -hide_banner -i "$f" -map 0:a -filter:a volumedetect -f null /dev/null 2>&1 | grep 'max_volume' | awk '{print $5, $6}')
echo " $FCT $MAX"
done
for f in out_*.mkv; do
MAX=$(ffmpeg -hide_banner -i "$f" -map 0:a -filter:a volumedetect -f null /dev/null 2>&1 | grep 'max_volume' | awk '{print $5, $6}')
echo " $f $MAX"
done
output:
1 max_volume: -1.1 dB
2 -0.2 dB
3 0.0 dB
4 -1.9 dB
5 -0.1 dB
6 -0.9 dB
7 0.0 dB
8 0.0 dB
9 0.0 dB
10 0.0 dB
11 0.0 dB
12 0.0 dB
13 -0.5 dB
14 -1.1 dB
15 0.0 dB
16 0.0 dB
17 -0.0 dB
out_02.mkv -4.4 dB
out_03.mkv -5.0 dB
out_04.mkv -6.8 dB
out_05.mkv -7.1 dB
out_06.mkv -8.3 dB
out_07.mkv -8.9 dB
out_08.mkv -8.9 dB
out_09.mkv -8.8 dB
out_10.mkv -8.9 dB
out_11.mkv -9.7 dB
out_12.mkv -10.3 dB
out_13.mkv -11.1 dB
out_14.mkv -11.3 dB
out_15.mkv -10.6 dB
out_16.mkv -10.9 dB
out_17.mkv -11.2 dB
get other result, but still there is no strong pattern
I am trying to convert few .wav files to .mp3 format
The desired .mp3 format is :
I tried with FFmpeg with this code :
ffmpeg -i input.wav -vn -ac 2 -b:a 160k output1.mp3
This is the output of this command on one .wav format
I am getting the result but two things are different
Overall bit rate mode and Writing library
Writing library: LAME3.99.5 vs LAME3.100 ( I think this shouldn't
make any problem?)
bit rate mode Constant Vs variable
How to change bit rate mode from variable to Constant? and do I need to convert using the same Writing library?
Thanks!
The output using ffmpeg -i input.wav -vn -ac 2 -b:a 160k output1.mp3 is constant bit rate, however ffmpeg writes a header with the title Xing and Mediainfo infers that to indicate VBR. Disable writing that header if you want Mediainfo to detect Constant bit rate.
ffmpeg -i input.wav -vn -ac 2 -b:a 160k -write_xing 0 output1.mp3
Note that the actual MP3 encoding won't change.
I ended up using sox instead of FFmpeg :
sox -t wav -r 48000 -b 16 -c 2 file.wav -C 160 -t mp3 sock33.mp3
Sample rate of 48 kHz (-r 48000)
two channel (-c 2)
16 bits bit depth (-b 16)
So I'm currently trying to stream my microphone input from my raspberry pi (rasbian)
to some sort of network stream in order to receive it later on my phone.
In order to do this I use arecord -D plughw:1,0 -f dat -r 44100 | top pipe the soundstream from my usb-microphone to stdout which works fine as far as I can see but I needed it to be a bit louder so I can understand people standing far away from it .
So i piped it to the sox play command like this :
arecord -D plughw:1,0 -f dat -r 44100| play -t raw -b 16 -e signed -c 2 -v 7 -r 44100 - test.wav
(test.wav is just some random wav file id doesn't work without it and there is a whitespace between the - behind 44100 and test.wav because i think - is a seperate parameter:
SPECIAL FILENAMES (infile, outfile):
- Pipe/redirect input/output (stdin/stdout); may need -t
-d, --default-device Use the default audio device (where available))
I figured out by using the -v parameter i can increase the volume.
This plays the recorded stream to the speakers I connected to the raspberry pi 3 .
Final goal : pipe the volume increased soundstream to the stdout(or some fifopipe file) so i can get it from stdin inside another script to send it to my phone.
However im very confused by the manpage of the play command http://sox.sourceforge.net/sox.html
i need to select the outputdevice to pipe or stout or something
if you know a better way to just increase the voulme of the i think Recording WAVE 'stdin' : Signed 16 bit Little Endian, Rate 44100 Hz, Stereosoundstream let me know
As far as I'm aware you can't pipe the output from play, you'll have to use the regular sox command for that.
For example:
# example sound file
sox -n -r 48k -b 16 test16.wav synth 2 sine 200 gain -9 fade t 0 0 0.1
# redundant piping
sox test16.wav -t wav - | sox -t wav - gain 8 -t wav - | play -
In the case of the command in your question it should be sufficient to change play to sox and add -t wav to let sox know in what format you want to pipe the sound.
arecord -D plughw:1,0 -f dat -r 44100 | \
sox -t raw -b 16 -e signed -c 2 -v 7 -r 44100 - -t wav -