sox - how to create mp3 file with bit rate 16kbps - audio

Currently the command used is
`sox input.wav -G -t mp3 -r 16k test.mp3`
But this is creating a file with bit rate 24.0 kbps.
How to make the bit rate of the out put file to 16.0 kbps?

In the sox formats manual you find, that it is the -C option. Below I quote the whole section because you may find it interesting.
However, if I call sox test.wav -C 16.01 test.mp3 my testfile (48kHz/16bit) is converted to 32kbps. If I call lame test.wav -b 16 -q 0 test.mp3 I get 16kbps but test.mp3 is converted to a samplerate of 8kHz. But if I really want to keep my 48kHz with lame test.wav -b 16 -q 0 --resample 48000 test.mp3 I also get 32kbps. So we see, there is a compromise between a high samplerate and a high compression ratio.
MP3 compressed audio; MP3 (MPEG Layer 3) is a part of the patent-encumbered MPEG standards for audio and video compression. It is a lossy compression format that achieves good compression rates with little quality loss.
Because MP3 is patented, SoX cannot be distributed with MP3 support without incurring the patent holder’s fees. Users who require SoX with MP3 support must currently compile and build SoX with the MP3 libraries (LAME & MAD) from source code, or, in some cases, obtain pre-built dynamically loadable libraries.
When reading MP3 files, up to 28 bits of precision is stored although only 16 bits is reported to user. This is to allow default behavior of writing 16 bit output files. A user can specify a higher precision for the output file to prevent lossing this extra information. MP3 output files will use up to 24 bits of precision while encoding.
MP3 compression parameters can be selected using SoX’s −C option as follows (note that the current syntax is subject to change):
The primary parameter to the LAME encoder is the bit rate. If the value of the −C value is a positive integer, it’s taken as the bitrate in kbps (e.g. if you specify 128, it uses 128 kbps).
The second most important parameter is probably "quality" (really performance), which allows balancing encoding speed vs. quality. In LAME, 0 specifies highest quality but is very slow, while 9 selects poor quality, but is fast. (5 is the default and 2 is recommended as a good trade-off for high quality encodes.)
Because the −C value is a float, the fractional part is used to select quality. 128.2 selects 128 kbps encoding with a quality of 2. There is one problem with this approach. We need 128 to specify 128 kbps encoding with default quality, so 0 means use default. Instead of 0 you have to use .01 (or .99) to specify the highest quality (128.01 or 128.99).
LAME uses bitrate to specify a constant bitrate, but higher quality can be achieved using Variable Bit Rate (VBR). VBR quality (really size) is selected using a number from 0 to 9. Use a value of 0 for high quality, larger files, and 9 for smaller files of lower quality. 4 is the default.
In order to squeeze the selection of VBR into the the −C value float we use negative numbers to select VRR. -4.2 would select default VBR encoding (size) with high quality (speed). One special case is 0, which is a valid VBR encoding parameter but not a valid bitrate. Compression value of 0 is always treated as a high quality vbr, as a result both -0.2 and 0.2 are treated as highest quality VBR (size) and high quality (speed).

Related

Using FFmpeg or Similar to Normalize audio in a video to EBU R128 standard

This is my first time here on stack overflow asking question.
I am stuck and really struggling with this. I am trying to make some of my MXF video files to be EBU r128 standard for its audio.
This means that it has to be -23 and not higher than 0.5.
My current process
Watch_folder > Encoding to MXF > Output_folder
I need to makesure when its comes to output folder, those MXF files are EBU R128 Loudness compliant.
What I have done so Far:
FFMPEG:
ffmpeg -i input.mxf -af loudnorm=I=-23:LRA=7:tp=-2:print_format=json -f null -
got the result:
Input Integrated: -15.1 LUFS
Input True Peak: +0.0 dBTP
Input LRA: 17.1 LU
Input Threshold: -26.2 LUFS
Output Integrated: -17.1 LUFS
Output True Peak: -1.5 dBTP
Output LRA: 5.3 LU
Output Threshold: -27.6 LUFS
Normalization Type: Dynamic
Target Offset: +1.1 LU
then i did
ffmpeg -i input.mxf -af loudnorm=I=-23:LRA=7:tp=-2:measured_I=-15.1:measured_LRA=17.1:measured_tp=0:measured_thresh=-27.6:offset=1.1 -ar 48k -y output.mxf
However, when i put it through the software Eff, it says that its not EBU compliant.
*EDIT:
This also reduces the quality. for example; my 6 Gb becomes 250 MB and you can tell the quality downgraded
ffmpeg-normalize
I did the following
ffmpeg-normalize input.mxf -c:a pcm_s32le -ar 48000 -o output.mxf
but this gives me errors.
if i do it without the output file type, i get a mkv which will not work for me. i need it to be mxf.
OK, a few issues here.
Firstly, if your file is measured at -26.2 LUFS, you'd need to add 3.2 dB to get it to -23. But you can't do that, because your true peak is too high (you'd be over full scale). You'll need to compress (dynamic audio compression, not file/rate compression) the audio or use at least a limiter to achieve this.
A good R128 audio track should be mixed properly rather than just run through a normaliser, otherwise you risk it either failing the standard or unwanted audio effects.
If you don't have access to audio editing software or someone who can do this for you, then FFMPEG does include an audio limiter, which will give you enough headroom to raise the level to -23 LUFS.
You can do that with something like this:
-filter_complex alimiter=level_in=1:level_out=1:limit=1.5:attack=7:release=100:level=disabled
However, tuning a limiter well depends on what the video file is of (music, speech, etc) and it is something that's worth taking some time over. Alter the attack and release values until you get the result you want.
Secondly, the reason that FFMPEG has produced a smaller file of lower quality is because you didn't specify anything in the video section. FFMPEG's default action with video is (usually) to encode to h264, so whatever your codec here is (I am assuming DNxHD from the fact that you're using an MXF wrapper) needs to be specified. FFMPEG will copy the video stream though and leave it alone if you include the option -c:v copy (which means copy video codec, basically).
Post your results once you have tried these...!

Difference between sampling rate, bit rate and bit depth

This is kind of a basic question which might sound too obvious to many of you , but I am getting confused so bad.
Here is what a Quora user says. Now It is clear to me what a Sampling rate is - The number of samples you take of a sound signal (in one second) is it's sampling rate.
Now my doubt here is - This rate should have nothing to do with the quantisation, right?
About bit-depth, Is the quantisation dependant on bit-depth? As in 32-bit (2^32 levels) and 64-bit (2^64 levels). Or is it something else?
and the bit-rate, is number of bits transferred in one second? If I an audio file says 320 kbps what does that really mean?
I assume the readers have got some sense on how I am panicking on where does the bit rate, and bit depth have significance?
EDIT: Also find this question if you have worked with linux OS and gstreamer framework.
Now my doubt here is - This rate should have nothing to do with the
quantisation, right?
Wrong. Sampling is a process that results in quantisation. Sampling, as the name implies, means taking samples (amplitudes) of a (usually) continuous signal (e.g audio) at regular time intervals and converting them to a different represantation thereof. In digital signal processing, this represantation is discrete (not continuous). An example of this process is a wave file (e.g recording your own voice and saving it as a wav).
About bit-depth, Is the quantisation dependant on bit-depth? As in
32-bit (2^32 levels) and 64-bit (2^64 levels). Or is it something
else?
Yes. The CD format, for example, has a bit depth of 16 (16 bits per sample). Bit depth is a part of the format of a sound (wave) file (along with the number of channels and sampling rate).
Since sound (think of a pure sine tone) has both positive and negative parts, I'd argue that you can represent (2^16 / 2) amplitude levels using 16 bits.
and the bit-rate, is number of bits transferred in one second? If I an
audio file says 320 kbps what does that really mean?
Yes. Bit rates are usually meaningful in the context of network transfers. 320 kbps == 320 000 bits per second. (for kilobit you multiply by 1000, rather than 1024)
Let's take a worked example 'Red-book' CD audio
The Bit depth is 16-bit. This is the number of bits used to represent each sample. This is intimately coupled with quantisation.
The Smaple-rate is 44.1kHz
The Frame-rate is 44.1kHz (two audio channels make up a stereo pair)
The Bit-rate is therefore 16 * 44100 * 2 = 1411200 bits/sec
There are a few twists with compressed audio streams such such as MP3 or AAC. In these, there is a non-linear relationship between bit-rate, sample-rate and bit-depth. The bit-rate is generally the maximum rate per-second and the efficiency of the codec is content dependant.

Batch amplification of PCM audio using sox

I have a large number of .PCM files (248 total) that are all encoded as:
Encoding: Signed 16-bit uncompressed PCM
Byte order: Little-endian
Channels: 2 channel (stereo)
Sample rate: 44100 Hz
8 Byte header
I need to apply a -7.5 db amplification (deamplification?) to every single one of these files.
The problem I have is that all of these tracks are looped, and I need to preserve the loop data (contained in the 8-byte header).
I've yet to see a batch audio editing problem that sox couldn't handle, so I'm hoping someone on here would know how to use sox to accomplish this, or failing that, know of a program that can do this for me.
Thanks for the help!
*Edit- A bit of research got me the exact encoding of the PCM audio I need to edit:
"The audio tracks are 44.1 kilohertz, 16-bit stereo uncompressed unsigned PCM files in little-endian order, left channel first, with a simple eight-byte header. The first four bytes spell out “MSU1” in ASCII. This is followed by a 32-bit unsigned integer used as the loop point, measured in samples (a sample being four bytes) – if the repeat bit is set in the audio state register, this value is used to determine where to seek the audio track to."
*Edit2-I've managed to develop the needed sox command, I just have no idea how to turn it into a batch. Also, turns out the files were 16-bit signed, not unsigned, PCM.
sox -t raw -e signed -b 16 -r 44100 -c 2 -L [filename].pcm -t raw -L [filename].raw vol -7.5dB
I'm fine with either a .BAT I drag and drop files onto or a .BAT that just converts every .PCM file in the folder.
Help appreciated, because I don't even know where to start looking for this one...

Audio format where silence would not affect file size

I'm looking for an audio format where a silence of a couple of hours at the beginning does not affect the overall file size. Has anyone any idea which one to use and what settings I have to use? I tried m4a, ogg and mp3 so far with no luck. An audio sample with 4 hours of silence in the beginning leads to a 400 MB file in some formats.
Of course, dealing with it programmatically would be the more sensible and SO way, something like SoX and the silence/pad effects. After all, any bit of silence is identical to any other bit of silence, trying to compress it is a bit of waste of effort.
Having said that, I was a little curious about this myself so I had a go at comparing how well the different codecs fared at compressing pure digital silence.
I created two test files. The first was a 44.1kHz 16bit 30 minutes long stereo WAVE file containing uncorrelated brown noise at -10.66 dBFS RMS. The second file was the same, except padded with 210 minutes of silence, making the total duration 240 minutes (or 4 hours). Next I encoded the files to various lossy and lossless codecs and looked at the size difference between the padded and unpadded files to gauge how efficiently the silence was encoded.
codec noise noise.silence diff ratio
wav 317.5 2540.0 2222.5 8.0
he-aac 14.6 116.5 101.9 8.0
vorbis 36.4 237.1 200.7 6.5
mp3 38.2 217.2 179.0 5.7
opus 27.0 81.6 54.6 3.0
tta 213.8 544.1 330.3 2.5
aac 54.0 131.7 77.7 2.4
wv 211.3 444.1 232.8 2.1
alac 212.5 393.7 181.2 1.9
flac 211.5 404.8 193.3 1.9
als 209.7 384.2 174.5 1.8
ofr 209.3 356.9 147.6 1.7
Codecs used:
Lossless
wav: WAVE
tta: True Audio v3.4.1
wv: WavPack v4.80.0 (wavpack -x)
alac: Apple Lossless
ofr: OptimFROG v5.100 (ofr --preset 2)
als: MPEG-4 Audio Lossless Coding v23 (mp4alsRM23 -a -b -o50)
flac: Free Lossless Audio Codec v1.3.1 (flac -8)
Lossy vbr
mp3: LAME MP3 v3.99.5 (lame -h -V2)
opus: Opus v1.1.2 (opusenc --bitrate 128 --framesize 40)
aac: Advanced Audio Codec v2.0 (afconvert -f 'm4af' -d aac -q 127 -s 3 -u vbrq 100)
vorbis: Vorbis aoTuV b5.5 (oggenc -q 5)
Lossy cbr
he-aac: High-Efficiency AAC v1 (afconvert -f 'm4af' -d aach -q 127 -s 0 -b 64000)
If you encode your audio file in .wav format, according to the "Multimedia Programming Interface and Data Specifications 1.0" at pages 56-60 you can encode, instead of the usual single "data" chunk, a "LIST" chunk of type 'wavl' alternating "data" and "slnt" chunks. For an interpretation of the obscure (and buggy) specification refer to the wikipedia page on the WAV format.
I'm not sure whether this helps, but if the size causes problems in storage or transfer, you can simply ZIP the wav and voilá! all the empty bytes disappear.
For usage you have to unpack it again though.
You might consider hacking the encoder to "pause" when it encounters more than a second or so of silence. Any of the codecs out there can be hacked to do this, though you will need to understand how they work before starting on changes like that...
Another option is to pipe the output of an MP3 encoder through a program that strips out "extra" silent frames. That might be less overall work (though you're still going to have to understand how MP3 framing & the Layer III bit reservoir work).

What exactly does bitrate mean in an video/audio file?

I use ffmpeg to convert videos from one format to another.
Is bitrate the only parameter which decides the output size of a video/audio file?
Yes, bitrate is essentially what will control the file size (for a given playback duration). It is the number of bits used to represent each second of material.
However, there are some subtleties, e.g. :
a video file encoded at a certain video bitrate probably contains a separate audio stream, with a separately-specified bitrate
most file formats will contain some metadata that won't be counted towards the basic video stream bitrate
sometimes the algorithm will not actually aim to achieve the specified bitrate - for example, using the CRF factor. http://trac.ffmpeg.org/wiki/x264EncodingGuide explains how two-pass would be preferred if targeting a specific file size.
So you may want to do a little experimenting with a particular set of options for a particular file format.
Bitrate describes the quality of an audio or video file.
For example, an MP3 audio file that is compressed at 192 Kbps will have a greater dynamic range and may sound slightly more clear than the same audio file compressed at 128 Kbps. This is because more bits are used to represent the audio data for each second of playback.
Similarly, a video file that is compressed at 3000 Kbps will look better than the same file compressed at 1000 Kbps. Just like the quality of an image is measured in resolution, the quality of an audio or video file is measured by the bitrate.

Resources