Audio format where silence would not affect file size - audio

I'm looking for an audio format where a silence of a couple of hours at the beginning does not affect the overall file size. Has anyone any idea which one to use and what settings I have to use? I tried m4a, ogg and mp3 so far with no luck. An audio sample with 4 hours of silence in the beginning leads to a 400 MB file in some formats.

Of course, dealing with it programmatically would be the more sensible and SO way, something like SoX and the silence/pad effects. After all, any bit of silence is identical to any other bit of silence, trying to compress it is a bit of waste of effort.
Having said that, I was a little curious about this myself so I had a go at comparing how well the different codecs fared at compressing pure digital silence.
I created two test files. The first was a 44.1kHz 16bit 30 minutes long stereo WAVE file containing uncorrelated brown noise at -10.66 dBFS RMS. The second file was the same, except padded with 210 minutes of silence, making the total duration 240 minutes (or 4 hours). Next I encoded the files to various lossy and lossless codecs and looked at the size difference between the padded and unpadded files to gauge how efficiently the silence was encoded.
codec noise noise.silence diff ratio
wav 317.5 2540.0 2222.5 8.0
he-aac 14.6 116.5 101.9 8.0
vorbis 36.4 237.1 200.7 6.5
mp3 38.2 217.2 179.0 5.7
opus 27.0 81.6 54.6 3.0
tta 213.8 544.1 330.3 2.5
aac 54.0 131.7 77.7 2.4
wv 211.3 444.1 232.8 2.1
alac 212.5 393.7 181.2 1.9
flac 211.5 404.8 193.3 1.9
als 209.7 384.2 174.5 1.8
ofr 209.3 356.9 147.6 1.7
Codecs used:
Lossless
wav: WAVE
tta: True Audio v3.4.1
wv: WavPack v4.80.0 (wavpack -x)
alac: Apple Lossless
ofr: OptimFROG v5.100 (ofr --preset 2)
als: MPEG-4 Audio Lossless Coding v23 (mp4alsRM23 -a -b -o50)
flac: Free Lossless Audio Codec v1.3.1 (flac -8)
Lossy vbr
mp3: LAME MP3 v3.99.5 (lame -h -V2)
opus: Opus v1.1.2 (opusenc --bitrate 128 --framesize 40)
aac: Advanced Audio Codec v2.0 (afconvert -f 'm4af' -d aac -q 127 -s 3 -u vbrq 100)
vorbis: Vorbis aoTuV b5.5 (oggenc -q 5)
Lossy cbr
he-aac: High-Efficiency AAC v1 (afconvert -f 'm4af' -d aach -q 127 -s 0 -b 64000)

If you encode your audio file in .wav format, according to the "Multimedia Programming Interface and Data Specifications 1.0" at pages 56-60 you can encode, instead of the usual single "data" chunk, a "LIST" chunk of type 'wavl' alternating "data" and "slnt" chunks. For an interpretation of the obscure (and buggy) specification refer to the wikipedia page on the WAV format.

I'm not sure whether this helps, but if the size causes problems in storage or transfer, you can simply ZIP the wav and voilá! all the empty bytes disappear.
For usage you have to unpack it again though.

You might consider hacking the encoder to "pause" when it encounters more than a second or so of silence. Any of the codecs out there can be hacked to do this, though you will need to understand how they work before starting on changes like that...
Another option is to pipe the output of an MP3 encoder through a program that strips out "extra" silent frames. That might be less overall work (though you're still going to have to understand how MP3 framing & the Layer III bit reservoir work).

Related

Using FFmpeg or Similar to Normalize audio in a video to EBU R128 standard

This is my first time here on stack overflow asking question.
I am stuck and really struggling with this. I am trying to make some of my MXF video files to be EBU r128 standard for its audio.
This means that it has to be -23 and not higher than 0.5.
My current process
Watch_folder > Encoding to MXF > Output_folder
I need to makesure when its comes to output folder, those MXF files are EBU R128 Loudness compliant.
What I have done so Far:
FFMPEG:
ffmpeg -i input.mxf -af loudnorm=I=-23:LRA=7:tp=-2:print_format=json -f null -
got the result:
Input Integrated: -15.1 LUFS
Input True Peak: +0.0 dBTP
Input LRA: 17.1 LU
Input Threshold: -26.2 LUFS
Output Integrated: -17.1 LUFS
Output True Peak: -1.5 dBTP
Output LRA: 5.3 LU
Output Threshold: -27.6 LUFS
Normalization Type: Dynamic
Target Offset: +1.1 LU
then i did
ffmpeg -i input.mxf -af loudnorm=I=-23:LRA=7:tp=-2:measured_I=-15.1:measured_LRA=17.1:measured_tp=0:measured_thresh=-27.6:offset=1.1 -ar 48k -y output.mxf
However, when i put it through the software Eff, it says that its not EBU compliant.
*EDIT:
This also reduces the quality. for example; my 6 Gb becomes 250 MB and you can tell the quality downgraded
ffmpeg-normalize
I did the following
ffmpeg-normalize input.mxf -c:a pcm_s32le -ar 48000 -o output.mxf
but this gives me errors.
if i do it without the output file type, i get a mkv which will not work for me. i need it to be mxf.
OK, a few issues here.
Firstly, if your file is measured at -26.2 LUFS, you'd need to add 3.2 dB to get it to -23. But you can't do that, because your true peak is too high (you'd be over full scale). You'll need to compress (dynamic audio compression, not file/rate compression) the audio or use at least a limiter to achieve this.
A good R128 audio track should be mixed properly rather than just run through a normaliser, otherwise you risk it either failing the standard or unwanted audio effects.
If you don't have access to audio editing software or someone who can do this for you, then FFMPEG does include an audio limiter, which will give you enough headroom to raise the level to -23 LUFS.
You can do that with something like this:
-filter_complex alimiter=level_in=1:level_out=1:limit=1.5:attack=7:release=100:level=disabled
However, tuning a limiter well depends on what the video file is of (music, speech, etc) and it is something that's worth taking some time over. Alter the attack and release values until you get the result you want.
Secondly, the reason that FFMPEG has produced a smaller file of lower quality is because you didn't specify anything in the video section. FFMPEG's default action with video is (usually) to encode to h264, so whatever your codec here is (I am assuming DNxHD from the fact that you're using an MXF wrapper) needs to be specified. FFMPEG will copy the video stream though and leave it alone if you include the option -c:v copy (which means copy video codec, basically).
Post your results once you have tried these...!

Remove audio streams from a .m2ts video file

I have a video which has 3 audio streams in the file. The first one is English and the other ones are in different languages. How can I get rid of these audio streams without losing the quality of the video and the English stream.
I think ffmpeg should be used but I don't know how to do it.
Video
Bit rate mode: Variable
Overall bit rate: 38.6 Mb/s
Chroma subsampling: 4:2:0
Audio
Format: DTS-HD
Compression mode: Lossless

mkv file out of sync with linear drift

I have a bunch of mkv files, with FLAC as the audio codec and FFV1 as the video one.
The files were created using an EasyCap aquisition dongle from a VCR analog source. Specifically, I used VLC's "open acquisition device" prompt and selected PAL. Then, I converted the files (audio PCM, video raw YUV) to (FLAC, FFV1) using
ffmpeg.exe -i input.avi -acodec flac -vcodec ffv1 -level 3 -threads 4 -coder 1 -context 1 -g 1 -slices 24 -slicecrc 1 output.mkv
Now, the files are progressively out of sync. It may be due to the fact that while (maybe) the video has a constant framerate, the FLAC track has variable framerate. So, is there a way to sync the track to audio, or something alike? Can FFmpeg do this? Thanks
EDIT
On Mulvya hint, I plotted the difference in sync at various times; the first column shows the seconds elapsed, the second shows the difference - in secs. The plot seems to behave linearly, with 0.0078 as a constant slope. NOTE: measurements taken by hands, by means of a chronometer
EDIT 2
Playing around with VirtualDub, I found that changing the framerate to 25 fps from the original 24.889 (Video->Frame rate...->Change frame rate to) and using the track converted to wav definitely does work. Two problems, though: VirtualDub crashes when importing the original FFV1-FLAC mkv file, so I had to convert the video to H264 to try it out; more, I find it difficult to use an external encoder to save VirtualDub output.
So, could I avoid using VirtualDub, and simply use ffmpeg for it? Here's the exported vdscript:
VirtualDub.audio.SetSource("E:\\4_track2.wav", "");
VirtualDub.audio.SetMode(0);
VirtualDub.audio.SetInterleave(1,500,1,0,0);
VirtualDub.audio.SetClipMode(1,1);
VirtualDub.audio.SetEditMode(1);
VirtualDub.audio.SetConversion(0,0,0,0,0);
VirtualDub.audio.SetVolume();
VirtualDub.audio.SetCompression();
VirtualDub.audio.EnableFilterGraph(0);
VirtualDub.video.SetInputFormat(0);
VirtualDub.video.SetOutputFormat(7);
VirtualDub.video.SetMode(3);
VirtualDub.video.SetSmartRendering(0);
VirtualDub.video.SetPreserveEmptyFrames(0);
VirtualDub.video.SetFrameRate2(25,1,1);
VirtualDub.video.SetIVTC(0, 0, 0, 0);
VirtualDub.video.SetCompression();
VirtualDub.video.filters.Clear();
VirtualDub.audio.filters.Clear();
The first line imports the wav-converted audio track.
Can I set an equivalent pipe in ffmpeg (possibly, using FLAC - not wav)? SetFrameRate2 is maybe the key, here.

sox - how to create mp3 file with bit rate 16kbps

Currently the command used is
`sox input.wav -G -t mp3 -r 16k test.mp3`
But this is creating a file with bit rate 24.0 kbps.
How to make the bit rate of the out put file to 16.0 kbps?
In the sox formats manual you find, that it is the -C option. Below I quote the whole section because you may find it interesting.
However, if I call sox test.wav -C 16.01 test.mp3 my testfile (48kHz/16bit) is converted to 32kbps. If I call lame test.wav -b 16 -q 0 test.mp3 I get 16kbps but test.mp3 is converted to a samplerate of 8kHz. But if I really want to keep my 48kHz with lame test.wav -b 16 -q 0 --resample 48000 test.mp3 I also get 32kbps. So we see, there is a compromise between a high samplerate and a high compression ratio.
MP3 compressed audio; MP3 (MPEG Layer 3) is a part of the patent-encumbered MPEG standards for audio and video compression. It is a lossy compression format that achieves good compression rates with little quality loss.
Because MP3 is patented, SoX cannot be distributed with MP3 support without incurring the patent holder’s fees. Users who require SoX with MP3 support must currently compile and build SoX with the MP3 libraries (LAME & MAD) from source code, or, in some cases, obtain pre-built dynamically loadable libraries.
When reading MP3 files, up to 28 bits of precision is stored although only 16 bits is reported to user. This is to allow default behavior of writing 16 bit output files. A user can specify a higher precision for the output file to prevent lossing this extra information. MP3 output files will use up to 24 bits of precision while encoding.
MP3 compression parameters can be selected using SoX’s −C option as follows (note that the current syntax is subject to change):
The primary parameter to the LAME encoder is the bit rate. If the value of the −C value is a positive integer, it’s taken as the bitrate in kbps (e.g. if you specify 128, it uses 128 kbps).
The second most important parameter is probably "quality" (really performance), which allows balancing encoding speed vs. quality. In LAME, 0 specifies highest quality but is very slow, while 9 selects poor quality, but is fast. (5 is the default and 2 is recommended as a good trade-off for high quality encodes.)
Because the −C value is a float, the fractional part is used to select quality. 128.2 selects 128 kbps encoding with a quality of 2. There is one problem with this approach. We need 128 to specify 128 kbps encoding with default quality, so 0 means use default. Instead of 0 you have to use .01 (or .99) to specify the highest quality (128.01 or 128.99).
LAME uses bitrate to specify a constant bitrate, but higher quality can be achieved using Variable Bit Rate (VBR). VBR quality (really size) is selected using a number from 0 to 9. Use a value of 0 for high quality, larger files, and 9 for smaller files of lower quality. 4 is the default.
In order to squeeze the selection of VBR into the the −C value float we use negative numbers to select VRR. -4.2 would select default VBR encoding (size) with high quality (speed). One special case is 0, which is a valid VBR encoding parameter but not a valid bitrate. Compression value of 0 is always treated as a high quality vbr, as a result both -0.2 and 0.2 are treated as highest quality VBR (size) and high quality (speed).

How can I validate a video file from a script?

I have a server with lots of video files. After a restore, I noticed that the checksum of a couple of files changed. Since I don't have checksums for all files, I wanted write a script to verify the file integrity. It's simple for archives (tar t, unzip -t, rar t, etc) or images (convert image.jpg /tmp/test.png).
Which options do I need to pass to mplayer or vlc or any other video tool on Linux to achieve the same effect (i.e. validate the file contents without having to watch the whole video)?
It sounds like what you want to do is:
mplayer -vo null -ao null input.file
and then parse the output and return value to see if it could actually play & decode the stream. This will take some time (but be faster than realtime). If you want something even faster, here are some more suggestions:
One easy thing is going to be to do an
mplayer -identify -vo null -ao null
on the file, and then parse the output and look at the return value for something that looks reasonable.
With respect to the checksums being incorrect, it's going to be hard to know if this is an issue for your media player or not (mplayer, vlc, totem, etc.). A good media player will tolerate many bit or byte level errors with little impact on the resulting playback. A very strict media player will exit when it sees malformed or incorrect codec & wrapper bytes.
To verify the wrapper (container) bytes, you could do something like
mencoder -ovc copy -oac copy input.file -o output.file
The problem is that mencoder will want to create an .avi file for output. If your inputs are .avi, then this will work great.
You can run a similar ffmpeg commandline, like this:
ffmpeg -acodec copy -vcodec copy input.file output.file
If the files are .mp4 files, you might want to take a look at mp4box ( http://www.videohelp.com/tools/mp4box ) for doing a similar task. The matroska tools are also good for this kind of thing. ( http://www.matroska.org/ )
If you are working with MP4 files you may want to have a look at the mpeg4ip project, specifically the tools like mp4videoinfo or mp4info. This may be enough to meet your needs, and is very quick.
From the front page:
mp4dump Utility to dump MP4 file meta-information in text form
mp4trackdump Utility to dump MP4 file track information in text form
mp4info Utility to display MP4 file summary
mp4videoinfo Utility to dump information about MP4 file video tracks
avidump Utility to display AVI file summary
yuvdump Utility to display a raw video file on the screen
mpeg_ps_info Utility to display streams in an mpeg program stream or vob file
mpeg_ps_extract Utility to extract elementary streams in an mpeg program stream or vob file
Here is some sample output of a MP4 taken on my Nokia N95:
manoa:Movies stu$ mp4info 20081017001.mp4
mp4info version 1.5.0.1
20081017001.mp4:
Track Type Info
1 video MPEG-4 Unknown Profile(4), 3.620 secs, 2700 kbps, 640x480 # 23.480663 fps
2 audio MPEG-4 AAC LC, 3.797 secs, 97 kbps, 48000 Hz
manoa:Movies stu$
manoa:Movies stu$
manoa:Movies stu$ mp4videoinfo 20081017001.mp4
mp4videoinfo version 1.5.0.1
tracks 1
mp4file 20081017001.mp4, track 1, samples 85, timescale 30000
sampleId 1, size 24110 time 0(0) VOP-I
sampleId 2, size 9306 time 4076(135) VOP-P
sampleId 3, size 13071 time 5104(170) VOP-P
... (a bunch more frames and a bit of info) ...
sampleId 59, size 8702 time 64975(2165) VOP-P
sampleId 60, size 8826 time 65980(2199) VOP-P
sampleId 61, size 9819 time 66966(2232) GOV VOP-I
sampleId 62, size 5591 time 67986(2266) VOP-P
... (a bunch more frames and a bit of info) ...
sampleId 83, size 10188 time 105546(3518) VOP-P
sampleId 84, size 6533 time 106585(3552) VOP-P
sampleId 85, size 6032 time 107601(3586) VOP-P
manoa:Movies stu$
Short of watching all the videos, there's no "perfect" way to do this.
Video files are quite robust - as an experiment I took a random MPEG-4 video file, opened it in a hex-editor and started changing bytes.. mplayer and Quicktime still played it back without errors.
I had to delete thousands of bytes before getting any error from mplayer:
...
[mpeg4 # 0x6762b0]marker does not match f_code
[mpeg4 # 0x6762b0]marker does not match f_code
[mpeg4 # 0x6762b0]concealing 852 DC, 852 AC, 852 MV errors
[mpeg4 # 0x6762b0]header damaged: 0.055 16/ 16 15% 1% 3.5% 0 0
Error while decoding frame!
It wouldn't be difficult to write a script that runs mplayer on each video, and checks the output for error messages/warnings, but unless the changed bytes are in the file header, or a lot of data was changed, you'll never find them all
As mplayer has options to convert from one video format to another that might be good enough for such a test assuming mencoder returns an error if it can not decode the input file (I have not tested that). This would work similar to the image test you mentioned (convert image.jpg /tmp/test.png)
I recommend you use sha1sum, a command line tool that you probably already have (and if not, you probably also have md5sum, which would be fine for this job)... All you need to do is compare the stdout of sha1sum before and after the restore...

Resources