Hi is it possible to turn off dithering in ffmpeg while doing audio format conversion. I came across a switch called dither_scale, which can be set to 0 while resampling. But I don't know how to turn off dithering while converting to different file format (16 bit PCM)
Related
I have used FFMPEG to extract decibel (or rms? I am not familiar with the units) values of the audio volume from an mp4. I have 20 samples per frame.
How can I use these values (which are negative in almost all frames), to determine if the frame is silent or has audio (music, speech, etc)?
I'm currently developing new file format, it's a video with custom color representation. Every color is a single byte; there's a constant RGBA colors array, every byte of the frame is the index of color in this array. Therefore every pixel in a single byte.
So I'm looking for a way to compress videos with such format. My first idea was is create this video format myself (which unfortunately failed), second idea is H.264, but I don't know if there's any way to use H.264 this way. So is there? Or maybe there's another way to compress such video data? (except gzip, lzma, bzip2, 7zip and so on)
Please, don't close this question. I'll add all asked details if needed.
The best I can suggest for such idea would be to encode it in 4:0:0 (mono) colorspace in lossless mode in H.264. For x264 this would mean options: --input-csp i400 --output-csp i400 --qp 0. But I doubt motion compensation would be good in such palette colorspace.
I currently have the idea to code a small audio converter (e.g. FLAC to MP3 or m4a format) application in C# or Python but my problem is I do not know at all how audio conversion works.
After a research, I heard about Analog-to-digital / Digital-to-analog converter but I guess it would be a Digital-to-digital or something like that isn't it ?
If someone could precisely explain how it works, it would be greatly appreciated.
Thanks.
digital audio is called PCM which is the raw audio format fundamental to any audio processing system ... its uncompressed ... just a series of integers representing the height of the audio curve for each sample of the curve (the Y axis where time is the X axis along this curve)
... this PCM audio can be compressed using some codec then bundled inside a container often together with video or meta data channels ... so to convert audio from A to B you would first need to understand the container spec as well as the compressed audio codec so you can decompress audio A into PCM format ... then do the reverse ... compress the PCM into codec of B then bundle it into the container of B
Before venturing further into this I suggest you master the art of WAVE audio files ... beauty of WAVE is that its just a 44 byte header followed by the uncompressed integers of the audio curve ... write some code to read a WAVE file then parse the header (identify bit depth, sample rate, channel count, endianness) to enable you to iterate across each audio sample for each channel ... prove that its working by sending your bytes into an output WAVE file ... diff input WAVE against output WAVE as they should be identical ... once mastered you are ready to venture into your above stated goal ... do not skip over groking notion of interleaving stereo audio as well as spreading out a single audio sample which has a bit depth of 16 bits across two bytes of storage and the reverse namely stitching together multiple bytes into a single integer with a bit depth of 16, 24 or even 32 bits while keeping endianness squared away ... this may sound scary at first however all necessary details are on the net as its how I taught myself this level of detail
modern audio compression algorithms leverage knowledge of how people perceive sound to discard information which is indiscernible ( lossy ) as opposed to lossless algorithms which retain all the informational load of the source ... opus (http://opus-codec.org/) is a current favorite codec untainted by patents and is open source
I have received a request to encode DPX files to MOV/MJPEG rather than MOV/H.264 (which ffmpeg picks by default if you convert to output.mov). These is to review compositing renders (in motion), so color accuracy is critical.
Comparing a sample "ideal" MOV to the current (H.264) output I can see:
resolution: the same
ColorSpace/Primaries: Rec609 (SD) versus Rec709 (HD)
YUV: 4:2:0 versus 4:4:4
filesize: smaller
The ffmpeg default seems to be better quality and result in a smaller filesize. Is there something I'm missing?
Maybe it's because MJPEG frames are independent of each other, so any snippet of video can be decoded / copied in isolation. With an inter-frame compression algorithm like H.264, the software has to scan data for potentially numerous frames to reconstruct any given one.
I want to identify areas in a .mp4 (H264 + AAC) video that are silent and unchanged frames and cut them out.
Of course there would be some fine-tuning regarding thresholds and algorithms to measure unchanged frames.
My problem is more general, regarding how I would go about automating this?
Is it possible to solve this with ffmpeg? (preferably with C or python)
How can I programatically analyse the audio?
How can I programatically analyse video frames?
For audio silence see this.
For still video scenes ffmpeg might not be the ideal tool.
You could use scene change detection with a low threshold to find the specific frames, then extract those frames and compare them with something like imagemagick's compare function:
ffprobe -show_frames -print_format compact -f lavfi "movie=test.mp4,select=gt(scene\,.1)"
compare -metric RMSE frame1.png frame0.png
I don't expect this to work very well.
Your best bet is to use something like OpenCV to find differences between frames.
OpenCV Simple Motion Detection