I'm making an mp3 from a flac file with ffmpeg. This is usually hum-dum for me.
Tonight, for some reason, the converted audio is distorting when I use the same commands I've always used. After troubleshooting, it appears the problem is the "-out_sample_rate" flag.
My command:
ffmpeg -i input.flac -write_id3v1 1 -id3v2_version 3 -dither_method modified_e_weighted -out_sample_rate 44.1k -b:a 320k output.mp3
The audio in the mp3 is then incredibly distorted by a jacked gain resulting in digital clipping.
I've tried updating ffmpeg, and then problem remains. I've tried converting various sample rates (44.1k source files, 48k source files, 96k source files) to both 44.1k and 48k mp3s, problem remains whenever there's a conversion.
I'm on macOS, and I installed ffmpeg via homebrew.
Any ideas?
Are you sure the distortion comes from resampling?
Even the poorest resampling algorithm doesn't distort. More typical artifacts from poor resampling are harsh high frequencies due to aliasing and quantisation noise.
FFmpeg's resampler isn't the very best but it isn't bad at all. It shouldn't lead to distortion at all. Enough for average use.
How much headroom does the source file have?
If not enough, the resampling or the MP3 conversion may lead to clipping. The MP3 encoder removes frequencies from the signal (even at 320kbps) so the waveform will alter.
So reimport the encoded MP3 into an audio editor and look for clipping.
If not sure, which step the distortion comes from, split the command up and look, which step leads to clipping:
ffmpeg -i input.flac -write_id3v1 1 -id3v2_version 3 -dither_method modified_e_weighted -out_sample_rate 44.1k intermediate.flac
ffmpeg -i intermediate.flac -b:a 320k output.mp3
There should be a headroom of at least 1dB left before it gets converted to MP3. If not, lower the gain before.
If the resampling of the intermediate.flac leads to a significant gain in amplitude, the original input.flac is poorly mastered. If so (and the quality is really important), do the SR conversion in an audio editor (i.e. Audacity, it does a better resampling job than FFMpeg) and apply a limiter between the resampling and dithering step to lower the few strong peaks nicely.
If that doesn't help: What exactly does input.flac contain? Music? Noise? Speech? and is it selfmade or taken from something else?
Related
Can you provide a link, or an explanation, to the -q:v 1 argument that deals with video/image quality, and compression, in ffmpeg.
Let me explain...
for f in *
do
extension="${f##*.}"
filename="${f%.*}"
ffmpeg -i "$f" -q:v 1 "$filename"_lq."$extension"
rm -f "$f"
done
The ffmpeg for loop above compresses all images and videos in your working directory, it basically lowers the quality which results in smaller file sizes (the desired outcome).
I'm most interested in the -q:v 1 argument of this for loop. The 1 in the -q:v 1 argument is what controls the amount of compression. But I can't find any documentation describing how to change this value of 1, and describing what it does. Is it a percentage? Multiplier? How do I adjust this knob? Can/should I use negative values? Integers only? Min/max values? etc.
I started with the official documentation but the best I could find was a section on video quality, and the -q flag description is sparse.
-frames[:stream_specifier] framecount (output,per-stream)
Stop writing to the stream after framecount frames.
.
-q[:stream_specifier] q (output,per-stream)
-qscale[:stream_specifier] q (output,per-stream)
Use fixed quality scale (VBR). The meaning of q/qscale is codec-dependent. If qscale is used without a stream_specifier then it applies only to the video stream, this is to maintain compatibility with previous behavior and as specifying the same codec specific value to 2 different codecs that is audio and video generally is not what is intended when no stream_specifier is used.
-q:v is probably being ignored
You are outputting MP4, so it is most likely that you are using the encoder libx264 which outputs H.264 video.
-q:v / -qscale:v is ignored by libx264.
The console output even provides a warning about this: -qscale is ignored, -crf is recommended.
For more info on -crf see FFmpeg Wiki: H.264.
When can I use -q:v?
The MPEG* encoders (mpeg4, mpeg2video, mpeg1video, mjpeg, libxvid, msmpeg4) can use -q:v / -qscale:v.
See How can I extract a good quality JPEG image from a video file with ffmpeg? for more info on this option.
This option is an alias for -qscale:v which might be why you didn't encounter it during your research (eventhough my resultat came first with "ffmpeg q:v" on google).
This link explains how the qscale option is not a multiplier or a percentage, it's a bitrate mode (so it's to bitrate). For a given encoder, the lower this number the higher the bitrate and quality. It usually spans from 1-31 but some encoders can accept a subset of this range.
I want to make a sound that is too high to be detected by the human ear. From my understanding, humans can hear sounds between 20hz and 44000hz.
With sox, I am making a sound that is 50000hz. The problem is I can still hear it. The command I am using is this:
sox -n -r 50000 output.wav rate -L -s 50050 synth 3 sine
Either I have super good hearing or I am doing something wrong. How can I make this sound undetectable with SOX of FFMPEG?
Human hearing is generally considered to range between 20Hz and 20kHz, although most people don't hear much above 16kHz. Digital signals can only represent frequencies up to half of their sampling rate, known as the Nyquist frequency, and so, in order to accurately reproduce audio for the human ear, a sampling rate of at least 40kHz is needed. In practice, a sampling rate of 44.1kHz or 48kHz is almost always used, leaving plenty of space for an inaudable sound somewhere in the 20-22kHz range.
For example, this command generates a WAV file with a sampling rate of 48kHz containing a sine wave at 22kHz that is completely inaudible to me:
sox -n -r 48000 output.wav synth 3 sine 22000
I think part of your problem was that you were using the wrong syntax to specify the pitch to sox. This question has some good information about using SoX to generate simple tones.
I am converting audio files of several different formats to mp3 using SoX. According to the docs, you can use the -C argument to specify compression options like the bitrate and quality, the quality being after the decimal point, for example:
sox input.wav -C 128.01 output.mp3 (highest quality, slower)
sox input.wav -C 128.99 output.mp3 (lowest quality, faster)
I expected the second one to sound terrible, however, the audio quality between the two sounds exactly the same. If that is the case, I do not understand why one performs so much slower or what I would gain by setting the compression to higher "quality".
Can someone please tell me if there is a real difference or advantage to using higher quality compression versus lower quality?
P.S. I also checked the file size of each output file and both are exactly the same size. But when hashed, each file comes out with a different hash.
The parameters are passed on to LAME. According to the LAME documentation (section “algorithm quality selection”/-q), the quality value has an effect on noise shaping and the psychoacoustic model used. They recommend a quality of 2 (i.e. -C 128.2 in SoX), saying that 0 and 1 are much slower, but hardly better.
However, the main factor determining the quality remains the bit rate. It is therefore not too surprising that there is no noticeable difference in your case.
For me faster with simple
time sox input.mp3 -C 128 output.mp3
real 0m7.417s user 0m7.334s sys 0m0.057s
time sox input.mp3 -C 128.02 output.mp3
real 0m39.805s user 0m39.430s sys 0m0.205s
Is there a case where a video file could contain both mjpeg frames and a sound layer? I know originally, people used to place a 8khz PCM uncompressed track along with their mjpeg movie since it is streamed/decoded/played frame by frame with no motion prediction needed. Can some decoder accept an Mjpeg with a more recent audio format?
[EDIT 1]
What I'll first try is to check if ffmpeg handles the conversion of Audio/Video movies to MJpeg with audio, and I'll explore the header and the layers with an hex editor.
[EDIT 2]
OK. I've studied a Mjpeg with audio:
ffmpeg -i some_movie_with_music.mp4 -f avi -acodec mp3 -vcodec mjpeg mjpegWithSound.avi
And there's an MP3 file splitted into the total number of frames under each jpeg plus some changes in the header. So it's easy to implement in a context where a mobile application would offer to the user the opportunity to add an MP3 files to a serie of jpeg or to a movie. So, one more reason to use Mjpeg when a platform has no encoder yet.
It's fun to watch your application take shape. :-) I'm going to assume this is a follow-on to your last question and that you want to write C# code to accomplish this task. Are you still writing this into an AVI container? AVI stands for "audio/video interleaved" and is designed to transport both audio and video.
So, yes, you should be able to write both MJPEG and audio into an AVI file.
Guess what! You have lots of options for audio codecs too. We haven't cataloged quite as many audio codecs as video codecs (but close). Good news, though: Implementing a basic audio encoder in pure C# should be much simpler than trying to port even an MPEG-1 video encoder. Alternatively, check around to see if you can find an MP3 encoder written in pure C#. AVI accommodates MP3. If not, try IMA ADPCM. It's easy to implement and gives you 4:1 compression. Thus, if you have a monophonic, 44100 Hz, 16-bit stream, that requires 88200 bytes/sec. IMA ADPCM will give you roughly 22050 bytes/sec (plus small overhead).
When merging an audio and video with ffmpeg, the quality of the resultant video goes down. How to we improve the quality(esp. the audio quality)?
The audio quality gets degraded to quite an extent with random screeches in between.
I've tried converting the audio to mp3 and video to mp4 before merging them, tried various audio-video parameters like bitrate, sample-rate, qscale etc but still unsuccessful.
Any help would be greatly appreciated!
The -acodec copy command line option should just copy the audio stream and not re-encode it. For the video stream -vcodec copy. Also, you can used the -sameq option for the video stream.
See this answer for a little more detail.
i try using the -sameq, yeah it gives a good quality of the video but what I notice was the file size increases.
Using -q:v 1 applies the best possible resolution. Number between 1 and 35. One is the best quality.