Making Sound To High To Hear Or Undetecable with Sox/FFMPEG

Making Sound To High To Hear Or Undetecable with Sox/FFMPEG - audio

I want to make a sound that is too high to be detected by the human ear. From my understanding, humans can hear sounds between 20hz and 44000hz.
With sox, I am making a sound that is 50000hz. The problem is I can still hear it. The command I am using is this:
sox -n -r 50000 output.wav rate -L -s 50050 synth 3 sine
Either I have super good hearing or I am doing something wrong. How can I make this sound undetectable with SOX of FFMPEG?

Human hearing is generally considered to range between 20Hz and 20kHz, although most people don't hear much above 16kHz. Digital signals can only represent frequencies up to half of their sampling rate, known as the Nyquist frequency, and so, in order to accurately reproduce audio for the human ear, a sampling rate of at least 40kHz is needed. In practice, a sampling rate of 44.1kHz or 48kHz is almost always used, leaving plenty of space for an inaudable sound somewhere in the 20-22kHz range.
For example, this command generates a WAV file with a sampling rate of 48kHz containing a sine wave at 22kHz that is completely inaudible to me:
sox -n -r 48000 output.wav synth 3 sine 22000
I think part of your problem was that you were using the wrong syntax to specify the pitch to sox. This question has some good information about using SoX to generate simple tones.

Related

Audio clip sounds different after importing to AE (audio is a hearing test 20khz to 20hz)

I'm trying to create a quick video in After Effects on audible frequencies.
I'm using this audio clip (20 seconds). The clip starts from 20khz and goes down to 20hz.
As you can see (or hear), we can't hear anything in the beginning at 20khz. Most of us start hearing the frequency at 16khz-15khz.
But when I import this audio clip into AE, it sounds completely different. The sound starts playing from the beginning and it's very loud and sounds nothing like the clip I downloaded.
Here's how the audio clip sounds after export: https://www.mboxdrive.com/soundonly.mp3
What's going on here and how do I fix it?

Two things here.
First off, the sound is exhibiting "aliasing" artifacts. This happens when there it pitch content to the sound that is higher in frequency than the Nyquist frequency for the given sample rate. So, either you are using a low sample rate, or the pitch you are generating has harmonics that are causing the aliasing.
Check to make sure you are using a sine wave (if done correctly there should be no additional harmonic content to the tone besides the fundamental pitch), and that your sample rate is above 40K in order to play a 20K sound without aliasing. The most common sampling rate these days is 44100 fps, but you may be using 8000 fps, which is also still employed and would not work for your application.
Second point, you probably want to change the rate at which you travel through the range of pitches. It sounds like you are going along linearly, but the ear hears things exponentially. The difference in pitch from 100 to 200, for example, is the same (in terms of our perception) as the difference from 1000 to 2000. So you might want to make your rate of descent reflect this if the goal is to spend equal time at every perceived pitch level.

FFMPEG distorting when resampling audio

I'm making an mp3 from a flac file with ffmpeg. This is usually hum-dum for me.
Tonight, for some reason, the converted audio is distorting when I use the same commands I've always used. After troubleshooting, it appears the problem is the "-out_sample_rate" flag.
My command:
ffmpeg -i input.flac -write_id3v1 1 -id3v2_version 3 -dither_method modified_e_weighted -out_sample_rate 44.1k -b:a 320k output.mp3
The audio in the mp3 is then incredibly distorted by a jacked gain resulting in digital clipping.
I've tried updating ffmpeg, and then problem remains. I've tried converting various sample rates (44.1k source files, 48k source files, 96k source files) to both 44.1k and 48k mp3s, problem remains whenever there's a conversion.
I'm on macOS, and I installed ffmpeg via homebrew.
Any ideas?

Are you sure the distortion comes from resampling?
Even the poorest resampling algorithm doesn't distort. More typical artifacts from poor resampling are harsh high frequencies due to aliasing and quantisation noise.
FFmpeg's resampler isn't the very best but it isn't bad at all. It shouldn't lead to distortion at all. Enough for average use.
How much headroom does the source file have?
If not enough, the resampling or the MP3 conversion may lead to clipping. The MP3 encoder removes frequencies from the signal (even at 320kbps) so the waveform will alter.
So reimport the encoded MP3 into an audio editor and look for clipping.
If not sure, which step the distortion comes from, split the command up and look, which step leads to clipping:
ffmpeg -i input.flac -write_id3v1 1 -id3v2_version 3 -dither_method modified_e_weighted -out_sample_rate 44.1k intermediate.flac
ffmpeg -i intermediate.flac -b:a 320k output.mp3
There should be a headroom of at least 1dB left before it gets converted to MP3. If not, lower the gain before.
If the resampling of the intermediate.flac leads to a significant gain in amplitude, the original input.flac is poorly mastered. If so (and the quality is really important), do the SR conversion in an audio editor (i.e. Audacity, it does a better resampling job than FFMpeg) and apply a limiter between the resampling and dithering step to lower the few strong peaks nicely.
If that doesn't help: What exactly does input.flac contain? Music? Noise? Speech? and is it selfmade or taken from something else?

How to recognize if an audio sample has been compressed and then decompressed?

Some years ago I made a music audio recording, and I can't find the original WAV files, I have only compressed MP3s. Now I found an audio CD, but I don't know if it was made using the original, uncompressed WAVs, or if it was made from compressed MP3 or OGG files.
Is there a way how to detect if an audio sample has been compressed and decompressed using a lossy compression such as MP, OGG, ..., without having the original to compare to?
Update:
Trying #MisterHenson's suggestion, I plotted the spectra of the two samples, with obvious differences in the graphs:
The sample from the CD:
The sample from the MP3:
This practically solves solves my current problem, but still I have these open questions:
If the spectra were visually indistinguishable, I wouldn't know if there is a real difference, or that I just can't distinguish them (i.e. the compression would be of better quality). What else could I try?
Similarly what would I do if I didn't have the MP3 file to compare to, just a single audio sample?
Is there an automated method, that'd answer the question with a reasonable probability?

I made an example to stress the topology of all MP3 transcodes, the source material being a Chopin nocturne. MP3 on top, Lossless on bottom. All recordings have background noise of some amplitude, and that noise is faintly visible here. What the MP3 transcode (Lame's V2 preset in this case) does is create a hard limit at ~16kHz. On a 320kbps bitrate 44.1kHz sample rate MP3, this hard limit appears at around 20kHz, but it would still be visibly different in this image.
You can pick out this shelf without having the original lossless file for comparison. I'm willing to say all music has amplitude at frequencies above even 19kHz. Here's an example for which I do not have the lossless source file, just a 320kbps MP3. You can see the very hard limit at 20kHz as well as a milder cutoff at 19kHz. Were it lossless, that red blob in the middle would extend all the way up to 22kHz since the sample rate is 44.1kHz.
I would say this process is probably automatable, but I do not know of any attempts to automate it. If this were automated, though, I'd say it could pick Lossy from Lossless with much higher accuracy than you or I, by virtue of it being able to analyze the entire spectrum as opposed to just the high frequency cutoffs.
Full res images:
http://i.imgur.com/dezONol.jpg
http://i.imgur.com/1qokxAN.jpg

The above approaches sound very promising although maybe a little complicated -- you might first try something easy, like check the distribution of the least significant bit. In a natural sample, LSB should be an almost exact 50/50 distribution between zeroes and ones (actually across many samples would have some variance following a binomial distribution but with millions or billions of bits this will be ridiculously close to 50/50 in any given sample). In a lossy sample, you will find an unlikely distribution in the LSB.
Something like this:
1 -- extract LSB from each data point
2 -- apply chi-squared test to judge if distribution is unusual

Here is the deal.
A raw sample (or a raw piece of sound) is encoded in a certain quality.
Some sound cards can go further with 64bit sampling.
But let's assume that we have sound files of a certain KNOWN quality.
CD quality is okay for the human ear.
A studio, would make use of more quality samples though. Like 24bit as a standard.
So you got a waveform filename.wav that really has a sample rate 44100 Hz.
What does that mean?
It means the computer can take a huge amount of different samples per second to represent almost the exact sound.
Is the sound original? Depends on how it was made.
If it was made by your computer and a piece of software using a 16bit default sound card yes it is.
If it was from an analogue recording though, it loses some of its quality on the digitization at 44100 Hz fortunately not so significant for the human ear.
NOTE THAT mp3 recordings is a bad idea for professional recording.
But since mp3 recording do exist... this adds complexity to your question. :P
So some sound quality is lost on digitization with a 16bit sound card.
Now similar thing can happen when you encode something to mp3.
Check out your picture. Above 17000 there is no sound. It was butchered to make the sound file significant smaller, without making any significant damage to the audio quality. Is it the same piece of sound? No. It sounds the same though. But a sound engineer LOVES original and good quality samples, because of the information that is NOT cut.
Imagine me, making an original sound, so balanced and compressed that even after an mp3 converting it is hard to tell if it is original sound or not. Imagine me using equalizers to cut any sharp edges, and gate effects to extremely normalize it. Also, my sound generators are some 8bit oscillators passing through some fx and filters.
If I convert it back to wavetable, there might be no difference.
For instance:
[UNCHANGED FREQUENCIES][CUT FREQUENCIES]
Waveform: =================================
mp3: =======================
Waveform: =======================
Waveform:
[UNCHANGED FREQUENCIES][CUT FREQUENCIES]
Waveform: =================
mp3 =================
Waveform: =================
The following seems impossible to me (except if the converter has bugs thing that can be heard)
[UNCHANGED FREQUENCIES][CUT FREQUENCIES]
Waveform: =========================
mp3 =======================
Waveform: =============================
So your question depends on the original source you used in the first waveform.
Good news is that a sample is RARELY THAT limited and compressed.
So it seems to me that the CD you used will probably sound like original waveform,
while as you can see, the mp3 has cut out frequencies.
To be sure of course you need a frequency analyzer and spectrum as MischaNix already has shown.
There are many mp3 encodings too. Some are static, some dynamic, some cut more and some cut less sound information. Some are also bigger than others for that reason.
Now there are lossless formats too.
And then there is ogg that is small enough and also has great quality.
So this question can become a huge topic for no reason here. I will not talk about all these.
If the issue is giving an original sample, your pictures show me significant differences between the two samples. I mean, making a waveform out of the mp3 cut variation, should look like that cut variation. You can not get information out of nothing.
Burn the mp3 on a cd, then get the wave, compare the new waveform with the old and the mp3 waveform. It will probably not be the same thing so you might hit the jackpot here. It is possible you got an original backup on your hands.
From now on though, try sampling raw material and store them in a CD or DVD before discarding them.
Or at least keep good uncompressed samples in a backup.
Open questions:
If the spectra were visually indistinguishable, I wouldn't know if there is a real difference, or that I just can't distinguish them.
Correct. But this would occur seldom without intention on sampling.
Why asking such a question? :) Do you have steganography in mind?
If yes, make sure to keep in mind the nature of the piece of sound you are gonna use. Samples are not appropriate. "Finished songs" are!
Similarly what would I do if I didn't have the MP3 file to compare to, just a single audio sample?
Since there are many mp3 encoding settings of different qualities, you can check if the lowest quality was used. If not there is uncertainty because of the compression capabilities. If this applies to the whole sample, then you got to see if compression was needed. That's why you can not be certain on a song. You don't record with SO hard compression in the first place. I guess this is another meta-reason why you need a natural sound. So if its about a recording you might be lucky.
Now about a finished mastered song... things get rough once again. It is about the nature, the type, of the sound. A recording is easier to figure out what is going on if you knew you used waveform recording. An mp3 recording of course is a waste of time. On the other hand a finished song, usually nowadays makes compressors, limiters, gates and chain compressors burn out. The amount of use of this techniques in modern mastering is enormous. So... you will really need luck to find out if the original piece was compressed before, before having an original waveform to begin with.
Is there an automated method, that'd answer the question with a reasonable probability?
None that I know. Sorry. :(
But that doesn't mean than nobody can make one.
BUT!
A stereo sample is usually split out to two channels. Left and right.
Now if you got a spectrum analyzer in a Digital Audio Workstation,
and take a look only on the left channels of two different samples, you can on the fly see
if they are the same or not I guess.
In order to understand what I mean, take a look at THIS link.
Go at 05:00 and just watch the interface.
Phew. Hope this will help you further, since it took some time. :P
Cheers.
Edit: Fixing some stuff here and there.

I found a description of the problem, a solution and an implementation in Python by Maurits van der Schee, that works with a FLAC though.
From the sample only the first 30 seconds are analyzed. For every
second the frequency spectrum of the sample is computed by applying a
Hanning Window and doing a Fast Fourier Transform. These spectrums are
added, so that eventually you end up with 30 stacked spectrums. These
are divided by 30 to get the average spectrum. Then the spectrum is
normalized using log10. After that we applied a rolling average on the
spectrum with a window size of 1/100th of the frequency, being
44100/100=441 samples.
If there is an unnatural cutoff in the frequency spectrum, this cutoff
is the thing we need to find. We sweep the spectrum from 44100th back
to the 1st frequency, where the variable frequency is f. As soon as
the magnitude at f-220 is more than 1.25 higher than the magnitude at
f and the magnitude at f is no bigger than 1.1x the magnitude at 44100
we have found the cutoff point. The cutoff point is multiplied by 100
and divided by the frequency to get to the percentage of the spectrum
not cut off.

Things to look for:
Cut-off frequency changing on frame boundaries (not going to be a 100% hard cut, but look for "audible" to "inaudible" and vice versa)
Frequencies disappearing or appearing on frame boundaries (again, not 100%)
Noise levels changing on frame boundaries (actually pretty solid for lossy codecs)
For MP3, the frame boundaries are precisely every 1152 samples, though you might be able to "see" the granules every 576 samples.
For Vorbis, the frame boundaries are typically every 128 or 1024 samples depending on transients the encoder "saw". You can probably get away with doing every 128 samples...
You'll have to research the other formats to know their frame sizes (I don't know them offhand).

Programmatic mix analysis of stereo audio files - is bass panned to one channel?

I want to analyze my music collection, which is all CD audio data (stereo 16-bit PCM, 44.1kHz). What I want to do is programmatically determine if the bass is mixed (panned) only to one channel. Ideally, I'd like to be able to run a program like this
mono-bass-checker music.wav
And have it output something like "bass is not panned" or "bass is mixed primarily to channel 0".
I have a rudimentary start on this, which in pseudocode looks like this:
binsize = 2^N # define a window or FFT bin as a power of 2
while not end of audio file:
read binsize samples from audio file
de-interleave channels into two separate arrays
chan0_fft_result = fft on channel 0 array
chan1_fft_result = fft on channel 1 array
for each index i in (number of items in chanX_fft_result/2):
freqency_bin = i * 44100 / binsize
# define bass as below 150 Hz (and above 30 Hz, since I can't hear it)
if frequency_bin > 150 or frequency_bin < 30 ignore
magnitude = sqrt(chanX_fft_result[i].real^2 + chanX_fft_result[i].complex^2)
I'm not really sure where to go from here. Some concepts I've read about but are still too nebulous to me:
Window function. I'm currently not using one, just naively reading from the audio file 0 to 1024, 1025 to 2048, etc (for example with binsize=1024). Is this something that would be useful to me? And if so, how does it get integrated into the program?
Normalizing and/or scaling of the magnitude. Lots of people do this for the purpose of making pretty spectograms, but do I need to do that in my case? I understand human hearing roughly works on a log scale, so perhaps I need to massage the magnitude result in some way to filter out what I wouldn't be able to hear anyway? Is something like A-weighting relevant here?
binsize. I understand that a bigger binsize gets me more frequency bins... but I can't decide if that helps or hurts in this case.
I can generate a "mono bass song" using sox like this:
sox -t null /dev/null --encoding signed-integer --bits 16 --rate 44100 --channels 1 sine40hz_mono.wav synth 5.0 sine 40.0
sox -t null /dev/null --encoding signed-integer --bits 16 --rate 44100 --channels 1 sine329hz_mono.wav synth 5.0 sine 329.6
sox -M sine40hz_mono.wav sine329hz_mono.wav sine_merged.wav
In the resulting "sine_merged.wav" file, one channel is pure bass (40Hz) and one is non-bass (329 Hz). When I compute the magnitude of bass frequencies for each channel of that file, I do see a significant difference. But what's curious is that the 329Hz channel has non-zero sub-150Hz magnitude. I would expect it to be zero.
Even then, with this trivial sox-generated file, I don't really know how to interpret the data I'm generating. And obviously, I don't know how I'd generalize to my actual music collection.
FWIW, I'm trying to do this with libsndfile and fftw3 in C, based on help from these other posts:
WAV-file analysis C (libsndfile, fftw3)
Converting an FFT to a spectogram
How do I obtain the frequencies of each value in an FFT?

Not using a window function (the same as using a rectangular window) will splatter some of the high frequency content (anything not exactly periodic in your FFT length) into all other frequency bins of an FFT result, including low frequency bins. (Sometimes this is called spectral "leakage".)
To minimize this, try applying a window function (von Hann, etc.) before the FFT, and expect to have to use some threshold level, instead of expecting zero content in any bins.
Also note that the bass notes from many musical instruments can generate some very powerful high frequency overtones or harmonics that will show up in the upper bins on an FFT, so you can't preclude a strong bass mix from the presence of a lot of high frequency content.

What is the effect of the "quality" option in SoX mp3 compression?

I am converting audio files of several different formats to mp3 using SoX. According to the docs, you can use the -C argument to specify compression options like the bitrate and quality, the quality being after the decimal point, for example:
sox input.wav -C 128.01 output.mp3 (highest quality, slower)
sox input.wav -C 128.99 output.mp3 (lowest quality, faster)
I expected the second one to sound terrible, however, the audio quality between the two sounds exactly the same. If that is the case, I do not understand why one performs so much slower or what I would gain by setting the compression to higher "quality".
Can someone please tell me if there is a real difference or advantage to using higher quality compression versus lower quality?
P.S. I also checked the file size of each output file and both are exactly the same size. But when hashed, each file comes out with a different hash.

The parameters are passed on to LAME. According to the LAME documentation (section “algorithm quality selection”/-q), the quality value has an effect on noise shaping and the psychoacoustic model used. They recommend a quality of 2 (i.e. -C 128.2 in SoX), saying that 0 and 1 are much slower, but hardly better.
However, the main factor determining the quality remains the bit rate. It is therefore not too surprising that there is no noticeable difference in your case.

For me faster with simple
time sox input.mp3 -C 128 output.mp3
real 0m7.417s user 0m7.334s sys 0m0.057s
time sox input.mp3 -C 128.02 output.mp3
real 0m39.805s user 0m39.430s sys 0m0.205s

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string