Meaning of sample values in a wav file - audio

for a school project, I am supposed to analyze a short sound recording in wav format. I am done with the project, I DFT'd it, filtered out unwanted frequencies, and got the correct result. What eludes me, though, is the meaning of the values of the individual samples of my wav file. I have tens of thousands of samples that look like this:
[ 0.06234258 0.16020246 0.14122963 ... -0.01704375 -0.08993937 -0.09293508]
However, no matter how much I multiply these values by a number, the resulting sound sounds the same. If I multiply every sample by 1000, it sounds just as it sounded before. The same goes for dividing. So what do these samples mean, if not volume?
EDIT:
Here is the code I'm using:
import soundfile as sf
import IPython
samples, sampling_freq = sf.read('recording.wav')
IPython.display.display(IPython.display.Audio(samples, rate=sampling_freq )) #This one displays a playable bar.

The samples (basically a long array of floating point numbers) in the file is the Pulse Code Modulated data representing the audio.
Given that audio players use this data to recreate the original audio wave, multiplying every sample by some factor should increase the volume. However, some audio players scale down (re-normalize) the samples to prevent audio clamping - which can be the cause why it sounds the same.
The ideal way to visualize the audio should be using Audacity. It has the capability to show the audio wave in real time. Something like this -
PC: Google

Related

Why does a .WAV file sound exactly the same as its complement?

So here it is, I've been playing around with some C# coding trying to see how much a .WAV audio file will sound different when I invert more and more bits of each sample, but when I got to the step in which I flipped the bits of the whole file, expecting more noise or even that the original audio will not be heard clearly anymore. But when I tried playing the complement of the original audio (all bits inverted), I was surprised the noise in previous steps (less bits inverted and not all of them yet) even disappeared and the file sounded just like the original audio again.
I would like to know what's the reason behind that.
The reason for this is not in fact related to C# at all, but the human perception of sound
The human ear responds only to the intensity I of the sound it receives (more specifically, to the intensity distribution over the different frequencies) and this goes more or less like the square of the amplitude,
I~A^2.
Changing the sign of the waveform changes the sign of A, which has no effect on I.
This is a direct quote from a post on physics.stackexchange
My guess is that the WAV file values are stored using signed integers.
Look at this page : https://en.wikipedia.org/wiki/Signed_number_representations
If you invert all bits of a signed integer, you have the opposite value.
So you just created a symetrical WaveForm, which is the same sound as the original.
You need to approach this from first principles - consider a sampling frequency of 44KHz, and a one bit sampling resolution; a partial stream of a pure 22KHz sound will read as:
010101010101
If you invert all of the bits, that becomes:
101010101010
That stream is still a pure 22KHz tone (albeit with a phase difference)
In the real world, the sound waves are closer in shape to sine waves, but the principle still holds - the opposite of what was recorded will appear to be more less identical to the original. It would be more noticeable if you played the inverted against the original down different stereo channels (There's a Pitchshifter song that does this on 2nd Hand on their www.pitchshifter.com album, and explains it at the same time (arond 2:20->2:30) if you fancy a listen)

How to recognize if an audio sample has been compressed and then decompressed?

Some years ago I made a music audio recording, and I can't find the original WAV files, I have only compressed MP3s. Now I found an audio CD, but I don't know if it was made using the original, uncompressed WAVs, or if it was made from compressed MP3 or OGG files.
Is there a way how to detect if an audio sample has been compressed and decompressed using a lossy compression such as MP, OGG, ..., without having the original to compare to?
Update:
Trying #MisterHenson's suggestion, I plotted the spectra of the two samples, with obvious differences in the graphs:
The sample from the CD:
The sample from the MP3:
This practically solves solves my current problem, but still I have these open questions:
If the spectra were visually indistinguishable, I wouldn't know if there is a real difference, or that I just can't distinguish them (i.e. the compression would be of better quality). What else could I try?
Similarly what would I do if I didn't have the MP3 file to compare to, just a single audio sample?
Is there an automated method, that'd answer the question with a reasonable probability?
I made an example to stress the topology of all MP3 transcodes, the source material being a Chopin nocturne. MP3 on top, Lossless on bottom. All recordings have background noise of some amplitude, and that noise is faintly visible here. What the MP3 transcode (Lame's V2 preset in this case) does is create a hard limit at ~16kHz. On a 320kbps bitrate 44.1kHz sample rate MP3, this hard limit appears at around 20kHz, but it would still be visibly different in this image.
You can pick out this shelf without having the original lossless file for comparison. I'm willing to say all music has amplitude at frequencies above even 19kHz. Here's an example for which I do not have the lossless source file, just a 320kbps MP3. You can see the very hard limit at 20kHz as well as a milder cutoff at 19kHz. Were it lossless, that red blob in the middle would extend all the way up to 22kHz since the sample rate is 44.1kHz.
I would say this process is probably automatable, but I do not know of any attempts to automate it. If this were automated, though, I'd say it could pick Lossy from Lossless with much higher accuracy than you or I, by virtue of it being able to analyze the entire spectrum as opposed to just the high frequency cutoffs.
Full res images:
http://i.imgur.com/dezONol.jpg
http://i.imgur.com/1qokxAN.jpg
The above approaches sound very promising although maybe a little complicated -- you might first try something easy, like check the distribution of the least significant bit. In a natural sample, LSB should be an almost exact 50/50 distribution between zeroes and ones (actually across many samples would have some variance following a binomial distribution but with millions or billions of bits this will be ridiculously close to 50/50 in any given sample). In a lossy sample, you will find an unlikely distribution in the LSB.
Something like this:
1 -- extract LSB from each data point
2 -- apply chi-squared test to judge if distribution is unusual
Here is the deal.
A raw sample (or a raw piece of sound) is encoded in a certain quality.
Some sound cards can go further with 64bit sampling.
But let's assume that we have sound files of a certain KNOWN quality.
CD quality is okay for the human ear.
A studio, would make use of more quality samples though. Like 24bit as a standard.
So you got a waveform filename.wav that really has a sample rate 44100 Hz.
What does that mean?
It means the computer can take a huge amount of different samples per second to represent almost the exact sound.
Is the sound original? Depends on how it was made.
If it was made by your computer and a piece of software using a 16bit default sound card yes it is.
If it was from an analogue recording though, it loses some of its quality on the digitization at 44100 Hz fortunately not so significant for the human ear.
NOTE THAT mp3 recordings is a bad idea for professional recording.
But since mp3 recording do exist... this adds complexity to your question. :P
So some sound quality is lost on digitization with a 16bit sound card.
Now similar thing can happen when you encode something to mp3.
Check out your picture. Above 17000 there is no sound. It was butchered to make the sound file significant smaller, without making any significant damage to the audio quality. Is it the same piece of sound? No. It sounds the same though. But a sound engineer LOVES original and good quality samples, because of the information that is NOT cut.
Imagine me, making an original sound, so balanced and compressed that even after an mp3 converting it is hard to tell if it is original sound or not. Imagine me using equalizers to cut any sharp edges, and gate effects to extremely normalize it. Also, my sound generators are some 8bit oscillators passing through some fx and filters.
If I convert it back to wavetable, there might be no difference.
For instance:
[UNCHANGED FREQUENCIES][CUT FREQUENCIES]
Waveform: =================================
mp3: =======================
Waveform: =======================
Waveform:
[UNCHANGED FREQUENCIES][CUT FREQUENCIES]
Waveform: =================
mp3 =================
Waveform: =================
The following seems impossible to me (except if the converter has bugs thing that can be heard)
[UNCHANGED FREQUENCIES][CUT FREQUENCIES]
Waveform: =========================
mp3 =======================
Waveform: =============================
So your question depends on the original source you used in the first waveform.
Good news is that a sample is RARELY THAT limited and compressed.
So it seems to me that the CD you used will probably sound like original waveform,
while as you can see, the mp3 has cut out frequencies.
To be sure of course you need a frequency analyzer and spectrum as MischaNix already has shown.
There are many mp3 encodings too. Some are static, some dynamic, some cut more and some cut less sound information. Some are also bigger than others for that reason.
Now there are lossless formats too.
And then there is ogg that is small enough and also has great quality.
So this question can become a huge topic for no reason here. I will not talk about all these.
If the issue is giving an original sample, your pictures show me significant differences between the two samples. I mean, making a waveform out of the mp3 cut variation, should look like that cut variation. You can not get information out of nothing.
Burn the mp3 on a cd, then get the wave, compare the new waveform with the old and the mp3 waveform. It will probably not be the same thing so you might hit the jackpot here. It is possible you got an original backup on your hands.
From now on though, try sampling raw material and store them in a CD or DVD before discarding them.
Or at least keep good uncompressed samples in a backup.
Open questions:
If the spectra were visually indistinguishable, I wouldn't know if there is a real difference, or that I just can't distinguish them.
Correct. But this would occur seldom without intention on sampling.
Why asking such a question? :) Do you have steganography in mind?
If yes, make sure to keep in mind the nature of the piece of sound you are gonna use. Samples are not appropriate. "Finished songs" are!
Similarly what would I do if I didn't have the MP3 file to compare to, just a single audio sample?
Since there are many mp3 encoding settings of different qualities, you can check if the lowest quality was used. If not there is uncertainty because of the compression capabilities. If this applies to the whole sample, then you got to see if compression was needed. That's why you can not be certain on a song. You don't record with SO hard compression in the first place. I guess this is another meta-reason why you need a natural sound. So if its about a recording you might be lucky.
Now about a finished mastered song... things get rough once again. It is about the nature, the type, of the sound. A recording is easier to figure out what is going on if you knew you used waveform recording. An mp3 recording of course is a waste of time. On the other hand a finished song, usually nowadays makes compressors, limiters, gates and chain compressors burn out. The amount of use of this techniques in modern mastering is enormous. So... you will really need luck to find out if the original piece was compressed before, before having an original waveform to begin with.
Is there an automated method, that'd answer the question with a reasonable probability?
None that I know. Sorry. :(
But that doesn't mean than nobody can make one.
BUT!
A stereo sample is usually split out to two channels. Left and right.
Now if you got a spectrum analyzer in a Digital Audio Workstation,
and take a look only on the left channels of two different samples, you can on the fly see
if they are the same or not I guess.
In order to understand what I mean, take a look at THIS link.
Go at 05:00 and just watch the interface.
Phew. Hope this will help you further, since it took some time. :P
Cheers.
Edit: Fixing some stuff here and there.
I found a description of the problem, a solution and an implementation in Python by Maurits van der Schee, that works with a FLAC though.
From the sample only the first 30 seconds are analyzed. For every
second the frequency spectrum of the sample is computed by applying a
Hanning Window and doing a Fast Fourier Transform. These spectrums are
added, so that eventually you end up with 30 stacked spectrums. These
are divided by 30 to get the average spectrum. Then the spectrum is
normalized using log10. After that we applied a rolling average on the
spectrum with a window size of 1/100th of the frequency, being
44100/100=441 samples.
If there is an unnatural cutoff in the frequency spectrum, this cutoff
is the thing we need to find. We sweep the spectrum from 44100th back
to the 1st frequency, where the variable frequency is f. As soon as
the magnitude at f-220 is more than 1.25 higher than the magnitude at
f and the magnitude at f is no bigger than 1.1x the magnitude at 44100
we have found the cutoff point. The cutoff point is multiplied by 100
and divided by the frequency to get to the percentage of the spectrum
not cut off.
Things to look for:
Cut-off frequency changing on frame boundaries (not going to be a 100% hard cut, but look for "audible" to "inaudible" and vice versa)
Frequencies disappearing or appearing on frame boundaries (again, not 100%)
Noise levels changing on frame boundaries (actually pretty solid for lossy codecs)
For MP3, the frame boundaries are precisely every 1152 samples, though you might be able to "see" the granules every 576 samples.
For Vorbis, the frame boundaries are typically every 128 or 1024 samples depending on transients the encoder "saw". You can probably get away with doing every 128 samples...
You'll have to research the other formats to know their frame sizes (I don't know them offhand).

Changing duration of Guitar pluck in PCM data

Folks,
I am struggling with a simple concept related to the duration of play of PCM data. I would appreciate your feedback.
The application I am developing plays guitar notes from a music sheet.
I have implemented Jaffe-Smith Algorithm for guitar plucking.
https://ccrma.stanford.edu/~jos/Mohonk05/Extended_Karplus_Strong_EKS_Algorithm.html.
Let's say I compute samples for note A (440 Hz) for one second.
At the sample rate of 11025, I will be storing 11025 samples that can be send to the computer speakers as PCM audio.
For all the unique notes on the guitar, it takes quite some time to compute samples for all the notes. I am thinking I will pre-compute and save them as binary data and simply load them when the application is run.
So far so good.
Now, let's say I want to play a song (a list of various notes). Let's say the song needs to be played at 100 beats per minute. Let's say I have to play note A for one beat or 0.6 seconds (60/100).
Recalculating samples for 0.6 seconds may take quite some time.
Can I simply play (11025 * 0.6) samples? Will this create any side effect?
Is there a better way to achieve what I am trying to do?
Thank you in advance for your help.
Regards,
Peter
What you're basically trying to do is create a synthesized guitar, yes? I might suggest that you go with the sampler route instead.
By sample, I mean a small clip of audio (not a single sample in the sense of ADC or DAC).
Basically, you can flatten what you need into 4 parts:
Attack
Decay
Sustain
Release
These four parts work in that order, and are generally referred to as an ADSR envelope. The attack of the note is the initial sound. For a guitar, you are going to hear a pluck and the start of a pitch. The decay is going to be the sample of the string as it starts to fade away. The sustain is a sample repeated over and over again until you release the key. The release sample is what is played when you release the key. For a guitar, you might hear a sample of lightly putting fingers back on the string to stop their vibration.
Now, you could generate all of these samples in real-time, but will likely be very CPU intensive.
Regarding your question: "Can I simply play (11025 * 0.6) samples?" Yes, at a sample rate of 11025, that will be 0.6 seconds of audio. Also keep in mind though that you should be sending a continuous stream of data to the sound card, filling any empty spots with 0 (for signed PCM).

How do I analyze an audio file for output frequency and duration?

I have a function that can play set frequencies. The function's inputs are frequency and duration. How can I analyze a sound file so that I have the output frequency for every milisecond of audio.
e.g.:
MS, Frequency
1, 400
2, 401
3, 402
etc.
If there is Mac based software that can do this, I'd be fine with preprocessing the audio files and only inputing the frequency/duration combos.
Thanks!
Most sound files (recordings of anything other than a simple sin wave) do not have a single output frequency, so what you're trying to do is essentially impossible. It is possible to determine the dominant or fundamental frequency of a sound file, but this becomes more difficult to do (and less accurate) the shorter the file is. A one-millisecond snippet of CD-quality (mono) sound consists of only 441 samples.
Are you perhaps trying to convert a recording into its component notes, and then reproduce the tune with your function that plays frequencies? To see how fundamentally difficult this task is, try googling "wav-to-midi".

Where can I learn how to work with audio data formats?

I'm working on an openGL project that involves a speaking cartoon face. My hope is to play the speech (encoded as mp3s) and animate its mouth using the audio data. I've never really worked with audio before so I'm not sure where to start, but some googling led me to believe my first step would be converting the mp3 to pcm.
I don't really anticipate the need for any Fourier transforms, though that could be nice. The mouth really just needs to move around when there's audio (I was thinking of basing it on volume).
Any tips on to implement something like this or pointers to resources would be much appreciated. Thanks!
-S
Whatever you do, you're going to need to decode the MP3s into PCM data first. There are a number of third-party libraries that can do this for you. Then, you'll need to analyze the PCM data and do some signal processing on it.
Automatically generating realistic lipsync data from audio is a very hard problem, and you're wise to not try to tackle it. I like your idea of simply basing it on the volume. One way you could compute the current volume is to use a rolling window of some size (e.g. 1/16 second), and compute the average power in the sound wave over that window. That is, at frame T, you compute the average power over frames [T-N, T], where N is the number of frames in your window.
Thanks to Parseval's theorem, we can easily compute the power in a wave without having to take the Fourier transform or anything complicated -- the average power is just the sum of the squares of the PCM values in the window, divided by the number of frames in the window. Then, you can convert the power into a decibel rating by dividing it by some base power (which can be 1 for simplicity), taking the logarithm, and multiplying by 10.

Resources