When we want to encrypt an audio file (wav/mp3), why is the encryption done in the frequency domain? I looked at some audio encryption methods and they use the Fourier Transform and then they do some encryption in the frequency domain. Why we dont just take the data (int/float) from the wav/mp3 file, encrypt it and then write it back as a wav/mp3 file? Is there any advantage of encryption in frequency domain?
Some audio encryption algoritm that i found:
http://ijcsit.com/docs/Volume%205/vol5issue03/ijcsit20140503393.pdf
No doubt due to why the majority of audio codec also use its frequency domain representation, it is more information efficient. When each of the freq domain bins only needs to store three parameters ( freq, magnitude and phase ) or even more concisely ( aj + bi) in the complex plane and an arbitrary audio curve in the time domain can be resurrected by just a handful of those freq bins it becomes compelling to perform the encryption on the more informational dense representation. Once in freq domain its also easier to discard non human perceived frequencies so reducing load. A knock on benefit is reduced compute demand when in freq domain for both compression and encryption.
So a typical data flow would give you
raw audio in PCM format ( time domain ) -> fft -> freq domain -> encryption -> decryption -> freq domain back again -> inverse fft -> resurrected raw audio
If you are free of those constraints its perfectly feasible to do audio encryption directly in the time domain. Keep in mind once you muck with the time domain signal its freq domain representation will require ever greater information ( space + compute ) per unit of time, and hence harder to compress
Related
I need to create a software can capture sound (from NOAA Satellite with RTL-SDR). The problem is not capture the sound, the problem is how I converted the audio or waves into an image. I read many things, Fourier Fast Transformed, Hilbert Transform, etc... but I don't know how.
If you can give me an idea it would be fantastic. Thank you!
Over the past year I have been writing code which makes FFT calls and have amassed 15 pages of notes so the topic is vast however I can boil it down
Open up your WAV file ... parse the 44 byte header and note the given bit depth and endianness attributes ... then read across the payload which is everything after that header ... understand notion of bit depth as well as endianness ... typically a WAV file has a bit depth of 16 bits so each point on the audio curve will be stored across two bytes ... typically WAV file is little endian not big endian ... knowing what that means you take the next two bytes then bit shift one byte to the left (if little endian) then bit OR that pair of bytes into an integer then convert that int which typically varies from 0 to (2^16 - 1) into its floating point equivalent so your audio curve points now vary from -1 to +1 ... do that conversion for each set of bytes which corresponds to each sample of your payload buffer
Once you have the WAV audio curve as a buffer of floats which is called raw audio or PCM audio then perform your FFT api call ... all languages have such libraries ... output of FFT call will be a set of complex numbers ... pay attention to notion of the Nyquist Limit ... this will influence how your make use of output of your FFT call
Now you have a collection of complex numbers ... the index from 0 to N of that collection corresponds to frequency bins ... the size of your PCM buffer will determine how granular your frequency bins are ... lookup this equation ... in general more samples in your PCM buffer you send to the FFT api call will give you finer granularity in the output frequency bins ... essentially this means as you walk across this collection of complex numbers each index will increment the frequency assigned to that index
To visualize this just feed this into a 2D plot where X axis is frequency and Y axis is magnitude ... calculate this magnitude for each complex number using
curr_mag = 2.0 * math.Sqrt(curr_real*curr_real+curr_imag*curr_imag) / number_of_samples
For simplicity we will sweep under the carpet the phase shift information available to you in your complex number buffer
This only scratches the surface of what you need to master to properly render a WAV file into a 2D plot of its frequency domain representation ... there are libraries which perform parts or all of this however now you can appreciate some of the magic involved when the rubber hits the road
A great explanation of trade offs between frequency resolution and number of audio samples fed into your call to an FFT api https://electronics.stackexchange.com/questions/12407/what-is-the-relation-between-fft-length-and-frequency-resolution
Do yourself a favor and checkout https://www.sonicvisualiser.org/ which is one of many audio workstations which can perform what I described above. Just hit menu File -> Open -> choose a local WAV file -> Layer -> Add Spectrogram ... and it will render the visual representation of the Fourier Transform of your input audio file as such
I'd like to build a an audio visualizer display using led strips to be used at parties. Building the display and programming the rendering engine is fairly straightforward, but I don't have any experience in signal processing, aside from rendering PCM samples.
The primary feature I'd like to implement would be animation driven by audible frequency. To keep things super simple and get the hang of it, I'd like to start by simply rendering a color according to audible frequency of the input signal (e.g. the highest audible frequency would be rendered as white).
I understand that reading input samples as PCM gives me the amplitude of air pressure (intensity) with respect to time and that using a Fourier transform outputs the signal as intensity with respect to frequency. But from there I'm lost as to how to resolve the actual frequency.
Would the numeric frequency need to be resolved as the inverse transform of the of the Fourier transform (e.g. the intensity is the argument and the frequency is the result)?
I understand there are different types of Fourier transforms that are suitable for different purposes. Which is useful for such an application?
You can transform the samples from time domain to frequency domain using DFT or FFT. It outputs frequencies and their intensities. Actually you get a set of frequencies not just one. Based on that LED strips can be lit. See DFT spectrum tracer
"The frequency", as in a single numeric audio frequency spectrum value, does not exist for almost all sounds. That's why an FFT gives you all N/2 frequency bins of the full audio spectrum, up to half the sample rate, with a resolution determined by the length of the FFT.
How can I calculate A-weighted and C-weighted dB sound levels from the microphone on iOS?
Here is what I have tried, but the reading I get is far below the sound level meter I have next to my iPhone:
Using the Novocain library, which I have slightly modified to set the audio session mode to Measurement.
Using the Maximilian audio library to run the incoming audio frames through an FFT and converting the amplitudes into dB.
Using the Maximilian audio libraries's Octave Analyser to place the FFT output into octave bins from 10hz to 20480hz.
For each octave bin I am apply a db gain of the relevant dB-weighing (e.g. apply -70.f db gain to the db value stored in the 10hz bin to get an A-weighted dB gain).
Added the db values of each bin together by reducing each dB bin to an amplitude, and the gain to an an amplitude, making the addition, and converting back to a dB value again.
Is this on the correct track, I have my doubts? Can someone outline an approach? Suggest a library and/or other example (I have looked).
To note – I would like approximate dB(A) and dB(C) values, this does not need to be scientific. Not sure how to compensate for the frequency response of the microphone, could the above technique be correct if it were compensating for the response of the microphone?
I don't think you can measure physical Sound Pressure Levels from a device. In step 2 you "convert the amplitudes into dB." However the amplitude that you record from the device has arbitrary units. When recording 16-bit audio, the audio is represented as numbers in the range -32768 to +32767. If you are working with floating point data then this is normalised by 32768 so that it has a range of (approximately) -1 to +1.
The device's microphone has to cope with a wide variety of sound levels. Generally devices will have some form of Automatic Gain Control which adapts to the current average sound level. This means that if you measure a peak value of 1.0 then you have no way of knowing the actual SPL it corresponded to. You can convert a recording to a series of dBs, but this uses a different definition of dB: as a power ratio. This has no correlation with SPL measurements such as dB(A).
It may be possible to produce an approximate dB(A) measurement if you are able to turn off the AGC and calibrate your device against your sound meter
EDIT: JASA have published a paper with a detailed comparison of existing SPL measurement apps with stats for the comparison of different generations of iPhone: Evaluation of smartphone sound measurement applications
I've been hunting all over the web for material about vocoder or autotune, but haven't got any satisfactory answers. Could someone in a simple way please explain how do you autotune a given sound file using a carrier sound file?
(I'm familiar with ffts, windowing, overlap etc., I just don't get the what do we do when we have the ffts of the carrier and the original sound file which has to be modulated)
EDIT: After looking around a bit more, I finally got to know exactly what I was looking for -- a channel vocoder. The way it works is, it takes two inputs, one a voice signal and the other a musical signal rich in frequency. The musical signal is modulated by the envelope of the voice signal, and the output signal sounds like the voice singing in the musical tone.
Thanks for your help!
Using a phase vocoder to adjust pitch is basically pitch estimation plus interpolation in the frequency domain.
A phase vocoder reconstruction method might resample the frequency spectrum at, potentially, a new FFT bin spacing to shift all the frequencies up or down by some ratio. The phase vocoder algorithm additionally uses information shared between adjacent FFT frames to make sure this interpolation result can create continuous waveforms across frame boundaries. e.g. it adjusts the phases of the interpolation results to make sure that successive sinewave reconstructions are continuous rather than having breaks or discontinuities or phase cancellations between frames.
How much to shift the spectrum up or down is determined by pitch estimation, and calculating the ratio between the estimated pitch of the source and that of the target pitch. Again, phase vocoders use information about any phase differences between FFT frames to help better estimate pitch. This is possible by using more a bit more global information than is available from a single local FFT frame.
Of course, this frequency and phase changing can smear out transient detail and cause various other distortions, so actual phase vocoder products may additionally do all kinds of custom (often proprietary) special case tricks to try and fix some of these problems.
The first step is pitch detection. There are a number of pitch detection algorithms, introduced briefly in wikipedia: http://en.wikipedia.org/wiki/Pitch_detection_algorithm
Pitch detection can be implemented in either frequency domain or time domain. Various techniques in both domains exist with various properties (latency, quality, etc.) In the F domain, it is important to realize that a naive approach is very limiting because of the time/frequency trade-off. You can get around this limitation, but it takes work.
Once you've identified the pitch, you compare it with a desired pitch and determine how much you need to actually pitch shift.
Last step is pitch shifting, which, like pitch detection, can be done in the T or F domain. The "phase vocoder" method other folks mentioned is the F domain method. T domain methods include (in increasing order of quality) OLA, SOLA and PSOLA, some of which you can read about here: http://www.scribd.com/doc/67053489/60/Synchronous-Overlap-and-Add-SOLA
Basically you do an FFT, then in the frequency domain you move the signals to the nearest perfect semitone pitch.
I have a group of related questions regarding FFTW and audio analysis on Linux.
What is the easiest-to-use, most comprehensive audio library in Linux/Ubuntu that will allow me to decode any of a variety of audio formats (MP3, etc.) and acquire a buffer of raw 16-bit PCM values? gstreamer?
I intend on taking that raw buffer and feeding it to FFTW to acquire frequency-domain data (without complex information or phase information). I think I should use one of their "r2r" methods, probably the DHT. Is this correct?
It seems that FFTW's output frequency axis is discretized in linear increments that are based on the buffer length. It further seems that I can't change this discretization within FFTW so I must do it after the DHT. Instead of a linear frequency axis, I need an exponential axis that follows 2^(i/12). I think I'll have to take the DHT output and run it through some custom anti-aliasing function. Is there a Linux library to do such anti-aliasing? If not, would a basic cosine-based anti-aliasing function work?
Thanks.
This is an age old problem with FFTs and working with audio - ideally we want a log frequency scale for audio but the DFT/FFT has a linear scale. You will need to choose an FFT size that gives sufficient resolution at the low end of your frequency range, and then accumulate bins across the frequency range of interest to give yourself a pseudo-logarithmic representation. There are more complex schemes, but essentially it all boils down to the same thing.
I've seen libsndfile used all over the place:
http://www.mega-nerd.com/libsndfile/
It's LGPL too. It can read pretty much all the open source and lossless audio format you would care about. It doesn't do MP3, however, because of licensing costs.