Match a target spectrum - audio

I'm working on rolling noise emissions from cars.
I have a model for rolling noise emissions which gives me sound pressure levels in third octave bands (29 in total, between 20Hz and 8kHz), depending on vehicle speed, and road/tyre combinaison.
I'd like to fit a real recording of tire/road noise to the model, while keeping the spectral properties of the recording. The signal of rolling noise is mainly stochastic
In blue, the spectrum of the recording using Welch's method, and in red, the levels i'd like to reach.
What kind of methods can i use in this particular case?
Thank you for your help!

This is quite dependent on the structure of the model you're trying to fit, but you should be able to use an optimization method (such as fmincon in MATLAB) to adjust the model parameters until you reach an acceptable level of error between your measurements and the model.

Related

Bias/Dirft compensation for integration of linear accelerometer data using Kalman filtering

I have become a part of this infinite question of how to estimate position from accelerometer data achieved by an Inertial measurement unit. I am wondering how to compensate for integration ''drift'' during linear movement using Kalman filtering.
At this moment I got my acceleration in a fixed coordinate system and all movements are in know directions with no change in angular position.
So at this point we got acceleration in 3D (x-y-z) in known directions, an acceleration in x will yield for zero acceleration in y and z and so on. Assuming perfect conditions, which are not the case, of course some noise with be added to the other directions when moving in one direction but lets ''leave'' this out at this point. In addition, It is important to note that the system only has to estimate a limited period, approximately about 1 second using a sampling freq of 512 Hz.
It also important to note that I have compensated for the offset (gravity and misalignment of the accelerometer in the IMU) and bias of the acceleromter data when static. Meaning when the sensor is non-moving all my readings are constant zero before going into the Kalman filter.
To more characterize my problem I have this graph to illustrate my problem with drift. This is estimations on 5 seconds to more show what I'm struggling with.
Position-estimation-drift-problem
Here we are looking into a movement in one direction, the movement are 20cm movement in y direction which in my case are forward relative to my starting position.
Is there a way to reduce/eliminate this drift when integrating my signal. For instance assume something about drifting when my sensor is non-moving. Or to compute using some correction in my Kalman algorithm to subtract or add to my estimated velocity and position. The system does not have to run in real time so any tuning bias compensation can be adjusted for looking back into the data. But I would be preferable if it was possible to take new measurements with slightly different movements and not tune more then needed.
Finally where/how can I compensate for this, in the Kalman algorithm or before/after, or should I be in for a disappointment already?
If I left out some important information please ask so i can elaborate more, an at last any thoughts/ideas are welcome!
Remember I do only need to estimate for second’s worth of time so my hope is that this makes it more achievable, but i might be wrong?
I can only guess/suggest few tricks, but you will probably get some significant error if you only based on accelerometer.
seems that detecting motionless is not resetting the speed, just acceleration (according to your graph) so this should be an easy fix
if we are talking an a car/other type of surface motion with contact / friction, your motionless can be set by characterizing the noise of in motion/self sensor noise
kalman parameters may be off
run multiple kernels and average results (may also try particle filter)
if its not for online application you can also try fitting offsets/drift and reduce them by assuming there is not motion in constant speed or other approaches that can replace the kalman filter which is designed for real time best estimation.
error seems a-symmetric in time, just run it in both directions (:
what are you measuring at 512 Hz??? maybe you can better model it
I can go on and on but if you supply data and code, it would be much easier.
Good luck,
Lev

Camera calibration algorithm evaluation

Recently, I am developing algorithm to improvement the camera calibration algorithm in my research group. I would like to ask are there any method to evaluation the camera calibration algorithm so that I can compare the result among difference algorithm?
The most easiest way I can think of is taking the mean square average of different between the calibrated one and the original one pixel-wise. Are there any other suggestions?
Are you talking about geometric camera calibration (focal length, optical center, etc.), or color calibration?
For geometric camera calibration, the main criterion is reprojection errors. Presumably, you are using some sort of calibration pattern, like a checkerboard, where you can detect a set of points. To evaluate calibration accuracy, you look at the distances between the detected points and the reprojected points.
This is what a calibration algorithm typically tries to minimize. Depending on the calibration software you use, you may also be able to look at the uncertainty of the estimated camera parameters.
See this example in MATLAB.
Alternatively, you can use your calibrated camera to measure an object of a known size, and see how precise your measurement is.

Theory behind Autotune/vocoder

I've been hunting all over the web for material about vocoder or autotune, but haven't got any satisfactory answers. Could someone in a simple way please explain how do you autotune a given sound file using a carrier sound file?
(I'm familiar with ffts, windowing, overlap etc., I just don't get the what do we do when we have the ffts of the carrier and the original sound file which has to be modulated)
EDIT: After looking around a bit more, I finally got to know exactly what I was looking for -- a channel vocoder. The way it works is, it takes two inputs, one a voice signal and the other a musical signal rich in frequency. The musical signal is modulated by the envelope of the voice signal, and the output signal sounds like the voice singing in the musical tone.
Thanks for your help!
Using a phase vocoder to adjust pitch is basically pitch estimation plus interpolation in the frequency domain.
A phase vocoder reconstruction method might resample the frequency spectrum at, potentially, a new FFT bin spacing to shift all the frequencies up or down by some ratio. The phase vocoder algorithm additionally uses information shared between adjacent FFT frames to make sure this interpolation result can create continuous waveforms across frame boundaries. e.g. it adjusts the phases of the interpolation results to make sure that successive sinewave reconstructions are continuous rather than having breaks or discontinuities or phase cancellations between frames.
How much to shift the spectrum up or down is determined by pitch estimation, and calculating the ratio between the estimated pitch of the source and that of the target pitch. Again, phase vocoders use information about any phase differences between FFT frames to help better estimate pitch. This is possible by using more a bit more global information than is available from a single local FFT frame.
Of course, this frequency and phase changing can smear out transient detail and cause various other distortions, so actual phase vocoder products may additionally do all kinds of custom (often proprietary) special case tricks to try and fix some of these problems.
The first step is pitch detection. There are a number of pitch detection algorithms, introduced briefly in wikipedia: http://en.wikipedia.org/wiki/Pitch_detection_algorithm
Pitch detection can be implemented in either frequency domain or time domain. Various techniques in both domains exist with various properties (latency, quality, etc.) In the F domain, it is important to realize that a naive approach is very limiting because of the time/frequency trade-off. You can get around this limitation, but it takes work.
Once you've identified the pitch, you compare it with a desired pitch and determine how much you need to actually pitch shift.
Last step is pitch shifting, which, like pitch detection, can be done in the T or F domain. The "phase vocoder" method other folks mentioned is the F domain method. T domain methods include (in increasing order of quality) OLA, SOLA and PSOLA, some of which you can read about here: http://www.scribd.com/doc/67053489/60/Synchronous-Overlap-and-Add-SOLA
Basically you do an FFT, then in the frequency domain you move the signals to the nearest perfect semitone pitch.

Note Onset Detection using Spectral Difference

Im fairly new to onset detection. I read some papers about it and know that when working only with the time-domain, it is possible that there will be a large number of false-positives/negatives, and that it is generally advisable to work with either both the time-domain and frequency-domain or the frequency domain.
Regarding this, I am a bit confused because, I am having trouble on how the spectral energy or the results from the FFT bin can be used to determine note onsets. Because, aren't note onsets represented by sharp peaks in amplitude?
Can someone enlighten me on this? Thank you!
This is the easiest way to think about note onset:
think of a music signal as a flat constant signal. When and onset occurs you look at it as a large rapid CHANGE in signal (a positive or negative peak)
What this means in the frequency domain:
the FT of a constant signal is, well, CONSTANT! and flat
When the onset event occurs there is a rapid increase in spectrial content.
While you may think "Well you're actually talking about the peak of the onset right?" not at all. We are not actually interested in the peak of the onset, but rather the rising edge of the signal. When there is a sharp increase in the signal, the high frequency content increases.
one way to do this is using the spectrial difference function:
take your time domain signal and cut it up into overlaping strips (typically 50% overlap)
apply a hamming/hann window (this is to reduce spectrial smudging) (remember cutting up the signal into windows is like multiplying it by a pulse, in the frequency domain its like convolving the signal with a sinc function)
Apply the FFT algorithm on two sucessive windows
For each DFT bin, calculate the difference between the Xn and Xn-1 bins if it is negative set it to zero
square the results and sum all th bins together
repeat till end of signal.
look for peaks in signal using median thresholding and there are your onset times!
Source:
https://adamhess.github.io/Onset_Detection_Nov302011.pdf
and
http://www.elec.qmul.ac.uk/people/juan/Documents/Bello-TSAP-2005.pdf
You can look at sharp differences in amplitude at a specific frequency as suspected sound onsets. For instance if a flute switches from playing a G5 to playing a C, there will be a sharp drop in amplitude of the spectrum at around 784 Hz.
If you don't know what frequency to examine, the magnitude of an FFT vector will give you the amplitude of every frequency over some window in time (with a resolution dependent on the length of the time window). Pick your frequency, or a bunch of frequencies, and diff two FFTs of two different time windows. That might give you something that can be used as part of a likelihood estimate for a sound onset or change somewhere between the two time windows. Sliding the windows or successive approximation of their location in time might help narrow down the time of a suspected note onset or other significant change in the sound.
"Because, aren't note onsets represented by sharp peaks in amplitude?"
A: Not always. On percussive instruments (including piano) this is true, but for violin, flute, etc. notes often "slide" into each other as frequency changes without sharp amplitude increases.
If you stick to a single instrument like the piano onset detection is do-able. Generalized onset detection is a much more difficult problem. There are about a dozen primitive features that have been used for onset detection. Once you code them, you still have to decide how best to use them.

Pitch recognition of musical notes on a smart phone

With limited resources such as slower CPUs, code size and RAM, how best to detect the pitch of a musical note, similar to what an electronic or software tuner would do?
Should I use:
Kiss FFT
FFTW
Discrete Wavelet Transform
autocorrelation
zero crossing analysis
octave-spaced filters
other?
In a nutshell, what I am trying to do is to recognize a single musical note, two octaves below middle-C to two octaves above, played on any (reasonable) instrument. I'd like to be within 20% of the semitone - in other words, if the user plays too flat or too sharp, I need to distinguish that. However, I will not need the accuracy required for tuning.
If you don't need that much accuracy, an FFT could be sufficient. Window the chunk of audio first so that you get well-defined peaks, then find the first significant peak.
Bin width = sampling rate / FFT size:
Fundamentals range from 20 Hz to 7 kHz, so a sampling rate of 14 kHz would be enough. The next "standard" sampling rate is 22050 Hz.
The FFT size is then determined by the precision you want. FFT output is linear in frequency, while musical tones are logarithmic in frequency, so the worst case precision will be at low frequencies. For 20% of a semitone at 20 Hz, you need a width of 1.2 Hz, which means an FFT length of 18545. The next power of two is 215 = 32768. This is 1.5 seconds of data, and takes my laptop's processor 3 ms to calculate.
This won't work with signals that have a "missing fundamental", and finding the "first significant" peak is somewhat difficult (since harmonics are often higher than the fundamental), but you can figure out a way that suits your situation.
Autocorrelation and harmonic product spectrum are better at finding the true fundamental for a wave instead of one of the harmonics, but I don't think they deal as well with inharmonicity, and most instruments like piano or guitar are inharmonic (harmonics are slightly sharp from what they should be). It really depends on your circumstances, though.
Also, you can save even more processor cycles by computing only within a specific frequency band of interest, using the Chirp-Z transform.
I've written up a few different methods in Python for comparison purposes.
If you want to do pitch recognition in realtime (and accurate to within 1/100 of a semi-tone), your only real hope is the zero-crossing approach. And it's a faint hope, sorry to say. Zero-crossing can estimate pitch from just a couple of wavelengths of data, and it can be done with a smartphone's processing power, but it's not especially accurate, as tiny errors in measuring the wavelengths result in large errors in the estimated frequency. Devices like guitar synthesizers (which deduce the pitch from a guitar string with just a couple of wavelengths) work by quantizing the measurements to notes of the scale. This may work for your purposes, but be aware that zero-crossing works great with simple waveforms, but tends to work less and less well with more complex instrument sounds.
In my application (a software synthesizer that runs on smartphones) I use recordings of single instrument notes as the raw material for wavetable synthesis, and in order to produce notes at a particular pitch, I need to know the fundamental pitch of a recording, accurate to within 1/1000 of a semi-tone (I really only need 1/100 accuracy, but I'm OCD about this). The zero-crossing approach is much too inaccurate for this, and FFT-based approaches are either way too inaccurate or way too slow (or both sometimes).
The best approach that I've found in this case is to use autocorrelation. With autocorrelation you basically guess the pitch and then measure the autocorrelation of your sample at that corresponding wavelength. By scanning through the range of plausible pitches (say A = 55 Hz thru A = 880 Hz) by semi-tones, I locate the most-correlated pitch, then do a more finely-grained scan in the neighborhood of that pitch to get a more accurate value.
The approach best for you depends entirely on what you're trying to use this for.
I'm not familiar with all the methods you mention, but what you choose should depend primarily on the nature of your input data. Are you analysing pure tones, or does your input source have multiple notes? Is speech a feature of your input? Are there any limitations on the length of time you have to sample the input? Are you able to trade off some accuracy for speed?
To some extent what you choose also depends on whether you would like to perform your calculations in time or in frequency space. Converting a time series to a frequency representation takes time, but in my experience tends to give better results.
Autocorrelation compares two signals in the time domain. A naive implementation is simple but relatively expensive to compute, as it requires pair-wise differencing between all points in the original and time-shifted signals, followed by differentiation to identify turning points in the autocorrelation function, and then selection of the minimum corresponding to the fundamental frequency. There are alternative methods. For example, Average Magnitude Differencing is a very cheap form of autocorrelation, but accuracy suffers. All autocorrelation techniques run the risk of octave errors, since peaks other than the fundamental exist in the function.
Measuring zero-crossing points is simple and straightforward, but will run into problems if you have multiple waveforms present in the signal.
In frequency-space, techniques based on FFT may be efficient enough for your purposes. One example is the harmonic product spectrum technique, which compares the power spectrum of the signal with downsampled versions at each harmonic, and identifies the pitch by multiplying the spectra together to produce a clear peak.
As ever, there is no substitute for testing and profiling several techniques, to empirically determine what will work best for your problem and constraints.
An answer like this can only scratch the surface of this topic. As well as the earlier links, here are some relevant references for further reading.
Summary of pitch detection algorithms (Wikipedia)
Pros and cons of Autocorrelation vs Harmonic Product Spectrum
A high-level overview of pitch detection methods
In my project danstuner, I took code from Audacity. It essentially took an FFT, then found the peak power by putting a cubic curve on the FFT and finding the peak of that curve. Works pretty well, although I had to guard against octave-jumping.
See Spectrum.cpp.
Zero crossing won't work because a typical sound has harmonics and zero-crossings much more than the base frequency.
Something I experimented with (as a home side project) was this:
Sample the sound with ADC at whatever sample rate you need.
Detect the levels of the short-term positive and negative peaks of the waveform (sliding window or similar). I.e. an envelope detector.
Make a square wave that goes high when the waveform goes within 90% (or so) of the positive envelope, and goes low when the waveform goes within 90% of the negative envelope. I.e. a tracking square wave with hysteresis.
Measure the frequency of that square wave with straight-forward count/time calculations, using as many samples as you need to get the required accuracy.
However I found that with inputs from my electronic keyboard, for some instrument sounds it managed to pick up 2× the base frequency (next octave). This was a side project and I never got around to implementing a solution before moving on to other things. But I thought it had promise as being much less CPU load than FFT.

Resources