Related
I have become a part of this infinite question of how to estimate position from accelerometer data achieved by an Inertial measurement unit. I am wondering how to compensate for integration ''drift'' during linear movement using Kalman filtering.
At this moment I got my acceleration in a fixed coordinate system and all movements are in know directions with no change in angular position.
So at this point we got acceleration in 3D (x-y-z) in known directions, an acceleration in x will yield for zero acceleration in y and z and so on. Assuming perfect conditions, which are not the case, of course some noise with be added to the other directions when moving in one direction but lets ''leave'' this out at this point. In addition, It is important to note that the system only has to estimate a limited period, approximately about 1 second using a sampling freq of 512 Hz.
It also important to note that I have compensated for the offset (gravity and misalignment of the accelerometer in the IMU) and bias of the acceleromter data when static. Meaning when the sensor is non-moving all my readings are constant zero before going into the Kalman filter.
To more characterize my problem I have this graph to illustrate my problem with drift. This is estimations on 5 seconds to more show what I'm struggling with.
Position-estimation-drift-problem
Here we are looking into a movement in one direction, the movement are 20cm movement in y direction which in my case are forward relative to my starting position.
Is there a way to reduce/eliminate this drift when integrating my signal. For instance assume something about drifting when my sensor is non-moving. Or to compute using some correction in my Kalman algorithm to subtract or add to my estimated velocity and position. The system does not have to run in real time so any tuning bias compensation can be adjusted for looking back into the data. But I would be preferable if it was possible to take new measurements with slightly different movements and not tune more then needed.
Finally where/how can I compensate for this, in the Kalman algorithm or before/after, or should I be in for a disappointment already?
If I left out some important information please ask so i can elaborate more, an at last any thoughts/ideas are welcome!
Remember I do only need to estimate for second’s worth of time so my hope is that this makes it more achievable, but i might be wrong?
I can only guess/suggest few tricks, but you will probably get some significant error if you only based on accelerometer.
seems that detecting motionless is not resetting the speed, just acceleration (according to your graph) so this should be an easy fix
if we are talking an a car/other type of surface motion with contact / friction, your motionless can be set by characterizing the noise of in motion/self sensor noise
kalman parameters may be off
run multiple kernels and average results (may also try particle filter)
if its not for online application you can also try fitting offsets/drift and reduce them by assuming there is not motion in constant speed or other approaches that can replace the kalman filter which is designed for real time best estimation.
error seems a-symmetric in time, just run it in both directions (:
what are you measuring at 512 Hz??? maybe you can better model it
I can go on and on but if you supply data and code, it would be much easier.
Good luck,
Lev
Hello kind people of the audio computing world,
I have an array of samples that respresent a recording. Let us say that it is 5 seconds at 44100Hz. How would I play this back at an increased pitch? And is it possible to increase and decrease the pitch dynamically? Like have the pitch slowly increase to double the speed and then back down.
In other words I want to take a recording and play it back as if it is being 'scratched' by a d.j.
Pseudocode is always welcomed. I will be writing this up in C.
Thanks,
EDIT 1
Allow me to clarify my intentions. I want to keep the playback at 44100Hz and so therefore I need to manipulate the samples before playback. This is also because I would want to mix the audio that has an increased pitch with audio that is running at a normal rate.
Expressed in another way, maybe I need to shrink the audio over the same number of samples somehow? That way when it is played back it will sound faster?
EDIT 2
Also, I would like to do this myself. No libraries please (unless you feel I could pick through the code and find something interesting).
EDIT 3
A sample piece of code written in C that takes 2 arguments (array of samples and pitch factor) and then returns an array of the new audio would be fantastic!
PS I've started a bounty on this not because I don't think the answers already given aren't valid. I just thought it would be good to get more feedback on the subject.
AWARD OF BOUNTY
Honestly I wish I could distribute the bounty over several different answers as they were quite a few that I thought were super helpful. Special shoutout to Daniel for passing me some code and AShelly and Hotpaw2 for putting in such detailed responses.
Ultimately though I used an answer from another SO question referenced by datageist and so the award goes to him.
Thanks again everyone!
Take a look at the "Elephant" paper in Nosredna's answer to this (very similar) SO question:
How do you do bicubic (or other non-linear) interpolation of re-sampled audio data?
Sample implementations are provided starting on page 37, and for reference, AShelly's answer corresponds to linear interpolation (on that same page). With a little tweaking, any of the other formulas in the paper could be plugged into that framework.
For evaluating the quality of a given interpolation method (and understanding the potential problems with using "cheaper" schemes), take a look at this page:
http://www.discodsp.com/highlife/aliasing/
For more theory than you probably want to deal with (with source code), this is a good reference as well:
https://ccrma.stanford.edu/~jos/resample/
One way is to keep a floating point index into the original wave, and mix interpolated samples into the output wave.
//Simulate scratching of `inwave`:
// `rate` is the speedup/slowdown factor.
// result mixed into `outwave`
// "Sample" is a typedef for the raw audio type.
void ScratchMix(Sample* outwave, Sample* inwave, float rate)
{
float index = 0;
while (index < inputLen)
{
int i = (int)index;
float frac = index-i; //will be between 0 and 1
Sample s1 = inwave[i];
Sample s2 = inwave[i+1];
*outwave++ += s1 + (s2-s1)*frac; //do clipping here if needed
index+=rate;
}
}
If you want to change rate on the fly, you can do that too.
If this creates noisy artifacts when rate > 1, try replacing *outwave++ += s1 + (s2-s1)*frac; with this technique (from this question)
*outwave++ = InterpolateHermite4pt3oX(inwave+i-1,frac);
where
public static float InterpolateHermite4pt3oX(Sample* x, float t)
{
float c0 = x[1];
float c1 = .5F * (x[2] - x[0]);
float c2 = x[0] - (2.5F * x[1]) + (2 * x[2]) - (.5F * x[3]);
float c3 = (.5F * (x[3] - x[0])) + (1.5F * (x[1] - x[2]));
return (((((c3 * t) + c2) * t) + c1) * t) + c0;
}
Example of using the linear interpolation technique on "Windows Startup.wav" with a factor of 1.1. The original is on top, the sped-up version is on the bottom:
It may not be mathematically perfect, but it sounds like it should, and ought to work fine for the OP's needs..
Yes, it is possible.
But this is not a small amount of pseudo code. You are asking for a time pitch modification algorithm, which is a fairly large and complicated amount of DSP code for decent results.
Here's a Time Pitch stretching overview from DSP Dimensions. You can also Google for phase vocoder algorithms.
ADDED:
If you want to "scratch", as a DJ might do with an LP on a physical turntable, you don't need time-pitch modification. Scratching changes the pitch and the speed of play by the same amount (not independently as would require time-pitch modification).
And the resulting array won't be of the same length, but will be shorter or longer by the amont of pitch/speed change.
You can change the pitch, as well as make the sound play faster or slower by the same ratio, by just resampling the signal using properly filtered interpolation. Just move each sample point, instead of by 1.0, by floating point addition by your desired rate change, then filter and interpolate the data at that point. Interpolation using a windowed Sinc interpolation kernel, with a low-pass filter transition frequency below the lower of the original and interpolated local sample rate, will work fairly well. Searching for "windowed Sinc interpolation" on the web returns lots of suitable result.
You need an interpolation method that includes a low-pass filter, or else you will hear horrible aliasing noise. (The exception to this might be if your original sound file is already severely low-pass filtered a decade or more below the sample rate.)
If you want this done easily, see AShelly's suggestion [edit: as a matter of fact, try it first anyway]. If you need good quality, you basically need a phase vocoder.
The very basic idea of a phase vocoder is to find the frequencies that the sound consists of, change those frequencies as needed and resynthesize the sound. So a brutal simplification would be:
run FFT
change all frequencies by a factor
run inverse FFT
If you're going to implement this yourself, you definitely should read a thorough explanation of how a phase vocoder works. The algorithm really needs many more considerations than the three-step simplification above.
Of course, ready-made implementations exist, but from the question I gather you want to do this yourself.
To decrease and increase the pitch is as simple as playing the sample back at a lower or higher rate than 44.1kHz. This will produce the slower/faster record sound but you'll need to add the 'scratchiness' of real records.
This helped me with resampling, which is same thing you need just looked from the opposite side.
If you can't find code, ping me, I have a nice C routine for this.
I'm happily drawing waveforms to screen from pcm data. I have a problem where occassionally the waveforms height will exceed the height of the display area height.
How can I ensure that the waveform plotting data will never exceed a determined height without having to rip through the entire set of pcm data and normalizing from the maximum value found?
using a normalized representation is exactly what you would do.
you could cheat and pre-calculate max values for a given range, if that's a constraint offered by the implementation.
Unfortunately, there is no good way to discover the actual maximum of a signal without going through sample by sample and finding it.
If you know the number of bits in the PCM samples, you can assume the scaling will be bounded by [-2^(bits-1), 2^(bits-1)-1]. That will be the absolute highest and lowest the signal can go. However, this is the most pessimistic scaling - if you have a 16-bit signal that never goes outside the range [-1024,1024], for instance, you're giving up a lot of display area (as well as ADC dynamic range, but that's another story).
If you are willing to dynamically scale the signal, you can simply make the graph scale bigger each time your signal would get clipped. A more sophisticated approach would be to upscale as necessary, but then slowly relax the max scale downwards over time. A good way to relax the max scale is exponential decay, like multiply the max scale by .98 (or some other number < 1) on each iteration.
I am trying to study how to use Kalman filter in tracking an object (ball) moving in a video sequence by myself so please explain it to me as I am a child.
By some algorithms (color analysis, optical flow...), I am able to get a binary image of each video frame in which there is the tracking object ( white pixels) and background (black pixels) -> I know the object size, object centroid, object position -> Just simple draw a bounding box around the object --> Finish. Why do I need to use Kalman filter here?
Ok, somebody told me that because I can not detect the object in each video frame because of noise, I need to use Kalman filter to estimate the position of the object. Ok, fine. But as I know, I need to provide the input to Kalman filter. They are previous state and measurement.
previous state ( so I think it is the position, the velocity, acceleration...of the object in the previous frame) -> Ok, this is fine to me.
measurement of current state: Here is what I can not understand. What can measurement be?
- The position of the object in the current frame? It is funny because if I know the position of the object, all I need is just to draw a simple boundingbox (rectangular) around the object. Why I need Kalman filter here anymore? Therefore, it is impossible to take the position of the object in the current frame as measurement value.
- "Kalman Filter Based Tracking in an Video Surveillance System" article says
The main role of the Kalman filtering block is to assign a tracking
filter to each of the measurements entering the system from the
optical flow analysis block.
If you read the full paper, you will see that the author takes the maximum number of blob and the minimum size of the blob as an input to the Kalman filter. How can those parameters be used as measurement?
I think I am in a loop now. I want to use Kalman filter to track the position of an object, but I need to know the position of that object as an input of Kalman filter. What is going on?
And 1 more question, I dont understand the term "number of Kalman filter". In a video sequence, if there are 2 objects need to track -> need to use 2 Kalman filter? Is that what it means?
You don't use the Kalman filter to give you an initial estimate of something; you use it to give you an improved estimate based on a series of noisy estimates.
To make this easier to understand, imagine you're measuring something that is not dynamic, like the height of an adult. You measure once, but you're not sure of the accuracy of the result, so you measure again for 10 consecutive days, and each measurement is slightly different, say a few millimeters apart. So which measurement should you choose as the best value? I think it's easy to see that taking the average will give you a better estimate of the person's true height than using any single measurement.
OK, but what has that to do with the Kalman filter?
The Kalman filter is essentially taking an average of a series of measurements, as above, but for dynamic systems. For instance, let's say you're measuring the position of a marathon runner along a race track, using information provided by a GPS + transmitter unit attached to the runner. The GPS gives you one reading per minute. But those readings are inaccurate, and you want to improve your knowledge of the runner's current position. You can do that in the following way:
Step 1) Using the last few readings, you can estimate the runner's velocity and estimate where he will be at any time in the future (this is the prediction part of the Kalman filter).
Step 2) Whenever you receive a new GPS reading, do a weighted average of the reading and of your estimate obtained in step 1 (this is the update part of the Kalman filter). The result of the weighted average is a new estimate that lies in between the predicted and measured position, and is more accurate than either by itself.
Note that you must specify the model you want the Kalman filter to use in the prediction part. In the marathon runner example you could use a constant velocity model.
The purpose of the Kalman filter is to mitigate the noise and other inaccuracies in your measurements. In your case, the measurement is the x,y position of the object that has been segmented out of the frame. If you can perfectly segmement out the ball and only the ball from the background for every frame, there is no need for the Kalman filter since your measurements in effect contain no noise.
In most applications, perfect measurements cannot be guaranteed for a number of reasons (change in lighting, change in background, other moving objects, etc.) so there needs to be a way of filtering the measurements to produce the best estimate of the true track.
What the Kalman Filter does is use a model to predict what the next position should be assuming the model holds true, and then compares that estimate to the actual measurement you pass in. The actual measurement is used in conjunction with the prediction and noise characteristics to form the final position estimate and update a characterization of the noise (measure of how much the measurements are differing from the model).
The model could be anything that models the system you are trying to track. A common model is a constant velocity model which just assumes that the object will continue to move with the same velocity as in the previous estimate. This is not to say that this model will not track something with a changing velocity since the measurements will reflect the change in velocity and affect the estimate.
There are a number of ways you can attack the problem of tracking multiple objects at once. The simplest way is to use an independent Kalman filter for each track. This is where the Kalman filter really starts to pay off because if you are using the simple approach of just using the centroid of a bounding box, what happens if the two objects cross one another? Can you again differentiate which object is which after they separate? With the Kalman filter, you have the model and prediction that will help keep the track correct when other objects are interfering.
There are also more advanced ways of tracking multiple objects jointly like a JPDAF.
Jason has given a good start on what Kalman filter is. In regard to your question as to how the paper can use the maximum number of blobs and the minimum size of the blob, this is exactly the power of Kalman filter.
A measurement needs not be a position, a velocity or an acceleration. A measurement can be any quantity that you can observe at a time instance. If you can define a model that predict your measurement in the next time instance given the current measurement, Kalman filter can help you mitigate the noise.
I would suggest you look into more introductory materials on Image Processing and Computer Vision. These materials will almost always cover Kalman filter.
Here is a SIGGRAPH course on trackers. It is not introductory but should give you a more in-depth look at the topic.
http://www.cs.unc.edu/~tracker/media/pdf/SIGGRAPH2001_CoursePack_08.pdf
In the case that you can find the ball exactly in every frame, you don't need a Kalman filter. Just because you find some blog which is likely the ball, it doesn't mean that the center of that blob will be the perfect center of the ball. Think of that as your measurement error. Also, if you happen to pick out the wrong blog, using a Kalman filter would help prevent you from trusting that one wrong measurement. Like you said before, if you can't find the ball in a frame, you can also use the filter to estimate where it is likely to be.
Here are some of the matrices you will need, and my guess at what they would be for you. Since the x and y position of the ball is independent, I think it is easier to have two filters, one for each. Both would look kinda like this:
x = [position ; velocity] //This is the output of the filter
P = [1, 0 ; 0 ,1] //This is the uncertainty of the estimation, I am not quite sure what you should have to start, but it will converge once the filter is running.
F = [ 1,dt ; 0,1] when you do x*F this will predict the next location of the ball. Notice that this assumes the ball keeps moving with the same velocity as before, and just updates the position.
Q = [ 0,0 ; 0,vSigma^2] This is the "process noise". This one of the matrices you tune to make the filter preform well. In your system, velocity can change at any time, but position will never change without the velocity being what changed it. This is confusing. The value should be the standard deviation of what those velocity changes might be.
z = [position in x or y] This is your measurement
H = [1,0 ; 0,0] This is how your measurement gets applied to your current state. Since you are only measuring position, you only have a 1 in the first row.
R = [?] I think you will only need a scalar for R, which is the error in your measurement.
With those matrices you should be able to plug them into the formulas that are everywhere for Kalman filters.
Some good things to read:
Kalman filtering demo
Another great into, read the page linked to in the third paragraph
I had this question few weeks ago. I hope this answer helps another people.
If you can get the a good segmentation at each frame (the whole ball), you don't need to use kalman filter. But segmentation can give you a set of unconected blobs (only few parts of the ball). The problem is to know what parts (blobs) belong to the object or are just noise. Using kalman filter we can assign blobs near of the estimated position as parts of the object. E.g. if the ball has 10 pixels of radius, blobs with a distance higher than 15 should not be considered as part of the object.
Kalman filter uses the previous state to predict the current state. But, uses the current measurement (current object position) to improve its next prediction. E.g. if a vehicle is at the position 10 (previous state) and goes with a velocity of 5 m/s, kalman filter predict the next position at the position 15. But if we measure the position of the object, we found the object is at position 18. In order to improve the estimation, kalman filter updates the velocity to 8 m/s.
As summary, kalman filter is mainly used to solve the data
association problem in video tracking. It is also good to estimate
the object position, because it take into account the noise in the
source and in the observation.
And for you final question, you are right. It corresponds to the number of
object to track (one kalman filter per object).
In vision application , it is common to use your results at each frame as measurement, for example location of ball in each frame is good measurement.
I'm implementing a 'filter sweep' effect (I don't know if it's called like that). What I do is basically create a low-pass filter and make it 'move' along a certain frequency range.
To calculate the filter cut-off frequency at a given moment I use a user-provided linear function, which yields values between 0 and 1.
My first attempt was to directly map the values returned by the linear function to the range of frequencies, as in cf = freqRange * lf(x). Although it worked ok it looked as if the sweep ran much faster when moving through low frequencies and then slowed down during its way to the high frequency zone. I'm not sure why is this but I guess it's something to do with human hearing perceiving changes in frequency in a non-linear manner.
My next attempt was to move the filter's cut-off frequency in a logarithmic way. It works much better now but I still feel that the filter doesn't move at a constant perceived speed through the range of frequencies.
How should I divide the frequency space to obtain a constant perceived sweep speed?
Thanks in advance.
The frequency sweep effect you're referring to is likely a wah-wah filter, named for the ubiquitous wah-wah pedal.
We hear frequency in terms of octaves, and sweeping through octaves with a logarithmic scale is the way to linearize it. Not to sound dismissive, but it sounds like what you're doing is physically and mathematically correct. (You should spent as much time between 200 and 400 Hz as you do between 2000 and 4000 Hz, etc.) You just don't like how it sounds. And that's quite okay on both counts -- audio is highly subjective.
To mix things up a bit, one option would be to try the Bark scale, which is based on psychoacoustics and the structure of the ear. As I understand it, this is designed to spend equal amounts of time in each of your ear's internal "bandpass filters".
You could always try a quadratic or cubic function between 0 and 1. Audio potentiometers often use a few piecewise quadratic or cubic sections to get their mapping.
Winging it, but try this:
http://en.wikipedia.org/wiki/Physics_of_music#Scales "The following table shows the ratios between the frequencies of all the notes of the just major scale and the fixed frequency of the first note of the scale."
There is then a chart showing fractional values between 1 and 2, and if you tweak your timing to match, you may get what you wish. While the overall progression is still logarithmic, the stepping between each one should divide up into equal stepped 8ths (a bit jumpy).
Put another way, every half second adjust one note up. Each octave (I think) will cover twice the frequency range of the prior octave.
EDIT: Also, you'll find the frequencies here: http://en.wikipedia.org/wiki/Middle_C#Designation_by_octave (doesn't the programmer in you wish that C0 was exactly 16hz?)