Position calculation of small model of a car using Accelerometer + Gyroscope - position

I wish to calculate position of a small remote controlled car (relative to starting position). The car moves on a flat surface for example: a room floor.
Now, I am using an accelerometer and a gyroscope. To be precise this board --> http://www.sparkfun.com/products/9623
As a first step I just took the accelerometer data in x and y axes (since car moves on a surface) and double integrated the data to get position. The formulae I used were:
vel_new = vel_old + ( acc_old + ( (acc_new - acc_old ) / 2.0 ) ) * SAMPLING_TIME;
pos_new = pos_old + ( vel_old + ( (vel_new - vel_old ) / 2.0 ) ) * SAMPLING_TIME;
vel_old = vel_new;
pos_old = pos_new;
acc_new = measured value from accelerometer
The above formulae are based on this document: http://perso-etis.ensea.fr/~pierandr/cours/M1_SIC/AN3397.pdf
However this is giving horrible error.
After reading other similar questions on this forum, I found out that I need to subtract the component of Gravity from above acceleration values (everytime from acc_new) by using gyroscope somehow. This idea is very well explained in Google Tech Talks video Sensor Fusion on Android Devices: A Revolution in Motion Processing at time 23:49.
Now my problem is how to subtract that gravity component?
I get angular velocity from gyroscope. How do I convert it into acceleration so that I can subtract it from the output of accelerometer?

It won't work, these sensors are not accurate enough to calculate the position.
The reason is also explained in the video you are referring to.
The best you could do is to get the velocity based on the rpm of the wheels. If you also know the heading that belongs to the velocity, you can integrate the velocity to get position. Far from perfect but it should give you a reasonable estimate of the position.
I do not know how you could get the heading of the car, it depends on the hardware you have.

I'm afraid Ali's answer is quite true when it comes to those devices. However why don't you try searching arduino dead reckoning which will cover stories of people trying similar boards?
Here's a link that appeared after a search that I think may help you:
Converting IMU to INS
Even it seems like all of them failed you may come across workarounds which will decrease errors to acceptable amounts or calibrate your algorithm with some other sensor to put it back in track as the squared error of acceleration along with gyros white noise destroying accuracy.

One reason you have a huge error is that the equation appears to be wrong.
Example: To get updated vel,use.. Velnew=velold+((accelold+accelnew)/2)*sampletime.
Looks like you had an extra accel term in the equation. Using this alone will not correct all the error....need to as you say correct for influence of gravity and other things.

Related

Possible to find velocity of person in video or camera using openpose

Question is, I want to calculate the speed of my arm for Slap detection. So I am using openpose to get the body points (here total points: 25) using body_25 model and using this along with the time I want to deduce the speed of my arm, i googled through openpose, stackoverflow, github.But could not succeed?
Velocity = Distance / Time = dx/dt
dx = frame3_bodypoints - frame_1_bodypoints;
dt = ?
I don't know how to find this from the openpose, is there a way I can find this? Any thoughts, would be great help!
I've never used OpenPose. But Newtonian physics would indicate that a slap corresponds to a sudden change in velocity of the hand.
I think it's a reasonable first approximation to assume that the Δt between frames is constant. Instantaneous variation in frame rate is called jitter. I would expect jitter to be small for modern recording devices. In any case, I don't know how to get instantaneous frame rate with the tools (OpenCV, PIL) that I am familiar with. I couldn't find any references to frame rate or time in the OpenPose docs.
For calculating velocity and delta-velocity, you have choices. Straight up linear velocity of the hand might be the easiest. For position changes use the geometric mean of positions (Δs = sqrt((x2-x1)^2 + (y2-y1)^2).
You could also calculate an angular velocity between the hand and the elbow, but that would be a little more involved and prone to noise.

Bias/Dirft compensation for integration of linear accelerometer data using Kalman filtering

I have become a part of this infinite question of how to estimate position from accelerometer data achieved by an Inertial measurement unit. I am wondering how to compensate for integration ''drift'' during linear movement using Kalman filtering.
At this moment I got my acceleration in a fixed coordinate system and all movements are in know directions with no change in angular position.
So at this point we got acceleration in 3D (x-y-z) in known directions, an acceleration in x will yield for zero acceleration in y and z and so on. Assuming perfect conditions, which are not the case, of course some noise with be added to the other directions when moving in one direction but lets ''leave'' this out at this point. In addition, It is important to note that the system only has to estimate a limited period, approximately about 1 second using a sampling freq of 512 Hz.
It also important to note that I have compensated for the offset (gravity and misalignment of the accelerometer in the IMU) and bias of the acceleromter data when static. Meaning when the sensor is non-moving all my readings are constant zero before going into the Kalman filter.
To more characterize my problem I have this graph to illustrate my problem with drift. This is estimations on 5 seconds to more show what I'm struggling with.
Position-estimation-drift-problem
Here we are looking into a movement in one direction, the movement are 20cm movement in y direction which in my case are forward relative to my starting position.
Is there a way to reduce/eliminate this drift when integrating my signal. For instance assume something about drifting when my sensor is non-moving. Or to compute using some correction in my Kalman algorithm to subtract or add to my estimated velocity and position. The system does not have to run in real time so any tuning bias compensation can be adjusted for looking back into the data. But I would be preferable if it was possible to take new measurements with slightly different movements and not tune more then needed.
Finally where/how can I compensate for this, in the Kalman algorithm or before/after, or should I be in for a disappointment already?
If I left out some important information please ask so i can elaborate more, an at last any thoughts/ideas are welcome!
Remember I do only need to estimate for second’s worth of time so my hope is that this makes it more achievable, but i might be wrong?
I can only guess/suggest few tricks, but you will probably get some significant error if you only based on accelerometer.
seems that detecting motionless is not resetting the speed, just acceleration (according to your graph) so this should be an easy fix
if we are talking an a car/other type of surface motion with contact / friction, your motionless can be set by characterizing the noise of in motion/self sensor noise
kalman parameters may be off
run multiple kernels and average results (may also try particle filter)
if its not for online application you can also try fitting offsets/drift and reduce them by assuming there is not motion in constant speed or other approaches that can replace the kalman filter which is designed for real time best estimation.
error seems a-symmetric in time, just run it in both directions (:
what are you measuring at 512 Hz??? maybe you can better model it
I can go on and on but if you supply data and code, it would be much easier.
Good luck,
Lev

Distance between the camera and a recognized "object"

I would like to calculate the distance between my camera and a recognized "object".
The recognized "object" is a black rectangle sticker on a white board for example. I know the values of the rectangle (x,y).
Is there a method that I can use to calculate the distance with the values of my original rectangle, and the values of the picture of the rectangle I took with the camera?
I searched the forum for answeres, but none of the were specified to calculate the distance with these attributes.
I am working on a robot called Nao from Aldebaran Robotics, I am planing to use OpenCV to recognize the black rectangle.
If you could compute the angle taken up by the image of the target, then the distance to the target should be proportional to cot (i.e. 1/tan) of that angle. You should find that the number of pixels in the image corresponded roughly to the angles, but I doubt it is completely linear, especially up close.
The behaviour of your camera lens is likely to affect this measurement, so it will depend on your exact setup.
Why not measure the size of the target at several distances, and plot a scatter graph? You could then fit a curve to the data to get a size->distance function for your particular system. If your camera is close to an "ideal" camera, then you should find this graph looks like cot, and you should be able to find your values of a and b to match dist = a * cot (b * width).
If you try this experiment, why not post the answers here, for others to benefit from?
[Edit: a note about 'ideal' cameras]
For a camera image to look 'realistic' to us, the image should approximate projection onto a plane held infront of the eye (because camera images are viewed by us by holding a planar image in front of our eyes). Imagine holding a sheet of tracing paper up in front of your eye, and sketching the objects silhouette on that paper. The second diagram on this page shows sort of what I mean. You might describe a camera which achieves this as an "ideal" camera.
Of course, in real life, cameras don't work via tracing paper, but with lenses. Very complicated lenses. Have a look at the lens diagram on this page. For various reasons which you could spend a lifetime studying, it is very tricky to create a lens which works exactly like the tracing paper example would work under all conditions. Start with this wiki page and read on if you want to know more.
So you are unlikely to be able to compute an exact relationship between pixel length and distance: you should measure it and fit a curve.
It is a big topic. If you want to proceed from a single image, take a look at this old paper by A. Criminisi. For an in-depth view, read his Ph.D. thesis. Then start playing with the OpenCV routines in the "projective geometry" sectiop.
I have been working on Image/Object Recognition as well. I just released a python programmed android app (ported to android) that recognizes objects, people, cars, books, logos, trees, flowers... anything:) It also shows it's thought process as it "thinks" :)
I've put it out as a test for 99 cents on google play.
Here's the link if you're interested, there's also a video of it in action:
https://play.google.com/store/apps/details?id=com.davecote.androideyes
Enjoy!
:)

Programmatically increase the pitch of an array of audio samples

Hello kind people of the audio computing world,
I have an array of samples that respresent a recording. Let us say that it is 5 seconds at 44100Hz. How would I play this back at an increased pitch? And is it possible to increase and decrease the pitch dynamically? Like have the pitch slowly increase to double the speed and then back down.
In other words I want to take a recording and play it back as if it is being 'scratched' by a d.j.
Pseudocode is always welcomed. I will be writing this up in C.
Thanks,
EDIT 1
Allow me to clarify my intentions. I want to keep the playback at 44100Hz and so therefore I need to manipulate the samples before playback. This is also because I would want to mix the audio that has an increased pitch with audio that is running at a normal rate.
Expressed in another way, maybe I need to shrink the audio over the same number of samples somehow? That way when it is played back it will sound faster?
EDIT 2
Also, I would like to do this myself. No libraries please (unless you feel I could pick through the code and find something interesting).
EDIT 3
A sample piece of code written in C that takes 2 arguments (array of samples and pitch factor) and then returns an array of the new audio would be fantastic!
PS I've started a bounty on this not because I don't think the answers already given aren't valid. I just thought it would be good to get more feedback on the subject.
AWARD OF BOUNTY
Honestly I wish I could distribute the bounty over several different answers as they were quite a few that I thought were super helpful. Special shoutout to Daniel for passing me some code and AShelly and Hotpaw2 for putting in such detailed responses.
Ultimately though I used an answer from another SO question referenced by datageist and so the award goes to him.
Thanks again everyone!
Take a look at the "Elephant" paper in Nosredna's answer to this (very similar) SO question:
How do you do bicubic (or other non-linear) interpolation of re-sampled audio data?
Sample implementations are provided starting on page 37, and for reference, AShelly's answer corresponds to linear interpolation (on that same page). With a little tweaking, any of the other formulas in the paper could be plugged into that framework.
For evaluating the quality of a given interpolation method (and understanding the potential problems with using "cheaper" schemes), take a look at this page:
http://www.discodsp.com/highlife/aliasing/
For more theory than you probably want to deal with (with source code), this is a good reference as well:
https://ccrma.stanford.edu/~jos/resample/
One way is to keep a floating point index into the original wave, and mix interpolated samples into the output wave.
//Simulate scratching of `inwave`:
// `rate` is the speedup/slowdown factor.
// result mixed into `outwave`
// "Sample" is a typedef for the raw audio type.
void ScratchMix(Sample* outwave, Sample* inwave, float rate)
{
float index = 0;
while (index < inputLen)
{
int i = (int)index;
float frac = index-i; //will be between 0 and 1
Sample s1 = inwave[i];
Sample s2 = inwave[i+1];
*outwave++ += s1 + (s2-s1)*frac; //do clipping here if needed
index+=rate;
}
}
If you want to change rate on the fly, you can do that too.
If this creates noisy artifacts when rate > 1, try replacing *outwave++ += s1 + (s2-s1)*frac; with this technique (from this question)
*outwave++ = InterpolateHermite4pt3oX(inwave+i-1,frac);
where
public static float InterpolateHermite4pt3oX(Sample* x, float t)
{
float c0 = x[1];
float c1 = .5F * (x[2] - x[0]);
float c2 = x[0] - (2.5F * x[1]) + (2 * x[2]) - (.5F * x[3]);
float c3 = (.5F * (x[3] - x[0])) + (1.5F * (x[1] - x[2]));
return (((((c3 * t) + c2) * t) + c1) * t) + c0;
}
Example of using the linear interpolation technique on "Windows Startup.wav" with a factor of 1.1. The original is on top, the sped-up version is on the bottom:
It may not be mathematically perfect, but it sounds like it should, and ought to work fine for the OP's needs..
Yes, it is possible.
But this is not a small amount of pseudo code. You are asking for a time pitch modification algorithm, which is a fairly large and complicated amount of DSP code for decent results.
Here's a Time Pitch stretching overview from DSP Dimensions. You can also Google for phase vocoder algorithms.
ADDED:
If you want to "scratch", as a DJ might do with an LP on a physical turntable, you don't need time-pitch modification. Scratching changes the pitch and the speed of play by the same amount (not independently as would require time-pitch modification).
And the resulting array won't be of the same length, but will be shorter or longer by the amont of pitch/speed change.
You can change the pitch, as well as make the sound play faster or slower by the same ratio, by just resampling the signal using properly filtered interpolation. Just move each sample point, instead of by 1.0, by floating point addition by your desired rate change, then filter and interpolate the data at that point. Interpolation using a windowed Sinc interpolation kernel, with a low-pass filter transition frequency below the lower of the original and interpolated local sample rate, will work fairly well. Searching for "windowed Sinc interpolation" on the web returns lots of suitable result.
You need an interpolation method that includes a low-pass filter, or else you will hear horrible aliasing noise. (The exception to this might be if your original sound file is already severely low-pass filtered a decade or more below the sample rate.)
If you want this done easily, see AShelly's suggestion [edit: as a matter of fact, try it first anyway]. If you need good quality, you basically need a phase vocoder.
The very basic idea of a phase vocoder is to find the frequencies that the sound consists of, change those frequencies as needed and resynthesize the sound. So a brutal simplification would be:
run FFT
change all frequencies by a factor
run inverse FFT
If you're going to implement this yourself, you definitely should read a thorough explanation of how a phase vocoder works. The algorithm really needs many more considerations than the three-step simplification above.
Of course, ready-made implementations exist, but from the question I gather you want to do this yourself.
To decrease and increase the pitch is as simple as playing the sample back at a lower or higher rate than 44.1kHz. This will produce the slower/faster record sound but you'll need to add the 'scratchiness' of real records.
This helped me with resampling, which is same thing you need just looked from the opposite side.
If you can't find code, ping me, I have a nice C routine for this.

Use Kalman filter to track the position of an object, but need to know the position of that object as an input of Kalman filter. What is going on?

I am trying to study how to use Kalman filter in tracking an object (ball) moving in a video sequence by myself so please explain it to me as I am a child.
By some algorithms (color analysis, optical flow...), I am able to get a binary image of each video frame in which there is the tracking object ( white pixels) and background (black pixels) -> I know the object size, object centroid, object position -> Just simple draw a bounding box around the object --> Finish. Why do I need to use Kalman filter here?
Ok, somebody told me that because I can not detect the object in each video frame because of noise, I need to use Kalman filter to estimate the position of the object. Ok, fine. But as I know, I need to provide the input to Kalman filter. They are previous state and measurement.
previous state ( so I think it is the position, the velocity, acceleration...of the object in the previous frame) -> Ok, this is fine to me.
measurement of current state: Here is what I can not understand. What can measurement be?
- The position of the object in the current frame? It is funny because if I know the position of the object, all I need is just to draw a simple boundingbox (rectangular) around the object. Why I need Kalman filter here anymore? Therefore, it is impossible to take the position of the object in the current frame as measurement value.
- "Kalman Filter Based Tracking in an Video Surveillance System" article says
The main role of the Kalman filtering block is to assign a tracking
filter to each of the measurements entering the system from the
optical flow analysis block.
If you read the full paper, you will see that the author takes the maximum number of blob and the minimum size of the blob as an input to the Kalman filter. How can those parameters be used as measurement?
I think I am in a loop now. I want to use Kalman filter to track the position of an object, but I need to know the position of that object as an input of Kalman filter. What is going on?
And 1 more question, I dont understand the term "number of Kalman filter". In a video sequence, if there are 2 objects need to track -> need to use 2 Kalman filter? Is that what it means?
You don't use the Kalman filter to give you an initial estimate of something; you use it to give you an improved estimate based on a series of noisy estimates.
To make this easier to understand, imagine you're measuring something that is not dynamic, like the height of an adult. You measure once, but you're not sure of the accuracy of the result, so you measure again for 10 consecutive days, and each measurement is slightly different, say a few millimeters apart. So which measurement should you choose as the best value? I think it's easy to see that taking the average will give you a better estimate of the person's true height than using any single measurement.
OK, but what has that to do with the Kalman filter?
The Kalman filter is essentially taking an average of a series of measurements, as above, but for dynamic systems. For instance, let's say you're measuring the position of a marathon runner along a race track, using information provided by a GPS + transmitter unit attached to the runner. The GPS gives you one reading per minute. But those readings are inaccurate, and you want to improve your knowledge of the runner's current position. You can do that in the following way:
Step 1) Using the last few readings, you can estimate the runner's velocity and estimate where he will be at any time in the future (this is the prediction part of the Kalman filter).
Step 2) Whenever you receive a new GPS reading, do a weighted average of the reading and of your estimate obtained in step 1 (this is the update part of the Kalman filter). The result of the weighted average is a new estimate that lies in between the predicted and measured position, and is more accurate than either by itself.
Note that you must specify the model you want the Kalman filter to use in the prediction part. In the marathon runner example you could use a constant velocity model.
The purpose of the Kalman filter is to mitigate the noise and other inaccuracies in your measurements. In your case, the measurement is the x,y position of the object that has been segmented out of the frame. If you can perfectly segmement out the ball and only the ball from the background for every frame, there is no need for the Kalman filter since your measurements in effect contain no noise.
In most applications, perfect measurements cannot be guaranteed for a number of reasons (change in lighting, change in background, other moving objects, etc.) so there needs to be a way of filtering the measurements to produce the best estimate of the true track.
What the Kalman Filter does is use a model to predict what the next position should be assuming the model holds true, and then compares that estimate to the actual measurement you pass in. The actual measurement is used in conjunction with the prediction and noise characteristics to form the final position estimate and update a characterization of the noise (measure of how much the measurements are differing from the model).
The model could be anything that models the system you are trying to track. A common model is a constant velocity model which just assumes that the object will continue to move with the same velocity as in the previous estimate. This is not to say that this model will not track something with a changing velocity since the measurements will reflect the change in velocity and affect the estimate.
There are a number of ways you can attack the problem of tracking multiple objects at once. The simplest way is to use an independent Kalman filter for each track. This is where the Kalman filter really starts to pay off because if you are using the simple approach of just using the centroid of a bounding box, what happens if the two objects cross one another? Can you again differentiate which object is which after they separate? With the Kalman filter, you have the model and prediction that will help keep the track correct when other objects are interfering.
There are also more advanced ways of tracking multiple objects jointly like a JPDAF.
Jason has given a good start on what Kalman filter is. In regard to your question as to how the paper can use the maximum number of blobs and the minimum size of the blob, this is exactly the power of Kalman filter.
A measurement needs not be a position, a velocity or an acceleration. A measurement can be any quantity that you can observe at a time instance. If you can define a model that predict your measurement in the next time instance given the current measurement, Kalman filter can help you mitigate the noise.
I would suggest you look into more introductory materials on Image Processing and Computer Vision. These materials will almost always cover Kalman filter.
Here is a SIGGRAPH course on trackers. It is not introductory but should give you a more in-depth look at the topic.
http://www.cs.unc.edu/~tracker/media/pdf/SIGGRAPH2001_CoursePack_08.pdf
In the case that you can find the ball exactly in every frame, you don't need a Kalman filter. Just because you find some blog which is likely the ball, it doesn't mean that the center of that blob will be the perfect center of the ball. Think of that as your measurement error. Also, if you happen to pick out the wrong blog, using a Kalman filter would help prevent you from trusting that one wrong measurement. Like you said before, if you can't find the ball in a frame, you can also use the filter to estimate where it is likely to be.
Here are some of the matrices you will need, and my guess at what they would be for you. Since the x and y position of the ball is independent, I think it is easier to have two filters, one for each. Both would look kinda like this:
x = [position ; velocity] //This is the output of the filter
P = [1, 0 ; 0 ,1] //This is the uncertainty of the estimation, I am not quite sure what you should have to start, but it will converge once the filter is running.
F = [ 1,dt ; 0,1] when you do x*F this will predict the next location of the ball. Notice that this assumes the ball keeps moving with the same velocity as before, and just updates the position.
Q = [ 0,0 ; 0,vSigma^2] This is the "process noise". This one of the matrices you tune to make the filter preform well. In your system, velocity can change at any time, but position will never change without the velocity being what changed it. This is confusing. The value should be the standard deviation of what those velocity changes might be.
z = [position in x or y] This is your measurement
H = [1,0 ; 0,0] This is how your measurement gets applied to your current state. Since you are only measuring position, you only have a 1 in the first row.
R = [?] I think you will only need a scalar for R, which is the error in your measurement.
With those matrices you should be able to plug them into the formulas that are everywhere for Kalman filters.
Some good things to read:
Kalman filtering demo
Another great into, read the page linked to in the third paragraph
I had this question few weeks ago. I hope this answer helps another people.
If you can get the a good segmentation at each frame (the whole ball), you don't need to use kalman filter. But segmentation can give you a set of unconected blobs (only few parts of the ball). The problem is to know what parts (blobs) belong to the object or are just noise. Using kalman filter we can assign blobs near of the estimated position as parts of the object. E.g. if the ball has 10 pixels of radius, blobs with a distance higher than 15 should not be considered as part of the object.
Kalman filter uses the previous state to predict the current state. But, uses the current measurement (current object position) to improve its next prediction. E.g. if a vehicle is at the position 10 (previous state) and goes with a velocity of 5 m/s, kalman filter predict the next position at the position 15. But if we measure the position of the object, we found the object is at position 18. In order to improve the estimation, kalman filter updates the velocity to 8 m/s.
As summary, kalman filter is mainly used to solve the data
association problem in video tracking. It is also good to estimate
the object position, because it take into account the noise in the
source and in the observation.
And for you final question, you are right. It corresponds to the number of
object to track (one kalman filter per object).
In vision application , it is common to use your results at each frame as measurement, for example location of ball in each frame is good measurement.

Resources