Angular statistics doesn't make sense - statistics

I calculated a mean angle of below two angles.
337.477792
324.8119785
I used the formula to calculate a mean angle (see below #Distribution of the mean)
http://en.wikipedia.org/wiki/Directional_statistics
What I got was 28.85511475 / -28.8551147. The values don't look right ... Wonder if someone can explain this result for me? Thank you so much!!

The values don't look right
Angles that differ by 360 are equivalent. So, -28.8551147 == 331.145, which is the arithmetic mean of the two values you provided. If you would like to ensure that your values are always in [0,360), you should add 360 if the values are less than 0.

Related

Given a # of sigma, finding the correspondent percentage

If I want to find the value 3 sigma below the mean, BUT for a highly positive skewed distribution, is there a good way of doing this on python?
I was thinking of using just taking the corresponding %. (e.g. 3 sigma interval covers 99.7% interval). So to calculate the value where it is considered 3 sigma below the mean, I would take 3, convert it to percentile 99.85% via some function $func(3)$, and then apply scipy.stats.scoreatpercentile to find the value at that position.
Welcome any better ideas please! My main problem with scipy.stats.scoreatpercentile is that it doesn't actually give me a value that exists in my distribution which I need to be able to pull just the closest will do.

Distance between straight lines

I work in the oil & gas industry and I'm seeking advice about how to calculate the minimum distance between a set of wells (the wells are drawn as straight lines on a map). My goal is for each individual well to have a unique "spacing" value (measured in feet) which is basically the straight-line horizontal distance to the closest wellbore on a map. Below is a simple example of what I'm trying to accomplish (assume the pipe | symbol is a wellbore and the dashes are the distance between the wells)
|--|---|-|
In the drawing above we have 4 wells. The 1st well (starting from the far left) would have a spacing value of 2 (since there are 2 dashes to the closest well), the 2nd well would also have a value of 2 (since the closest well is the one to the far left which is two spaces away), the 3rd well would have a value of 1, and the 4th well would have a value of 1.
Now imagine that I have hundreds of these wells (each with latitude/longitude points that describe the start & end points of each well) and I have them all mapped in TIBCO Spotfire (scattered across Texas). Do you guys know if it would even be possible to automate a calculation like the above? I would also like to build in a rule that says the max distance between wells is 2640 ft (half of a mile).
Any ideas are appreciated!
I think you should be able to do this without any R or iron python.
Within Spotfire, you can calculate the distance in miles between 2 points using the formula below (substitute 6371 for 3958.756 to get the answer in kilometres).
GreatCircleDistance([Lat 1],[Lon 1],[Lat 2],[Lon 2]) * 3958.756
For your use case, you could cross join your table of locations, so that you have a row for every possible location combination, then calculate the distance between them using the formula above. After that, it should be pretty straight forward to find each wells closest pair.

Excel - 3D cartesian points - euclidean distance for a large group of points

I have a large set of XYZ Cartesian points in Excel (some 40k actually) and was looking for a formula or macro to compare every point to every other point to get the distances between them.
The math to get the distance value between two 3D points is:
Distance=SQRT((X2 – X1)^2 + (Y2 – Y1)^2 + (Z2 – Z1)^2)
X1=the X value of the 1st point
X2=the X value of the 2nd point
Y1=the Y value of the 1st point
Y2=the Y value of the 2nd point
etc
Here is an example starting with 10 points:
http://i.imgur.com/U3lchMk.jpg
Would anyone know of a way to build this into Excel so that I can just copy the formula across the page to the horizontal limit? Or would you recommend a better way than using Excel?
As a secondary goal, I want to group the points into clusters that can connect by a distance lower than 2. But if I can accomplish the first goal, I can worry about the second later.
Actually, I was able to come up with the solution with a bit more research: i.imgur.com/9JL5Qni.jpg =SQRT(((INDIRECT("A"&$D2))-(INDIRECT("A"&E$1)))^2+((INDIRECT("B"&$D2))-(INDIRECT‌​("B"&E$1)))^2+((INDIRECT("C"&$D2))-(INDIRECT("C"&E$1)))^2)

Averaging many curves with different x and y values

I have several curves that contain many data points. The x-axis is time and let's say I have n curves with data points corresponding to times on the x-axis.
Is there a way to get an "average" of the n curves, despite the fact that the data points are located at different x-points?
I was thinking maybe something like using a histogram to bin the values, but I am not sure which code to start with that could accomplish something like this.
Can Excel or MATLAB do this?
I would also like to plot the standard deviation of the averaged curve.
One concern is: The distribution amongst the x-values is not uniform. There are many more values closer to t=0, but at t=5 (for example), the frequency of data points is much less.
Another concern. What happens if two values fall within 1 bin? I assume I would need the average of these values before calculating the averaged curve.
I hope this conveys what I would like to do.
Any ideas on what code I could use (MATLAB, EXCEL etc) to accomplish my goal?
Since your series' are not uniformly distributed, interpolating prior to computing the mean is one way to avoid biasing towards times where you have more frequent samples. Note that by definition, interpolation will likely reduce the range of your values, i.e. the interpolated points aren't likely to fall exactly at the times of your measured points. This has a greater effect on the extreme statistics (e.g. 5th and 95th percentiles) rather than the mean. If you plan on going this route, you'll need the interp1 and mean functions
An alternative is to do a weighted mean. This way you avoid truncating the range of your measured values. Assuming x is a vector of measured values and t is a vector of measurement times in seconds from some reference time then you can compute the weighted mean by:
timeStep = diff(t);
weightedMean = timeStep .* x(1:end-1) / sum(timeStep);
As mentioned in the comments above, a sample of your data would help a lot in suggesting the appropriate method for calculating the "average".

Excel average every 0.5 meters, irregular distances between data points

I have a data set that has height values every so often, like topography data in a straight line with GPS coordinates. I used the GPS coordinates and trigonometry to make a cumulative distance column. However, the distance between points varies. Sometimes its 10 cm sometimes its 13, sometimes its 40.
I would like to take the average height every 0.5 meters, but sometimes the distance column doesnt even land on a multiple of 0.5! This would mean my output column would be significantly shorter than my raw data column.
I think my main problem is I do not know what this process is called in order to Google it. Another problem is that the distances are irregular as mentioned above. Things I think may have something to do with it:
averageif?
binning? I do not want a histrogram though, just the data.
Thanks for the help and if you do not know the answer but at least know what I should be writing in the search bars that would be helpful as well. Thanks!
Perhaps this will work for you. I made up a series of distance vs height measurements and determined that a third order polynomial curve fit pretty well. (A different curve might best fit your real data, so you would have to alter the formula accordingly). I then used that formula to derive a set of new heights for the desired ditances at, in my example five unit differences.
The formula under Extrapolated heights is an ARRAY formula entered into all the cells at once. You select D2:D12, enter the formula in D2 and, hold down CTRL-SHIFT while hitting ENTER. If you did this correctly, you will have the same formula in each cell surrounded by curly braces {...}
Then you can decide how you want to "Average" the heights.

Resources