Excel average every 0.5 meters, irregular distances between data points - excel

I have a data set that has height values every so often, like topography data in a straight line with GPS coordinates. I used the GPS coordinates and trigonometry to make a cumulative distance column. However, the distance between points varies. Sometimes its 10 cm sometimes its 13, sometimes its 40.
I would like to take the average height every 0.5 meters, but sometimes the distance column doesnt even land on a multiple of 0.5! This would mean my output column would be significantly shorter than my raw data column.
I think my main problem is I do not know what this process is called in order to Google it. Another problem is that the distances are irregular as mentioned above. Things I think may have something to do with it:
averageif?
binning? I do not want a histrogram though, just the data.
Thanks for the help and if you do not know the answer but at least know what I should be writing in the search bars that would be helpful as well. Thanks!

Perhaps this will work for you. I made up a series of distance vs height measurements and determined that a third order polynomial curve fit pretty well. (A different curve might best fit your real data, so you would have to alter the formula accordingly). I then used that formula to derive a set of new heights for the desired ditances at, in my example five unit differences.
The formula under Extrapolated heights is an ARRAY formula entered into all the cells at once. You select D2:D12, enter the formula in D2 and, hold down CTRL-SHIFT while hitting ENTER. If you did this correctly, you will have the same formula in each cell surrounded by curly braces {...}
Then you can decide how you want to "Average" the heights.

Related

Calculate a plot and intersect on a curve in an excel chart based on a single value

I have some data that I'm using to plot a curve in excel. It uses a non-linear calculation.
The calculation is called the rule of twelfths - it is used to calculate changes in tidal height between a high and low tide. The rule states that in the first sixth (often approximated to an hour) of the time period, the tide will move 1/12th of the overall range. In the second sixth, the tide will move 2/12th's of the overall range. In the 3rd and 4th sixth, the tide will move 3/12ths (in each), and then it will move 2/12ths again in the fifth sixth, and 1/12th in the final sixth.
The maths for this is relatively straightforward - if I know the High Water Time and Low Water time, and their respective heights, I can calculate a data point for each sixth. That then plots to a nice even curve (and some fun pie chart shenanigans shows it on a clock face too).
This produces the following sheet:
What I am now after is the ability to overlay onto that the height for a given time of day. This would be used in a 'live' sense to display the height 'now', or perhaps where the user dragged their finger on the curve if it was in an app. I'm only using this for screenshot/flat file purposes, so I just need it to base the overlay one the data in one cell.
So, in the attached screenshot, if we had a time of day of 1128, (based on Cell J3), excel would take the time in J3, and wherever it intersected the curve, draw both a vertical and a horizontal line, so that the height of tide data (HOT) could be measured off that axis.
This would look something like this (I've circled cell J3 too):
Is that something that's possible? It might be that it needs to do a lookup in the table of calculated data points and then interpolate just between those two - that would probably get close enough.
A two stage question I guess - firstly calculating the intercept, secondly getting it to draw on (complete with the vertical and horizontal lines if possible!).
There's a widget on planetcalc which does almost the same thing - it only gives the calculated data points (and it uses hours rather than the range), but it gives a nice visual idea.
PlanetCalc Tide Calculator
Any thoughts? Is it possible?
I created a solution to this myself in the end.
Given that the two values were easily calculated, I produced a pair of values for the desired time/height, and plotted each as an additional line graph on top of the existing (styled correctly).
This gave the appearance of what I wanted, and intersected my curve perfectly.

Is there an EXCEL function to choose the closest points below a certain line?

Suppose that we have a scattered series of data X,Y randomly spaced (in the pic they are ordered, but this doesn't matter) and a line which shows the maximum limit we are considering for a sub-application.
Is there a combination of functions to choose the closest points below the orange line? I've tried with a MAXIFS + LOOKUP, but didn't solve anything.
The formula of your line is: y=1.17*x, so you create a helper column, containing a formula like:
=IF(1.17*A3-B3>0;1.17*A3-B3;100000)
This means: calculate the difference between the line and the point if that difference is positive. In case it's negative (which means that the point is above the line), then show a value which is that large that it won't be taken into account while calculating the minimum.
You drag this formula all over the column.
You calculate the minimum of that column (one of the easy ways to do this, is using the autofilter).

Excel, Determine where data takes a dive

I'm trying to determine where, in a set of measurement data, the data takes a dive...
... so I can plot a vertical line and
... plot a horizontal line in the graph.
I have no problem doing the 2nd and 3rd bullet points above on my own, so that's taken care of.
The problem I need help with is the first bullet point - determining WHERE the data takes a dive - WHERE the data crosses a threshold that basically says, "Whatever-it-is you're measuring, is no longer performing as it is expected to.".
Here's what I'm doing:
I am taking measurements using a measuring device and that device is logging the measurements in its internal memory and allowing me to download that measurement data to my computer into a csv when the test session is complete.
I pull that csv into an xls and plot the data on a graph. (see attached image)
Here's what I want to do:
If you look at the attached image I would like to find the value where the data DEFINITELY crosses BELOW the horizontal line so I can say, "Here is where the device being tested 'gave up the ghost' and was no longer able to perform as desired."
What the data roughly looks like:
Each measurement set will have the rough look and feel of the attached image but slightly different each time. (because each object I am testing will have roughly the same performance characteristics but they all have their own manufacturing defects and variations.)
The data set for the attached image is a data set of 7000 measurements.
I never really know where the horizontal line will be.
Examples of the data sets I have gotten in the past several tests look like this:
(394 to 0)
(390000 to 0)
(3.88 to 0)
(375000 to 0)
(39.55 to 0)
(59200 to 0)
and each data set will have about 1,000 to 7,000 measurements each.
Here's how I was trying to solve this issue:
I was using SLOPE() and trying to latch onto where the slop of the line took a dive / started to work its way to a zero slope (which is a vertical line) so when it starts approaching a really small slope then it MUST be taking a dive. That didn't really work.
I was looking at using STDEV.P() in Excel and feeding it the entire data set. Then I was looking at doing the same thing but feeding it only the first 10, 30, 60 measurements but then I thought - we never really know just how many measurements will come through. Then I thought I would use the first 10% of the measurements and feed that to STDEV.P().
Please let me know what you think of this and please let me know of any ideas you may have.
Thanks.
H
Something like this should work to flag when the decay rate increases.
To find what 'direction' your data is going in you need the derivative.
Excel doesn't have a derivative formula but you can set it up pretty easily by using the (change in y)/(change in x) as demonstrated here:
http://faculty.educ.ubc.ca/sanderson/lab/CLFbiom/demo/diff.htm
I would then check a formula which counts how many datarows you have (=COUNTA(A:A) or similar)
Then uses that to get a step of 10% of your data
Then check the value of the derivative in a cell against a cell 10% further down. If it's still a negative (to account for the slight downhill at first) then you'll know
The right way to go about this is to model the data with an unknown discontinuity, something like "if time < break_time then (some constant plus noise) else (decaying exponential)". A maximum likelihood estimation for that model might require iteration or other operations which are clumsy in Excel -- maybe you should consider VB or Python or some other programming language. I.e. choose the tool to fit the problem and not the other way around.
See Seber and Wild, "Nonlinear Regression", for an extensive discussion of models with discontinuities.
If your data can be generally characterized as having:
(A) a more or less flat plateau region, followed by
(B) a downward trending region
then a basic strategy could be to start at then end of the data and march towards the beginning one point at a time, checking to see that the values are increasing. Once they stop increasing, you've found the break point.
The strategy assumes (unwisely?) that the downward trending region is smooth/noiseless. To make the solution more robust to noise, you could compare values that are 5 apart, or 10 apart, or whatever interval works to filter out the noise. Or you could use a moving average.
This strategy could potentially be made more efficient by starting the search somewhere in the middle of the data but still in downward trending portion. If you know (based on experience) that any value that is (say) 0.5X the maximum is in the downward trending portion, you could start the search there.
Hope that helps.
It appears as though you want to detect when the slope changes from something near zero to something negative. One way to detect this is to calculate the 2nd derivative of the values (calculate the slope of the slope). The 2nd derivative should be near zero in the flat portion of the data AND in the downward trending portion of the data. It should go negative at the break point. So finding the minimum (most negative) value of the 2nd should locate the break point.
To implement this, you probably will need to filter noise. So calculate the first derivative (slope) over some suitable window of data:
=SLOPE(moving window of say 25 raw values)
Then calculate the second derivative (slope of slope):
=SLOPE(moving window of say 25 slope values)
Then look for the minimum.
Hope that helps.

Averaging many curves with different x and y values

I have several curves that contain many data points. The x-axis is time and let's say I have n curves with data points corresponding to times on the x-axis.
Is there a way to get an "average" of the n curves, despite the fact that the data points are located at different x-points?
I was thinking maybe something like using a histogram to bin the values, but I am not sure which code to start with that could accomplish something like this.
Can Excel or MATLAB do this?
I would also like to plot the standard deviation of the averaged curve.
One concern is: The distribution amongst the x-values is not uniform. There are many more values closer to t=0, but at t=5 (for example), the frequency of data points is much less.
Another concern. What happens if two values fall within 1 bin? I assume I would need the average of these values before calculating the averaged curve.
I hope this conveys what I would like to do.
Any ideas on what code I could use (MATLAB, EXCEL etc) to accomplish my goal?
Since your series' are not uniformly distributed, interpolating prior to computing the mean is one way to avoid biasing towards times where you have more frequent samples. Note that by definition, interpolation will likely reduce the range of your values, i.e. the interpolated points aren't likely to fall exactly at the times of your measured points. This has a greater effect on the extreme statistics (e.g. 5th and 95th percentiles) rather than the mean. If you plan on going this route, you'll need the interp1 and mean functions
An alternative is to do a weighted mean. This way you avoid truncating the range of your measured values. Assuming x is a vector of measured values and t is a vector of measurement times in seconds from some reference time then you can compute the weighted mean by:
timeStep = diff(t);
weightedMean = timeStep .* x(1:end-1) / sum(timeStep);
As mentioned in the comments above, a sample of your data would help a lot in suggesting the appropriate method for calculating the "average".

Excel Chart doesn't keep format

I have a table (came from a pivot table) where I have formatted the column 4 cells to show 1 billion as 1. But when I select the table and insert a chart, I am getting my units in millions. So the 14.8 billion number for Mexico is showing up as 14,800 on the chart. Why might this be happening and how can I fix this? This is also making all my other bars negligibly small. Note that the first three columns are not in billions and are totally different things. Some are percentages, some are other small numbers.
Table:
Chart:
You need a secondary horizontal axis and some formatting on the Axes.
In Excel 2013
First change the Chart Type to Combo and select Clustered Bar for both sets of data, then Check
Secondary Axis for the Percentage Series.
Then set up the axis limits so they match, e.g.
Percentage: min -.5 max 2
Billions: min -5e9 max 20e9
Then set the percentage format on the source data to a custom Number format of "";(0)%;0%
Then set the Billions format as 0,,,;"";0
You will get something like this:
EDIT
Now that we have the general principles, we can apply them to your specific data.
I will also switch to Excel 2010 do show the different menus.
The data selection looks like this
Select the non-Billion series (plural!) and check the secondary axis
If the larger data is always positive then you can use custom formatting to clean up the axis
Align the primary and secondary axes so that the grid lines match on both
The end result is clean and readable.
Mixing percentages and numbers for the smaller numbers is not handled by this but I would suggest that that would be confusing anyway?
The simplest way to fix this might be to plot cells containing the billions values divided by 10^9 rather than to plot the billions themselves, though via a secondary axis may be possible.
Using Excel 2007. For the purple bars, the example on the left uses ColumnE values, on the right ColumnF values. E1 contains =F1/10^9 and F1 contains =14800000000:
It appears that there are 3 questions here: 1) "Why might this be happening", 2) "how can I fix this", and 3) something like "how can I plot data which lie on two widely differing ranges, and make them all reasonably visible anyway", even if there was no explicit question on this.
There are several ways to solve issue #2 about the units (e.g., billions) and numbers (e.g., 14.8 vs. 14,800.0) shown in the axis, each one with its own pros and cons:
Use Format Axis -> Axis Options -> Display units.
This might be the answer to your issue #1 as well, you might have the following selection: Display units -> Millions, and unchecked Show display units... Otherwise, I wouldn't know why you chart shows what it shows.
Use faked tick marks, as indicated in the (excellent) site of Jon Peltier
http://peltiertech.com/Excel/Charts/ArbitraryAxis.html
It gives detailed instructions on how to create tick marks on an axis with arbitrary labels (which may be text, numbers, etc.), which is more generic than what the OP wants here. In this particular case, the labels will be the desired numbers.
Create new cells containing data that would be plotted exactly the way you want.
As for your issue #3, I guess the only option is to have a Secondary Axis (see the answer by pnuts).
Thus, to come up with the best final chart for you might use a combination of one of the options I gave here and a secondary axis.

Resources