Rebinning data in Excel - excel

I have some wind data in excel that is binned by wind speed and direction. I want to re-bin it to cover different intervals (assuming the data is uniform across the original intervals). i.e I want to go from:
To
Can anyone give me some pointers? Struggling to think of an elegant way to do this.

Related

Excel data - cleaning data with multiple values for numerous instances

I have a data set which is related to force applied vs distance traveled.
When the data was created the measurement software has provided multiple values for distance traveled as the force increases, then in some cases the data has no values for distance at the force values.
I have several data sets which look like this.
The data looks like this
I want to 'clean the data so I can create a graph with all 3 samples in columns the same height so it is easy to edit and make scatter graphs from.
I tried to clean the data by using VLOOKUP to create a column of force values at each 0.5N, but when I do this I end up with a large table that has lots of missing data points, when I make the graph from this there are lots of blank areas which don't seem to plot correctly.
The VLOOKUP data looks like this
The graph looks like this
Is there a better way to do this which will give me a better looking data set which is better for creating a graph from?
I have about 30 sets of data, so any info that you have would be greatly appreciated.
Why make the columns equal length.
If you plot the three samples with the data as given, an XY graph should look OK:
If there's some other reason to make the columns equal length, I'd "fill in the blanks" using the FORECAST or GROWTH functions, or use a trendline.
You can use IFERROR to insert something in place of the #N/As. For example, you could use =IFERROR(VLOOKUP(A1,D:D,1,FALSE),0) to add a zero in place of the #N/As

How to use excel data to find period

I have three Excel columns of data from an experiment with a pendulum: time, angle displacement, and angular velocity. I was wondering if there is a way in Excel to calculate and then graph the period (and, if possible, display the function for the graph)... I realize it's kinda a dumb question. I'm still new at Excel.
Thanks for any pointers u can give!
In case the Analysis ToolPak is installed, one can use Tools->Data Analysis->Fourier Analysis. If the data is a superposition of harmonic functions (sin,cos), the corresponding frequencies (or inverse periods) will appear as peaks in the Fourier analysis.

Excel, Determine where data takes a dive

I'm trying to determine where, in a set of measurement data, the data takes a dive...
... so I can plot a vertical line and
... plot a horizontal line in the graph.
I have no problem doing the 2nd and 3rd bullet points above on my own, so that's taken care of.
The problem I need help with is the first bullet point - determining WHERE the data takes a dive - WHERE the data crosses a threshold that basically says, "Whatever-it-is you're measuring, is no longer performing as it is expected to.".
Here's what I'm doing:
I am taking measurements using a measuring device and that device is logging the measurements in its internal memory and allowing me to download that measurement data to my computer into a csv when the test session is complete.
I pull that csv into an xls and plot the data on a graph. (see attached image)
Here's what I want to do:
If you look at the attached image I would like to find the value where the data DEFINITELY crosses BELOW the horizontal line so I can say, "Here is where the device being tested 'gave up the ghost' and was no longer able to perform as desired."
What the data roughly looks like:
Each measurement set will have the rough look and feel of the attached image but slightly different each time. (because each object I am testing will have roughly the same performance characteristics but they all have their own manufacturing defects and variations.)
The data set for the attached image is a data set of 7000 measurements.
I never really know where the horizontal line will be.
Examples of the data sets I have gotten in the past several tests look like this:
(394 to 0)
(390000 to 0)
(3.88 to 0)
(375000 to 0)
(39.55 to 0)
(59200 to 0)
and each data set will have about 1,000 to 7,000 measurements each.
Here's how I was trying to solve this issue:
I was using SLOPE() and trying to latch onto where the slop of the line took a dive / started to work its way to a zero slope (which is a vertical line) so when it starts approaching a really small slope then it MUST be taking a dive. That didn't really work.
I was looking at using STDEV.P() in Excel and feeding it the entire data set. Then I was looking at doing the same thing but feeding it only the first 10, 30, 60 measurements but then I thought - we never really know just how many measurements will come through. Then I thought I would use the first 10% of the measurements and feed that to STDEV.P().
Please let me know what you think of this and please let me know of any ideas you may have.
Thanks.
H
Something like this should work to flag when the decay rate increases.
To find what 'direction' your data is going in you need the derivative.
Excel doesn't have a derivative formula but you can set it up pretty easily by using the (change in y)/(change in x) as demonstrated here:
http://faculty.educ.ubc.ca/sanderson/lab/CLFbiom/demo/diff.htm
I would then check a formula which counts how many datarows you have (=COUNTA(A:A) or similar)
Then uses that to get a step of 10% of your data
Then check the value of the derivative in a cell against a cell 10% further down. If it's still a negative (to account for the slight downhill at first) then you'll know
The right way to go about this is to model the data with an unknown discontinuity, something like "if time < break_time then (some constant plus noise) else (decaying exponential)". A maximum likelihood estimation for that model might require iteration or other operations which are clumsy in Excel -- maybe you should consider VB or Python or some other programming language. I.e. choose the tool to fit the problem and not the other way around.
See Seber and Wild, "Nonlinear Regression", for an extensive discussion of models with discontinuities.
If your data can be generally characterized as having:
(A) a more or less flat plateau region, followed by
(B) a downward trending region
then a basic strategy could be to start at then end of the data and march towards the beginning one point at a time, checking to see that the values are increasing. Once they stop increasing, you've found the break point.
The strategy assumes (unwisely?) that the downward trending region is smooth/noiseless. To make the solution more robust to noise, you could compare values that are 5 apart, or 10 apart, or whatever interval works to filter out the noise. Or you could use a moving average.
This strategy could potentially be made more efficient by starting the search somewhere in the middle of the data but still in downward trending portion. If you know (based on experience) that any value that is (say) 0.5X the maximum is in the downward trending portion, you could start the search there.
Hope that helps.
It appears as though you want to detect when the slope changes from something near zero to something negative. One way to detect this is to calculate the 2nd derivative of the values (calculate the slope of the slope). The 2nd derivative should be near zero in the flat portion of the data AND in the downward trending portion of the data. It should go negative at the break point. So finding the minimum (most negative) value of the 2nd should locate the break point.
To implement this, you probably will need to filter noise. So calculate the first derivative (slope) over some suitable window of data:
=SLOPE(moving window of say 25 raw values)
Then calculate the second derivative (slope of slope):
=SLOPE(moving window of say 25 slope values)
Then look for the minimum.
Hope that helps.

tableau waterfall chart with mixed colors

I started working with Tableau and found out how to do waterfall charts (Gantt-charts, rolling sum for y-axis and negative value for length of bar. See here: tableau tutorial waterfall charts)
Now my question would be if there is a possibility to split the color by some category? To show what I'm talking about I set up some waterfalls with excel powerpoint. The first one without a split by a category and the second one with a split by category.
I would appreciate any help
It is as simple as dragging the right field to Color, and adjusting the table calculation.
If it's a simple running sum (for the positioning of the bar), you need to go edit the table calculation, select Compute using Advanced..., then drag all fields to addressing. Then you need to sort your data by the master field (in your case the one that has Coffee, Coke,...) maximum (ascending or descending, doesn't matter).
This way you guarantee that the running sum is being applied to one category at a time (at not one color at a time or something like that).
It's really important to understand the concept of table calculations so you can understand what's going on, how Tableau is calculating stuff. Read this http://onlinehelp.tableausoftware.com/current/pro/online/en-us/help.htm#calculations_tablecalculations_understanding_addressing.html
And if you actually understand what you're doing, it's easier to find a solution. For instance, this hack of Gantt Chart to make a waterfall. What goes on rows the chart will understand at starting point of the data, and what goes on size is, well, the size of the chart. You put negative values cause you want the bar to go down.
That being said, dragging a field to color won't mess up with the size, but it will mess up with the running sum used to determine the starting points. How to solve it? make it calculate all the colors together. How to make it? Understand table calculations and reach my solution.
This is the simple approach, if your database or fields have some peculiarity, this might not work perfectly, and you'll need to explain so I can try to understand how to solve

Way to reduce geopoints?

Does anyone have any handy algorithms that could be used to reduce the number of geo-points ?
I am using a list of 2,000,000 postcodes which come with their own geo-point. I am using them to collect data from an API to be used offline. The program is written in C++.
I have to go through each postcode, calculate a bounding box based on the postcodes location, and then send it to the API which gives me some data near to that postcode.
However 2,000,000 is a lot to process and some of the postcodes are next to each other or close enough to each other that they would share some of the same data.
So far I've came up with two ways I could reduce them but I am not sure if they would work:
1 - Program uses data structure to record which postcode overlaps which and then run a routine a few time to removes the ones that have overlaps one by one until we are left without ones without overlapping postcodes.
Start at the top left geo point of the UK and slowly increment it the rough size of a postcode area until we have covered the entire UK.
Is there a easy way to reduce these number of postcodes so that I have few of them overlapping as possible ? whilst still making sure I get data covering as much of the UK as possible ? I was thinking there may be an algorithm handy for this, that people use else where.
You can use a quadtree especially a quadkey. A quadkey plot the points along a curve. It's similar to sort the points into a grid. Then you can traverse the grid to search deeper in the tree. You can also search around a center point. You can also use a database with a spatial index. It depends how much the data overlap but with a quadtree you can choose the size of the grid.

Resources