Excel how to find algebraic curve and fill in estimate? - excel

Given a generic "rank" column and some actual data in the SOLD-LAST-MONTH column, how can i fill in the blanks using excel's basic algebra functions?
SALESRANK SOLD-LAST-MONTH
171
433 2931
1104
1484 2691
1872 2108
2196
2762 495
2829
3211
6646
7132
10681
10804
Seems like the numbers on the left would form a curve and the numbers on the right would shape the curve.
I'm forgetting my highschool math days about how to accomplish this?

Fitting a curve requires much more than simple algebra.
Also, you don't have enough data to define a curve. Plotting the points you already have (using x-y scatter plot), the extrapolation from the last 3 points would be the red line, which runs into negatives very quickly.
Sales obviously need to remain positive, so assuming a very small number of sales for the lowest salesrank and plotting that point as well shows what the curve should look more like.
To generate the green curve I just drew a smooth line over the known points. (Using drawing tools and adjusting the points and gradients until the curve looks reasonable. We can do this visually easily but programmatically it's very complicated.)
It would be easiest (and considering how little data you have, it's also about as accurate as you'll get) to just read values from the curve at each salesrank point.
While it's safe to assume sales are near zero at the lowest ranks, the top ranks can be unpredictable... in some situations the top few ranks are far greater than the rest. For a more accurate curve near the top ranks, you really need to know the number of sales for the top rank. That would allow you to get a far more accurate value for the 171 rank.

Related

Calculate a plot and intersect on a curve in an excel chart based on a single value

I have some data that I'm using to plot a curve in excel. It uses a non-linear calculation.
The calculation is called the rule of twelfths - it is used to calculate changes in tidal height between a high and low tide. The rule states that in the first sixth (often approximated to an hour) of the time period, the tide will move 1/12th of the overall range. In the second sixth, the tide will move 2/12th's of the overall range. In the 3rd and 4th sixth, the tide will move 3/12ths (in each), and then it will move 2/12ths again in the fifth sixth, and 1/12th in the final sixth.
The maths for this is relatively straightforward - if I know the High Water Time and Low Water time, and their respective heights, I can calculate a data point for each sixth. That then plots to a nice even curve (and some fun pie chart shenanigans shows it on a clock face too).
This produces the following sheet:
What I am now after is the ability to overlay onto that the height for a given time of day. This would be used in a 'live' sense to display the height 'now', or perhaps where the user dragged their finger on the curve if it was in an app. I'm only using this for screenshot/flat file purposes, so I just need it to base the overlay one the data in one cell.
So, in the attached screenshot, if we had a time of day of 1128, (based on Cell J3), excel would take the time in J3, and wherever it intersected the curve, draw both a vertical and a horizontal line, so that the height of tide data (HOT) could be measured off that axis.
This would look something like this (I've circled cell J3 too):
Is that something that's possible? It might be that it needs to do a lookup in the table of calculated data points and then interpolate just between those two - that would probably get close enough.
A two stage question I guess - firstly calculating the intercept, secondly getting it to draw on (complete with the vertical and horizontal lines if possible!).
There's a widget on planetcalc which does almost the same thing - it only gives the calculated data points (and it uses hours rather than the range), but it gives a nice visual idea.
PlanetCalc Tide Calculator
Any thoughts? Is it possible?
I created a solution to this myself in the end.
Given that the two values were easily calculated, I produced a pair of values for the desired time/height, and plotted each as an additional line graph on top of the existing (styled correctly).
This gave the appearance of what I wanted, and intersected my curve perfectly.

Normalisation or Standardisation for detecting outlier?

When to use min max scaling that is normalisation and when to use standardisation that is using z score for data pre-processing ?
I know that normalisation brings down the range of feature to 0 to 1, and z score bring downs to -3 to 3, but am unsure when to use one of the two technique for detecting the outliers in data?
Let us briefly agree on the terms:
The z-score tells us how many standard deviations a given element of a sample is away from the mean.
The min-max scaling is the method of rescaling a range of measurements the interval [0, 1].
By those definitions, z-score usually spans an interval much larger than [-3,3] if your data follows a long-tailed distribution. On the other hand, a plain normalization does indeed limit the range of the possible outcomes, but will not help you help you to find outliers, since it just bounds the data.
What you need for outlier dedetction are thresholds above or below which you consider a data point to be an outlier. Many programming languages offer Violin plots or Box plots which nicely show your data distribution. The methods behind plots implement a common choice of thresholds:
Box and whisker [of the box plot] plots quartiles, and the band inside the box is always the second quartile (the median). But the ends of the whiskers can represent several possible alternative values, among them:
the minimum and maximum of all of the data [...]
one standard deviation above and below the mean of the data
the 9th percentile and the 91st percentile
the 2nd percentile and the 98th percentile.
All data points outside the whiskers of the box plots are plotted as points and considered outliers.

Unsupervised Outlier detection

I have 6 points in each row and have around 20k such rows. Each of these row points are actually points on a curve, the nature of curve of each of the rows is same (say a sigmoidal curve or straight line, etc). These 6 points may have different x-values in each row.I also know a point (a,b) for each row which that curve should pass through. How should I go about in finding the rows which may be anomalous or show an unexpected behaviour than other rows? I was thinking of curve fitting but then I only have 6 points for each curve, all I know is that majority of the rows have same nature of curve, so I can perhaps make a general curve for all the rows and have a distance threshold for outlier detection.
What happens if you just treat the 6 points as a 12 dimensional vector and run any of the usual outlier detection methods such as LOF and LoOP?
It's trivial to see the relationship between Euclidean distance on the 12 dimensional vector, and the 6 Euclidean distances of the 6 points each. So this will compare the similarities of these curves.
You can of course also define a complex distance function for LOF.

Find contour of 2D unorganized pointcloud

I have a set of 2D points, unorganized, and I want to find the "contour" of this set (not the convex hull). I can't use alpha shapes because I have a speed objective (less than 10ms on an average computer).
My first approach was to compute a grid and find the outline squares (squares which have an empty square as a neighbor). So I think I downsized efficiently my numbers of points (from 22000 to 3000 roughly). But I still need to refine this new set.
My question is : how do I find the real outlines points among my green points ?
After a weekend full of reflexions, I may have found a convenient solution.
So we need a grid, we need to fill it with our points, no difficulty here.
We have to decide which squares are considered as "Contour". Our criteria is : at least one empty neighbor and at least 3 non empty neighbors.
We lack connectivity information. So we choose a "Contour" square which as 2 "Contour" neighbors or less. We then pick one of the neighbor. From that, we can start the expansion. We just circle around the current square to find the next "Contour" square, knowing the previous "Contour" squares. Our contour criteria prevent us from a dead end.
We now have vectors of connected squares, and normally if our shape doesn't have a hole, only one vector of connected squares !
Now for each square, we need to find the best point for the contour. We select the one which is farther from the barycenter of our plane. It works for most of the shapes. Another technique is to compute the barycenter of the empty neighbors of the selected square and choose the nearest point.
The red points are the contour of the green one. The technique used is the plane barycenter one.
For a set of 28000 points, this techniques take 8 ms. CGAL's Alpha shapes would take an average 125 ms for 28000 points.
PS : I hope I made myself clear, English is not my mothertongue :s
You really should use the alpha shapes. Maybe use only green points as inputs of the alpha alpha algorithm.

How to draw a line parallel to the linear portion of the curve in excel?

I have stress strain data from tensile test. I have drawn a stress vs strain graph in excel.I need to find the yield point.For that, I need to draw a line parallel to the linear(straight) portion of the curve with 0.2% offset in x axis and see where it intersects with the original curve.
So, I tried to keep only the linear portion data and drew a trendline, which gives me straight line equation y=mx. Now, if I want 0.2% offset, equation of line is y=mx +c.
I have the equation, how to draw this line in excel with the equation? And how to get the intersection point? Is my approach right? Please help.
A Stress-strain curve with data courtesy of YouTube:
It can be observed that this is linear (elastic deformation) for about the first nine data points. Hide the other data points and add a liner trend line to the chart, with Display Equation on chart. This shows y=4232.x + 0.701.
In say N1 =4232*D2+.701 and copied down to N10 provides the data for an additional data series. Copy N2:N10, select the chart, Paste and format the added series to see the match:
For a 0.2% offset x is .002 when y=0. For a parallel line the slope (determined by 4232 in the formula) cannot be changed so the constant must be, from +0.701 to -7.763. A further data set can be created with the altered constant and added to the chart as before:
By observation the offset yield strength (0.2% proof strength) can be seen to be around 80 MPa in this example (where the green line intersects the blue curve).

Resources