Applying Quadratic Fit to Unknowns - excel

I'm trying to build a spreadsheet to find a quadratic fit for a set of control data, then apply that fit to a set of unknowns to get a calculated concentration.
For my quadratic curve calculation, I have this:
=LINEST(F28:F33,A28:A33^{1,2},TRUE,TRUE)
An example of relevant control data (where 0-40 would be found in the A column, and the 0.001-0.575 in the F column) is:
0 0.001
2 0.030
5 0.076
10 0.156
20 0.310
40 0.575
This is giving me a curve solution that matches the software currently being used to analyze the data (SoftMax 4.7):
A: -5.1E-05
B: 0.016
C: -0.002
Using this formula to apply the curve to data (where E16 represents any individual datapoint I'm solving for and Blank1 is a set of negative controls):
=(-CurveB+SQRT((CurveB^2)-(4*CurveA*(CurveC-(E16-AVERAGE(Blank1))))))/(2*CurveA)
However, when I apply the curve using the formula
to a set of data, e.g.:
0.275 0.269 0.266
0.217 0.193 0.194
0.011 0.013 0.011
0.004 0.006 0.003
I get output:
17.835 17.426 17.221
13.922 12.333 12.399
0.796 0.919 0.796
0.369 0.491 0.308
Compared to SoftMax's output:
17.827 17.405 17.215
13.918 12.333 12.393
0.785 0.950 0.797
0.353 0.487 0.298
My problem is, I can't find enough documentation on how SoftMax applies the quadratic fit to the data so I don't know which set of results is more accurate. I've checked to see if it's a rounding error (i.e. Softmax is rounding the displayed results but calculating using unrounded figures or possibly the other way around), I've tried throwing the whole mess through Solver, letting Excel change the curve variables and the blank factor (I also tried removing the blank factor and solving, and adding independent blank factors for each column and solving) and solving for a minimum total variance from the Softmax results, but I cannot find a solution that produces the same results as the Softmax software (or even closer than 0.58% or so average variance from the Softmax results).
Can anybody tell me if this is an error in my calculations (I'm specifically skeptical of my formula to apply the curve to data-is there a more graceful way to apply a quadratic fit to a set of unknowns in Excel?) or is it an error with the calculations produced by the other program, e.g. solving using approximations or rounded values somewhere?

Summary: I think you're seeing rounding errors.
Details. I used your Excel equations and the data provided and reproduced your curve parameters, so that seems OK. I then plugged the SoftMax Pro output (17.827, 17.405, 17.215, 13.918, ...) and your output (17.835, 17.426, 17.221, 13.922, ...) into y=AX^2+BX+C and calculated y-values. The pair-wise differences were in the 4th decimal place or smaller --- biggest (abs) difference was ~ -0.0005, so that's consistent with a rounding/truncation of the X-data that's hidden from you.
Final Comment: I suspect you should not subtract blanks. The standard curve appears to have been created using not-blank-subtracted data (at zero input the output is non-zero) so it seems like you need to treat samples the same way as standards. It may not make much difference ...
Hope that helps.

Related

bollinger bands versus statistics: is 1 standard deviation supposed to be split into two halves by it's mean? or top and bottom bands from mean?

I have a question about how bollinger bands are plotted in relation to statistics. In statistics, once a standard deviation is calculated from a mean of a set of numbers, shouldn't interpreting a 1 standard deviation be done so that you divide this number is half, and plot each half above and below the mean? By doing so, you can then determine whether or not it's data points fall within this 1 standard deviation.
Then, correct me if I am wrong, but aren't bollinger bands NOT calculated this way?? Instead, it takes a 1 standard deviation (if you have set it to 1) and plots the WHOLE value both above and below the mean (not splitting in two), thereby doubling the size of this standard-deviation?
Bollinger bands loosely state that that 68% of data falls within the 1st band, 1 standard deviation (loosely because the empirical rule in statistics requires that distributions be normal distributions which most often stock prices are not). However if this empirical rule is from statistics where 1 standard deviation is split in half, that means that applying a 68% probability in to an entire bollinger band is wrong. ??? is this correct??
You can modify the deviation multiples to suite your purpose, you can use 0.5 for example.

My 'y'- axis in my normal distribution curve is over 1. Is this okay?

I am trying to show the normal distribution of two sets of data. My goal is to see if dataset 1 differs to dataset 2 (the dataset is total eroded area in m2). When i make normal distribution curves, i am aiming to fix or understand these problems
firstly im not sure how to interpret these negative values as total eroded area (my variable) cannot be negative
secondly im not sure what these greater than 1 y values mean
Dataset 1 is 0.180,0.063,0.65,0.43 and Dataset 2 is 0.148, 0.106, 0.39, 0.32 and the resulting normal distribution graph (based on the mean and standard deviation) is shown below.

How to model normal distribution curve based on a few datapoints in Excel?

I can't believe I'm not finding a simple answer on Google for this noob question.
I have a handful of datapoints (lets say 10) on scores and respective percentile ranks that are normally distributed, for example see below:
Scores
Percentile rank
846
96.5
809
91.0
729
67.8
592
27.7
...
...
I now want to use those datapoints to calculate the percentile ranks for scores for which I don't have datapoints. E.g. what would be the percentile rank for a score of 650?
I know how to do a linear regression in Excel, but for a normally distributed dataset this doesn't work obviously.
You have
where x is the value and p is the corresponding probability (percentile value/100).
so if you plotted NORM.S.INV(p) against x you would get a straight line with
and
so you could estimate
and
I simulated some data with a mean of 100 and SD of 10 - it works reasonably well, but that is with a fair number of points spread between -3 and +3 standard deviations, so it might not be very good on just a small number of points
The estimated mean is 99.5 and SD 9.75

Excel how to find algebraic curve and fill in estimate?

Given a generic "rank" column and some actual data in the SOLD-LAST-MONTH column, how can i fill in the blanks using excel's basic algebra functions?
SALESRANK SOLD-LAST-MONTH
171
433 2931
1104
1484 2691
1872 2108
2196
2762 495
2829
3211
6646
7132
10681
10804
Seems like the numbers on the left would form a curve and the numbers on the right would shape the curve.
I'm forgetting my highschool math days about how to accomplish this?
Fitting a curve requires much more than simple algebra.
Also, you don't have enough data to define a curve. Plotting the points you already have (using x-y scatter plot), the extrapolation from the last 3 points would be the red line, which runs into negatives very quickly.
Sales obviously need to remain positive, so assuming a very small number of sales for the lowest salesrank and plotting that point as well shows what the curve should look more like.
To generate the green curve I just drew a smooth line over the known points. (Using drawing tools and adjusting the points and gradients until the curve looks reasonable. We can do this visually easily but programmatically it's very complicated.)
It would be easiest (and considering how little data you have, it's also about as accurate as you'll get) to just read values from the curve at each salesrank point.
While it's safe to assume sales are near zero at the lowest ranks, the top ranks can be unpredictable... in some situations the top few ranks are far greater than the rest. For a more accurate curve near the top ranks, you really need to know the number of sales for the top rank. That would allow you to get a far more accurate value for the 171 rank.

Averaging many curves with different x and y values

I have several curves that contain many data points. The x-axis is time and let's say I have n curves with data points corresponding to times on the x-axis.
Is there a way to get an "average" of the n curves, despite the fact that the data points are located at different x-points?
I was thinking maybe something like using a histogram to bin the values, but I am not sure which code to start with that could accomplish something like this.
Can Excel or MATLAB do this?
I would also like to plot the standard deviation of the averaged curve.
One concern is: The distribution amongst the x-values is not uniform. There are many more values closer to t=0, but at t=5 (for example), the frequency of data points is much less.
Another concern. What happens if two values fall within 1 bin? I assume I would need the average of these values before calculating the averaged curve.
I hope this conveys what I would like to do.
Any ideas on what code I could use (MATLAB, EXCEL etc) to accomplish my goal?
Since your series' are not uniformly distributed, interpolating prior to computing the mean is one way to avoid biasing towards times where you have more frequent samples. Note that by definition, interpolation will likely reduce the range of your values, i.e. the interpolated points aren't likely to fall exactly at the times of your measured points. This has a greater effect on the extreme statistics (e.g. 5th and 95th percentiles) rather than the mean. If you plan on going this route, you'll need the interp1 and mean functions
An alternative is to do a weighted mean. This way you avoid truncating the range of your measured values. Assuming x is a vector of measured values and t is a vector of measurement times in seconds from some reference time then you can compute the weighted mean by:
timeStep = diff(t);
weightedMean = timeStep .* x(1:end-1) / sum(timeStep);
As mentioned in the comments above, a sample of your data would help a lot in suggesting the appropriate method for calculating the "average".

Resources