I want to generate a single column of 6000 numbers with a normal distribution, with a mean of 30.15, standard deviation of 49.8, minium of -11.5, maximum 133.5.
I am a total newb at this so i tried to use the following formula in a cell and than just drag it down to cell 6000:
=NORMINV(RANDBETWEEN(-11.5,133.5)/100,30.15,49.8)
It returns a value but sometimes it returns #NUM! error. Thank you!
Unfortunately NORMINV expects a probability for the argument, which must be a value in the interval (0, 1). Any parameter outside that range will yield #NUM!.
What you're asking cannot be done directly with a normal distribution since that has no constraints on the minimum and maximum values.
One approach is to use a primary column to generate the normally distributed numbers, then filter out the ones you want in the adjacent column. But this will cause even the mean (let alone higher moments) to go off quite considerably due to your minimum and maximum values not being equidistant from the mean. You could get round this by recentering the distribution and adjusting afterwards.
I wrote a macro which as a tiny task on the side also calculates the average of around 39000 different values. I noticed that using WorksheetFunction.Average and calculating the average "step-wise" yield different results, but only at the 15th digit after the decimal point. By calculating "step-wise" I mean adding up each value to a total_sum variable, counting the amount of values in another variable and then dividing the former by the latter.
The 15th digit after the decimal point might be considered negligible but I find it unsettling nonetheless. Shouldn't those two values be exactly the same? They are when I use less values and as the macro might be applied on far more values than 39000 (100k+), I'm worried the error might increase.
So my questions are: What could cause the difference and more importantly which method is more precise?
What I tried was to declare all variables in the "step-wise" calculation as Variant to avoid using the wrong data type in any of those steps.
Thank you very much for your help!
I'm currently working on an evaluation excel sheet for forceplate data (showing vertical force development in jumps over time) and stumbled upon a problem that I couldn't manage to fix for the past days. Basically there are two main columns over ~ 4000 rows and 1 extra cell:
Column A shows time [in ms]
Column B shows vertical force measured at the time point in Column A
C1 is the already calculated peak force value before takeoff
I am now trying to define the timepoint of takeoff in an extra cell using INDEX and MATCH functions (FYI: the time of takeoff is when the vertical force value is close to 0 for the first time [range of lookup must be starting from the peak force value though!!], but never exactly 0 due to force plate drift in measurement)
My idea was this:
=INDEX(A2:A4000;MATCH(0;INDEX(B2:B4000;MATCH(C1;B2:B4000;0)):B4000;-1))
so the range
INDEX(B2:B4000;MATCH(C1;B2:B4000;0)):B4000
should define a range of force values starting at the peak force value (C1).
Unfortunately Excel will show me a timepoint where the force value is far away from 0. I've tried the same formula within an easier (but for my purpose faulty) range (B2:B4000) and it worked perfectly, so I guess the problem I'm dealing with lies somewhere within the range defined with the INDEX function.
I'd be glad if someone could help me out with this!
You are certainly on the right track. It seems you've correctly adjusted the range in the nested INDEX function but that MATCH function will retunr the position within the adjusted B2:B4000. You need to adjust A2:A4000 in the same way so that the position returned by MATCH will be correct.
=INDEX(INDEX(A2:A4000; MATCH(C1; B2:B4000; 0)):A4000; MATCH(0; INDEX(B2:B4000; MATCH(C1; B2:B4000; 0)):B4000; -1))
I don't have sample data to test that on but I believe it is correct.
I am trying to recreate the formula from a trendline on a graph. basically my company is trying to predict the corn yields for next year. all of the actual programmers are out for the week so they passed it on to me(web developer:D). Ive attempted the LINEST formula multiple times with no luck.
basically in column B I have the years(1-15, trying to project 16) and Column C i have the actual trend data. i am probably doing this wrong however
EX =LINEST(C16:C30,B16:B30,FALSE,FALSE)
Any help would be appreciated. just tell me if you need the actual file or more information. Thanks in advance!
The fourth argument, concerning the return of additional regression statistics, is optional and is taken as FALSE if omitted, so seems not required for your purposes. The third argument, concerning the intercept with the Y-axis (the value of y when x is 0), is also optional but taken as TRUE if omitted. In your case TRUE seems appropriate so the third parameter seems not required for your purposes.
With your data spanning 15 years, if ending with the current year, it is conveniently 2001-2015 bdi and has no information about the value of y (production) in year 2000 (ie when x is 0) but this is unlikely to have been 0, as would be taken to be the case if the third argument is FALSE.
In a simplified example, take production of 50 in 2001, increasing by an (unrealistically!) constant 5 each year. By 2015 this has reached 120, so for 2016 at the same rate of increase production of 125 should be expected. Your formula returns 9.35 so would predict production of 129.35, though we know to expect 125, as given by:
=LINEST(C16:C30,B16:B30)
when added to the latest available (120).
The former is too high a predicted increase because it assumes growth was from 0 to 120 in sixteen years, rather than what I have taken to be from 50 to 120 in fifteen.
As has been mentioned by #Byron Wall, Excel has the TREND function that may be used for linear extrapolation to obtain the next (16th) value like so:
=TREND(C16:C30,B16:B30,16)
This directly returns 125 for the, simplified, sample data.
HOWEVER, all the above assumes growth is linear. Taking say Brazilian corn production (Million tons) over the period (offset one year) this has been roughly (based on USDA.gov):
The red line is the Linear trend and green a fourth order Polynomial. They happen both to end up at the same place for one year ahead (the hollow bar) but predict different results from the latest six years:
It may be worth charting the data you have, and adding different trend lines, before deciding whether linear extrapolation seems the most promising for forecasting purposes. ‘Wavy’ (cyclical) progress is evident in many datasets.
I'm sure this is the kind of problem other have solved many times before.
A group of people are going to do measurements (Home energy usage to be exact).
All of them will do that at different times and in different intervals.
So what I'll get from each person is a set of {date, value} pairs where there are dates missing in the set.
What I need is a complete set of {date, value} pairs where for each date withing the range a value is known (either measured or calculated).
I expect that a simple linear interpolation would suffice for this project.
If I assume that it must be done in Excel.
What is the best way to interpolate in such a dataset (so I have a value for every day) ?
Thanks.
NOTE: When these datasets are complete I'll determine the slope (i.e. usage per day) and from that we can start doing home-to-home comparisons.
ADDITIONAL INFO After first few suggestions:
I do not want to manually figure out where the holes are in my measurement set (too many incomplete measurement sets!!).
I'm looking for something (existing) automatic to do that for me.
So if my input is
{2009-06-01, 10}
{2009-06-03, 20}
{2009-06-06, 110}
Then I expect to automatically get
{2009-06-01, 10}
{2009-06-02, 15}
{2009-06-03, 20}
{2009-06-04, 50}
{2009-06-05, 80}
{2009-06-06, 110}
Yes, I can write software that does this. I am just hoping that someone already has a "ready to run" software (Excel) feature for this (rather generic) problem.
I came across this and was reluctant to use an add-in because it makes it tough to share the sheet with people who don't have the add-in installed.
My officemate designed a clean formula that is relatively compact (at the expensive of using a bit of magic).
Things to note:
The formula works by:
using the MATCH function to find the row in the inputs range just before the value being searched for (e.g. 3 is the value just before 3.5)
using OFFSETs to select the square of that line and the next (in light purple)
using FORECAST to build a linear interpolation using just those two points, and getting the result
This formula cannot do extrapolations; make sure that your search value is between the endpoints (I do this in the example below by having extreme values).
Not sure if this is too complicated for folks; but it had the benefit of being very portable (and simpler than many alternate solutions).
If you want to copy-paste the formula, it is:
=FORECAST(F3,OFFSET(inputs,MATCH(F3,inputs)-1,1,2,1),OFFSET(inputs,MATCH(F3,inputs)-1,0,2,1
(inputs being a named range)
There are two functions, LINEST and TREND, that you can try to see which gives you the better results. They both take sets of known Xs and Ys along with a new X value, and calculate a new Y value. The difference is that LINEST does a simple linear regression, while TREND will first try to find a curve that fits your data before doing the regression.
The easiest way to do it probably is as follows:
Download Excel add-on here: XlXtrFun™ Extra Functions for Microsoft Excel
Use function intepolate().
=Interpolate($A$1:$A$3,$B$1:$B$3,D1,FALSE,FALSE)
Columns A and B should contain your input, and column G should contain all your date values. Formula goes into the column E.
A nice graphical way to see how well your interpolated results fit:
Take your date,value pairs and graph them using the XY chart in Excel (not the Line chart). Right-click on the resulting line on the graph and click 'Add trendline'. There are lots of different options to choose which type of curve fitting is used. Then you can go to the properties of the newly created trendline and display the equation and the R-squared value.
Make sure that when you format the trendline Equation label, you set the numerical format to have a high degree of precision, so that all of the significant digits of the equation constants are displayed.
The answer above by YGA doesn't handle end of range cases where the desired X value is the same as the reference range's X value. Using the example given by YGA, the excel formula would return #DIV/0! error if an interpolated value at 9999 was asked for. This is obviously part of the reason why YGA added the extreme endpoints of 9999 and -9999 to the input data range, and then assumes that all forecasted values are between these two numbers. If such padding is undesired or not possible, another way to avoid a #DIV/0! error is to check for an exact input value match using the following formula:
=IF(ISNA(MATCH(F3,inputs,0)),FORECAST(F3,OFFSET(inputs,MATCH(F3,inputs)-1,1,2,1),OFFSET(inputs,MATCH(F3,inputs)-1,0,2,1)),OFFSET(inputs,MATCH(F3,inputs)-1,1,1,1))
where F3 is the value where interpolated results are wanted.
Note: I would have just added this as a comment to the original YGA post, but I don't have enough reputation points yet.
alternatively.
=INDEX(yVals,MATCH(J7,xVals,1))+(J7-MATCH(J7,xVals,1))*(INDEX(yVals,MATCH(J7,xVals,1)+1)-INDEX(yVals,MATCH(J7,xVals,1)))/(INDEX(xVals,MATCH(J7,xVals,1)+1)-MATCH(J7,xVals,1))
where j7 is the x value.
xvals is range of x values
yvals is range of y values
easier to put this into code.
You can find out which formula fits best your data, using Excel's "trend line" feature. Using that formula, you can calculate y for any x
Create linear scatter (XY) for it (Insert => Scatter);
Create Polynominal or Moving Average trend line, check "Display Equation on
chart" (right-click on series => Add Trend Line);
Copy the equation into cell and replace x's with your desired x value
On screenshot below A12:A16 holds x's, B12:B16 holds y's, and C12 contains formula that calculates y for any x.
I first posted an answer here, but later found this question