I'm sure this is the kind of problem other have solved many times before.
A group of people are going to do measurements (Home energy usage to be exact).
All of them will do that at different times and in different intervals.
So what I'll get from each person is a set of {date, value} pairs where there are dates missing in the set.
What I need is a complete set of {date, value} pairs where for each date withing the range a value is known (either measured or calculated).
I expect that a simple linear interpolation would suffice for this project.
If I assume that it must be done in Excel.
What is the best way to interpolate in such a dataset (so I have a value for every day) ?
Thanks.
NOTE: When these datasets are complete I'll determine the slope (i.e. usage per day) and from that we can start doing home-to-home comparisons.
ADDITIONAL INFO After first few suggestions:
I do not want to manually figure out where the holes are in my measurement set (too many incomplete measurement sets!!).
I'm looking for something (existing) automatic to do that for me.
So if my input is
{2009-06-01, 10}
{2009-06-03, 20}
{2009-06-06, 110}
Then I expect to automatically get
{2009-06-01, 10}
{2009-06-02, 15}
{2009-06-03, 20}
{2009-06-04, 50}
{2009-06-05, 80}
{2009-06-06, 110}
Yes, I can write software that does this. I am just hoping that someone already has a "ready to run" software (Excel) feature for this (rather generic) problem.
I came across this and was reluctant to use an add-in because it makes it tough to share the sheet with people who don't have the add-in installed.
My officemate designed a clean formula that is relatively compact (at the expensive of using a bit of magic).
Things to note:
The formula works by:
using the MATCH function to find the row in the inputs range just before the value being searched for (e.g. 3 is the value just before 3.5)
using OFFSETs to select the square of that line and the next (in light purple)
using FORECAST to build a linear interpolation using just those two points, and getting the result
This formula cannot do extrapolations; make sure that your search value is between the endpoints (I do this in the example below by having extreme values).
Not sure if this is too complicated for folks; but it had the benefit of being very portable (and simpler than many alternate solutions).
If you want to copy-paste the formula, it is:
=FORECAST(F3,OFFSET(inputs,MATCH(F3,inputs)-1,1,2,1),OFFSET(inputs,MATCH(F3,inputs)-1,0,2,1
(inputs being a named range)
There are two functions, LINEST and TREND, that you can try to see which gives you the better results. They both take sets of known Xs and Ys along with a new X value, and calculate a new Y value. The difference is that LINEST does a simple linear regression, while TREND will first try to find a curve that fits your data before doing the regression.
The easiest way to do it probably is as follows:
Download Excel add-on here: XlXtrFunâ„¢ Extra Functions for Microsoft Excel
Use function intepolate().
=Interpolate($A$1:$A$3,$B$1:$B$3,D1,FALSE,FALSE)
Columns A and B should contain your input, and column G should contain all your date values. Formula goes into the column E.
A nice graphical way to see how well your interpolated results fit:
Take your date,value pairs and graph them using the XY chart in Excel (not the Line chart). Right-click on the resulting line on the graph and click 'Add trendline'. There are lots of different options to choose which type of curve fitting is used. Then you can go to the properties of the newly created trendline and display the equation and the R-squared value.
Make sure that when you format the trendline Equation label, you set the numerical format to have a high degree of precision, so that all of the significant digits of the equation constants are displayed.
The answer above by YGA doesn't handle end of range cases where the desired X value is the same as the reference range's X value. Using the example given by YGA, the excel formula would return #DIV/0! error if an interpolated value at 9999 was asked for. This is obviously part of the reason why YGA added the extreme endpoints of 9999 and -9999 to the input data range, and then assumes that all forecasted values are between these two numbers. If such padding is undesired or not possible, another way to avoid a #DIV/0! error is to check for an exact input value match using the following formula:
=IF(ISNA(MATCH(F3,inputs,0)),FORECAST(F3,OFFSET(inputs,MATCH(F3,inputs)-1,1,2,1),OFFSET(inputs,MATCH(F3,inputs)-1,0,2,1)),OFFSET(inputs,MATCH(F3,inputs)-1,1,1,1))
where F3 is the value where interpolated results are wanted.
Note: I would have just added this as a comment to the original YGA post, but I don't have enough reputation points yet.
alternatively.
=INDEX(yVals,MATCH(J7,xVals,1))+(J7-MATCH(J7,xVals,1))*(INDEX(yVals,MATCH(J7,xVals,1)+1)-INDEX(yVals,MATCH(J7,xVals,1)))/(INDEX(xVals,MATCH(J7,xVals,1)+1)-MATCH(J7,xVals,1))
where j7 is the x value.
xvals is range of x values
yvals is range of y values
easier to put this into code.
You can find out which formula fits best your data, using Excel's "trend line" feature. Using that formula, you can calculate y for any x
Create linear scatter (XY) for it (Insert => Scatter);
Create Polynominal or Moving Average trend line, check "Display Equation on
chart" (right-click on series => Add Trend Line);
Copy the equation into cell and replace x's with your desired x value
On screenshot below A12:A16 holds x's, B12:B16 holds y's, and C12 contains formula that calculates y for any x.
I first posted an answer here, but later found this question
Related
Stata uses the method of quantile calculation called R-2 (https://en.wikipedia.org/wiki/Quantile), whereas Excel uses R-7 with percentile.inc function. My goal is to find a correct formula in Excel that would give results identical to ones in Stata with the R-2 method.
For now, I can see that percentile.inc matches Stata results only for odd and discrete samples (I am dealing with discrete samples). However, the issue occurs with even samples shown here
Conceptually, using percentile.inc in Excel does not seem to be correct since it is an R-7 method, even though it matches with the R-2 method for odd and discrete samples.
My question is what is the simplest formula that would be correct to use in Excel to match Stata percentile results?
So a fairly literal translation of R-2 into Excel for N=4 would look like this (assuming sorted data):
=(INDEX(A$2:A$5,CEILING(C2*4,1))+INDEX(A$2:A$5,FLOOR(C2*4+1,1)))/2
It does indeed go wrong if you try and put in a quantile of zero so that would have to be a special case as would a quantile of 1. I assume Stata gives the lowest and highest values in the set in these two cases?
A more dynamic formula with all the checking would look like this:
=IFS(OR(C2<0,C2>1),"Out of range",C2=0,A$2,C2=1,INDEX(A:A,COUNT(A:A)+1),TRUE,(INDEX(A$2:INDEX(A:A,COUNT(A:A)+1),CEILING(C2*COUNT(A:A),1))+INDEX(A$2:INDEX(A:A,COUNT(A:A)+1),FLOOR(C2*COUNT(A:A)+1,1)))/2)
although you could make it shorter using the Let construct in Microsoft 365.
It would probably be nice to implement this as function in VBA which would sort the data as well as returning the quantile value or of course you could do the sort in a Microsoft 365 formula as well:
=LET(N,COUNT(A:A),sortedRange,SORT(A$2:INDEX(A:A,N+1)),IFS(OR(C2<0,C2>1),"Out of range",C2=0,INDEX(sortedRange,1),C2=1,INDEX(sortedRange,N),TRUE,(INDEX(sortedRange,CEILING(C2*N,1))+INDEX(sortedRange,FLOOR(C2*N+1,1)))/2))
The Problem:
I need to calculate only the slope's standard error.
I have a large data set and I need to calculate more than 1200 different Slope's SE values (and use them for further calculations).
I couldn't find a way only to calculate the slope's standard error.
What I can do but prefer to avoid:
I know that the function LINEST can calculate it using the Array formulas (CSE) that LINEST has. However I do not want to do and re-do over 1200 times it in order to extract just the one value (the Slope's SE) that I need.
Data-set example:
I need to calculate the slope standard error of yx1 (B1:F1,B2:F2), yx2 (B1:F1,B3:F3), yx3 (B1:F1,B4:F4), etc.
Some ideas for solutions:
If anyone can help or recommend a way to get only the slope's standard error value (of 2 variables series) it would be great and save me tons of time.
It can be a formula, a combinations of formulas or a way to use LINEST and extract only the Slope's standard error value.
Thanks!
You can use INDEX with LINEST to pull one number. Use Absolute and Relative References:
=INDEX(LINEST($B$1:$F$1,B2:F2,TRUE,TRUE),2,1)
Put that in G2 and copy down. It will change the known x reference as it does so while keeping the known ys.
And the INDEX returns only the SE of the slope Row(2)Column(1) Number
What I have attempted:
AVERAGEIF(B11:V11,">+MEDIAN(B11:V11)")
What I am trying to do:
I would like to take the average of the upper half of given data. Elaborating more. I would like to find a formula that will allow me to remove a given lower fence of outliers and dissect the data then given to me. I would greatly prefer to maintain this formula within one cell "not grabbing different results from formulas within multiple cells".
Update:
Following through I found the solution.. I think.
One thing I should have explained further:
The data coming in replicating a typical sqrt function.
What I wanted to achieve is to capture the mean of the "plateau" of the data.
The equation I used was:
=AVERAGEIF(B3:B62,(">"&+TRIMMEAN(B3:B62,0.8)),B3:B62)
This was something I just copied and pasted. of course "B3" and "B62" are significant only for my application.
My rough explanation of the equation:
TRIMMEAN will limit the AVERAGE to the top 20%(">")(0.8) of the data selected. So for my application, this SHOULD give me a rough mean of the "plateau" of the data i would like to find the mean for.
This formula calculates the Median() of the range, then AverageIf() uses the median and only grabs values that are greater than or equal to >= the median ~ giving you the average of the 'top-half' of your values.
AVERAGEIF(A1:A10,">="&MEDIAN(A1:A10))
Hope this help!
I have got data points of a sound sample from Audacity which I exported to a .txt file and imported in Excel. Is it possible to plot an upper envelope function in Excel?
(In the end I have to determine the reverberation time, so the time in which the loudness decreases with 60dB.)
For a decaying oscillation such as this or a damped pendulum, the decay envelope can be found by looking at the difference between each sampled reading. You then need to see where the is a change in gradient from +ve to -ve. (or zero on one side). To do this some logical operators are used.
Method:
Consider data starting at column A row 5 for time and Column B row 5 for first data value.
Create a column which has the value minus the previous value. [ C6=B6-B5 etc.]
Next do a column for the gradient change giving a "flag" of 1 for a positive to negative or zero inflection [D7= IF (AND(C7>0,C6<=0],1,0)
This should produce a column of data that corresponds to the peaks.
In the next columns use the flag to get the original co-ordinates to display
[E7 =IF (D7=1,A7,"")] For time and [F7 IF(D7=1,B7,"")]
Copy this data using "Value" only in the paste option to yet another column.
Filter this to exclude the null data.
Beware of aliasing in data set.
Not the most elegant of solutions (and some steps can be linked -for clarity shown separately) but it works.
Tech99m
I've created a spreadsheet for choosing resistor combinations for an RC Operational Amplifier. I've used a list of available capacitors and resistors for my limiting values to produce values of one of the resistors based on the resistance and capacitance values of the available (standard) components. The values in my tables look like 7.23436793078690. I wish to apply a filter that will find the values closest to a whole number (1592.00188622182000). Then I wish to apply another filter that will compare those values to a list of available resistors and highlight resistors closest to the desired value. Many of the returned values of R2 are negative values so I also wish to filter values of R2<0.
For this spreadsheet I've used the equation R2=(Req)(R1)/(R1-Req), which is an equation to determine Req, for parallel resistors, that is solved for R2. In Column 1, the Rows are populated with values for available (standard) resistors. All other columns are populated with the equation for R2. The value for Req is obtained from another table in the Workbook that uses available (standard) capacitor values. Therefore, Columns B and beyond are labeled R2(C=.47 uF), for example. Essentially, Columns B and beyond reference the available (standard) capacitor values.
I wish to highlight the values I discussed in the first paragraph so I can quickly scan the workbook for the best possible value of R2. Then I can quickly determine the values of R1 and C to complete my task and minimize the tolerance for the given op-amp application.
I have some C++ programming knowledge and I have enough experience with Excel so I should be able to understand where and how to do what I wish to do but I wish to get some advice and direction from a more experienced Excel user.
***UPDATE***
Since my first post, I've done some research. It seems like the easiest approach would be to apply a "closest to" filter. I've attached a screenshot of a small portion of my workbook, which contains the equation for the "closest to" filter, a partial range of available resistor values, and the results for my filter. I have multiple tabs in my workbook.
I lied. I'm unable to post an image until I gain 10 reputation. I have 6 reputation. If you're reading this post and you're able to contribute to my reputation, please contribute.
This is my equation: =INDEX(A3:BZ26,MATCH(MIN(ABS(A3:BZ26-CB3)),ABS(A3:BZ26-CB3),0))
The equation format is: =INDEX(rng,MATCH(MIN(ABS(rng-value)),ABS(rng-value),0))
My formula seems to be correct but it returns "#VALUE!".