Combination of several INDEX and MATCH functions - excel

I'm currently working on an evaluation excel sheet for forceplate data (showing vertical force development in jumps over time) and stumbled upon a problem that I couldn't manage to fix for the past days. Basically there are two main columns over ~ 4000 rows and 1 extra cell:
Column A shows time [in ms]
Column B shows vertical force measured at the time point in Column A
C1 is the already calculated peak force value before takeoff
I am now trying to define the timepoint of takeoff in an extra cell using INDEX and MATCH functions (FYI: the time of takeoff is when the vertical force value is close to 0 for the first time [range of lookup must be starting from the peak force value though!!], but never exactly 0 due to force plate drift in measurement)
My idea was this:
=INDEX(A2:A4000;MATCH(0;INDEX(B2:B4000;MATCH(C1;B2:B4000;0)):B4000;-1))
so the range
INDEX(B2:B4000;MATCH(C1;B2:B4000;0)):B4000
should define a range of force values starting at the peak force value (C1).
Unfortunately Excel will show me a timepoint where the force value is far away from 0. I've tried the same formula within an easier (but for my purpose faulty) range (B2:B4000) and it worked perfectly, so I guess the problem I'm dealing with lies somewhere within the range defined with the INDEX function.
I'd be glad if someone could help me out with this!

You are certainly on the right track. It seems you've correctly adjusted the range in the nested INDEX function but that MATCH function will retunr the position within the adjusted B2:B4000. You need to adjust A2:A4000 in the same way so that the position returned by MATCH will be correct.
=INDEX(INDEX(A2:A4000; MATCH(C1; B2:B4000; 0)):A4000; MATCH(0; INDEX(B2:B4000; MATCH(C1; B2:B4000; 0)):B4000; -1))
I don't have sample data to test that on but I believe it is correct.

Related

Excel Group rows based on time interval

I have the following excel result:
I want to group the above result in groups based on sessions i.e. if the time gap between two successive timestamps is greater than 5 minutes, it must be a new row.
For example :
I need some formula to achieve this. As I'm fairly new to Excel this is causing to be a major headache for me. Please help me, if anyone knows how to do it or at least point me in a direction.
Thanks a ton !!!
Judging by your screenshot, it appears your timestamps are actually text values. Text by default is usually left aligned where as numbers are right aligned. You seem to have a space at the end of your time stamp suggesting that it is probably left aligned and therefore text. You can test it with the following formula which will return TRUE if its text.
=ISTEXT(P2)
where P2 is one of your time stamps.
CONVERT TIMESTAMPS TO TIME
There are a variety of ways to do this. Some will depend on system settings. Take a look at the following functions as each might be useable depending on your system. The first two are a guarantee, the last two are more dependent on system settings.
DATE
TIME
DATEVALUE
TIMEVALUE
Something important to remember here is that in excel dates are integers counting the days since 1900/01/01 with that date being 1. Time is stored as a decimal and represents fraction/percentage of a day. 24:00:00 is not a valid time in excel though some functions may work with it.
So in order to convert your time stamp in P2 I used the following formula to pull out the date:
=DATE(LEFT(P2,4),MID(P2,FIND("-",P2)+1,2),MID(P2,FIND(" ",P2)-2,2))
Basically it goes into the text and strips out the individual numbers for Year, Month and Day.
To pull out the time, I could have done the same procedure but elected to demonstrate the TIMEVALUE method which is a little more robust than DATEVALUE and not a subjective to system settings as much. With the following formula I stripped out the whole time code (MINUS"UTC"):
=TIMEVALUE(TRIM(MID(P2,FIND(" ",P2)+1,FIND("UTC",P2)-FIND(" ",P2)-1)))
I also made an assumption that you are not mixing and matching UTC with other time zones which means it can be ignored. Now to get DATE and TIME all in one cell, you just need to add the two formulas together to get:
=DATE(LEFT(P2,4),MID(P2,FIND("-",P2)+1,2),MID(P2,FIND(" ",P2)-2,2))+TIMEVALUE(TRIM(MID(P2,FIND(" ",P2)+1,FIND("UTC",P2)-FIND(" ",P2)-1)))
In the example at the end, I placed that formula in Q2 and copied down
DELTA TIME
Since you want to break your groups out based on a time difference between individual entries, I used a helper column to store the time difference. In my example at the end I stored this difference in Column S. The first entry is blank as there is no time before it. I used the following formula in S3 and copied it downward.
=Q3-Q2
I applied the custom formatting of [h]:mm:ss to the cell to get it to display as shown.
FIND GROUP BREAK POINTS
In my example I am using helper column T to hold breakpoint flags. At a minimum, you will have two break points. Your first time entry and your last time entry. To make like simple I simply hard coded my first breakpoint flag in T2 as 1. Stating in T3, Three checks need to be made. If any of them are TRUE then the next flag needs to be added with a value increase by one. the three checks are:
Is this the last entry
Is the next time delta greater than 5 minutes (means end of a group)
Is this time delta greater than 5 minutes (means start of a group)
Based on those three checks I placed the following formula in T3 and copied down:
=IF(OR(S4="",S4>TIME(0,5,0),S3>TIME(0,5,0)),MAX($T$2:T2)+1,"")
Note the $ on the first part of the range for the MAX function. This will lock the start of the range while the formula gets copied down while the end of the range increases accordingly.
Also the row after the last time entry must be blank. IF it is not blank and has a set value in it, change the S4="" to S4="set value".
GENERATE TABLE
There are multiple ways to reference the flags and pull the corresponding times. a couple of formulas you can look into are:
INDEX / MATCH
LOOKUP
In this example I elected to use LOOKUP though I believe INDEX and MATCH are more appropriate and robust. For starters we want to generate a list of ODD number and EVEN numbers. These represent the start and end of the groups and correspond to the flags set in column T. One way to generate ODD and EVEN numbers as you copy down is:
=ROW(A1)*2-1 (ODD)
=ROW(A1)*2 (EVEN)
The next step is to find the generated number in Column T and then pull its corresponding timestamp in Column Q. I did this with the following formula in V2 and copied down.
=LOOKUP(ROW(A1)*2-1,T:T,Q:Q)
And in W2
=LOOKUP(ROW(A1)*2,T:T,Q:Q)

Excel- Generating a set of numbers with normal distribution with MIN and MAX

I want to generate a single column of 6000 numbers with a normal distribution, with a mean of 30.15, standard deviation of 49.8, minium of -11.5, maximum 133.5.
I am a total newb at this so i tried to use the following formula in a cell and than just drag it down to cell 6000:
=NORMINV(RANDBETWEEN(-11.5,133.5)/100,30.15,49.8)
It returns a value but sometimes it returns #NUM! error. Thank you!
Unfortunately NORMINV expects a probability for the argument, which must be a value in the interval (0, 1). Any parameter outside that range will yield #NUM!.
What you're asking cannot be done directly with a normal distribution since that has no constraints on the minimum and maximum values.
One approach is to use a primary column to generate the normally distributed numbers, then filter out the ones you want in the adjacent column. But this will cause even the mean (let alone higher moments) to go off quite considerably due to your minimum and maximum values not being equidistant from the mean. You could get round this by recentering the distribution and adjusting afterwards.

Excel Rolling Mean of 3 Similar Consecutive Observations

I'm trying to find the rolling mean of time series while ignoring values that do not follow the trend.
x
869
1570
946
0
1136
So, what I would want the result to look like is...
x | y
869 | 0
1570 | 0
946 | 1128.33
3 | 0
1136 | 1217.33 ([1136+1570+946]/3)
900 | 2982 ([946+1136+900]/3)
860 | 2896
The tough part here is if the row I'm on is a trending value I want to take the 3 previous trending values and find them mean of them, but if it's a non-trending value I want it to just zero out. Sometimes I might have to skip 2 or 3 previous lines to get 3 trending values to take the average as well.
So far I've been using array, RC formulas in a VBA macro form, but I'm not sure I could use RC here or if it has to be something else completely. Any help would be greatly appreciated.
I believe I can help you with your problem. First three notes:
1) It appears to me that you are trying to do DCA on smoothed production profiles, ignoring months without a complete record or no data. I'm making this assumption since you mentioned this was time series data but didn't give a sample rate. 2) I've added some extra 'data' for the sake of demo-ing. 3) In your example you shared, the last two values in your 'Y' column it looks like you may have summed but have forgotten to divide.
The solution I came up with has three parts: 1) create a metric to identify 'outliers'; 2) flag 'outliers'; 3) smooth non-flagged data. Let's establish some worksheet infrastructure and say that your production values are in column B and the associated time is in column A as follows:
Part 1) In column 'C', estimate a rough data value based on a trend approximated from two points on either side of your current time step. Subtract the actual value from this approximation. The result will always be positive and quite large for a timestep with little or no production.
=(INTERCEPT(B1:B6,A1:A6)+(A4*SLOPE(B1:B6,A1:A6)))-B4
Part 2) In column 'D', add a condition for when the value computed above is larger than the actual data point. Have it use '0' to identify a point that shouldn't be included in your average. Copy this down to the end of your data as well.
=IF(C4>B4,0,1)
Our sheet now looks like this:
3) Your three element average can now be computed. In the last cell of column 'E', enter the following array formula. You have to accept this formula by pressing ctrl + shift + enter. Once that is done fill the column from bottom to top:
=IFERROR(IF(D17=1,AVERAGE(INDEX(B12:B17,MATCH(2,1/(FIND(1,D12:D17)))),INDEX(B12:B16,MATCH(2,1/(FIND(1,D12:D16)))-COUNTIF(D17,"=0")),INDEX(B12:B15,MATCH(2,1/(FIND(1,D12:D15)))-COUNTIF(D16:D17,"=0"))),0),"")
This takes averages the most recent three values and allows for a skip of up to three time steps of outlier data per your problem statement. For an idea of how the completed sheet looks:
This was a fun challenge, I have some ideas for a more efficient formula but this should get the job done. Please let me know how this works for you!
Cheers
[EDIT]
An alternative approach which allows the user to specify the number of previous entries to include is detailed below. This is a more general (preferred alternative) and picks up in place of the previously described step 3.
3Alt) In cell G2 enter a number of previous values to average, for this example I am sticking with 3. In cell E4 enter the following array expression (ctrl+shift+enter) and drag to the end of column E:
=IFERROR(IF(D4=1,SUM(INDEX(D:D,LARGE(($D$4:D4=1)*ROW($D$4:D4),$G$2)):D4 * INDEX(B:B,LARGE(($D$4:D4=1)*ROW($D$4:D4),$G$2)):B4)/$G$2,0),"")
This uses the LARGE function to find the 'nth' largest value, where n is the number of preceding values from the current time-step to average. Then it builds a range that extends from the found cell to the current time step. Then it multiplies the flags (0's and 1's) by each month's production value, sums them and divides by n. In this way months flagged as bad are set to 0 and not included in the sum.
This is a much cleaner way to achieve the desired result and has the flexibility to average different periods of time. See example of the final value below.

Removing Lower/Upper Fence of outliers from input data to then be evaluated

What I have attempted:
AVERAGEIF(B11:V11,">+MEDIAN(B11:V11)")
What I am trying to do:
I would like to take the average of the upper half of given data. Elaborating more. I would like to find a formula that will allow me to remove a given lower fence of outliers and dissect the data then given to me. I would greatly prefer to maintain this formula within one cell "not grabbing different results from formulas within multiple cells".
Update:
Following through I found the solution.. I think.
One thing I should have explained further:
The data coming in replicating a typical sqrt function.
What I wanted to achieve is to capture the mean of the "plateau" of the data.
The equation I used was:
=AVERAGEIF(B3:B62,(">"&+TRIMMEAN(B3:B62,0.8)),B3:B62)
This was something I just copied and pasted. of course "B3" and "B62" are significant only for my application.
My rough explanation of the equation:
TRIMMEAN will limit the AVERAGE to the top 20%(">")(0.8) of the data selected. So for my application, this SHOULD give me a rough mean of the "plateau" of the data i would like to find the mean for.
This formula calculates the Median() of the range, then AverageIf() uses the median and only grabs values that are greater than or equal to >= the median ~ giving you the average of the 'top-half' of your values.
AVERAGEIF(A1:A10,">="&MEDIAN(A1:A10))
Hope this help!

Interpolating data points in Excel

I'm sure this is the kind of problem other have solved many times before.
A group of people are going to do measurements (Home energy usage to be exact).
All of them will do that at different times and in different intervals.
So what I'll get from each person is a set of {date, value} pairs where there are dates missing in the set.
What I need is a complete set of {date, value} pairs where for each date withing the range a value is known (either measured or calculated).
I expect that a simple linear interpolation would suffice for this project.
If I assume that it must be done in Excel.
What is the best way to interpolate in such a dataset (so I have a value for every day) ?
Thanks.
NOTE: When these datasets are complete I'll determine the slope (i.e. usage per day) and from that we can start doing home-to-home comparisons.
ADDITIONAL INFO After first few suggestions:
I do not want to manually figure out where the holes are in my measurement set (too many incomplete measurement sets!!).
I'm looking for something (existing) automatic to do that for me.
So if my input is
{2009-06-01, 10}
{2009-06-03, 20}
{2009-06-06, 110}
Then I expect to automatically get
{2009-06-01, 10}
{2009-06-02, 15}
{2009-06-03, 20}
{2009-06-04, 50}
{2009-06-05, 80}
{2009-06-06, 110}
Yes, I can write software that does this. I am just hoping that someone already has a "ready to run" software (Excel) feature for this (rather generic) problem.
I came across this and was reluctant to use an add-in because it makes it tough to share the sheet with people who don't have the add-in installed.
My officemate designed a clean formula that is relatively compact (at the expensive of using a bit of magic).
Things to note:
The formula works by:
using the MATCH function to find the row in the inputs range just before the value being searched for (e.g. 3 is the value just before 3.5)
using OFFSETs to select the square of that line and the next (in light purple)
using FORECAST to build a linear interpolation using just those two points, and getting the result
This formula cannot do extrapolations; make sure that your search value is between the endpoints (I do this in the example below by having extreme values).
Not sure if this is too complicated for folks; but it had the benefit of being very portable (and simpler than many alternate solutions).
If you want to copy-paste the formula, it is:
=FORECAST(F3,OFFSET(inputs,MATCH(F3,inputs)-1,1,2,1),OFFSET(inputs,MATCH(F3,inputs)-1,0,2,1
(inputs being a named range)
There are two functions, LINEST and TREND, that you can try to see which gives you the better results. They both take sets of known Xs and Ys along with a new X value, and calculate a new Y value. The difference is that LINEST does a simple linear regression, while TREND will first try to find a curve that fits your data before doing the regression.
The easiest way to do it probably is as follows:
Download Excel add-on here: XlXtrFunâ„¢ Extra Functions for Microsoft Excel
Use function intepolate().
=Interpolate($A$1:$A$3,$B$1:$B$3,D1,FALSE,FALSE)
Columns A and B should contain your input, and column G should contain all your date values. Formula goes into the column E.
A nice graphical way to see how well your interpolated results fit:
Take your date,value pairs and graph them using the XY chart in Excel (not the Line chart). Right-click on the resulting line on the graph and click 'Add trendline'. There are lots of different options to choose which type of curve fitting is used. Then you can go to the properties of the newly created trendline and display the equation and the R-squared value.
Make sure that when you format the trendline Equation label, you set the numerical format to have a high degree of precision, so that all of the significant digits of the equation constants are displayed.
The answer above by YGA doesn't handle end of range cases where the desired X value is the same as the reference range's X value. Using the example given by YGA, the excel formula would return #DIV/0! error if an interpolated value at 9999 was asked for. This is obviously part of the reason why YGA added the extreme endpoints of 9999 and -9999 to the input data range, and then assumes that all forecasted values are between these two numbers. If such padding is undesired or not possible, another way to avoid a #DIV/0! error is to check for an exact input value match using the following formula:
=IF(ISNA(MATCH(F3,inputs,0)),FORECAST(F3,OFFSET(inputs,MATCH(F3,inputs)-1,1,2,1),OFFSET(inputs,MATCH(F3,inputs)-1,0,2,1)),OFFSET(inputs,MATCH(F3,inputs)-1,1,1,1))
where F3 is the value where interpolated results are wanted.
Note: I would have just added this as a comment to the original YGA post, but I don't have enough reputation points yet.
alternatively.
=INDEX(yVals,MATCH(J7,xVals,1))+(J7-MATCH(J7,xVals,1))*(INDEX(yVals,MATCH(J7,xVals,1)+1)-INDEX(yVals,MATCH(J7,xVals,1)))/(INDEX(xVals,MATCH(J7,xVals,1)+1)-MATCH(J7,xVals,1))
where j7 is the x value.
xvals is range of x values
yvals is range of y values
easier to put this into code.
You can find out which formula fits best your data, using Excel's "trend line" feature. Using that formula, you can calculate y for any x
Create linear scatter (XY) for it (Insert => Scatter);
Create Polynominal or Moving Average trend line, check "Display Equation on
chart" (right-click on series => Add Trend Line);
Copy the equation into cell and replace x's with your desired x value
On screenshot below A12:A16 holds x's, B12:B16 holds y's, and C12 contains formula that calculates y for any x.
I first posted an answer here, but later found this question

Resources