I'm trying to find the rolling mean of time series while ignoring values that do not follow the trend.
x
869
1570
946
0
1136
So, what I would want the result to look like is...
x | y
869 | 0
1570 | 0
946 | 1128.33
3 | 0
1136 | 1217.33 ([1136+1570+946]/3)
900 | 2982 ([946+1136+900]/3)
860 | 2896
The tough part here is if the row I'm on is a trending value I want to take the 3 previous trending values and find them mean of them, but if it's a non-trending value I want it to just zero out. Sometimes I might have to skip 2 or 3 previous lines to get 3 trending values to take the average as well.
So far I've been using array, RC formulas in a VBA macro form, but I'm not sure I could use RC here or if it has to be something else completely. Any help would be greatly appreciated.
I believe I can help you with your problem. First three notes:
1) It appears to me that you are trying to do DCA on smoothed production profiles, ignoring months without a complete record or no data. I'm making this assumption since you mentioned this was time series data but didn't give a sample rate. 2) I've added some extra 'data' for the sake of demo-ing. 3) In your example you shared, the last two values in your 'Y' column it looks like you may have summed but have forgotten to divide.
The solution I came up with has three parts: 1) create a metric to identify 'outliers'; 2) flag 'outliers'; 3) smooth non-flagged data. Let's establish some worksheet infrastructure and say that your production values are in column B and the associated time is in column A as follows:
Part 1) In column 'C', estimate a rough data value based on a trend approximated from two points on either side of your current time step. Subtract the actual value from this approximation. The result will always be positive and quite large for a timestep with little or no production.
=(INTERCEPT(B1:B6,A1:A6)+(A4*SLOPE(B1:B6,A1:A6)))-B4
Part 2) In column 'D', add a condition for when the value computed above is larger than the actual data point. Have it use '0' to identify a point that shouldn't be included in your average. Copy this down to the end of your data as well.
=IF(C4>B4,0,1)
Our sheet now looks like this:
3) Your three element average can now be computed. In the last cell of column 'E', enter the following array formula. You have to accept this formula by pressing ctrl + shift + enter. Once that is done fill the column from bottom to top:
=IFERROR(IF(D17=1,AVERAGE(INDEX(B12:B17,MATCH(2,1/(FIND(1,D12:D17)))),INDEX(B12:B16,MATCH(2,1/(FIND(1,D12:D16)))-COUNTIF(D17,"=0")),INDEX(B12:B15,MATCH(2,1/(FIND(1,D12:D15)))-COUNTIF(D16:D17,"=0"))),0),"")
This takes averages the most recent three values and allows for a skip of up to three time steps of outlier data per your problem statement. For an idea of how the completed sheet looks:
This was a fun challenge, I have some ideas for a more efficient formula but this should get the job done. Please let me know how this works for you!
Cheers
[EDIT]
An alternative approach which allows the user to specify the number of previous entries to include is detailed below. This is a more general (preferred alternative) and picks up in place of the previously described step 3.
3Alt) In cell G2 enter a number of previous values to average, for this example I am sticking with 3. In cell E4 enter the following array expression (ctrl+shift+enter) and drag to the end of column E:
=IFERROR(IF(D4=1,SUM(INDEX(D:D,LARGE(($D$4:D4=1)*ROW($D$4:D4),$G$2)):D4 * INDEX(B:B,LARGE(($D$4:D4=1)*ROW($D$4:D4),$G$2)):B4)/$G$2,0),"")
This uses the LARGE function to find the 'nth' largest value, where n is the number of preceding values from the current time-step to average. Then it builds a range that extends from the found cell to the current time step. Then it multiplies the flags (0's and 1's) by each month's production value, sums them and divides by n. In this way months flagged as bad are set to 0 and not included in the sum.
This is a much cleaner way to achieve the desired result and has the flexibility to average different periods of time. See example of the final value below.
Related
Let's say I have this:
A B
1 10 20
2 12 30
3 25 15
4 40 30
How do I find the row which have same value in column B and different value for column A when compared to all the rows above or below ?
I want to find this cell:
A2:B2
Update: NO revision necessary
Following feedback I have tested this equation (below) with 20k rows (link below) - happy to report back results as expected/all still in order. No changes necessary/warranted. This function works just fine/as expected. Beaut!
Explanation:
When testing large samples of data of type 'integer' (say) that range a common order of magnitude/size (i.e. have material probability of re-occurring), the probability of obtaining a unique value for field A (col B, below screenshot) reduces, due to the law of large numbers (variance is what leads to unique values, and this reduces as the sample size increases).
As a consequence, one may encounter results = !Calc# which simply means 'no unique values could be found in col A (or they could but only for when col C was also unique - although the probability of this is remote, it's mainly due to numerous other cells in respective columns containing identical data. Throw a negative 100 in column A (assuming all other values are positive real/integer number plane), and you should see my eqn. below return '-100' and whatever the corresponding 'col-C' data is (assuming that is not unique too, as I have mentioned)...
NOW - back to the solution already! :)
ORIGINAL SOLN:
This will give you back every such combination (besides {12,30} there is also {40,30}):
=FILTER(B2:B5&"-"&C2:C5,(COUNTIFS(B2:B5,"="&$B$2:$B$5)=1)*(COUNTIFS(C2:C5,"="&$C$2:$C$5)>1))
OneDrive excel-linked spreadsheet for your convenience here, taking careful note of restrictions per 1st comment to this proposed soln.
Screenshot
Notes
Assumes you have Office 365 version of Excel
So this is the simplified question I broke down from a former question I had here: Excel help on combination of Index - match and sumifs? .
For this one, I have Table1 (the black-gray one) with two or more columns for adjustments for various order numbers. See this image below:
What I want to achieve is to have total adjustments for those order numbers that contain the numbers in Total Adjustment column in the blue table, each of which will depend on the cell beside it.
Example: Order number 17051 has two products: 17051A (Apple) and 17051B (Orange).
Now what I want to achieve in cell C10 is the sum of adjustment for both 17051A and 17051B, which will be: Apple Adjustment (5000) + Orange Adjustment (4500) = 9500.
The formula I used below (and in the image) kept giving me error messages, and this happens even before I add the adjustment for Orange.
=SUMIF(Text(LEFT(Table1[Order Number],5),"00000"),text(B10,"00000"),Table1[Apple Adjustment])
I have spent the whole day looking for a solution for this and didn’t even come close to find any. Any suggestion is appreciated.
Assuming your headers always have the text "adjustment" in them, you could use:
=SUMPRODUCT((LEFT($B$4:$B$7,5)=B10&"")*(RIGHT($C$3:$F$3,10)="adjustment")*$C$4:$F$7)
In C10 you could add two sumproducts. This assumes that products are always 5 numbers long at the start. If not swop the 5 to use the length of the product reference part you are matching on.
=SUMPRODUCT(--(1*LEFT($B$4:$B$7,5)=$B10),$D$4:$D$7)+SUMPRODUCT(--(1*LEFT($B$4:$B$7,5)=$B10),$F$4:$F$7)
Which with table syntax is:
=SUMPRODUCT(--(1*LEFT(Table1[Order Number],5)=$B10),Table1[Apple Adjustment])+SUMPRODUCT(--(1*LEFT(Table1[Order Number],5)=$B10),Table1[Orange Adjustment])
Using LEN
=SUMPRODUCT(--(1*LEFT(Table1[Order Number],LEN($B10))=$B10),Table1[Apple Adjustment])+SUMPRODUCT(--(1*LEFT(Table1[Order Number],LEN($B10))=$B10),Table1[Orange Adjustment])
I am multiplying by 1 to ensure Left, 5 becomes numeric.
I am working on a model of charging load of electric vehicle. I am attaching a link to an excel workbook for your better understanding.
Column B contains random time values
Column G to P represents houses and each house can have 1 car. So the each time values needs to be distributed in one column. Now when a car is plugged in, its load stays constant for 3 cells.
I want excel to randomly distribute these cars e.g. 4 cars to 4 houses and leave others blank.
what i can think of is, to assign each time a random house then use IF formula with AND function to match random times with time series and second condition to match random houses with columns 1-10.
the problem i am facing is, the formula gives a value error and only works in the rows with has random generated time in front of them screenshot. I know there is a very small thing that i am missing. please help me find it
Regards
workbook
=IF(ISNA(MATCH(G$5,$C$6:$C$9,FALSE)),"",IF(AND(INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE))>=$F6,INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE))<=$F6+TIME(0,30,0)),11,""))
The two elements in the AND find the house number in column C and return the corresponding time in column B.
The first element compares the time in F to that time. The second element compares the time + 30 minutes to F (three cells). If it's between those two times, it gets an 11.
The ISNA makes sure that the house in question is on the list. You could also use an IFERROR, but I prefer the precision of ISNA.
Update
If you want the values to wrap around, you need to OR compare to the next day.
=IF(ISNA(MATCH(G$5,$C$6:$C$9,FALSE)),"",IF(OR(AND(ROUND($F6,5)>=ROUND(INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE)),5),ROUND($F6,5)<=ROUND(INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE))+TIME(0,30,0),5)),AND(ROUND($F6+1,5)>=ROUND(INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE)),5),ROUND($F6+1,5)<=ROUND(INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE))+TIME(0,30,0),5))),11,""))
That formula structure looks like
=If(isna(),"",if(or(and(today,today),and(tomorrow,tomorrow)),11,"")
This formulas already getting too big. If you triple it for your three voltages, it will be huge. You should consider writing a UDF in VBA. It won't be as quick to calculate, but will probably be more maintainable.
If you want to stick with a formula, you could put the wattage in row 4 above the house number. Then in another table, list the wattages and minutes to charge. So in, say, B12:C14 you have
3.7 120
11 30
22 15
Now where you have 11 in your formula, you'd have G$4 and the two placed you have TIME(0,30,0), you'd have TIME(0,INDEX($C$12:$C$14,MATCH(G$4,$B$12:$B$14,FALSE)),0). I re-arranged some stuff to make it more 'readable' (but it's still pretty tough) and here's the final formula
=IF(ISNA(MATCH(G$5,$C$6:$C$9,FALSE)),"",IF(OR(AND(ROUND($F6,5)>=ROUND(INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE)),5),ROUND($F6,5)<=ROUND(INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE))+TIME(0,INDEX($C$12:$C$14,MATCH(G$4,$B$12:$B$14,FALSE)),0),5)),AND(ROUND($F6+1,5)>=ROUND(INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE)),5),ROUND($F6+1,5)<=ROUND(INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE))+TIME(0,INDEX($C$12:$C$14,MATCH(G$4,$B$12:$B$14,FALSE)),0),5))),G$4,""))
I'm trying to perform an AVERAGEIFS formula on some data, but there are 2 possible results and as far as I can tell AVERAGEIFS doesn't deal with that situation.
I basically want to have an ELSE inside it.
At the moment I have 2 ranges of data:
The first column only contains values 'M-T' and 'F' (Mon-Thurs and Fri).
The second column contains a time.
The times on the rows with an 'F' value in column 1 are an hour behind the rest.
I want to take an average of all the times, adjusting for the hour delay on Fridays.
So for example I want it to take an average of all the times, but subtract 1 hour from the values which are in a row with an 'F' value in it.
The way I've been doing it so far is by having 2 separate results for each day, then averaging them again for a final one:
=AVERAGEIFS(G3:G172, B3:B172, "M-T")
=AVERAGEIFS(G3:G172, B3:B172, "F")
I want to combine this into just one result.
The closest I can get is the following:
=AVERAGE(IF(B3:B172="M-T",G3:G172,((G3:G172)-1/24)))
But this doesn't produce the correct result.
Any advice?
Try this
=(SUMPRODUCT(G3:G172)-(COUNTIF(B3:B172,"=F")/24))/COUNTIF(B3:B172,"<>""""")
EDIT
Explaining various steps in the formula as per sample data in the snapshot.
SUMPRODUCT(G3:G17) sums up all the value from G3 to G17. It gives a
value of 4.635416667. This after formatting to [h]:mm gives a value
of 111.15
OP desires that Friday time be one hour less. So I have kept one hour less for Friday's in the sample data. Similar SUMPRODUCT on H3:H17 leads to a value of 4.510416667. This after formatting to [h]:mm gives a value
of 108.15. Which is exactly three hours less for three occurrences of Fridays in the sample data.
=COUNTIF(B3:B17,"=F") counts the occurrences of Friday's in the B3:B17 range which are 3 occurrences.Hence 3 hours have to less. These hours are to be represented in terms of 24 hours hence the Function COUNTIF() value is divided by 24. This gives 0.125. Same is the difference of 4.635416667 and 4.510416667 i.e. 0.125
Demonstration column H is for illustrative purposes only. Infact Friday accounted values that is 108.15 in sample data has to be divided by total data points to get the AVERAGE. The occurrences of data points are calculated by =COUNTIF(B3:B17,"<>""""") with a check for empty columns.
Thus 108:15 divided by 15 data points give 7:13 in the answer.
Revised EDIT Based upon suggestions by #Tom Sharpe
#TomSharpe has been kind enough to point the shortcomings in the method proposed by me. COUNTIF(B3:B172,"<>""""") gives too many values and is not advised. Instead of it COUNTA(B3:B172) or COUNT(G3:G172) are preferable. Better Formula to get AVERAGE as per his suggestion gives very accurate results and is revised to:
=AVERAGE(IF(B3:B172="M-T",G3:G172,((G3:G172)-1/24)))
This is an Array Formula. It has to be entered with CSE and further cell to be formatted as time.
If your column of M-T and F is named Day and your column of times is named TIME then:
=SUMPRODUCT(((Day="M-T")*TIME + (Day="F")*(TIME-1/24)))/COUNT(TIME)
One simple solution would be to create a separate column that maps the time column and performs the adjustment there. Then average this new column.
Is that an option?
Ended up just combining the two averageifs. No idea why I didn't just do that from the start:
=((AVERAGEIFS(G$3:G171, $B$3:$B171, "F")-1/24)+AVERAGEIFS(G$3:G171, $B$3:$B171, "M-T"))/2
I'm involved with a youth football tournament on the referee side, with assessing/coaching the referees. I've just taken over doing the data entry for the referees assessment scores which we then use to determine who gets finals etc and am looking to extract more usable information from the data to help us identify trends.
I've got (up to) 200 referees, each receiving from none to two assessment scores each day for 5 days. The scores are entered as both the raw mark and the weighted mark based on match difficulty (along with a host of other data about the match that isn't relevant to this issue.
I can extract the average mark (raw and weighted) across all referees without issues and have done so using the below formula, which is the raw average mark:
=AVERAGE(Working!AK4:AK200,Working!BK4:BK200,Working!CL4:CL200,Working!DL4:DL200,Working!EM4:EM200,Working!FM4:FM200,Working!GN4:GN200,Working!HN4:HN200,Working!IO4:IO200,Working!JO4:JO200)
But I also want to extract the average mark (raw and weighted) across two subsets - Academy and non academy referees, to help plot trends and determine where resources need to be utilised.
I've attempted to use an AVERAGEIF formula, but am getting a #VALUE! return. This is the formula that I've attempted to use to return the average raw mark for those referees in the academy:
=AVERAGEIF(Working!G4:G200,Working!G4:G200="Yes",(Working!AK4:AK200,Working!BK4:BK200,Working!CL4:CL200,Working!DL4:DL200,Working!EM4:EM200,Working!FM4:FM200,Working!GN4:GN200,Working!HN4:HN200,Working!IO4:IO200,Working!JO4:JO200))
If I do the same formula as above, but without the brackets around the [average_range], I get a 'you've used too many arguments, and it highlights BK200.
From what I've been able to find so far online, it seems that the formula I'm trying to use would only work if ALL the cells in (Working!G4:G200) returned "Yes". However if there are only 50 academy referees as indicated by "Yes" in G column, then I want those specific scores to be averaged, and the inverse for the non-academy referees.
I thought about having another sheet, which would simply contain populate from Column G (a simple =G4 and then populated down to =G200 next to all of the scores), consolidated into a block of raw marks columned under Assessment 1, 2, 3, 4.... and then the same for all of the weighted marks which would populate from the equivalent cell on the working sheet, but there's a lot of filtering, and re-sorting that goes on on the working sheet, and I'm not 100% certain that that wouldn't cause issues.
Any feedback on how to work through this problem, so that I can display the overall average mark for academy and non-academy referees in both raw and weighted form would be much appreciated, and I apologize if this post is rather convoluted.
I don't think there is a neat solution if the scores are in several columns which are not consecutive.
My suggestion is:-
(1) Work out the sum for each column separately and total them up
(2) Work out the count for each column separately and total them up
(3) Divide Sum by Count to get Average.
In my small example below with 3 referees and 3 columns:-
(1) In K2:-
=SUMIF(H2:H4,"Yes",B2:B4)+SUMIF(H2:H4,"Yes",D2:D4)+SUMIF(H2:H4,"Yes",F2:F4)
(2) In K3:-
=COUNTIFS(B2:B4,">=0",H2:H4,"Yes")+COUNTIFS(D2:D4,">=0",H2:H4,"Yes")+COUNTIFS(F2:F4,">=0",H2:H4,"Yes")
(3) In K4:
=K2/K3
This would include any zero scores (if this is possible) but exclude any blanks.
You can then scale it up to your data.
Beyond this, you would have to change the data structure either
(1) Add a row to label the columns that you want to average e.g.
Score 1 Score 2 Score 3
3 0 3
so you could pick up only the columns labelled 3 say
Here's how it would be in my small example:-
In K3:-
=SUM((B$2:F$2=3)*($H3:$H5="Yes")*B3:F5)
Which is an array formula and must be entered with Ctrl-Shift-Enter
In K4:-
=SUM((B$2:F$2=3)*($H3:$H5="Yes")*(B3:F5<>""))
another array formula
In K5:-
=K3/K4
This is how the columns you want are labelled with a 3 in row 2, so it ignores the other columns:-
(2) Consolidate them into another sheet as you suggest.