Calculate Average based on multiple condition in Excel - excel

I am back with my new excel question.
Lets say I have table like this.
| A | B
------------------------------------------
1 | ENV | Value
------------------------------------------
2 | ABC - 10/1/2014 1:38:32 PM | 4
3 | XYZ - 10/1/2014 1:38:32 PM | 6
4 | ABC - 9/1/2014 1:38:32 PM | 1
5 | XYZ - 10/1/2014 1:38:32 PM | 10
6 | ABC - 10/1/2014 1:38:32 PM | 7
7 | XYZ - 9/1/2014 1:38:32 PM | 1
8 | ABC - 9/1/2014 1:38:32 PM | 10
9 | ABC - 10/1/2014 1:38:32 PM | 7
10 | XYZ - 10/1/2014 1:38:32 PM | 7
Now, in Cell C2, I've selected ABC.
So in cell D2, I want the average (from col B) of all the "ABC" (col A) where Month = 10 (col A) and in cell E2, Max (from col B) of all the "ABC" where Month = 10 (col A).
So, my result in cells D2 and E2 would be 6 and 7 respectively.
I hope my question and example make sense.
UPDATE:
Thank you all for all your help.
Now let's say I am not sure how many rows I'll have on this spreadsheet, so I came up with this formula, but its not working, giving me #DIV/0! error.
*Note: I am using formula to get "ABC" and "10" from cell C2.
=AVERAGEIFS(
(OFFSET($A$1,1,1,COUNTA($B:$B)-1,1)),
OFFSET($A$1,1,0,COUNTA($A:$A)-1,1), (MID(C2,1,(FIND("-",C2))-2)),
OFFSET($A$1,1,0,COUNTA($A:$A)-1,1), (MID(C2,(FIND("-",C2)+1),(FIND("/",C2))-(FIND("-",C2)+1))))
Even tried this, but same error:
=SUMPRODUCT(((MID(A2:A10,1,(FIND("-",A2:A10))-1))=(MID(C2,(FIND("-",C2)+1),(FIND("/",C2))-(FIND("-",C2)+1))))*
(MONTH(DATEVALUE(MID(A2:A10,7,99)))=(MID(C2,(FIND("-",C2)+1),(FIND("/",C2))-(FIND("-",C2)+1))))*
(B2:B10))/SUMPRODUCT(((MID(A2:A10,1,(FIND("-",A2:A10))-1))=(MID(C2,(FIND("-",C2)+1),(FIND("/",C2))-(FIND("-",C2)+1))))*
(MONTH(DATEVALUE(MID(A2:A10,7,99)))=(MID(C2,(FIND("-",C2)+1),(FIND("/",C2))-(FIND("-",C2)+1)))))
Can you help me with this...?

Solution with Intermediary Values
To solve the issue (I tested the average only) I first used 2 intermediary values: this solution is not optimal and there will be many smarter ways to address the issue (e.g. pivot tables).
ENV Value Intermediary 1 Intermediary 2
ABC - 10/1/2014 1:38:32 PM 4 ABC 10
XYZ - 10/1/2014 1:38:32 PM 6 XYZ 10
ABC - 9/1/2014 1:38:32 PM 1 ABC 9
XYZ - 10/1/2014 1:38:32 PM 10 XYZ 10
The first intermediary column contains the first 3 chars of ENV column (=LEFT(A9,3)), while the second intermediary column contains the month (=MID(A9,7,2)). This works only if your ENV records are fixed size and homogeneous (e.g. your env name has exactly 3 chars).
With this layout, you can compute the average putting in any cell the following formula:
=AVERAGEIFS(D9:D12, F9:F12,"=ABC", G9:G12, "=10")
Where D9:D12 is the values interval, F9:F12 is the 1st intermediary column and G9:G12 the second intermediary column.
One Shot Compact Solution (Arrays)
An optimized solution can be found relying on arrays. For instance, to calculate the average and the max of an interval based on 2 "vectorial" conditions you can write this one liners:
= MAX(IF((LEFT(A9:A12,3)="ABC")*(MID(A9:A12,7,2)="10"),D9:D12))
= AVERAGE(IF((LEFT(A9:A12,3)="ABC")*(MID(A9:A12,7,2)="10"),D9:D12))
With A9:A12 your original records, and D9:D12 is the values interval.
The advantages of this solution are that you don't need any intermediary column and that you can extend this approach to all the other formulas that don't have 'xxxxxIFS' (it's the case for MAX).
NOTE: you have to confirm this formula with CTRL + SHIFT + RETURN or your formula will fail with #VALUE error.
Live Demo
Live demo available here.

You can start by spiting column A into a date and letters using - Data > Text to Columns with the delimiter " - ".
after you have the new two columns (let say F and G) you can use the function "AVERAGEIF" with a condition that check is the value of the cell in "F" is ABC and the Moth(cell in "G") = 10.
as for the max, you can do the same with MAX(IF....) for column E.

SUMPRODUCT will allow you to parse the left-most and date characters from your combined string. A pseudo-MAXIF() can be similarly constructed using MAX() and INDEX().
In D2 use =SUMPRODUCT((LEFT(A2:A10,3)="ABC")*(MONTH(DATEVALUE(MID(A2:A10,7,99)))=10)*(B2:B10))/SUMPRODUCT((LEFT(A2:A10,3)="ABC")*(MONTH(DATEVALUE(MID(A2:A10,7,99)))=10))
In E2 use =MAX(INDEX((LEFT(A2:A10,3)="ABC")*(MONTH(DATEVALUE(MID(A2:A10,7,99)))=10)*(B2:B10),,))
Both SUMPRODUCT and INDEX like to choke on anything remotely resembling an error when parsing text so keep the cell range references to what your actual data is and avoid blanks.
Your results should look like the following.
            

Related

Excel: Remove Duplicates based on time condition

I'm looking to remove duplicates from a 250,000 row excel sheet based on a 3 month rolling time condition.
We have a lot of usersIDs and the dates which they visited but a lot of these visits are very far apart (sometimes over a year) and a lot of them are within the same day/couple day period.
The best way to explain what I want to do is with an example. So if they first visited on 1st Jan, 1st Jan, 3rd Jan, 8th Feb, 4th June, 5th June, 1st Dec, 1st Dec, 2nd Dec, I would want to grab that first date of 1st Jan, 4th June and 1st Dec.
If they visited 1st Jan, 1st Jan, 3rd Jan, 8th Feb, 9th Apr then 1st August, 1st Sept, I would want 1st Jan and 8th August.
So we want to grab the first date, then see how often they visit within 3 months of each visit and if they leave for more than a 3 month period, grab the first date that they return. Sometimes they come back 4 or 5 times after 3 months and the data can span several years.
Is there a way for me to achieve this? It would be great to get some help as this is driving me mad.
Cheers
If the UserID is in column A and the VisitDate is in B with the headings in row 1 and then a blank row in 2 and the data starting in row 3 then try this (explanation below):
Array Formula version:
sort the rows ascending by VisitDate
in B2 put 1/1/1900 so it won't match anything (but it has to be a date)
in C3 put this array formula (press control-shift-enter instead of just enter):
=SUM((B$2:B2<DATE(YEAR(B3),MONTH(B3)-3,DAY(B3)))*(A$2:A2=A3))=SUM((A$2:A2=A3)*1)
Copy the formula in C3 down to every row of data
Filter on Unique = TRUE
if you want to resort you will need to copy and paste back column C by values
New non-array formula version:
sort the rows ascending by VisitDate
in B2 put 1/1/1900 so it won't match anything (but it has to be a date)
in C3 put this normal formula (just press enter):
=COUNTIFS(B$2:B2,"<"&DATE(YEAR(B3),MONTH(B3)-3,DAY(B3)),A$2:A2,A3)=COUNTIF(A$2:A2,A3)
Copy the formula in C3 down to every row of data
Filter on Unique = TRUE
if you want to resort you will need to copy and paste back column C by values
This produces the following with my sample data (array formulas may take a very long time to calculate for lots of rows):
| A | B | C
---+--------+------------+--------
1 | UserID | VisitDate | Unique
2 | | 1/01/1900 |
3 | a | 1/01/2017 | TRUE
4 | a | 1/01/2017 | FALSE
5 | b | 2/01/2017 | TRUE
6 | b | 2/01/2017 | FALSE
7 | a | 3/01/2017 | FALSE
8 | c | 3/01/2017 | TRUE
9 | c | 3/01/2017 | FALSE
10 | b | 4/01/2017 | FALSE
11 | c | 5/01/2017 | FALSE
12 | a | 8/02/2017 | FALSE
13 | b | 9/02/2017 | FALSE
14 | c | 10/02/2017 | FALSE
15 | a | 4/06/2017 | TRUE
16 | a | 5/06/2017 | FALSE
17 | b | 5/06/2017 | TRUE
18 | b | 6/06/2017 | FALSE
19 | c | 6/06/2017 | TRUE
20 | c | 7/06/2017 | FALSE
21 | a | 1/12/2017 | TRUE
22 | a | 1/12/2017 | FALSE
23 | a | 2/12/2017 | FALSE
24 | b | 2/12/2017 | TRUE
25 | b | 2/12/2017 | FALSE
26 | b | 3/12/2017 | FALSE
27 | c | 3/12/2017 | TRUE
28 | c | 3/12/2017 | FALSE
29 | c | 4/12/2017 | FALSE
Because the formula compares the current row with all the rows above looking for rows with dates in the past the data needs to be sorted with the oldest dates first.
How the array formula works:
=SUM((B$2:B2<DATE(YEAR(B3),MONTH(B3)-3,DAY(B3)))*(A$2:A2=A3))=SUM((A$2:A2=A3)*1)
DATE(YEAR(B3),MONTH(B3)-3,DAY(B3)) is 3 months ago (even if it is 92 days)
(B$2:B2<DATE(YEAR(B3),MONTH(B3)-3,DAY(B3))) is an array of TRUE/FALSE values which has a TRUE for every row above that is older than 3 months ago
(A$2:A2=A3) is an array of TRUE/FALSE values which has a TRUE for every row above that matches the user ID
(B$2:B2<DATE(YEAR(B3),MONTH(B3)-3,DAY(B3)))*(A$2:A2=A3) does an AND of the arrays so 1 is returned (TRUE*TRUE=1) for each row above that has the same name and a date that is older than 3 months ago
SUM((B$2:B2<DATE(YEAR(B3),MONTH(B3)-3,DAY(B3)))*(A$2:A2=A3)) adds all the TRUE rows above that have the same name and a date that is older than 3 months ago
SUM((A$2:A2=A3)*1) adds the number of rows above that have the same name (TRUE*1=1)
=SUM((B$2:B2<DATE(YEAR(B3),MONTH(B3)-3,DAY(B3)))*(A$2:A2=A3))=SUM((A$2:A2=A3)*1) compares the two sums and returns TRUE if all the rows above that have the same name are all older than 3 months ago
Methodology:
I originally just played with a column of dates - no userID. I wanted to find a way to know if the date on a particular was more than 3 months after all the dates before it (I implicitly assumed that the dates were sorted). I reasoned that if a count of the dates before the current row matched a count of the dates before the current row that were older than 3 months in the past then I would have the answer I wanted. So I originally put this formula in C3 and copied it down:
=COUNTIF(B$2:B2,"<"&(B3-90))=COUNTA(B$2:B2)
Then change it to 3 months instead of 90 days:
=COUNTIF(B$2:B2,"<"&DATE(YEAR(B3),MONTH(B3)-3,DAY(B3)))=COUNTA(B$2:B2)
And then to add the userID we need a way to compare multiple criteria - this is where COUNTIFS comes in (if you have Excel 2007 or better):
=COUNTIFS(B$2:B2,"<"&DATE(YEAR(B3),MONTH(B3)-3,DAY(B3)),A$2:A2,A3)=COUNTIF(A$2:A2,A3)
And then I converted it to this array formula:
=SUM((B$2:B2<DATE(YEAR(B3),MONTH(B3)-3,DAY(B3)))*(A$2:A2=A3))=SUM((A$2:A2=A3)*1)
In retrospect I don't know if giving the array formula was a good idea or not: I don't know whether the array formula would be better/faster than COUNTIFS or not. So use whichever you prefer.

Sum only values which fall Monday to Friday

I receive a statement (as a .xls) each month which list a bunch billable items with an associated date. I want to create a formula (using either =sum() or =sumifs() to total the billable items, but only those which fall Monday to Friday (i.e., not weekends). Is that possible?
A B
------+--------------+-------------
1 | 05/12/2016 | $10.00
2 | 06/12/2016 | $10.00
3 | 07/12/2016 | $10.00
4 | 08/12/2016 | $10.00 dates are formatted as
5 | 09/12/2016 | $10.00 dd/mm/yyyy
6 | 10/12/2016 | $10.00
7 | 11/12/2016 | $10.00
8 | 12/12/2016 | $10.00
------+--------------+-------------
| Sum | $80.00
------+--------------+-------------
| Sum |
| (no weekends)| $60.00
------+--------------+-------------
EDIT:
I've just looked closer at the excel doc, and it's actually a datetime field, e.g. 31/10/2016 12:44:00 pm (displayed as 31/10/16 12:44).
I'm also not looking for a formula which works line by line, I'd like something which I can just copy and paste into a single cell at the bottom of the doc each month which examines A:A.
You need to use this formula:
=SUMPRODUCT(B1:B8,--(WEEKDAY(A1:A8,2)<6))
This is a hack which behaves like SUMIF but lets you use a function in your criteria. Otherwise, you would need to create an auxiliary column with WEEKDAY (in C for example) and then use =SUMIF(C1:C8,"<6",B1:B8).
WEEKDAY by default returns 1-7 for SUN-SAT. As this doesn't help, you can change the return type to type 2 with the optional second parameter to make the function return 1-7 for MON-SUN, which lets you do the easy <6 comparison. You can also use type 3, which returns 0-6 for MON-SUN, and then obviously use <5 instead.
More about the -- hack here.

Need some kind of pivotal table without agregation in Excel

I struck with problem of getting reports from table, that look like this:
C1| C2 | C3 | C4
A | 2015-05-15 | 34 | 4
A | 2015-03-12 | -4 | 5
A | 2014-03-12 | 24 | 8
B | 2015-11-10 | -4 | 5
B | 2015-06-12 | 3 | 5
C | 2013-05-12 | 3 | 5
...
600+ rows
...
So I need to make a diagram by different value columns (C3 and C4) grouping by values in the first column. In usual case it is achieved with to separate table which a looks like this (e.g. for col3):
A | B | C | ....
34 | -4 | 3 | ....
-4 | 3 | | ....
24 | | | ....
For col4, I need a table with the similar layout. So in short, I need to make some pivotal table by without aggregation on value per term. Is it possible to get such small table with the original table? If you can offer some other layout for original data which will be more suitable (and easier, in ideal with standard excel functions) for this task, fell free to offer - with some Python script I can resave it.
Find a simple solution: just download tool Tableau. It affords to create very flexible graphs based on raw data, freely place any col as row and vice versa, groping also available.
First, I will assume that C1,C2,C3,C4 are column headings in A1:D1 and the data is in A2:D1000
In F1:H1 place C3,C3,C3
In F2:H2 place A,B,C
In J1:L1 place C4,C4,C4
In J2:L2 place A,B,C
In F3 place the following array formula:
=IFERROR(INDEX(OFFSET($A$2:$A$1000,0,MATCH(F$1,$A$1:$D$1,0)-1),SMALL(IF($A$2:$A$1000=F$2,ROW($A$2:$A$1000)-ROW(A$2)+1),ROWS(F$2:F2))),"")
Don't forget to use Ctrl+Shift+Enter to make it an array formula instead of just using Enter for a normal formula.
Copy F3 to F4:F7, G3:H7 and J3:L7
You end up with something that looks like this:
| A B C D E F G H I J K L
----|--------------------------------------------------------------------
1 | C1 C2 C3 C4 C3 C3 C3 C4 C4 C4
2 | A 15/05/2015 34 4 A B C A B C
3 | A 12/03/2015 -4 5 34 -4 3 4 5 5
4 | A 12/03/2014 24 8 -4 3 5 5
5 | B 10/11/2015 -4 5 24 8
6 | B 12/06/2015 3 5
7 | C 12/05/2013 3 5
Decomposing the formula in F3:
=IFERROR(INDEX(source column,SMALL(filtered row numbers,ranking)),"")
Where:
source column is OFFSET($A$2:$A$1000,0,MATCH(F$1,$A$1:$D$1,0)-1)
filtered row numbers is IF($A$2:$A$1000=F$2,row numbers)
row numbers is ROW($A$2:$A$1000)-ROW(A$2)+1
ranking is ROWS(F$2:F2)
How the formula works is a multipart explanation:
IFERROR(formula,"") If the embedded formula produces an error then we display an empty cell instead. This is useful since we don't know how many results we will get.
INDEX(range,row) Get the value in the row we want to see. The row number here is dynamically generated based on whether the data matches the criteria.
SMALL(array,k) Extract the kth smallest value in the array. The array is the filtered values. k is just a number depending on which row in F we put the formula in.
IF(criteria,value) Based on the criteria either produces the value or FALSE.
ROW(cell) The row number of the cell
ROWS(range) The size of the range
So in the array formula:
ROW($A$2:$A$1000)-ROW(A$2)+1 is a list of all the row numbers in the range starting at 1
IF($A$2:$A$1000=F$2,ROW($A$2:$A$1000)-ROW(A$2)+1) is a list of just those row numbers that match the criteria
SMALL takes the filtered list and returns the kth smallest filtered row number which means that F3 gets the first one, F4 gets the second one, etc
INDEX grabs the data at that row number and displays it
I got the idea for this from http://www.exceltactics.com/make-filtered-list-sub-arrays-excel-using-small/ and on that page is a really good explanation of how it all works.

How to select the last cell in a range in Excel

I've got a spreadsheet that updates throughout the day with data, I need to be able to grab the last cell in a column but for certain date ranges, not just the last cell in the column.
Column C contains the data I need, column A and B update with the date and time, (some cells in column A could be blank too). Column D I can change to make column E display the latest data for the selected date.
Here's what I've got so far to put in column E:
VLOOKUP(D1, $A:$C,3,FALSE)
I've managed to get data from my formula but only the first entry. For example if I enter the date 17/05/2016 it will return '5'. Whereas I need the more recent data '28'.
Example sheet:
A | B | C | D | E
16/05/2016 | 08:00:00 | 3 | date | data
16/05/2016 | 12:00:00 | 7
16/05/2016 | 18:00:00 | 15
16/05/2016 | 22:00:00 | 27
17/05/2016 | 08:00:00 | 5
17/05/2016 | 12:00:00 | 11
17/05/2016 | 18:00:00 | 21
17/05/2016 | 22:00:00 | 28
18/05/2016 | 08:00:00 | 4
18/05/2016 | 12:00:00 | 13
18/05/2016 | 18:00:00 | 19
18/05/2016 | 22:00:00 | 30
I've only just started getting my head around excel formulas so any help would be greatly appreciated!
=INDEX(C2:C13,MATCH(D3,A2:A13,1))
INDEX/MATCH is a very powerful combination. It can perform the same job as VLOOKUP and then a bit more. VLOOKUP is restricted to searching the first column and returning information to the right. With MATCH you can search any column, and you can return information from any column (even to the left which vlookup cant do)
If you start reading with the MATCH function, it searches for the value in D3 within the range A2:A13 and return an integer representing the row the value of D3 was found it. The 1 at the end of match tell match to look for that last entry that D3 exceeded. This means that column A needs to be sorted in ASCENDING order
INDEX uses the integer from MATCH and goes down that many rows in in specified range. so if match returned 1, then it would read C2.

Excel formula to get ranking position

I have a table of people with points. The more points, the higher your position. If you have the same points you are equal first, second etc.
| A | B | C
1 | name | position | points
2 | person1 | 1 | 10
3 | person2 | 2 | 9
4 | person3 | 2 | 9
5 | person4 | 2 | 9
6 | person5 | 5 | 8
7 | person6 | 6 | 7
Using an Excel formula, how can I automatically determine the position? I'm currently using an IF statement that works fine for 5 or 6 matching positions, but I can't add 30+ if statements because there's a limit to the formula.
=IF(C7=C2,B2,IF(C7=C3,B2+5,IF(C7=C4,B3+4,....
So if the points column is the same as the position above then it's the same position value. If the points are less than above then it drops a position so the previous row position +1. But if the row above that is the same then it's the previous position +2 and so on.
You could also use the RANK function
=RANK(C2,$C$2:$C$7,0)
It would return data like your example:
| A | B | C
1 | name | position | points
2 | person1 | 1 | 10
3 | person2 | 2 | 9
4 | person3 | 2 | 9
5 | person4 | 2 | 9
6 | person5 | 5 | 8
7 | person6 | 6 | 7
The 'Points' column needs to be sorted into descending order.
Type this to B3, and then pull it to the rest of the rows:
=IF(C3=C2,B2,B2+COUNTIF($C$1:$C3,C2))
What it does is:
If my points equals the previous points, I have the same position.
Othewise count the players with the same score as the previous one, and add their numbers to the previous player's position.
You can use the RANK function in Excel without necessarily sorting the data. Type =RANK(C2,$C$2:$C$7). Excel will find the relative position of the data in C2 and display the answer. Copy the formula through to C7 by dragging the small node at the right end of the cell cursor.
Try this in your forth column
=COUNTIF(B:B; ">" & B2) + 1
Replace B2 with B3 for next row and so on.
What this does is it counts how many records have more points then current one and then this adds current record position (+1 part).
If your C-column is sorted, you can check whether the current row is equal to your last row. If not, use the current row number as the ranking-position, otherwise use the value from above (value for b3):
=IF(C3=C2, B2, ROW()-1)
You can use the LARGE function to get the n-th highest value in case your C-column is not sorted:
=LARGE(C2:C7,3)
The way I've done this, which is a bit convoluted, is as follows:
Sort rows by the points in descending order
Create an additional column (D) starting at D2 with numbers 1,2,3,... total number of positions
In the cell for the actual positions (D2) use the formula if(C2=C1), D2, C1). This checks if the points in this row are the same as the points in the previous row. If it is it gives you the position of the previous row, otherwise it uses the value from column D and thus handle people with equal positions.
Copy this formula down the entire column
Copy the positions column(C), then paste special >> values to overwrite the formula with positions
Resort the rows to their original order
That's worked for me! If there's a better way I'd love to know it!

Resources