Count unique values in column A based on a moving range of criteria in column B - excel

Okay, so this is my first question, let's hope I can explain it well...
Essentially, I would like to count the number of unique values in column A, but from a subset of those which have, in column B, a value that falls within a specified range.
Here's an example:
ColumnA ColumnB
potato 29.1
potato 29.7
potato 30.3
potato 31.0
bean 31.6
apple 32.2
apple 32.8
bean 33.5
bean 34.0
apple 34.3
potato 35.0
Count b/w 29-31: 1
Count b/w 30-32: 2
Count b/w 31-33: 3
Count b/w 32-34: 2
Count b/w 33-35: 3
In other words, I want to know how many unique items are present within each range (as specified by column B), and I want to carry that down through a series of overlapping ranges.
So far, the best I've been able to come up with is a COUNTIFS formula that counts the total number of records in each range. e.g.:
=COUNTIFS(B1:B11,">=29",B1:B11,"<=31")
=COUNTIFS(B1:B11,">=30",B1:B11,"<=32")
=COUNTIFS(B1:B11,">=31",B1:B11,"<=33")
etc...
And this obviously doesn't even reference column A. I've tried a few different array formulas based on similar questions, but they're always solving a slightly different problem, so I've been largely unsuccessful.
Any help much appreciated! Thank you.

You would use this array formula:
=SUM(IF(($B$2:$B$12>=A16)*($B$2:$B$12<=B16),(1/COUNTIFS($A$2:$A$12,$A$2:$A$12,$B$2:$B$12,">=" & A16,$B$2:$B$12,"<=" & B16))))
Being and array formula it must be confirmed with Ctrl-Shift-Enter when exiting edit mode. If done correctly then Excel will put the {} around the formula automatically.
It finds all the rows where the data in B is between the extents then uses the 1/COUNTIF() to find the unique values.

Related

Distributing students across classes based on marks

Name
Marks
Rank
Class
Eddie
20
6
C
Tom
10
10
A
Jenny
30
4
A
Riva
40
3
C
Andy
50
2
B
Mark
5
11
B
Sally
78
1
A
Jack
15
8
B
Dick
15
8
C
Harry
20
6
A
Dom
30
4
B
The students are expected to be distributed across classes A, B and C, based on their marks in the above picture.
The student with the highest marks goes in A. The one with the next highest goes in B. The next highest goes in C. The next goes again to A and so on.
What should be the formula to be used in Excel 2013 and above for calculating the Class?
Sort the table by either Marks descending or Rank ascending.
D2: =CHOOSE(MOD(ROWS($1:1)-1,3)+1,"A","B","C")
If you are using Excel 365, you can use the SORTBY function to solve the question.
Assume the Name column is in a named range called List_Name, the Marks column is in a named range called List_Marks, your example dataset is in range A1:D12, and you want to return the class code in column D.
In cell D2, enter any one of the following formulas and drag it down:
=CHOOSE(MOD(MATCH(A2,SORTBY(List_Name,List_Marks,-1),0),3)+1,"C","A","B")
Alternatively, you can use the following in cell D2 instead:
=INDEX({"C";"A";"B"},MOD(MATCH(A2,SORTBY(List_Name,List_Marks,-1),0),3)+1)
If you cannot use the SORTBY function, then the answer provided by Ron Rosenfeld should do the job quite well.
Let me know if you have any questions.
Assuming that the chart you provided is in cells A1-D11
Try making a 2x3 chart on the side (I’m using F2-G4 with 1...A 2...B 0...C
and then put the formula in D1 as follows: =vlookup(mod(C2,3),F2:G4, false)
You could even skip out the whole C column if you wanted, writing =vlookup(mod(rank(A2,B:B),3),F2:G4, false)
But then you might have an issue of 2 people going to the same class if they rank the same.

How to create a dynamic formula to find the average of a set of values for a given vector

I am trying to create a formula that gives me the average of the last 12 entries in a given dataset depending on the associated vector.
Let's make an example:
I have in column F2,G2,H2 and I2 dates, Company1, Company2 and Company3 respectively. Then from row3 to row 33 I have months dates starting from May 2016.
Date Company1 Company2 Company3
May-16 2,453,845
Jun-16 13,099,823
Jul-16 14,159,037
Aug-16 38,589,050 8,866,101
Sep-16 63,290,285 13,242,522
Oct-16 94,005,364 14,841,793
Nov-16 123,774,792 7,903,600 41,489,883
Dec-16 93,355,037 12,449,604 69,117,105
Jan-17 47,869,982 13,830,712 83,913,764
Feb-17 77,109,905 10,361,555 68,176,643
The goal is to create a formula that, when I drag it down, correctly calculates the average of the last 12 values for a given company.
So for example i would have, say in table "B2:C5":
Company1 76,856,345
Company2 11,120,859
Company3 65,674,349
And, if a new Company4 is added to the list, then I just have to drag it down the formula, to calculate the average of the last 12 months for Company4.
Until now, I have came up with this formula:
=AVERAGE(LOOKUP(LARGE(IF(ISNUMBER(G:G),ROW(G:G)),ROW(INDIRECT("1:"&MIN(12,COUNT(G:G))))),ROW(G:G),G:G ))
This formula correctly calculates the average of a given column, considering only the last 12 values. The last step would be to come up with a formula that includes all the columns and then calculates the average for the given company.
Thanks!
I recommend that you use a named range to define your data in columns G:I. When a company is added, just modify the named range's specs. I used the name Target. Of course, you can replace it with $G:$I if you feel so inclined but I would rather recommend reducing the number of rows in the range, which is easier to manage when it is named.
Use the formula below to extract the company names from the first row of Target into the first column of your averages table. This is to ensure that the names are spelled identically in both locations.
=INDEX(Target,1,ROW()-2)
The number 2 indicates the number of rows above the row containing the formula. it is copied here from cell M3. There, ROW()-2 creates the number 1, counting sequentially as the formula is copied down.
Now I have the formula below in my cell N3 and copied down.
=SUM(INDEX(Target,0,MATCH($M3,INDEX(Target,1,0),0)))
The formula simply sums up the columns G, H, and I in 3 consecutive rows.
In the final step I inserted the range definition established above, meaning excluding the SUM() function, into your existing formula.
=AVERAGE(LOOKUP(LARGE(IF(ISNUMBER(INDEX(Target,0,MATCH($M3,INDEX(Target,1,0),0))),ROW(INDEX(Target,0,MATCH($M3,INDEX(Target,1,0),0)))),ROW(INDIRECT("1:"&MIN(12,COUNT(INDEX(Target,0,MATCH($M3,INDEX(Target,1,0),0))))))),ROW(INDEX(Target,0,MATCH($M3,INDEX(Target,1,0),0))),INDEX(Target,0,MATCH($M3,INDEX(Target,1,0),0))))

Excel conditional SUMPRODUCT / SUMIFS / Array Formula for optional dimension

I have a sheet of data with multiple dimensions like this:
A B C D E
1 COUNTRY FLAVOUR SIZE DATE SALES ($)
2 Japan Strawberry 100ml 10/12/14 100
3 Japan Banana 100ml 10/03/15 100
4 China Orange 200ml 14/04/15 30
5 France Strawberry 200ml 11/04/15 400
6 UK 200ml 23/04/15 250
7 ....
I want to aggregate this data over a date range, where the summary sheet has each dimension (country & flavour), and if I do not specify a dimension it sums all rows for that dimension.
A B C
1 COUNTRY FLAVOUR SALES TOTAL
2 Japan Strawberry 100
3 Japan 200
4 Strawberry 500
I can do this if all the dimensions are present (i.e. row 2 above) using a SUMPRODUCT or SUMIFS:
=SUMPRODUCT((data!A$2:A$100=A1)*(data!B$2:B$100=B1)*(data!D$2:D$100>[start_date]*(data!D$2:D$100<[end_date])*(data!E$2:E$100))
However I have not been able to figure out how to include all rows for a dimension if that input cell is empty (e.g. row 3 above). I tried:
Adding an IF statement or OR statement within the criteria (e.g. OR(data!A$2:A$100=A1,isblank(A1))).
Using a + in a SUMPRODUCT as an OR statement, (per this answer https://stackoverflow.com/a/27536131/1450420)
One solution is to have different branches of the formula depending on which summary dimensions are present, but that would quickly get out of control if I extend this same behaviour to further dimensions like Size.
Any help appreciated!
(I'm running Excel Mac 2011).
EDIT
Per #BrakNicku's comment one of the formulas I tried was =SUMPRODUCT(((data!A$2:A$100=A2)+ISBLANK(A2))*((data!B$2:B$100=B2)+ISBLANK(B2))*(data!E$2:E$100))
The reason this doesn't work is that sometimes my data has blank attributes (edited above). For some reason this formula double-counts rows where the attribute present matches (e.g. data!A6) but the other attribute is missing (e.g. data!B6).
EDIT 2
I can see why this double-counting is happening, because the + is summing the match because data!A$2:A$100=A2 (they match because they are both blank) and the match because ISBLANK(A2) (it is indeed blank). The question would remain how to achieve this without double counting. If needed a workaround could be to fill all blank cells on my data with some placeholder value.
The reason for double-counting values is here:
((data!A$2:A$100=A2)+ISBLANK(A2))
If a cell in A column is blank, both parts of this sum are equal 1. To get rid of this problem you can change it to:
(((data!A$2:A$100=A2)+ISBLANK(A2))>0)
Try this (I only included the first two, I left the dates out):
=SUMPRODUCT((((Data!$A$2:$A$5=A2)+(A2=""))>0)*(((Data!$B$2:$B$5=B2)+(B2=""))>0)*(Data!$E$2:$E$5))

Formula returning Column A value for row containing MAX value of a range

Assume I have the following table:
A B C
1 Week 1 Week 2
2 Melissa 114.7 82.8
3 Mike 105.5 122.5
4 Andrew 102.3 87.5
5 Rich 105.3 65.2
The names are in column A, the Week values are in Row 1. (So A1 is blank, B1 = Week 1, and A2 = Melissa.)
I'm trying to build a formula that looks at all the values in a known range (in this example, B2:C5), chooses the highest value of the bunch (here, 122.5) and returns the name of the person from Column A that got that value. If I use this formula, it works for the values in range B2:B5:
=INDEX(A2:A5,MATCH(MAX(B2:B5),B2:B5,0))
That returns Melissa but if I expand the range to include more than just column B's values, I get an #N/A returned:
=INDEX(A2:A5,MATCH(MAX(B2:C5),B2:C5,0))
The weird part (to my simple brain) is that the MATCH portion of the formula works fine, if I just put in this formula, it returns the highest value of 122.5 from C3:
=MAX(B2:C5,B2:C5,0)
So clearly something it going wrong when I'm using either the MATCH or INDEX commands.
Hopefully this makes sense and someone can point out my error?
Try this:
=INDEX(A:A,MAX((B2:C5=MAX(B2:C5))*ROW(B2:C5)))
This is an array formula and must be confirmed with Ctrl+Shift+Enter.
Note: Match can only search one vector at a time. It can be one row or one column or one array. It cannot be two or more rows or columns or a 2D array.
Do it "twice"? Please try:
=INDEX(A2:A5,IFERROR(MATCH(MAX(B2:C5),B2:B5,0),MATCH(MAX(B2:C5),C2:C5,0)))
If you are going to have up to 52/53 weeks to cope with I'd suggest instead inserting a helper column with the MAX for each row. Make that an new (inserted) ColumnA (say =MAX(C2:BC2) etc.) and a simple VLOOKUP should serve, say:
=VLOOKUP(MAX(A:A),A:B,2,0)

SUMIF for first 5 cells meeting criteria

Simple Excel Table such as
A B
1 John 5
2 John 7
3 John 9
4 Jill 25
5 John 21
6 John 22
7 Jill 50
8 John 100
9 John 2000
10 Jack 4
Using SUMIF, we can return the total assigned to John.
=SUMIF(A:A,"John",B:B)
Is there a way to return only the first 5 values that match the criteria? Or is there a way to return the 5 smallest values for John? Either would work.
Oh well. I'll go ahead and presume that you have Excel 2010 or later.
With e.g. "John" in D1, enter this formula in E1:
=SUMIFS($B$1:$B$10,$A$1:$A$10,D1,$B$1:$B$10,"<="&AGGREGATE(15,6,$B$1:$B$10/($A$1:$A$10=D1),5))
Copy down to give similar results for names in D2, D3, etc.
Regards
Formula:
=IF(COUNTIF($A$1:A1,A1)<=5,SUMIF($A$1:A1,A1,$B$1:B1),"")
The last value shown for each person will be the sum of the first (up to)5 values for that person. Just copy and paste values then sort.
Your sample data would show the same result for either the first 5 or lowest 5 as John's numbers are in ascending order. If that is not always the case or if you need to provide compatibility to versions of Excel earlier than 2010 I would offer the following. Note that in my sample image, I've resorted the numerical values in descending order to illustrate the difference.
For John's first 5 values (E2 in the sample image):
=SUM(INDEX(($B$2:$B$11)*($A$2:$A$11=D2)*(ROW($1:$10)<=SMALL(INDEX(ROW($1:$10)+($A$2:$A$11<>D2)*1E+99,,), 5)),,))
For John's lowest 5 values (F2 in the sample image):
=SUMPRODUCT(SMALL(INDEX(($B$2:$B$11)+($A$2:$A$11<>D2)*1E+99,,),ROW($1:$5)))
These are standard formulas. Any array processing is supplied by INDEX and/or SUMPRODUCT. Ctrl+Shift+Enter is not required. Some form of error control may be necessary when there are less than 5 matching values; a simple IF(COUNTIF(), <formula>) would suffice. When transcribing these type of formulas it is important to note that ROW(1:10) is the position within B2:B11 or A2:A11, not the actual row on the worksheet.
 
                  
In C1 enter:
=IF(A1="John",1,0)
In C2 enter:
=IF(A2="John",1+MAX($C$1:C1),0)
and copy down. Then use:
=SUMPRODUCT((A:A="John")*(B:B)*(C:C<6))
.
Assuming John in D1 you can get the sum of the 5 smallest values for John with this array formula
=SUM(SMALL(IF(A$1:A$100=D1,B$1:B$100),{1,2,3,4,5}))
confirm with CTRL+SHIFT+ENTER and copy down for to work for all names in the list

Resources