EXCEL - count number of non-empty rows by unique column value - excel

My data looks like this: [with 10k rows and 26 something columns]
FID Year I1 I2 I3 I4 I5
1 2009 a b c
1 2010 a d e g
1 2011 g f h
1 2011 h f j i k
1 2013 h t g k m
1 2014 a b c d
1 2014 e d f c l
1 2014 h i j k d
1 2014 a b d
I want the count of unique 'I's for every year. Output like this:
FID Year Count(Unique)
1 2009 3
1 2010 4
1 2011 6
1 2013 5
1 2014 11
What I have tried till now:
I tried this =COUNTIF("$C2:$G2,"*") and then summing for the same year. Until I realized that 'I's are being repeated.
Then I tried concatenating all I's for the same year. Failed.
Then I tried converting the data into the long format with [FID+Year] as the key. So that I can remove duplicates. But failed due to 'not enough memory' to perform the task.
Any help would be appreciated as to how to solve this problem. Thanks.

This is the formula that you need:
=SUM(IF(COUNTIF(A1:E1;A1:E1)=1;1;0))
It is an array formula, thus you need Ctrl+Shift+Enter to execute it.
If your values were numbers, it would have been easier. Like this:
=SUM(IF(FREQUENCY(A1:E1,A1:E1)>0,1))
It is not an array formula.

I came up with this for you and here is the screenshot of the results:
Here is the formula you can try from cell K2 and drag it down:
=IFERROR(SUMPRODUCT((OFFSET($C$1,MATCH(J2,$B$2:$B$10,0),0,COUNTIF($B$2:$B$10,B2),5)<>"")/COUNTIF(OFFSET($C$1,MATCH(J2,$B$2:$B$10,0),0,COUNTIF($B$2:$B$10,B2),5),OFFSET($C$1,MATCH(J2,$B$2:$B$10,0),0,COUNTIF($B$2:$B$10,B2),5)&"")),0)
Explanation:
What this does is to use these two basic formulas.
=SUMPRODUCT((range<>"")/COUNTIF(range,range&"")), this is to find unique values in a range and also ignore the blanks
=OFFSET(reference,rows,cols,height,width), this is to find the range that you want to evaluate
Hope this can solve your problem.

Related

Get number of unique values from a column with multiple criteria

I am working on an Excel problem. Here is my questions:
name department year
a cs 5
b cs 8
c cs 2
d cs 3
a cs 1
b cs 10
a ma 7
f ma 8
h ma 2
The question is to get the number of unique name (only occur once) with department="cs" and year >2, in this case the result is 2 (i.e,"a" and "d" only occur once).
I knew the formula below might do the trick, but did not know how to put the range filtered by department="cs" and year >2 into the below formula.
=SUM(IF(COUNTIF(range, range)=1,1,0))
Use SUMPRODUCT:
=SUMPRODUCT((COUNTIFS(A:A,A2:INDEX(A:A,MATCH("zzz",A:A)),B:B,"cs",C:C,">2")=1)*(B2:INDEX(B:B,MATCH("zzz",A:A))="cs")*(C2:INDEX(C:C,MATCH("zzz",A:A))>2))

Count using a category and Median using a category

I have two problems here.
The data is as follows:
Col X Col Y
A 10
A 12
A
A 32
B 11
B 31
B 9
C 8
C 7
C 3
D 1
D 3
D
D 9
I need to do the following:
Count the entries in Column Y using the Categories in Column X, for ex. A repeats 4 times in Column X but has 3 total corresponding numbers in column Y, i need the 3 count of the numbers in Column Y.
Calculate the median of those numbers using the category (excluding blanks whenever there are, not to be assumed as 0 by the code), for ex. Median for A is 12, Median for D will be 3.
Please help.
So 1 is:
=COUNTIFS(X:X,"A",Y:Y,"<>")
2 is:
=MEDIAN(IF(X:X="A",IF(NOT(ISBLANK(Y:Y)),Y:Y)))
Hold down ctrl + shift when you're using 2 as it's an array formula

Identify Rows with Same Values in 2 Different Columns

I have a data set of roughly 405,000 rows and 23 columns. I need the records where the value in column "D" is the same as the value in column "H" for that row.
So for
A B C D E F G H
13 8 21 ok 3 S - of
51 7 22 no 3 A k no
24 3 23 by 3 S * we
24 4 24 we 3 S ! ok
24 9 25 by 3 S # we
75 2 26 ok 3 S 9 ok
etc...
I'd get back the 2nd row, the 6th row, etc...
A B C D E F G H
51 7 22 no 3 A k no
75 2 26 ok 3 S 9 ok
Based on other posts like: Formula to find matching row value based on cells in multiple columns I tried using a Pivot Table, but it complains I can't put either of my two columns in the "Columns" area because there is too much data. With both columns in the "Rows" area, I get a relationship of D to H, but I can't then find a way to filter on only those where D = H.
I've also looked into countifs(), vlookup, and index / match functions, but I can't figure this out. Help please.
I would do a simple "IF()" formula in a new column.
For your example add a new column I and use the following formula in the first data row (I2):
=IF(D2=H2,"Yes","No")
Fill down to the end of the data.
Then using Excel filters or countif you can check the number of "Yes" vs "No" in your data.

excel formula depending on dynamic values in different columns

I am trying to create an excel formula using SUM and SUMIF but cannot find how to.
I have a first column(A) which is the total time of a piece of work and then for each row the time spent in that task during each day(columns B, C, ...).
For each day(columns B, C, ...), the formula would return the sum of only those values in column A that(for that specific column), relate to task that have been completed that day: the sum of all cells within a row is equals or more than the time the task was allocated.
Example for one 12-hours task:
A B C D E
12 4 6 2 0
Using the formula:
A B C D E
12 4 6 2 0
0 0 0 12 0
where 12 is displayed in column D because 4 + 6 + 2 = 12(Column A)
Second example(3 tasks):
A B C D E
10 9 0 1 0
21 8 8 5 0
5 0 0 3 2
Using the formula:
A B C D E
10 9 0 1 0
21 8 8 5 0
5 0 0 3 2
0 0 0 31 5
Where:
31(Day D) = 10(Task 1 is finished that day) + 21(Task 2 is finished that day too)
5(Day E) = Task 3 is finished that day
Tried this formula (for Day B):
SUMIF(B1:B3,">=A1:A3",A1:A3)
(Sum those values in column A if the cells in that row p to column B(in this case just B) are >= than those iterated).
Then for column C, it would be,
SUMIF(C1:C3 + B1:B3,">=A1:A3",A1:A3)
The above examples did not work(first returns zero, second is an invalid formula),
Any ideas?
Thank you.
Formula below given by user ServerS works fine:
Col B:
=IF(SUM(B2)=A2,A2,0)+IF(SUM(B3)=A3,A3,0)+IF(SUM(B4)=A4,A4,0)+IF(SUM(B5)=A5,A5,0)
Col C:
=IF(SUM(B2:C2)=A2,A2,0)+IF(SUM(B3:C3)=A3,A3,0)+IF(SUM(B4:C4)=A4,A4,0)+IF(SUM(B5:C5)=A5,A5,0)
Col D
=IF(SUM(B2:D2)=A2,A2,0)+IF(SUM(B3:D3)=A3,A3,0)+IF(SUM(B4:D4)=A4,A4,0)+IF(SUM(B5:D5)=A5,A5,0)
However there are two inconvenients:
if new rows are added it needs to be adapted and include another IF(). Would be better to have a generic SUM if IF's
Trying to propagate the formula to adjacent cells is not possible as it would change part of the formula like "=A2,A2,0" to "=A3,A3,0" which needs to keep the same.
Any other ideas that improve this, if possible, are appreciated.
You can avoid using IF with a sumproduct. This method allows use to insert any row you want. Make sure range are correct (eg A2:A5 with 5 the last row used). I would go for this :
in column B :
=SOMMEPROD(($A$2:$A$5)*($A$2:$A$5=(B2:B5)))
in column C :
=SUMPRODUCT(($A$2:$A$5)*($A$2:$A$5=(B2:B5+C2:C5)))-B6
in column D
=SUMPRODUCT(($A$2:$A$5)*($A$2:$A$5=(B2:B5+C2:C5+D2:D5)))-C6-B6
in column E
=SUMPRODUCT(($A$2:$A$5)*($A$2:$A$5=(B2:B5+C2:C5+D2:D5+E2:E5)))-D6-C6-B6

Excel using INDEX and multiple MATCH

I am having trouble using INDEX and MATCH functions in excel. Say I have the following data:
A B C D
Year Month Site Count
2004 3 X1 54
2006 6 X3 10
2005 10 X5 15
And I want to arrange it like this
E F G H I J K
Year Month X1 X2 X3 X4 X5
2004 1
2004 2
2004 3 54
2004 4
2004 5
2004 6
...
2005 10 10
...
2006 6 15
I have the following formula (I want to match the Site, Year and Month):
=IFERROR(INDEX($D$2:$D$4,MATCH(G$1,$C$2:$C$4,0),MATCH($E2,$A$2:$A$4,0),MATCH($F2,$B$2:$B$4,0)),"")
and it seems to work fine for the first column (G) but when I autofill the rest of the columns (H:K) it doesn't work. Any ideas? Thanks.
I'd take a different approach than using all these nested matches, and create a searchable column with unique values and MATCH that column to get a row to feed into the INDEX.
Insert 2 rows between Cols D and E, putting the 2nd appearance of 'Year' in Column G. (Because I like some whitespace)
Paste this formula in E2 and copy it to E3:E4
=CONCATENATE(C2,"-",A2,"-",B2)
If it bothers you, hide Column E.
Then paste this formula in I2 and copy it to I2:M4:
=IFERROR(INDEX($D$2:$D$4,MATCH(CONCATENATE(I$1,"-",$G2,"-",$H2),$E$2:$E$4,0),0),"")

Resources