Sumif with only first value of each group in column - excel

I have a dataset similar to this, but really extensive:
Row
Levels
Level 1
Size
Department
1
1
AA
2.0
Dept 1
2
2
AA
0.8
Dept 1
3
3
AA
1.5
Dept 1
4
2
BB
3.0
Dept 1
5
3
BB
2.0
Dept 1
6
3
BB
2.5
Dept 2
7
2
CC
5.0
Dept 2
8
3
CC
1.5
Dept 2
9
3
DD
0.5
Dept 2
10
3
DD
3.0
Dept 2
11
2
EE
4.0
Dept 2
12
3
EE
2.0
Dept 2
What I need is to achieve a total size per Department, however I want to sum only the first match per Level 1, i.e.:
Department 1 would be 2.0 (row 1) + 3.0 (row 4) = 5.0
Department 2 would be 2.5 (row 6) + 5.0 (row 7) + 0.5 (row 9) + 4.0 (row 11) = 12.0
Does anyone have any idea how to accomplish this in Excel?

Alternate solution to the same formula:
=SUM(XLOOKUP(UNIQUE(FILTER(C:C,(ROW(C:C)>1)*(E:E=#$F$2#))&#$F$2#),C:C&E:E,D:D))
Where F2 holds =UNIQUE(FILTER(E:E,(ROW(E:E)>1)*(E:E<>"")))

If you have Excel 365, you could try something like this:
=LET(FilteredLevel,FILTER(C$2:C$13,E$2:E$13=H2),
SUM(XLOOKUP(UNIQUE(FilteredLevel),FilteredLevel,FILTER(D$2:D$13,E$2:E$13=H2))))
Note
You can also use full-column references if you wish
=LET(FilteredLevel,FILTER(C:C,E:E=H2),
SUM(XLOOKUP(UNIQUE(FilteredLevel),FilteredLevel,FILTER(D:D,E:E=H2))))

SUMIFS() will not do what you want. Use SUMPRODUCT() with some boolean:
=SUMPRODUCT($C$2:$C$13*($D$2:$D$13=F2)*(COUNTIFS(OFFSET($B$2,0,0,ROW($B$2:$B$13)-1),$B$2:$B$13,OFFSET($D$2,0,0,ROW($B$2:$B$13)-1),F2)=1))
One note, the use of OFFSET() makes this a volatile function, meaning that it will recalc with every change made to excel. If there are too many then it will slow down the responsiveness in Excel.
To do it without the volatility we need a helper column. In E2 put:
=COUNTIFS($D$2:D2,D2,$B$2:B2,B2)=1
And copy down. Then we can use SUMIFS():
=SUMIFS(C:C,D:D,F2,E:E,TRUE)

Related

Excel Lag Two Group

STUDENT
TIME
CLASS
SCORE
WANT
1
1
A
13
NULL
1
1
B
4
NULL
1
2
A
11
-2
1
2
B
9
5
1
3
A
8
-3
2
2
B
16
NULL
2
3
B
6
-10
2
4
A
7
NULL
2
4
B
6
0
I have XLSX file with STUDENT, TIME, CLASS, SCORE. I wish to calculate WANT which does this:
For every STUDENT and CLASS, calculate the difference in SCORE from TIME(X) TO TIME(X-1).
for STUDENT=1, TIME=2,CLASS=B equals to 5 because it is (9-4)
I try this with no success:
=IF(A3=A2 & C3=C2, OFFSET(D3, -1, 0), "")
I think you can try:
Formula in E2:
=IF(COUNTIFS(A$2:A2,A2,C$2:C2,C2)>1,D2-SUMIFS(D:D,A:A,A2,B:B,B2-1,C:C,C2),"Null")
It is far from the best approach, but it works.
If using helper column is not a problem, you can make additional column for VLOOKUP (see column "Helper1") with formula =TEXTJOIN("-",,A2:C2).
Now use VLOOKUP to find value TEXTJOIN("-",,A2,B2-1,C2) in that column. Formula in "WANT" column: IFNA(E2-VLOOKUP(TEXTJOIN("-",,A2,B2-1,C2),$D$2:$E$10,2,FALSE),"NULL")
Result:

Deleting a row with VBA in Excel gives invalid reference in another table

I have an Excel Workbook with 2 sheets. Each sheet contains a formatted table.
The tables start at row2 and feature an internal numbering in columnA achieved by the formulas
=Row()-Row(talble1)+1 and =Row()-Row(talble2)+1
sheet2:
In table2/columnB I concatenate 2 of the internal row indices from table1 using the formulas
table2/internal row1: =CONCATENATE(sheet1!A2;";";sheet1!A3;";")
table2/internal row2: =CONCATENATE(sheet1!A4;";";sheet1!A5;";")
table2/internal row3: =CONCATENATE(sheet1!A6;";";sheet1!A7;";")
table2/internal row4: (...)
It looks like this:
sheet1/table1: sheet2/table2:
A B C A B C
1 1
2 1 2 1 1;2
3 2 3 2 3;4
4 3 4 3 5;6
5 4 5
6 5 6
7 6 7
8 8
9 9
If I "manually" delete the 2 rows with internal indices 3 and 4 in table1, the numbering automatically adjusts. Also, the references in internal row2/columnB of table2 become invalid, which makes total sense. The formula of internal row3 in table2 automatically adjusts to the deletion in table1 and remains valid.
It then looks like this:
sheet1/table1: sheet2/table2:
A B C A B C
1 1
2 1 2 1 1;2
3 2 3 2 invalid
4 3 4 3 3;4
5 4 5 4
6 6 5
7 7 6
8 8
9 9
Now comes the issue:
If I do the deletion described above using a VBA macro, the references in internal row3 of table2 become invalid as well! When I check the formula I see that it still references cells A6 and A7 which don't exist anymore in table1.
I used the code
sheet1.ListObjects("table1").DataBodyRange.Rows(4).Delete Shift:=xlUp
sheet1.ListObjects("table1").DataBodyRange.Rows(3).Delete Shift:=xlUp
It looks like this:
sheet1/table1: sheet2/table2:
A B C A B C
1 1
2 1 2 1 1;2
3 2 3 2 invalid
4 3 4 3 invalid
5 4 5 4
6 6 5
7 7 6
8 8
9 9
What is the explanation for the different behaviour of deleting a row in table1 manually or by VBA code?
I also recorded macros when deleting the rows manually and I could not detect anything that explains why my macro code leads to the invalid references in internal row3 of table2.

how to map/pull column in 1 sheet based on another to repeating values in excel?

I have excel sheet with repeating ids
id jun19
1 3
2 2
3 7
1 3
2 2
3 7
1 3
2 2
3 7
i want to append another column 'jul19' from another sheet.
that jul19 sheet has all and even more ids:
id jul19
1 4
2 6
3 45
4 7
5 9
it should take only those that have the id and pull values from column 'jul19'.
the end result is this:
id jun19 jul19
1 3 4
2 2 6
3 7 45
1 3 4
2 2 6
3 7 45
1 3 4
2 2 6
3 7 45
how to do this? how to pull corresponding values from column "jul19" based on the id?
I tried to do this in pandas, but failed.
Assuming table1 is in A1:B10, table2 is in D1:E6, & table3 is in G1:I10. put :
=INDEX(E:E,MATCH(G2,D:D,0)) in I2
and drag downwards. ref : https://exceljet.net/index-and-match
Hope it helps. ( :

excel - sumifs formula with range on criteria

I have these data
A B C D
1 lvl1 lvl2 lvl3 value
2 1 1.1 1.1.1 3
3 1 1.1 1.1.2 2
4 1 1.2 1.1.3 7
5 2 2.1 2.1.1 2
6 2 2.1 2.1.2 3
and i want the output of the formula to be like this
7 Type LEVEL value
8 1 level1 12
9 1.1 level2 5
10 1.2 level2 7
11 2 level1 5
12 2.1.1 level3 2
I have already implemented this request with sumifs (cause i have more than one criteria in the original case) by putting in "criteria range" nested if
sumifs(D2:D6,IF(B8="level1",A2:A6),IF(B8="level2",B2:B6),IF(B8="level3",C2:C6))))
Is there any other way (perhaps with index & match?) to have the same result?
If your Levels are exactly the same name as your column headers (so lvl1 instead of level1), then you can use this formula in cell C8 and copied down:
=SUMIFS($D$2:$D$6,INDEX($A$2:$C$6,0,MATCH(B8,$A$1:$C$1,0)),A8)
Put these formulas into H2:I2.
="level"&LEN(G2)-LEN(SUBSTITUTE(G2, ".", ""))+1
=SUMIFS(D:D, INDEX(A:C, 0, LEN(G2)-LEN(SUBSTITUTE(G2, ".", ""))+1), G2)
Fill down.

Pandas dependent columns lookup

I have a dataset that has 2 conditions, 2 replicates and samples with corresponding values (amounts). I read this into a pandas dataframe:
condition replicate sample amount
0 1 1 a1 5
1 1 1 a2 2
2 1 2 a1 3
3 1 2 a2 1
4 2 1 b99 7
5 2 1 a2 4
6 2 2 a1 3
7 2 2 a2 2
I want to divide the amount from every sample in condition 1, by the amount from the corresponding sample in condition 2, if they belong to the same replicate (and have the same sample name).
In other words, I want to find the ratio between the amounts where the sample names and replicate numbers match between the conditions.
In this example, the output should be something like:
replicate sample amount
0 1 a1 0.714286
1 1 a2 NaN
2 2 a1 1.000000
3 2 a2 0.500000
I need advice if I should structure my data differently and if it is a good idea to go for pandas dataframes? Can anyone think of an elegant lookup solution?
You can use unstack for columns by conditions, then divide columns and last remove all NaNs rows by dropna:
df = df.set_index(['sample','replicate','condition'])['amount'].unstack()
df['new'] = df[1].div(df[2])
df = df['new'].unstack().dropna(how='all').stack(dropna=False).reset_index(name='amount')
print (df)
sample replicate amount
0 a1 1 NaN
1 a1 2 1.0
2 a2 1 0.5
3 a2 2 0.5

Resources