Looking for the Max Sum, based on Criteria and Unique Values - excel

Col1 Col2 Col3
a 3 x
b 2 x
c 2 x
a 1 x
b 3 x
c 1 y
a 2 y
b 1 y
c 3 y
Using the table above, can anyone give me a formula to find:
The max sum of Col2 when Col3=X per each unique value in Col1
(Answer should be 5, would be 4 based on Col3=Y)

Create a PivotTable with Col3 as FILTERS (select x), Col1 for ROWS and Sum of Col2 for VALUES. Uncheck Show grand totals for Columns and then for whichever column contains Sum of Col2 take the maximum, say:
=MAX(F:F)

Well it's not ideal but it works:
Column D put an array formula in for Max If:
in D2: =MAX(IF($C$2:$C$10=C2,SUM(IF($A$2:$A$10=A2,IF($C$2:$C$10=C2,$B$2:$B$10)))))
Change the ranges obviously.
Then in E2 put this: =MAX(IF($C$2:$C$10=C2,$D$2:$D$10))
These are both array formulas so after inputting them you must press CTRL-SHIFT-ENTER not just enter.
Then drag down.
There may be a way to combine these but my array formula knowledge is limited
Here are the results:
Col1 Col2 Col3 Sum of max per col 1 Max of col 4 per col 3
a 3 x 4 5
b 2 x 5 5
c 2 x 2 5
a 1 x 4 5
b 3 x 5 5
c 1 y 4 4
a 2 y 2 4
b 1 y 1 4
c 3 y 4 4
If you don't use CTRL-SHIFT-ENTER you will get 18 and 5 all the way down.

Related

SUM based on list of categories

Consider the following Excel
A B C D
1 foo 7 whaa
2 bar 5 AA
3 baz 9 BB
4 bal 1 AA
5 oof 3 blah
6 aba 9 C
Extra:
Each row has either a value in column C OR in column D
The values in column Care categories (in this example ÀA,BB,C`)
The values in column Dcan be anything
I need a SUM (based on column A) as follows:
SUM of column B for all lines that have a value in (any value) in column D (called Rest)
SUM of column B for each category in column C. I have a list of the categories (see below)
So like this:
A B
1 Rest 10 <----- 7 + 3
2 AA 6 <----- 5 + 1
3 BB 9
4 C 9
What formulas do I need in column B above to get this result?
or, you can use sumproduct to solve:
H2=SUMPRODUCT(($D$4:$D$9=IF(G2="Rest","",G2))*$C$4:$C$9)
H2=SUMIF($D$4:$D$9,IF(G2="Rest","",G2),$C$4:$C$9)

Grouping corresponding Rows based on One column

I have an Excel Sheet Dataframe with no fixed number of rows and columns.
eg.
Col1 Col2 Col3
A 1 -
A - 2
B 3 -
B - 4
C 5 -
I would like to Group Col1 which has the same content. Like the following.
Col1 Col2 Col3
A 1 2
B 3 4
C 5 -
I am using pandas GroupBy, but not getting what I wanted.
Try using groupby:
print(df.replace('-', pd.np.nan).groupby('Col1', as_index=False).first().fillna('-'))
Output:
Col1 Col2 Col3
0 A 1 2
1 B 3 4
2 C 5 -

how to sort a pandas dataframe according to elements of list [duplicate]

I have the following example of dataframe.
c1 c2
0 1 a
1 2 b
2 3 c
3 4 d
4 5 e
Given a template c1 = [3, 2, 5, 4, 1], I want to change the order of the rows based on the new order of column c1, so it will look like:
c1 c2
0 3 c
1 2 b
2 5 e
3 4 d
4 1 a
I found the following thread, but the shuffle is random. Cmmiw.
Shuffle DataFrame rows
If values are unique in list and also in c1 column use reindex:
df = df.set_index('c1').reindex(c1).reset_index()
print (df)
c1 c2
0 3 c
1 2 b
2 5 e
3 4 d
4 1 a
General solution working with duplicates in list and also in column:
c1 = [3, 2, 5, 4, 1, 3, 2, 3]
#create df from list
list_df = pd.DataFrame({'c1':c1})
print (list_df)
c1
0 3
1 2
2 5
3 4
4 1
5 3
6 2
7 3
#helper column for count duplicates values
df['g'] = df.groupby('c1').cumcount()
list_df['g'] = list_df.groupby('c1').cumcount()
#merge together, create index from column and remove g column
df = list_df.merge(df).drop('g', axis=1)
print (df)
c1 c2
0 3 c
1 2 b
2 5 e
3 4 d
4 1 a
5 3 c
merge
You can create a dataframe with the column specified in the wanted order then merge.
One advantage of this approach is that it gracefully handles duplicates in either df.c1 or the list c1. If duplicates not wanted then care must be taken to handle them prior to reordering.
d1 = pd.DataFrame({'c1': c1})
d1.merge(df)
c1 c2
0 3 c
1 2 b
2 5 e
3 4 d
4 1 a
searchsorted
This is less robust but will work if df.c1 is:
already sorted
one-to-one mapping
df.iloc[df.c1.searchsorted(c1)]
c1 c2
2 3 c
1 2 b
4 5 e
3 4 d
0 1 a

match multiple columns within the same row

Table 1. I have a table that looks like this:
X Y Z
1 a p
2 a p
6 b p
7 c p
9 c p
Table 2. I have a different table that looks like this:
Col1 Col2 Col3 Col4
Row1 p p p
Row2 a b c
Row3 1
Row4 2
Row5 3
Row6 4
Row7 5
Row8 6
Row9 7
Row10 8
Row11 9
I want to mark "TRUE" when rows of table 1 match with values of its column in Table 1. As a result for example:
Col1 Col2 Col3 Col4
Row1 p p p
Row2 a b c
Row3 1 TRUE
Row4 2 TRUE
Row5 3
Row6 4
Row7 5
Row8 6 TRUE
Row9 7 TRUE
Row10 8
Row11 9 TRUE
Here is what I have tried so far. This is the formula for Col2 Row3:
=IFERROR(IF(AND(AND(MATCH(Col1Row3,X:X,0), MATCH(Col2Row1,Z:Z,0)), MATCH(Col2Row2,Y:Y,0)), "TRUE", ""),"")
I think it's not working because I am not containing the matches within the same row. How can I achieve my result?
Also, I do not want to specify a specific row in the formula because I have thousands of rows in Table 1, and Table 2 has to select values among those thousands of rows.
Use COUNTIFS
=IF(COUNTIFS($F:$F,$A3,$G:$G,B$2,$H:$H,B$1),TRUE,"")

iterate through rows and columns in excel using pandas-Python 3

I have an excel spreadsheet that I read with this code:
df=pd.ExcelFile('/Users/xxx/Documents/Python/table.xlsx')
ccg=df.parse("CCG")
With the sheet that I want inside the spreadsheet being CCG
The sheet looks like this:
col1 col2 col3
x a 1 2
x b 3 4
x c 5 6
x d 7 8
x a 9 10
x b 11 12
x c 13 14
y a 15 16
y b 17 18
y c 19 20
y d 21 22
y a 23 24
How would I write code that gets values of col 2 and col3 for rows that contain both a and x. So the proposed output for this table would be: col1=[1,9], col2=[2,10]
Try this:
df = pd.read_excel('/Users/xxx/Documents/Python/table.xlsx', 'CCG', index_col=0, usecols=['col1','col2']) \
.query("index == 'x' and col1 == 'a'")
Demo:
Excel file:
In [243]: fn = r'C:\Temp\.data\41718085.xlsx'
In [244]: pd.read_excel(fn, 'CCG', index_col=0, usecols=['col1','col2']) \
.query("index == 'x' and col1 == 'a'")
Out[244]:
col1 col2
x a 1
x a 9
You can do:
df = pd.read_excel('/Users/xxx/Documents/Python/table.xlsx'),sheetname='CCG', index_col=0)
filter = df[(df.index == 'x') & (df.col1 == 'a')]
Then from here, you can return all the values as a numpy array with:
filter['col2']
filter['col3']
Managed to create a count that iterates until it finds a adds +1 to the count and only appends to the list index if it is between the ranges that x is in, once i have the indices i search through col 2 and 3 and pull the values out for the indices

Resources