Modify Dataframe based on Priority - python-3.x

I have a df like this:
ID A1 A2 A3 A4 A5
1 1 2 3
2 1 2 3
3 2 1
4 3 1 2
5
For every ID, I have 5 columns A1 to A5 (In real I have many more) and the values are top priority for a particular ID.
For example: ID 1 has A1, A3 and A5 as priorites, , ID 3 has only 2 A2 and A1 and ID 5 has no
Priorities
Resultant DF
ID Priority_1 Priority_2 Priority_3
1 A1 A3 A5
2 A1 A2 A4
3 A2 A1
4 A3 A5 A1
5
I am trying to same using melt and pivot using this and this_1 and many more, but exactly not able to get the same resultant df.
Any help on this or clarity from my side!!

Use DataFrame.melt with sorting by DataFrame.sort_values and removing missing rows by DataFrame.dropna, then add new column used for filtering by boolean indexing and Series.le for less or equal and last use DataFrame.pivot with DataFrame.add_prefix, last add DataFrame.reindex for added only mising rows ID:
N = 3
df1 = df.melt('ID').sort_values(['ID','value']).dropna(subset=['value'])
df1['new'] = df1.groupby('ID').cumcount().add(1)
df1 = df1[df1['new'].le(N)]
df2 = df1.pivot('ID','new','variable').add_prefix('Priority_').reindex(df['ID'])
print (df2)
new Priority_1 Priority_2 Priority_3
ID
1 A1 A3 A5
2 A1 A2 A4
3 A2 A1 NaN
4 A3 A5 A1
5 NaN NaN NaN

Related

Which is the best way to move cell values to column based on rows in Excel?

I am having values in Excel sheet in the following format:
code
Warehouse
Quantity
A5
G1
3
A2
G1
4
A2
G2
60
A3
G2
20
How can I move the quantities from the above rows to warehouse columns like this
code
G1
G2
A5
3
0
A2
4
60
A3
0
20

Getting all columns of a Dataframe after using 'groupby' method

Not able to fetch all the columns of the Dataframe after applying groupby method of Pandas
I have a sample Dataframe as below.
col1 col2 day col4
0 a1 b1 monday c1
1 a2 b2 tuesday c2
2 a3 b3 wednesday c3
3 a1 b1 monday c5
Here 'a1 b1 monday' are repeated twice. So after groupby the output should be:
col1 col2 day col4 count
a1 b1 monday c1 2
a2 b2 tuesday c2 1
a3 b3 wednesday c3 1
I tried using df.groupby(['col1','day'],sort=False).size().reset_index(name='Count')
and
df.groupby(['col1','day']).transform('count')
and the output is always
col1 day count
a1 monday 2
a2 tuesday 1
a3 wednesday 1
where as my original data have 14 columns and it is not making sense to keep all the column names in groupby statement. Is there a better pythonic way to achieve this??
First groupby with transform to make your count column.
Then use drop_duplicates to remove duplicate rows:
df['count'] = df.groupby(['col1','day'],sort=False)['col1'].transform('size')
df.drop_duplicates(['col1', 'day'], inplace=True)
print(df)
col1 col2 day col4 count
0 a1 b1 monday c1 2
1 a2 b2 tuesday c2 1
2 a3 b3 wednesday c3 1

how to count # of records depending on two conditions in excel

in aother words: if cell contains text and another cell contains another text then count the record
let say we have some data in a table1 in sheet1
C1 C2 C3 C4
ST 1 2 3
SR 3 2 1
CE 6 4 3
ST 1 9 3
ST 1 4 3
I want to save the count of records having ST in C1 and 3 in C4
and save it in a cell.
thank you
Put this in an unused cell,
=COUNTIFS(A:A, "ST", D:D, 3)

Get the count of not null values before the cell

As shown in the image in Col2 I need to get the count of not null values in the Col1 before the cell.
For cell B2 there is only one value A hence 1.
For cell B4 it should be 2 as there are 2 values A & C.
Same way for B5, 3 (A,C,D)
Data:
A B
1 Col1 Col2
2 A 1
3
4 C 2
5 D 3
6
7 F 4
I have tried:
B1 Cell = COUNTA(A2:A2)
B2 Cell = COUNTA(A2:A3)
B3 Cell = COUNTA(A2:A4)
However I cannot drag this formula as it will change the cell reference.
Can anyone suggest any way to get this done in a single formula which can be applied to all the cells through out the column.
Try this:
=IF(A2<>"",COUNTA($A$2:A2),"")

Match two column values from 2 data sets, then find associated values

The issue I'm having was hard to title, and hard to search as well.
Here's some example data.
A B C D E F
B1 04/14/16 746 B1 04/25/16 2
B1 04/15/16 180 B1 04/30/16 4
B1 04/16/16 494 B1 05/01/16 5
B1 04/17/16 726 B2 04/01/16 1
B1 04/18/16 206 B2 04/03/16 1
B1 04/19/16 22 B2 04/04/16 2
B1 04/20/16 193 B2 04/05/16 2
B1 04/21/16 739 B2 04/12/16 8
B1 04/22/16 926 B2 04/13/16 1
B1 04/23/16 748 B2 04/14/16 2
B1 04/24/16 830 B2 04/15/16 1
B1 04/25/16 272 B2 04/18/16 9
B1 04/26/16 0 B2 04/19/16 1
B1 04/27/16 0 B2 04/26/16 9
B1 04/28/16 0 B2 04/27/16 3
B1 04/29/16 0 B2 04/30/16 1
B1 04/30/16 685 B2 05/02/16 5
B1 05/01/16 770 B2 05/03/16 2
B1 05/02/16 701 B3 04/03/16 3
B1 05/03/16 181 B3 04/04/16 1
B2 04/01/16 77 B3 04/06/16 2
B2 04/02/16 182 B3 04/07/16 1
B2 04/03/16 53 B3 04/09/16 1
B2 04/04/16 32 B3 04/16/16 7
What I'm trying to do is check for matching A and D columns, as well as matching B and E columns. If the columns match I would like to take column F and divide by column C.
Also if there is no match for both A and B column values, then have return those values with a zero.
So for a match:
B1 04/25/16 =2/272
For a non-match:
B1 04/14/16 0
Thank you.
Two INDEX/MATCH Function will do it:
=IFERROR(INDEX($F$1:$F$24,MATCH(1,INDEX(($E$1:$E$24=J2)*($D$1:$D$24=I2),),0))/INDEX($C$1:$C$24,MATCH(1,INDEX(($B$1:$B$24=J2)*($A$1:$A$24=I2),),0)),0)
This is an array formula, Full column references should be avoided as the calculation are exponential and will increase the calculation times.
If a more dynamic range is wanted then use this formula:
=IFERROR(INDEX($F$1:INDEX(F:F,MATCH(1E+99,F:F)),MATCH(1,INDEX(($E$1:INDEX(E:E,MATCH(1E+99,F:F))=J2)*($D$1:INDEX(D:D,MATCH(1E+99,F:F))=I2),),0))/INDEX($C$1:INDEX(C:C,MATCH(1E+99,C:C)),MATCH(1,INDEX(($B$1:INDEX(B:B,MATCH(1E+99,C:C))=J2)*($A$1:INDEX(A:A,MATCH(1E+99,C:C))=I2),),0)),0)
This will find the last cell with data and use that to set the extents of the range. So now as the data grows or shrinks it will only look at the data and not iterate through any more or any less than what is needed to cover the entire data set.

Resources