Not able to fetch all the columns of the Dataframe after applying groupby method of Pandas
I have a sample Dataframe as below.
col1 col2 day col4
0 a1 b1 monday c1
1 a2 b2 tuesday c2
2 a3 b3 wednesday c3
3 a1 b1 monday c5
Here 'a1 b1 monday' are repeated twice. So after groupby the output should be:
col1 col2 day col4 count
a1 b1 monday c1 2
a2 b2 tuesday c2 1
a3 b3 wednesday c3 1
I tried using df.groupby(['col1','day'],sort=False).size().reset_index(name='Count')
and
df.groupby(['col1','day']).transform('count')
and the output is always
col1 day count
a1 monday 2
a2 tuesday 1
a3 wednesday 1
where as my original data have 14 columns and it is not making sense to keep all the column names in groupby statement. Is there a better pythonic way to achieve this??
First groupby with transform to make your count column.
Then use drop_duplicates to remove duplicate rows:
df['count'] = df.groupby(['col1','day'],sort=False)['col1'].transform('size')
df.drop_duplicates(['col1', 'day'], inplace=True)
print(df)
col1 col2 day col4 count
0 a1 b1 monday c1 2
1 a2 b2 tuesday c2 1
2 a3 b3 wednesday c3 1
Related
So, I have a pandas data frame:
df =
a b c
a1 b1 c1
a2 b2 c1
a2 b3 c2
a2 b4 c2
I want to rename a2 into a1 and then group by a and c and add the corresponding values of b
df =
a b c
a1 b1+b2 c1
a1 b3+b4 c2
So, something like this
df =
a value c
a1 10 c1
a2 20 c1
a2 50 c2
a2 60 c2
df =
a value c
a1 30 c1
a1 110 c2
How to do this?
What about
>>> res = df.replace({"a": {"a2": "a1"}}).groupby(["a", "c"], as_index=False).sum()
>>> res
a c value
0 a1 c1 30
1 a1 c2 110
which first replaces "a2"s with "a1" in only a column and then groups by and sums.
To get the original column order back, we can reindex:
>>> res.reindex(df.columns, axis=1)
a value c
0 a1 30 c1
1 a1 110 c2
Try this:
df.groupby([df['a'].replace({'a2':'a1'}),'c']).sum().reset_index()
I am having values in Excel sheet in the following format:
code
Warehouse
Quantity
A5
G1
3
A2
G1
4
A2
G2
60
A3
G2
20
How can I move the quantities from the above rows to warehouse columns like this
code
G1
G2
A5
3
0
A2
4
60
A3
0
20
I have a df like this:
ID A1 A2 A3 A4 A5
1 1 2 3
2 1 2 3
3 2 1
4 3 1 2
5
For every ID, I have 5 columns A1 to A5 (In real I have many more) and the values are top priority for a particular ID.
For example: ID 1 has A1, A3 and A5 as priorites, , ID 3 has only 2 A2 and A1 and ID 5 has no
Priorities
Resultant DF
ID Priority_1 Priority_2 Priority_3
1 A1 A3 A5
2 A1 A2 A4
3 A2 A1
4 A3 A5 A1
5
I am trying to same using melt and pivot using this and this_1 and many more, but exactly not able to get the same resultant df.
Any help on this or clarity from my side!!
Use DataFrame.melt with sorting by DataFrame.sort_values and removing missing rows by DataFrame.dropna, then add new column used for filtering by boolean indexing and Series.le for less or equal and last use DataFrame.pivot with DataFrame.add_prefix, last add DataFrame.reindex for added only mising rows ID:
N = 3
df1 = df.melt('ID').sort_values(['ID','value']).dropna(subset=['value'])
df1['new'] = df1.groupby('ID').cumcount().add(1)
df1 = df1[df1['new'].le(N)]
df2 = df1.pivot('ID','new','variable').add_prefix('Priority_').reindex(df['ID'])
print (df2)
new Priority_1 Priority_2 Priority_3
ID
1 A1 A3 A5
2 A1 A2 A4
3 A2 A1 NaN
4 A3 A5 A1
5 NaN NaN NaN
I have a spreadsheet of available samples across 45 boxes, arranged stacked with column headers from 1 to 10 and row headers from A to J. I'm looking for a way to fetch the box, row and cell number if I lookup an ID (prefixed with B).
Sheet 1 is a list of animal IDs that I want to know if a sample is available for
Sheet 2:
Box 1
1 2 3 4 5 ... 10
A B43 B12 B3 B6 B103
B B13 B14 B78 B51 B63
C B78 B33 B99 B43 B92
...
J
Box 2
1 2 3 4 5 ... 10
A B2 B6
I have tried doing nested if functions by columns:
if(match(A2, Sheet2!$B$2:$B$521,0),"1",if(match(A2,Sheet2!$C$2:$C$521,0),"2","")
...but I've been getting #N/A if A2 is in column C.
I've resorted to re-labelling the left-most column to Box 1 A, Box 1 B, Box 1 C... so on, and doing:
=index(Sheet2!$A$2:$A$521,match(A2,Sheet2!$B$2:$B$521,0),0)
...and duplicating the function for **Columns 1 to 10*.
Sheet 1:
Animal ID Col1 Col2 Col3 Col4 ...
B12 . Box 1 A . .
B43 . . . Box 1 C
...
Is there an easier way to fetch the location of a sample from an array?
If your Sheet2 looks like this.
and you want the data like this in Sheet1.
Then enter this formula in Cell B2 of Sheet1 and drag it down and across.
=IFERROR(CONCATENATE(INDEX(Sheet2!$A:$A,(MATCH($A2,Sheet2!B:B,0))-MOD(MATCH($A2,Sheet2!B:B,0)-1,12),)," - ",INDEX(Sheet2!$A:$A,MATCH($A2,Sheet2!B:B,0))),".")
As shown in the image in Col2 I need to get the count of not null values in the Col1 before the cell.
For cell B2 there is only one value A hence 1.
For cell B4 it should be 2 as there are 2 values A & C.
Same way for B5, 3 (A,C,D)
Data:
A B
1 Col1 Col2
2 A 1
3
4 C 2
5 D 3
6
7 F 4
I have tried:
B1 Cell = COUNTA(A2:A2)
B2 Cell = COUNTA(A2:A3)
B3 Cell = COUNTA(A2:A4)
However I cannot drag this formula as it will change the cell reference.
Can anyone suggest any way to get this done in a single formula which can be applied to all the cells through out the column.
Try this:
=IF(A2<>"",COUNTA($A$2:A2),"")