order the DF by changing columns values to new rows - python-3.x

I have the following dataframe:
Time Image Mean
0 A1 1
1 A1 2
0 B1 3
1 B1 4
And I want to change this as following (remove image column, add the Image values as a row header and put the values of the mean):
Time A1 B1
0 1 3
1 2 4

Try:
print(
df.pivot(index="Time", columns="Image", values="Mean")
.reset_index()
.rename_axis("", axis=1)
)
Prints:
Time A1 B1
0 0 1 3
1 1 2 4

Related

Ungroup pandas dataframe column values separated by comma

Hello I Have pandas dataframe which is grouped wanted to ungroup the dataframe the column values are separated comma the dataframe which is looking as below
col1 col2 name exams
0,0,0 0,0,0, A1 exm1,exm2, exm3
0,1,0,20 0,0,2,20 A2 exm1,exm2, exm4, exm5
0,0,0,30 0,0,20,20 A3 exm1,exm2, exm3, exm5
output how I wanted
col1 col2 name exam
0 0 A1 exm1
0 0 A1 exm2
0 0 A1 exm3
0 0 A2 exm1
1 0 A2 exm2
0 2 A2 exm4
20 20 A2 exm5
..............
30 20 A3 exm5
I am tried with Split (explode) pandas dataframe string entry to separate rows
but not able get proper approach any one please give me suggestion how can I get my output
Try with explode, notice , explode is the new function after pandas 0.25.0
df[['col1','col2','exams']]=df[['col1','col2','exams']].apply(lambda x : x.str.split(','))
df = df.join(pd.concat([df.pop(x).explode() for x in ['col1','col2','exams']],axis=1))
Out[62]:
name col1 col2 exams
0 A1 0 0 exm1
0 A1 0 0 exm2
0 A1 0 0 exm3
1 A2 0 0 exm1
1 A2 1 0 exm2
1 A2 0 2 exm4
1 A2 20 20 exm5
2 A3 0 0 exm1
2 A3 0 0 exm2
2 A3 0 20 exm3
2 A3 30 20 exm5

Dataframe find out duplicate values in column based on other columns, and then add label in to it

Given the following data frame:
import pandas as pd
d=pd.DataFrame({'ID':[1,1,1,1,2,2,2,2],
'values':['a','b','a','a','a','a','b','b']})
d
ID values
0 1 a
1 1 b
2 1 a
3 1 a
4 2 a
5 2 a
6 2 b
7 2 b
The data I want to get is:
ID values count label(values + ID)
0 1 a 3 a11
1 1 b 1 b11
2 1 a 3 a12
3 1 a 3 a13
4 2 a 2 a21
5 2 a 2 a22
6 2 b 2 b21
7 2 b 2 b22
Thank you so much!!!!!!!!!!!!!!!!!!!!
Seems like you need transform count + cumcount
d['count']=d.groupby(['ID','values'])['values'].transform('count')
d['label']=d['values']+d.ID.astype(str)+d.groupby(['ID','values']).cumcount().add(1).astype(str)
d
Out[511]:
ID values count label
0 1 a 3 a11
1 1 b 1 b11
2 1 a 3 a12
3 1 a 3 a13
4 2 a 2 a21
5 2 a 2 a22
6 2 b 2 b21
7 2 b 2 b22
You want to group by ID and values. Within each group, you are interested in two things: the number of members in the group (count) and the occurrence within the group (order):
df['order'] = df.groupby(['ID', 'values']).cumcount() + 1
df['count'] = df.groupby(['ID', 'values']).transform('count')
You can then concatenate their string values, along with the values using sum:
df['label'] = df[['values', 'ID', 'order']].astype(str).sum(axis=1)
Which leads to:
ID values order count label
0 1 a 1 3 a11
1 1 b 1 1 b11
2 1 a 2 3 a12
3 1 a 3 3 a13
4 2 a 1 2 a21
5 2 a 2 2 a22
6 2 b 1 2 b21
7 2 b 2 2 b22

Pandas dependent columns lookup

I have a dataset that has 2 conditions, 2 replicates and samples with corresponding values (amounts). I read this into a pandas dataframe:
condition replicate sample amount
0 1 1 a1 5
1 1 1 a2 2
2 1 2 a1 3
3 1 2 a2 1
4 2 1 b99 7
5 2 1 a2 4
6 2 2 a1 3
7 2 2 a2 2
I want to divide the amount from every sample in condition 1, by the amount from the corresponding sample in condition 2, if they belong to the same replicate (and have the same sample name).
In other words, I want to find the ratio between the amounts where the sample names and replicate numbers match between the conditions.
In this example, the output should be something like:
replicate sample amount
0 1 a1 0.714286
1 1 a2 NaN
2 2 a1 1.000000
3 2 a2 0.500000
I need advice if I should structure my data differently and if it is a good idea to go for pandas dataframes? Can anyone think of an elegant lookup solution?
You can use unstack for columns by conditions, then divide columns and last remove all NaNs rows by dropna:
df = df.set_index(['sample','replicate','condition'])['amount'].unstack()
df['new'] = df[1].div(df[2])
df = df['new'].unstack().dropna(how='all').stack(dropna=False).reset_index(name='amount')
print (df)
sample replicate amount
0 a1 1 NaN
1 a1 2 1.0
2 a2 1 0.5
3 a2 2 0.5

Index Value of Last Matching Row Python Panda DataFrame

I have a dataframe which has a value of either 0 or 1 in a "column 2", and either a 0 or 1 in "column 1", I would somehow like to find and append as a column the index value for the last row where Column1 = 1 but only for rows where column 2 = 1. This might be easier to see than read:
d = {'C1' : pd.Series([1, 0, 1,0,0], index=[1,2,3,4,5]),'C2' : pd.Series([0, 0,0,1,1], index=[1,2,3,4,5])}
df = pd.DataFrame(d)
print(df)
C1 C2
1 1 0
2 0 0
3 1 0
4 0 1
5 0 1
#I've left out my attempts as they don't even get close
df['C3'] = IF C2 = 1: Call Function that gives Index Value of last place where C1 = 1 Else 0 End
This would result in this result set:
C1 C2 C3
1 1 0 0
2 0 0 0
3 1 0 0
4 0 1 3
5 0 1 3
I was trying to get a function to do this as there are roughly 2million rows in my data set but only ~10k where C2 =1.
Thank you in advance for any help, I really appreciate it - I only started
programming with python a few weeks ago.
It is not so straight forward, you have to do a few loops to get this result. The key here is the fillna method which can do forwards and backwards filling.
It is often the case that pandas methods does more than one thing, this makes it very hard to figure out what methods to use for what.
So let me talk you through this code.
First we need to set C3 to nan, otherwise we cannot use fillna later.
Then we set C3 to be the index but only where C1 == 1 (the mask does this)
After this we can use fillna with method='ffill' to propagate the last observation forwards.
Then we have to mask away all the values where C2 == 0, same way we set the index earlier, with a mask.
df['C3'] = pd.np.nan
mask = df['C1'] == 1
df['C3'].loc[mask] = df.index[mask].copy()
df['C3'] = df['C3'].fillna(method='ffill')
mask = df['C2'] == 0
df['C3'].loc[mask] = 0
df
C1 C2 C3
1 1 0 0
2 0 0 0
3 1 0 0
4 0 1 3
5 0 1 3
EDIT:
Added a .copy() to the index, otherwise we overwrite it and the index gets all full of zeroes.

Count values in a range comprehended between two values

At Column A i have this values 1
0
3
2
0
5
1
1
1
0
2
1
1
1
0
2
1
1
1
0
0
3
0
2
0
0
3
1
This list grows everyday.
I need a formula to put on every cell of column B that counts upwards how many values bigger than 1 are until the next value = 1 is found.
In another words i need to count how many values larger than 1 are between 1's.
The pretended result would be something like this:
1
0
3
2
0
5
1 3
1
0
2
1 1
1
0
2
1 1
1
0
0
3
0
2
0
0
3
1 3
Thanks in Advance
I would use a helper column, if this is acceptable.
So to create a running count of numbers greater than one which resets each time it encounters a '1', enter this starting in B2 and pull down (I'm assuming the data has a heading and the list starts with a 1) :-
=IF(A2=1,0,B1+(A2>1))
Then to display the counts at each '1' value (but not for repeated ones) enter this in C2 and pull down:-
=IF(AND(A2=1,A1<>1,ISNUMBER(A1)),B1,"")
It's also possible to do it with an array formula, but not sure if it's worth the effort:-
=IF(AND(A2=1,A1<>1),
COUNTIF(
OFFSET(
A$1,
MAX(ROW(A1:A$2)*(A1:A$2=1))-ROW(A$1)+1,,
MAX(ROW(A1))-MAX(ROW(A1:A$2)*(A1:A$2=1))),
">"&0),
"")
to be entered in B2 with Ctrl Shift Enter and pulled down.

Resources