Update two dataframe column based on condition - python-3.x

I Have a dataframe with columns as 'PK', 'Column1', 'Column2'.
I want to update Column1 and Column2 as follows:
If Column1 > Column2 then (Column1 = Column1 - Column2) and at the same time Column2 = 0
Similarly
If Column1 < Column2 then (Column2 = Column2 - Column1) and at the same time Column1 = 0
I have tried with following but it is not giving expected result:
df["Column1"] = np.where(df['Column1'] > df['Column2'], df['Column1'] - df['Column2'], 0)
df["Column2"] = np.where(df['Column1'] < df['Column2'], df['Column2'] - df['Column1'], 0)

Use DataFrame.assign for avoid testing overwriten column Column1 in second line of your code:
df = pd.DataFrame({
'Column1':[4,5,4,5,5,4],
'Column2':[7,8,9,4,2,3],
})
print (df)
Column1 Column2
0 4 7
1 5 8
2 4 9
3 5 4
4 5 2
5 4 3
a = np.where(df['Column1'] > df['Column2'], df['Column1'] - df['Column2'], 0)
b = np.where(df['Column1'] < df['Column2'], df['Column2'] - df['Column1'], 0)
df = df.assign(Column1 = a, Column2 = b)
print (df)
Column1 Column2
0 0 3
1 0 3
2 0 5
3 1 0
4 3 0
5 1 0

Related

How to create column based on string position of other column in python?

df:
Col_A
0 011011
1 011011
2 011011
3 011011
4 011011
How to create a column based on string position, for example in column_A I need to check 0 position create a column B.
Output;
Col_A Col_B
0 011011 pos1,pos4
1 000111 pos1,pos2,pos3
2 011000 pos1,pos4,pos5,pos6
3 011111 pos1
4 011010 pos1,pos4,pos6
First convert strings to DataFrame and add columns names by function in rename:
f = lambda x: f'pos{x+1}'
df1 = pd.DataFrame([list(x) for x in df['Col_A']], index=df.index).rename(columns=f)
print (df1)
pos1 pos2 pos3 pos4 pos5 pos6
0 0 1 1 0 1 1
1 0 0 0 1 1 1
2 0 1 1 0 0 0
3 0 1 1 1 1 1
4 0 1 1 0 1 0
Then compare '0' values by DataFrame.eq and for new column use matrix multiplication by DataFrame.dot with remove separator by Series.str.rstrip:
df['Col_B'] = df1.eq('0').dot(df1.columns + ',').str.rstrip(',')
print (df)
Col_A Col_B
0 011011 pos1,pos4
1 000111 pos1,pos2,pos3
2 011000 pos1,pos4,pos5,pos6
3 011111 pos1
4 011010 pos1,pos4,pos6

groupby and trim some rows based on condition

I have a data frame something like this:
df = pd.DataFrame({"ID":[1,1,2,2,2,3,3,3,3,3],
"IF_car":[1,0,0,1,0,0,0,1,0,1],
"IF_car_history":[0,0,0,1,0,0,0,1,0,1],
"observation":[0,0,0,1,0,0,0,2,0,3]})
I want output where I can trim rows in groupby with ID and condition on "IF_car_history" == 1
tried_df = df.groupby(['ID']).apply(lambda x: x.loc[:(x['IF_car_history'] == '1').idxmax(),:]).reset_index(drop = True)
I want to drop rows in a groupby by after i get ['IF_car_history'] == '1'
expected output:
Thanks
First compare values for mask m by Series.eq and then use GroupBy.cumsum, and for values before 1 compare by 0, last filter by boolean indexing, but because id necesary remove after last 1 is used swapped values by slicing with [::-1].
m = df['IF_car_history'].eq(1).iloc[::-1]
df1 = df[m.groupby(df['ID']).cumsum().ne(0).iloc[::-1]]
print (df1)
ID IF_car IF_car_history observation
2 2 0 0 0
3 2 1 1 1
5 3 0 0 0
6 3 0 0 0
7 3 1 1 2
8 3 0 0 0
9 3 1 1 3

Dataset with maximal rows by userId indicated

I have a dataframe like this:
ID date var1 var2 var3
AB 22/03/2020 0 1 3
AB 29/03/2020 0 3 3
CD 22/03/2020 0 1 1
And I would like to have a new dataset that, if it is a maximal column (can happen ties too) leaves the same number of the original dataset on the rows; otherwise set -1 if it is not the maximal. So it would be:
ID date var1 var2 var3
AB 22/03/2020 -1 -1 3
AB 29/03/2020 -1 3 3
CD 22/03/2020 -1 1 1
But I am not managing at all how to do this. What can I try next?
Select only numeric columns by DataFrame.select_dtypes:
df1 = df.select_dtypes(np.number)
Or select all columns without first two by positions by DataFrame.iloc:
df1 = df.iloc[:, 2:]
Or select columns with var label by DataFrame.filter:
df1 = df1.filter(like='var')
And then set new values by DataFrame.where with max:
df[df1.columns] = df1.where(df1.eq(df1.max(1), axis=0), -1)
print (df)
ID date var1 var2 var3
0 AB 22/03/2020 -1 -1 3
1 AB 29/03/2020 -1 3 3
2 CD 22/03/2020 -1 1 1
IIUC use where and date back
s=df.loc[:,'var1':]
df.update(s.where(s.eq(s.max(1),axis=0),-1))
df
ID date var1 var2 var3
0 AB 22/03/2020 -1 -1 3
1 AB 29/03/2020 -1 3 3
2 CD 22/03/2020 -1 1 1

Unstacking a pandas dataframe

Suppose I have a dataframe with two columns called 'column' and 'value' that looks like this:
Dataframe 1:
column value
0 column1 1
1 column2 1
2 column3 1
3 column4 1
4 column5 2
5 column6 1
6 column7 1
7 column8 1
8 column9 8
9 column10 2
10 column1 1
11 column2 1
12 column3 1
13 column4 3
14 column5 2
15 column6 1
16 column7 1
17 column8 1
18 column9 1
19 column10 2
20 column1 5
.. ... ...
I want to transform this dataframe so that it looks like this:
Dataframe 2:
column1 column2 column3 column4 column5 column6 column7 column8 column9 column10
0 1 1 1 1 2 1 1 1 8 2
1 1 1 1 3 2 1 1 1 1 2
2 5 .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. ..
Now I know how to do it the other way around. If you have a dataframe called df that looks like dataframe 2 you can stack it with the following code:
df = (df.stack().reset_index(level=0, drop=True).rename_axis(['column']).reset_index(name='value'))
Unfortunately, I don't know how to go back!
Question: How do I manipulate dataframe 1 (unstack it, if that's a word) so that it looks like dataframe 2?
Create MultiIndex by set_index with counter Series by cumcount and reshape by unstack:
g = df.groupby('column').cumcount()
df1 = df.set_index([g, 'column'])['value'].unstack(fill_value=0)
print (df1)
column column1 column10 column2 column3 column4 column5 column6 \
0 1 2 1 1 1 2 1
1 1 2 1 1 3 2 1
2 5 0 0 0 0 0 0
column column7 column8 column9
0 1 1 8
1 1 1 1
2 0 0 0
Last if need sorting by numeric value of columns names use extract for integers, convert them and get positions of columns by argsort - last reorder by iloc:
df1 = df1.iloc[:, df1.columns.str.extract('(\d+)', expand=False).astype(int).argsort()]
print (df1)
column column1 column2 column3 column4 column5 column6 column7 \
0 1 1 1 1 2 1 1
1 1 1 1 3 2 1 1
2 5 0 0 0 0 0 0
column column8 column9 column10
0 1 8 2
1 1 1 2
2 0 0 0

How to add string to all values in a column of pandas DataFrame

Say you have a DataFrame with columns;
col_1 col_2
1 a
2 b
3 c
4 d
5 e
how would you change the values of col_2 so that, new value = current value + 'new'
Use +:
df.col_2 = df.col_2 + 'new'
print (df)
col_1 col_2
0 1 anew
1 2 bnew
2 3 cnew
3 4 dnew
4 5 enew
Thanks hooy for another solution:
df.col_2 += 'new'
Or assign:
df = df.assign(col_2 = df.col_2 + 'new')
print (df)
col_1 col_2
0 1 anew
1 2 bnew
2 3 cnew
3 4 dnew
4 5 enew

Resources