Update a pandas dataframe - python-3.x

I have a pandas dataframe with multiple columns, I have to update a column with true or false based on a condition. Example the column names are price and result, if price column has promotion as value then result column should be updated as true or else false.
Please help me with this.

Given this df:
price result
0 promotion 0
1 1 0
2 4 0
3 3 0
You can do so:
df['result'] = np.where(df['price'] == 'promotion', True, False)
Output:
price result
0 promotion True
1 1 False
2 4 False
3 3 False

Lets suppose the dataframe looks like this:
price result
0 0 False
1 1 False
2 2 False
3 promotion False
4 3 False
5 promotion False
You can create two boolean arrays. The first one will have 'True' value at the indices where you want to set the 'True' value in result column and the second one will have 'True' values at the indices where you want to set 'False' value in result column.
Here is the code:
index_true = (df['price'] == 'promotion')
index_false = (df['price'] != 'promotion')
df.loc[index_true, 'result'] = True
df.loc[index_false, 'result'] = False
The resultant dataframe will look like this:
price result
0 0 False
1 1 False
2 2 False
3 promotion True
4 3 False
5 promotion True

Related

PANDAS/Python check if the value from 2 datasets is equal and change the 1&0 to True or False

I want to check if the value in both datasets is equal. But the datasets are not in the same order so need to loop through the datasets.
Dataset 1 contract : enter image description here
Part number
H50
H51
H53
ID001
1
1
1
ID002
1
1
1
ID003
0
1
0
ID004
1
1
1
ID005
1
1
1
data 2 anx : enter image description here
So the partnumber are not in the same order, but to check the value the partnumber needs to be equal from each file. Then if the part nr is the same, check if the Hcolumn is the same too. If both partnumber and the H(header)nr are the same, check if the value is the same.
Part number
H50
H51
H53
ID001
1
1
1
ID003
0
0
1
ID004
0
1
1
ID002
1
0
1
ID005
1
1
1
Expecting outcome:
If the value 1==1 or 0 == 0 from both dataset -> change to TRUE.
If the value = 1 in dataset1 but = 0 in dataset2 -> change the value to FALSE. and safe all the rows that contains FALSE value into an excel file name "Not in contract"
If the value = 0 in dataset1 but 1 in dataset2 -> change the value to FALSE
Example expected outcome
Part number
H50
H51
H53
ID001
TRUE
TRUE
TRUE
ID002
TRUE
FALSE
TRUE
ID003
TRUE
FALSE
FALSE
ID004
FALSE
TRUE
TRUE
ID005
TRUE
TRUE
TRUE
df_merged = df1.merge(df2, on='Part number')
a = df_merged[df_merged.columns[df_merged.columns.str.contains('_x')]]
b = df_merged[df_merged.columns[df_merged.columns.str.contains('_y')]]
out = pd.concat([df_merged['Part number'], pd.DataFrame(a.values == b.values, columns=df1.columns[1:4])], axis=1)
out
Part number H50 H51 H53
0 ID001 True True True
1 ID002 True False True
2 ID003 True False False
3 ID004 False True True
4 ID005 True True True

Python: Compare 2 pandas dataframe with unequal number of rows

Need to compare two pandas dataframe with unequal number of rows and generate a new df with True for matching records and False for non matching and missing records.
df1:
date x y
0 2022-11-01 4 5
1 2022-11-02 12 5
2 2022-11-03 11 3
df2:
date x y
0 2022-11-01 4 5
1 2022-11-02 11 5
expected df_output:
date x y
0 True True True
1 False False False
2 False False False
Code:
df1 = pd.DataFrame({'date':['2022-11-01', '2022-11-02', '2022-11-03'],'x':[4,12,11],'y':[5,5,3]})
df2 = pd.DataFrame({'date':['2022-11-01', '2022-11-02'],'x':[4,11],'y':[5,5]})
df_output = pd.DataFrame(np.where(df1 == df2, True, False), columns=df1.columns)
print(df_output)
Error: ValueError: Can only compare identically-labeled DataFrame objects
You can use:
# cell to cell equality
# comparing by date
df3 = df1.eq(df1[['date']].merge(df2, on='date', how='left'))
# or to compare by index
# df3 = df1.eq(df2, axis=1)
# if you also want to turn a row to False if there is any False
df3 = (df3.T & df3.all(axis=1)).T
Output:
date x y
0 True True True
1 False False False
2 False False False

pandas dataframe groups check the number of unique values of a column is one but exclude empty strings

I have the following df,
id invoice_no
1 6636
1 6637
2 6639
2 6639
3
3
4 6635
4 6635
4 6635
the invoice_no for id 3 are all empty strings or spaces; I want to
df['same_invoice_no'] = df.groupby("id")["invoice_no"].transform('nunique') == 1
but also consider spaces and empty string invoice_no in each group as same_invoice_no = False; I am wondering how to do that. The result will look like,
id invoice_no same_invoice_no
1 6636 False
1 6637 False
2 6639 True
2 6639 True
3 False
3 False
4 6635 True
4 6635 True
4 6635 True
Empty strings equate to True but NaNs don't. Replace empty strings by Numpy nan
df.replace('', np.nan, inplace = True)
df['same_invoice_no'] = df.groupby("id")["invoice_no"].transform('nunique') == 1
id invoice_no same_invoice_no
0 1 6636.0 False
1 1 6637.0 False
2 2 6639.0 True
3 2 6639.0 True
4 3 NaN False
5 3 NaN False
6 4 6635.0 True
7 4 6635.0 True
8 4 6635.0 True

pandas create a column based on values in another column which selected as conditions

I have the following df,
id match_type amount negative_amount
1 exact 10 False
1 exact 20 False
1 name 30 False
1 name 40 False
1 amount 15 True
1 amount 15 True
2 exact 0 False
2 exact 0 False
I want to create a column 0_amount_sum that indicates (boolean) if the amount sum is <= 0 or not for each id of a particular match_type, e.g. the following is the result df;
id match_type amount 0_amount_sum negative_amount
1 exact 10 False False
1 exact 20 False False
1 name 30 False False
1 name 40 False False
1 amount 15 True True
1 amount 15 True True
2 exact 0 True False
2 exact 0 True False
for id=1 and match_type=exact, the amount sum is 30, so 0_amount_sum is False. The code is as follows,
df = df.loc[df.match_type=='exact']
df['0_amount_sum_'] = (df.assign(
amount_n=df.amount * np.where(df.negative_amount, -1, 1)).groupby(
'id')['amount_n'].transform(lambda x: sum(x) <= 0))
df = df.loc[df.match_type=='name']
df['0_amount_sum_'] = (df.assign(
amount_n=df.amount * np.where(df.negative_amount, -1, 1)).groupby(
'id')['amount_n'].transform(lambda x: sum(x) <= 0))
df = df.loc[df.match_type=='amount']
df['0_amount_sum_'] = (df.assign(
amount_n=df.amount * np.where(df.negative_amount, -1, 1)).groupby(
'id')['amount_n'].transform(lambda x: sum(x) <= 0))
I am wondering if there is a better way/more efficient to do that, especially when the values of match_type is unknown, so the code can automatically enumerate all the possible values and then do the calculation accordingly.
I believe need groupby by 2 Series (columns) instead filtering:
df['0_amount_sum_'] = ((df.amount * np.where(df.negative_amount, -1, 1))
.groupby([df['id'], df['match_type']])
.transform('sum')
.le(0))
id match_type amount negative_amount 0_amount_sum_
0 1 exact 10 False False
1 1 exact 20 False False
2 1 name 30 False False
3 1 name 40 False False
4 1 amount 15 True True
5 1 amount 15 True True
6 2 exact 0 False True
7 2 exact 0 False True

delete rows based on first N columns

I have a datafame:
import pandas as pd
df= pd.DataFrame({'date':['2017-12-31','2018-02-01','2018-03-01'],'type':['Asset','Asset','Asset'],'Amount':[1,0,0],'Amount1':[1,0,0],'Ted':[1,0,0]})
df
I want to delete rows where the first three columns are 0. I don't want to use the name of the column as it changes. In this case, I want to delete the 2nd and 3rd rows.
Use boolean indexing:
df = df[df.iloc[:, :3].ne(0).any(axis=1)]
#alternative solution with inverting mask by ~
#df = df[~df.iloc[:, :3].eq(0).all(axis=1)]
print (df)
Amount Amount1 Ted date type
0 1 1 1 2017-12-31 Asset
Detail:
First select N columns by iloc:
print (df.iloc[:, :3])
Amount Amount1 Ted
0 1 1 1
1 0 0 0
2 0 0 0
Compare by ne (!=):
print (df.iloc[:, :3].ne(0))
Amount Amount1 Ted
0 True True True
1 False False False
2 False False False
Get all rows at least one True per row by any:
print (df.iloc[:, :3].ne(0).any(axis=1))
0 True
1 False
2 False
dtype: bool

Resources