Blank Cell equal to value Zero in another cell should returns True in python,pandas - python-3.x

Below is my data in excel format
Where there is a blank cell it is treating as general
Where there are amounts, format is "#,##0.00000;(#,##0.00000)"
I have written the below code to calculate the Blank cell equal to value zero from another cell should return True
import pandas as pd
df=pd.read_excel('Book1.xlsx',dtype=str)
df.replace('nan','',inplace=True)
df['True']=''
df.loc[df['Amount_1'] == df['Amount_2'],'True'] = 'True'
df.loc[df['Amount_1'] != df['Amount_2'],'True'] = 'False'
df
Name Amount_1 Amount_2 True
0 A1 0 False
1 A2 0 False
2 A3 0.01 False
If I am doing it in excel I am getting True for the first two rows whereas I am getting False here.
My End Result/Expected Result should be:
True for A1 and A2 but I am getting False instead.
While writing to Excel blank cells should come as blanks.

You could give more conditions such as
import pandas as pd
df=pd.read_csv('test.csv',dtype=str) # this is modified for my test.
df=df.fillna('')
df['True'] = ''
df.loc[df['Amount_1'] != df['Amount_2'], 'True'] = 'False'
df.loc[df['Amount_1'] == df['Amount_2'], 'True'] = 'True'
df.loc[(df['Amount_1'] == '') & (df['Amount_2'] == '0'), 'True'] = 'True'
df.loc[(df['Amount_2'] == '') & (df['Amount_1'] == '0'), 'True'] = 'True'
df
where the result is:
Name Amount_1 Amount_2 True
0 A1 0 True
1 A2 0 True
2 A3 0.01 False

Related

I am finding it hard to understand Boolean Logic in code

The code below has the Python code with Boolean expressions. Please help me understand the logic in them and how they are bringing out the different results.
#code 1
for row in range(7):#Code to print rows
for col in range(5):#Code to print columns
if ((col == 0 or col == 4) or row!=0):
print("*", end = "")
else:
print(end = " ")
print()
#Code 2
for row in range(7):#Code to print rows
for col in range(5):#Code to print columns
if ((col ==0 and col == 4) and row!=0):
print("*", end = "")
else:
print(end = " ")
print()
#Code 3
for row in range(7):#Code to print rows
for col in range(5):#Code to print columns
if ((col ==0 or col == 4) and row!=0):
print("*", end = "")
else:
print(end = " ")
print()
#code 4
for row in range(7):#Code to print rows
for col in range(5):#Code to print columns
if ((col ==0 and col == 4) or row!=0):
print("*", end = "")
else:
print(end = " ")
print()
The four codes have in common:
Two nested loops go through 7x5 = 35 different cases.
For each case, a Boolean expression is evaluated. Depending on the result, a '*' is printed for true and a gap/space for false.
In English, the four Boolean expressions can be described as follows:
1: ((col == 0 or col == 4) or row!=0)
This is true when either col is 0 or 4, or row is unequal to 0.
In other words: For the first row, there are two '*' columns.
For the remaining six rows, outcome is '*' for all columns
2: ((col == 0 and col == 4) and row!=0)
This can only be true for the last six rows.
But col cannot have two different values at the same time.
Therefore, the expression is always false. It is a contradiction.
3: ((col == 0 or col == 4) and row!=0)
This can only be true for two of the five columns.
It is false for the first row.
4: ((col == 0 and col == 4) or row!=0)
The col part is a contradiction and thus always false.
But the row part is true for the last six rows.
Therefore, one blank rows followed by six rows of '*' are printed

Dataframe - Create new column based on a given column's previous & current row value

I am dealing with a dataframe which has several thousands of rows & has several columns. The column of interest is called customer_csate_score.
The data looks like below
customer_csate_score
0.000
-0.4
0
0.578
0.418
-0.765
0.89
What I'm trying to do is create a new column in the dataframe called customer_csate_score_toggle_status which will have true if the value changed either from a positive value to a negative value or vice-versa. It will have value false if the polarity didn't reverse.
Expected Output for toggle status column
customer_csate_score_toggle_status
False
True
False
True
False
True
True
I've tried few different things but haven't been able to get this to work. Here's what I've tried -
Attempt - 1
def update_cust_csate_polarity(curr_val, prev_val):
return True if (prev_val <= 0 and curr_val > 0) or (prev_val >= 0 and curr_val < 0) else False
data['customer_csate_score_toggle_status'] = data.apply(lambda x: update_cust_csate_polarity(data['customer_csate_score'], data['customer_csate_score'].shift()))
Attempt - 2
//Testing just one condition
data['customer_csate_score_toggle_status'] = data[(data['customer_csate_score'].shift() < 0) & (data['customer_csate_score']) > 0]
Could I please request help to get this right?
Calculate the sign change using np.sign(df.customer_csate_score)[lambda x: x != 0].diff() != 0:
Get the sign of values;
Filter out 0s so sequence like 5 0 1 won't get marked incorrectly;
Check if the sign has changed using diff.
import numpy as np
df['customer_csate_score_toggle_status'] = np.sign(df.customer_csate_score)[lambda x: x != 0].diff() != 0
df['customer_csate_score_toggle_status'] = df['customer_csate_score_toggle_status'].fillna(False)
df
customer_csate_score customer_csate_score_toggle_status
0 0.000 False
1 -0.400 True
2 0.000 False
3 0.578 True
4 0.418 False
5 -0.765 True
6 0.890 True

In Python Pandas, how do I combine two columns containing strings using if/else statement or similar?

I have created a pandas dataframe from an excel file where first two columns are:
df = pd.DataFrame({'0':['','','Location Code','pH','Ag','Alkalinity'], '1':['Lab Id','Collection Date','','','µg/L','mg/L']})
which looks like this:
df[0] df[1]
Lab Id
Collection Date
Location Code
pH
Ag µg/L
Alkalinity mg/L
I want to merge these columns into one that looks like this:
df[0]
Lab Id
Collection Date
Location Code
pH
Ag (µg/L)
Alkalinity (mg/L)
I believe I need a control statement before combining df[0] and df[1] which would appear like this:
if **there is a blank space in either column, then it performs**:
df[0] = df[0].astype(str)+df[1].astype(str)
else:
df[0] = df[0].astype(str)+' ('+df[1].astype(str)+')'
but I am not sure how to write the if statement. Could anyone please guide me here.
Thank you very much.
We can try np.select
cond=[(df['0']=='') & (df['1']!=''), (df['0']!='') & (df['1']==''), (df['0']!='') & (df['1'] !='')]
val=[df['1'], df['0'], df['0']+ '('+df['1']+')']
df['new']=np.select(cond,val)
df
0 1 new
0 Lab Id Lab Id
1 Collection Date Collection Date
2 Location Code Location Code
3 pH pH
4 Ag µg/L Ag(µg/L)
5 Alkalinity mg/L Alkalinity(mg/L)
if value is Na, maybe:
df['result'] = df[0].fillna(df[1])
This works using numpy where, and the string concatenation assumption is based on the data shared :
df.assign(
merger=np.where(
df["1"].str.endswith("/L"),
df["0"].str.cat(df["1"], "(").add(")"),
df["0"].str.cat(df["1"], ""),
)
)
0 1 merger
0 Lab Id Lab Id
1 Collection Date Collection Date
2 Location Code Location Code
3 pH pH
4 Ag µg/L Ag(µg/L)
5 Alkalinity mg/L Alkalinity(mg/L)
Or, you could just assign it to "0", if that is what you are after :
df["0"] = np.where(
df["1"].str.endswith("/L"),
df["0"].str.cat(df["1"], "(").add(")"),
df["0"].str.cat(df["1"], ""),
)
Here is another way:
First you replace values you are going to concat with the value + '()'
df['1'].loc[df.replace('', np.nan).notnull().all(axis =1 )] = '(' + df['1'] + ')'
Now we fill in missing values with bfill and ffill
df = df.replace('', np.nan).bfill(axis = 1).ffill(axis = 1)
Only thing remaining, is to merge values wherever we have brackets
df.loc[:, 'merge'] = np.where(df['1'].str.endswith(')'), df['0'] + df['1'], df['1'])
Test if empty value at least in one column 0,1 by DataFrame.eq and DataFrame.any and then join both columns like in your answer in numpy.where:
df = pd.DataFrame({0:['','','Location Code','pH','Ag','Alkalinity'],
1:['Lab Id','Collection Date','','',u'µg/L','mg/L']})
print (df[[0,1]].eq(''))
0 1
0 True False
1 True False
2 False True
3 False True
4 False False
5 False False
print (df[[0,1]].eq('').any(axis=1))
0 True
1 True
2 True
3 True
4 False
5 False
dtype: bool
df[0] = np.where(df[[0,1]].eq('').any(axis=1),
df[0].astype(str)+df[1].astype(str),
df[0].astype(str)+' ('+df[1].astype(str)+')')
print (df)
0 1
0 Lab Id Lab Id
1 Collection Date Collection Date
2 Location Code
3 pH
4 Ag (µg/L) µg/L
5 Alkalinity (mg/L) mg/L

selecting different columns each row

I have a dataframe which has 500K rows and 7 columns for days and include start and end day.
I search a value(like equal 0) in range(startDay, endDay)
Such as, for id_1, startDay=1, and endDay=7, so, I should seek a value D1 to D7 columns.
For id_2, startDay=4, and endDay=7, so, I should seek a value D4 to D7 columns.
However, I couldn't seek different column range successfully.
Above-mentioned,
if startDay > endDay, I should see "-999"
else, I need to find first zero (consider the day range) and such as for id_3's, first zero in D2 column(day 2). And starDay of id_3 is 1. And I want to see, 2-1=1 (D2 - StartDay)
if I cannot find 0, I want to see "8"
Here is my data;
data = {
'D1':[0,1,1,0,1,1,0,0,0,1],
'D2':[2,0,0,1,2,2,1,2,0,4],
'D3':[0,0,1,0,1,1,1,0,1,0],
'D4':[3,3,3,1,3,2,3,0,3,3],
'D5':[0,0,3,3,4,0,4,2,3,1],
'D6':[2,1,1,0,3,2,1,2,2,1],
'D7':[2,3,0,0,3,1,3,2,1,3],
'startDay':[1,4,1,1,3,3,2,2,5,2],
'endDay':[7,7,6,7,7,7,2,1,7,6]
}
data_idx = ['id_1','id_2','id_3','id_4','id_5',
'id_6','id_7','id_8','id_9','id_10']
df = pd.DataFrame(data, index=data_idx)
What I want to see;
df_need = pd.DataFrame([0,1,1,0,8,2,8,-999,8,1], index=data_idx)
You can create boolean array to check in each row which 'Dx' column(s) are above 'startDay' and below 'endDay' and the value is equal to 0. For the first two conditions, you can use np.ufunc.outer with the ufunc being np.less_equal and np.greater_equal such as:
import numpy as np
arr_bool = ( np.less_equal.outer(df.startDay, range(1,8)) # which columns Dx is above startDay
& np.greater_equal.outer(df.endDay, range(1,8)) # which columns Dx is under endDay
& (df.filter(regex='D[0-9]').values == 0)) #which value of the columns Dx are 0
Then you can use np.argmax to find the first True per row. By adding 1 and removing 'startDay', you get the values you are looking for. Then you need to look for the other conditions with np.select to replace values by -999 if df.startDay >= df.endDay or 8 if no True in the row of arr_bool such as:
df_need = pd.DataFrame( (np.argmax(arr_bool , axis=1) + 1 - df.startDay).values,
index=data_idx, columns=['need'])
df_need.need= np.select( condlist = [df.startDay >= df.endDay, ~arr_bool.any(axis=1)],
choicelist = [ -999, 8],
default = df_need.need)
print (df_need)
need
id_1 0
id_2 1
id_3 1
id_4 0
id_5 8
id_6 2
id_7 -999
id_8 -999
id_9 8
id_10 1
One note: to get -999 for id_7, I used the condition df.startDay >= df.endDay in np.select and not df.startDay > df.endDay like in your question, but you can cahnge to strict comparison, you get 8 instead of -999 in this case.

replace values in pandas based on other two column

I have problem with replacement values in a column conditional other two columns.
For example we have three columns. A, B, and C
Columns A and B are both booleans, containing True and False, and column C contains three values: "Payroll", "Social", and "Other".
When in columns A and B are True in column C we have value "Payroll".
I want to change values in column C where both column A and B are True.
I tried following code: but gives me this error "'NoneType' object has no attribute 'where'":
data1.replace({'C' : { 'Payroll', 'Social'}},inplace=True).where((data1['A'] == True) & (data1['B'] == True))
but gives me this error "'NoneType' object has no attribute 'where'":
What can be done to this problem?
I think you need all for check if all Trues per rows and then assign output by filtered DataFrame by boolean mask:
data1 = pd.DataFrame({
'C': ['Payroll','Other','Payroll','Social'],
'A': [True, True, True, False],
'B':[False, True, True, False]
})
print (data1)
A B C
0 True False Payroll
1 True True Other
2 True True Payroll
3 False False Social
m = data1[['A', 'B']].all(axis=1)
#same output as
#m = data1['A'] & data1['B']
print (m)
0 False
1 True
2 True
3 False
dtype: bool
print (data1[m])
A B C
1 True True Other
2 True True Payroll
data1[m] = data1[m].replace({'C' : { 'Payroll':'Social'}})
print (data1)
A B C
0 True False Payroll
1 True True Other
2 True True Social
3 False False Social
Well you can use apply function to do this
def change_value(dataframe):
for index, row in df.iterrows():
if row['A'] == row['B'] == True:
row['C'] = # Change to whatever value you want
else:
row ['C'] = # Change how ever you want

Resources