replace values in pandas based on other two column - python-3.x

I have problem with replacement values in a column conditional other two columns.
For example we have three columns. A, B, and C
Columns A and B are both booleans, containing True and False, and column C contains three values: "Payroll", "Social", and "Other".
When in columns A and B are True in column C we have value "Payroll".
I want to change values in column C where both column A and B are True.
I tried following code: but gives me this error "'NoneType' object has no attribute 'where'":
data1.replace({'C' : { 'Payroll', 'Social'}},inplace=True).where((data1['A'] == True) & (data1['B'] == True))
but gives me this error "'NoneType' object has no attribute 'where'":
What can be done to this problem?

I think you need all for check if all Trues per rows and then assign output by filtered DataFrame by boolean mask:
data1 = pd.DataFrame({
'C': ['Payroll','Other','Payroll','Social'],
'A': [True, True, True, False],
'B':[False, True, True, False]
})
print (data1)
A B C
0 True False Payroll
1 True True Other
2 True True Payroll
3 False False Social
m = data1[['A', 'B']].all(axis=1)
#same output as
#m = data1['A'] & data1['B']
print (m)
0 False
1 True
2 True
3 False
dtype: bool
print (data1[m])
A B C
1 True True Other
2 True True Payroll
data1[m] = data1[m].replace({'C' : { 'Payroll':'Social'}})
print (data1)
A B C
0 True False Payroll
1 True True Other
2 True True Social
3 False False Social

Well you can use apply function to do this
def change_value(dataframe):
for index, row in df.iterrows():
if row['A'] == row['B'] == True:
row['C'] = # Change to whatever value you want
else:
row ['C'] = # Change how ever you want

Related

how to get row index of a Pandas dataframe from a regex match

This question has been asked but I didn't find the answers complete. I have a dataframe that has unnecessary values in the first row and I want to find the row index of the animals:
df = pd.DataFrame({'a':['apple','rhino','gray','horn'],
'b':['honey','elephant', 'gray','trunk'],
'c':['cheese','lion', 'beige','mane']})
a b c
0 apple honey cheese
1 rhino elephant lion
2 gray gray beige
3 horn trunk mane
ani_pat = r"rhino|zebra|lion"
That means I want to find "1" - the row index that matches the pattern. One solution I saw here was like this; applying to my problem...
def findIdx(df, pattern):
return df.apply(lambda x: x.str.match(pattern, flags=re.IGNORECASE)).values.nonzero()
animal = findIdx(df, ani_pat)
print(animal)
(array([1, 1], dtype=int64), array([0, 2], dtype=int64))
That output is a tuple of NumPy arrays. I've got the basics of NumPy and Pandas, but I'm not sure what to do with this or how it relates to the df above.
I altered that lambda expression like this:
df.apply(lambda x: x.str.match(ani_pat, flags=re.IGNORECASE))
a b c
0 False False False
1 True False True
2 False False False
3 False False False
That makes a little more sense. but still trying to get the row index of the True values. How can I do that?
We can select from the filter the DataFrame index where there are rows that have any True value in them:
idx = df.index[
df.apply(lambda x: x.str.match(ani_pat, flags=re.IGNORECASE)).any(axis=1)
]
idx:
Int64Index([1], dtype='int64')
any on axis 1 will take the boolean DataFrame and reduce it to a single dimension based on the contents of the rows.
Before any:
a b c
0 False False False
1 True False True
2 False False False
3 False False False
After any:
0 False
1 True
2 False
3 False
dtype: bool
We can then use these boolean values as a mask for index (selecting indexes which have a True value):
Int64Index([1], dtype='int64')
If needed we can use tolist to get a list instead:
idx = df.index[
df.apply(lambda x: x.str.match(ani_pat, flags=re.IGNORECASE)).any(axis=1)
].tolist()
idx:
[1]

Blank Cell equal to value Zero in another cell should returns True in python,pandas

Below is my data in excel format
Where there is a blank cell it is treating as general
Where there are amounts, format is "#,##0.00000;(#,##0.00000)"
I have written the below code to calculate the Blank cell equal to value zero from another cell should return True
import pandas as pd
df=pd.read_excel('Book1.xlsx',dtype=str)
df.replace('nan','',inplace=True)
df['True']=''
df.loc[df['Amount_1'] == df['Amount_2'],'True'] = 'True'
df.loc[df['Amount_1'] != df['Amount_2'],'True'] = 'False'
df
Name Amount_1 Amount_2 True
0 A1 0 False
1 A2 0 False
2 A3 0.01 False
If I am doing it in excel I am getting True for the first two rows whereas I am getting False here.
My End Result/Expected Result should be:
True for A1 and A2 but I am getting False instead.
While writing to Excel blank cells should come as blanks.
You could give more conditions such as
import pandas as pd
df=pd.read_csv('test.csv',dtype=str) # this is modified for my test.
df=df.fillna('')
df['True'] = ''
df.loc[df['Amount_1'] != df['Amount_2'], 'True'] = 'False'
df.loc[df['Amount_1'] == df['Amount_2'], 'True'] = 'True'
df.loc[(df['Amount_1'] == '') & (df['Amount_2'] == '0'), 'True'] = 'True'
df.loc[(df['Amount_2'] == '') & (df['Amount_1'] == '0'), 'True'] = 'True'
df
where the result is:
Name Amount_1 Amount_2 True
0 A1 0 True
1 A2 0 True
2 A3 0.01 False

pandas DataFrame: get cells in column that are NaN, None, empty string/list, etc

there seems to be different methods to check if a cell is not set (NaN, by checking isnull) or whether it contains an empty string or list, but what is the most pythonic way to retrieve all cells that are NaN, None, empty string/list, etc. at the same time?
So far I got:
df = df[df['colname'].isnull() or df['colname'] == None or len(df['colname']) == 0]
Cheers!
One idea is chain Series.isna with compare lengths by Series.str.len:
df = pd.DataFrame({
'a':[None,np.nan,[],'','aa', 0],
})
m = df['a'].isna() | df['a'].str.len().eq(0)
print (m)
0 True
1 True
2 True
3 True
4 False
5 False
Name: a, dtype: bool

How to check row is a sub-row of specified rows in pandas

I want to check whether row in data frame (pandas) is a sub-row of a specified row. All columns have True/False values.
e.g: specified row is 11010 with 5 columns and the sub-row is 10000, the 11110 is not. Clearly, sub-row only contain True values if and only if father-row has True values at corresponding columns.
I have a data frame below:
A B C D E
1 True False True False False
2 True False False True False
3 True True False False True
Input row: specified row True False True True True
Expected output is the first and second row
Thanks for help!
Suppose df is your DataFrame and row is the input row as a Series:
row = pd.Series([True, False, True, True, True], index=df.columns)
You are to find the rows in df that do not dominate row in any column:
answer_index = (df <= row).all(axis=1)
df[answer_index]
# A B C D E
#1 True False True False False
#2 True False False True False

Cannot Reindex from a duplicate axis while subsetting

I have the following dataframe:
print(df)
Col Col Col Name
A B C Alex
B B C Jack
B A A Mark
I would like to get the following result, where at least one A appears:
Col Col Col Name
A B C Alex
B A A Mark
I tried:
final_df = df["Col"] == "A" but it gives me "ValueError: cannot reindex from a duplicate axis"
There is problem you have duplicated columns names, so if select df["Col"] get all columns called Col.
Possible solution is compare all columns with any for check at least one True per row:
df = df[(df == 'A').any(1)]
print (df)
Col Col Col
0 A B C
2 B A A
Details:
print ((df == 'A'))
Col Col Col
0 True False False
1 False False False
2 False True True
print ((df == 'A').any(1))
0 True
1 False
2 True
dtype: bool

Resources