how to fix None fields as false in a dataframe - python-3.x

Id 0 1 2 3 4 5
0 apple True None None None None None
1 orange False None True None None None
2 banana True None None True None None
3 guava False None None None True None
4 leeche None True None None None None
above dataframe contains boolean and None value
If any of 0-5 columns have false i want to omit them from updated dataframe. considering None denotes True value , so my result should look like
Id
0 apple
2 banana
4 leeche
I am not able to understand how to get combined filter on multiple columns.

Use:
cols = ['0','1','2','3','4','5']
df = df.loc[df[cols].ne('False').all(1), ['Id']]
#if False is boolean
#df = df.loc[df[cols].ne(False).all(1), ['Id']]
print (df)
Id
0 apple
2 banana
4 leeche
If need check all columns without first:
df = df.loc[df.iloc[:, 1:].ne('False').all(1), ['Id']]
Explanation:
First select columns by columns names:
#if strings
cols = ['0','1','2','3','4','5']
#if numeric
#cols = np.arange(6)
print (df[cols])
0 1 2 3 4 5
0 True None None None None None
1 False None True None None None
2 True None None True None None
3 False None None None True None
4 None True None None None None
Then check if not equal False by DataFrame.ne:
#if boolean False
print(df[cols].ne(False))
#if string False
#print(df[cols].ne('False'))
0 1 2 3 4 5
0 True True True True True True
1 False True True True True True
2 True True True True True True
3 False True True True True True
4 True True True True True True
And test if all Trues per rows by DataFrame.all:
print(df[cols].ne('False').all(1))
0 True
1 False
2 True
3 False
4 True
dtype: bool
Last filtering by boolean indexing with select Id with [] for one column DataFrame:
print(df[df[cols].ne('False').all(1)])
Id 0 1 2 3 4 5
0 apple True None None None None None
2 banana True None None True None None
4 leeche None True None None None None
print(df.loc[df[cols].ne('False').all(1), ['Id']])
Id
0 apple
2 banana
4 leeche

Related

I need to match the unique value of rows from one dataset to the columns matching in another dataset and provide the dataframe

Below is the dataframe example where id is the index
df:
id
A
B
C
1
False
False
NA
2
True
False
NA
3
False
True
True
df2:
A
B
C
D
True
False
NA
True
False
True
False
False
False
True
True
True
False
True
True
True
False
True
True
True
False
True
True
True
False
True
True
True
False
True
True
True
Output:
Here we are matching the unique row if the id of df matches with the columns of df2 and has true
values in df2 columns then sum it per id of df and provide the data frame of the same index and ignoring d column in df2
id
A
B
C
Sum of matched true values in columns of df2
1
False
False
NA
0
2
True
False
NA
2
3
False
True
True
6
match_df = try_df.merge(df, on= list_new , how='outer',suffixes=('', '_y'))
match_df.drop(match_df.filter(regex='_y$').columns, axis=1, inplace=True)
df_grouped = match_df.groupby('CIS Sub Controls')[list_new].agg(['sum', 'count'])
df_final = pd.concat([df_grouped['col1']['sum'], df_grouped['col2']['sum'], df_grouped['col3']['sum'], df_grouped['col4']['sum'], df_grouped['col1']['count'], df_grouped['col2']['count'], df_grouped['col3']['count'], df_grouped['col4']['count']], axis=1).join(df_grouped.index)
This is not how it goes
You can use value_counts and merge:
cols = df1.columns.intersection(df2.columns)
out = (df1.merge(df2[cols].value_counts(dropna=False).reset_index(name='sum'),
how='left')
.fillna({'sum': 0}, downcast='infer')
)
Output:
id A B C sum
0 1 False False NaN 0
1 2 True False NaN 1
2 3 False True True 6

PANDAS/Python check if the value from 2 datasets is equal and change the 1&0 to True or False

I want to check if the value in both datasets is equal. But the datasets are not in the same order so need to loop through the datasets.
Dataset 1 contract : enter image description here
Part number
H50
H51
H53
ID001
1
1
1
ID002
1
1
1
ID003
0
1
0
ID004
1
1
1
ID005
1
1
1
data 2 anx : enter image description here
So the partnumber are not in the same order, but to check the value the partnumber needs to be equal from each file. Then if the part nr is the same, check if the Hcolumn is the same too. If both partnumber and the H(header)nr are the same, check if the value is the same.
Part number
H50
H51
H53
ID001
1
1
1
ID003
0
0
1
ID004
0
1
1
ID002
1
0
1
ID005
1
1
1
Expecting outcome:
If the value 1==1 or 0 == 0 from both dataset -> change to TRUE.
If the value = 1 in dataset1 but = 0 in dataset2 -> change the value to FALSE. and safe all the rows that contains FALSE value into an excel file name "Not in contract"
If the value = 0 in dataset1 but 1 in dataset2 -> change the value to FALSE
Example expected outcome
Part number
H50
H51
H53
ID001
TRUE
TRUE
TRUE
ID002
TRUE
FALSE
TRUE
ID003
TRUE
FALSE
FALSE
ID004
FALSE
TRUE
TRUE
ID005
TRUE
TRUE
TRUE
df_merged = df1.merge(df2, on='Part number')
a = df_merged[df_merged.columns[df_merged.columns.str.contains('_x')]]
b = df_merged[df_merged.columns[df_merged.columns.str.contains('_y')]]
out = pd.concat([df_merged['Part number'], pd.DataFrame(a.values == b.values, columns=df1.columns[1:4])], axis=1)
out
Part number H50 H51 H53
0 ID001 True True True
1 ID002 True False True
2 ID003 True False False
3 ID004 False True True
4 ID005 True True True

How to change the 'True' boolean to 'False' boolean incase we have only one 'True' boolean between two 'False' boalean in a column in dataframe

I have unknown number of dataframes. The number and location of 'True' boolean are unknown in a column "label_number_hours" in dataframes. There is a possibility to be unlimited numbers of 'True' booleans between two 'False' booleans in a column "label_number_hours" in dataframes. I am looking to change the 'True' boolean to 'False' boolean in this column if the number of 'True' boolean is only one, for example, False - True - False, I want to be False - False - False.
This is an example of one of dataframe I have:
df =
label_number_hours some_other_column
0 True 0.174998
1 False 0.235088
2 True 0.076127
3 True 0.817929
4 True 0.781144
5 False 0.904597
6 True 0.703006
7 False 0.923654
8 True 0.261100
9 True 0.803631
10 False 0.149026
This is the dataframe which I am looking for:
df =
label_number_hours some_other_column
0 True 0.174998
1 False 0.235088
2 True 0.076127
3 True 0.817929
4 True 0.781144
5 False 0.904597
6 False 0.703006
7 False 0.923654
8 True 0.261100
9 True 0.803631
10 False 0.149026
This is the code:
falses_idx, = np.where(~df["label_number_hours"])
if falses_idx.size > 0:
df.iloc[falses_idx[0]:falses_idx[-1], df.columns.get_loc("label_number_hours")] = False
This is the result:
label_number_hours some_other_column
0 True 0.174998
1 False 0.235088
2 False 0.076127
3 False 0.817929
4 False 0.781144
5 False 0.904597
6 False 0.703006
7 False 0.923654
8 False 0.261100
9 False 0.803631
10 False 0.149026
I need really to your help

pandas dataframe groups check the number of unique values of a column is one but exclude empty strings

I have the following df,
id invoice_no
1 6636
1 6637
2 6639
2 6639
3
3
4 6635
4 6635
4 6635
the invoice_no for id 3 are all empty strings or spaces; I want to
df['same_invoice_no'] = df.groupby("id")["invoice_no"].transform('nunique') == 1
but also consider spaces and empty string invoice_no in each group as same_invoice_no = False; I am wondering how to do that. The result will look like,
id invoice_no same_invoice_no
1 6636 False
1 6637 False
2 6639 True
2 6639 True
3 False
3 False
4 6635 True
4 6635 True
4 6635 True
Empty strings equate to True but NaNs don't. Replace empty strings by Numpy nan
df.replace('', np.nan, inplace = True)
df['same_invoice_no'] = df.groupby("id")["invoice_no"].transform('nunique') == 1
id invoice_no same_invoice_no
0 1 6636.0 False
1 1 6637.0 False
2 2 6639.0 True
3 2 6639.0 True
4 3 NaN False
5 3 NaN False
6 4 6635.0 True
7 4 6635.0 True
8 4 6635.0 True

How to delete the entire row if any of its value is 0 in pandas

In the below example I only want to retain the row 1 and 2
I want to delete all the rows which has 0 anywhere across the column:
kt b tt mky depth
1 1 1 1 1 4
2 2 2 2 2 2
3 3 3 0 3 3
4 0 4 0 0 0
5 5 5 5 5 0
the output should read like below:
kt b tt mky depth
1 1 1 1 1 4
2 2 2 2 2 2
I have tried:
df.loc[(df!=0).any(axis=1)]
But it deletes the row only if all of its corresponding columns are 0
You are really close, need DataFrame.all for check all Trues per row:
df = df.loc[(df!=0).all(axis=1)]
print (df)
kt b tt mky depth
1 1 1 1 1 4
2 2 2 2 2 2
Details:
print (df!=0)
kt b tt mky depth
1 True True True True True
2 True True True True True
3 True True False True True
4 False True False False False
5 True True True True False
print ((df!=0).all(axis=1))
1 True
2 True
3 False
4 False
5 False
dtype: bool
Alternative solution with any for check at least one True for row with changed mask df == 0 and inversing by ~:
df = df.loc[~(df==0).any(axis=1)]

Resources