How to check row is a sub-row of specified rows in pandas - python-3.x

I want to check whether row in data frame (pandas) is a sub-row of a specified row. All columns have True/False values.
e.g: specified row is 11010 with 5 columns and the sub-row is 10000, the 11110 is not. Clearly, sub-row only contain True values if and only if father-row has True values at corresponding columns.
I have a data frame below:
A B C D E
1 True False True False False
2 True False False True False
3 True True False False True
Input row: specified row True False True True True
Expected output is the first and second row
Thanks for help!

Suppose df is your DataFrame and row is the input row as a Series:
row = pd.Series([True, False, True, True, True], index=df.columns)
You are to find the rows in df that do not dominate row in any column:
answer_index = (df <= row).all(axis=1)
df[answer_index]
# A B C D E
#1 True False True False False
#2 True False False True False

Related

how to get row index of a Pandas dataframe from a regex match

This question has been asked but I didn't find the answers complete. I have a dataframe that has unnecessary values in the first row and I want to find the row index of the animals:
df = pd.DataFrame({'a':['apple','rhino','gray','horn'],
'b':['honey','elephant', 'gray','trunk'],
'c':['cheese','lion', 'beige','mane']})
a b c
0 apple honey cheese
1 rhino elephant lion
2 gray gray beige
3 horn trunk mane
ani_pat = r"rhino|zebra|lion"
That means I want to find "1" - the row index that matches the pattern. One solution I saw here was like this; applying to my problem...
def findIdx(df, pattern):
return df.apply(lambda x: x.str.match(pattern, flags=re.IGNORECASE)).values.nonzero()
animal = findIdx(df, ani_pat)
print(animal)
(array([1, 1], dtype=int64), array([0, 2], dtype=int64))
That output is a tuple of NumPy arrays. I've got the basics of NumPy and Pandas, but I'm not sure what to do with this or how it relates to the df above.
I altered that lambda expression like this:
df.apply(lambda x: x.str.match(ani_pat, flags=re.IGNORECASE))
a b c
0 False False False
1 True False True
2 False False False
3 False False False
That makes a little more sense. but still trying to get the row index of the True values. How can I do that?
We can select from the filter the DataFrame index where there are rows that have any True value in them:
idx = df.index[
df.apply(lambda x: x.str.match(ani_pat, flags=re.IGNORECASE)).any(axis=1)
]
idx:
Int64Index([1], dtype='int64')
any on axis 1 will take the boolean DataFrame and reduce it to a single dimension based on the contents of the rows.
Before any:
a b c
0 False False False
1 True False True
2 False False False
3 False False False
After any:
0 False
1 True
2 False
3 False
dtype: bool
We can then use these boolean values as a mask for index (selecting indexes which have a True value):
Int64Index([1], dtype='int64')
If needed we can use tolist to get a list instead:
idx = df.index[
df.apply(lambda x: x.str.match(ani_pat, flags=re.IGNORECASE)).any(axis=1)
].tolist()
idx:
[1]

Python: How to set n previous rows as True if row x is True in a DataFrame

My df (using pandas):
Value Class
1 False
5 False
7 False
2 False
4 False
3 True
2 False
If a row has Class as True, I want to set all n previous rows as true as well. Let's say n = 3, then the desired output is:
Value Class
1 False
5 False
7 True
2 True
4 True
3 True
2 False
I've looked up similar questions but they seem to focus on adding new columns. I would like to avoid that and just change the values of the existing one. My knowledge is pretty limited so I don't know how to tackle this.
Idea is replace False to missing values by Series.where and then use back filling function with limit parameter by Series.bfill, last replace missing values to False and convert values to boolean:
n = 3
df['Class'] = df['Class'].where(df['Class']).bfill(limit=n).fillna(0).astype(bool)
print (df)
Value Class
0 1 False
1 5 False
2 7 True
3 2 True
4 4 True
5 3 True
6 2 False

How to find the number of rows that has been updated in pandas

How can we find the number of rows that got updated in pandas.
New['Updated']= np.where((New.Class=='B')&(New.Flag=='Y'),'N',np.where((New.Class=='R')&(New.Flag=='N'),'Y',New.Flag))
data.Flag=data['Tracking_Nbr'].map(New.set_index('Tracking_Nbr').Updated)
You need store the Flag before the change , here I using Flag1
df2['Updated']=np.where((df2.Class=='B')&(df2.Flag=='Y'),'N',np.where((df2.Class=='R')&(df2.Flag=='N'),'Y',df2.Flag))
df1['Flag1']=df1['Flag']
df1.Flag=df1['Tracking_Nbr'].map(df2.set_index('Tracking_Nbr').Updated)
df1[df1['Flag1']!=df1['Flag']]
More information
df1['Flag1']!=df1['Flag']
Out[716]:
0 True
1 True
2 True
3 True
4 True
5 True
6 True
dtype: bool

pandas dataframe create a new column whose values are based on groupby sum on another column

I am trying to create a new column amount_0_flag for a df, the values in that column are based on groupby another column key, for which if amount sum is 0, assigned True to amount_0_flag, otherwise False. The df looks like,
key amount amount_0_flag negative_amount
1 1.0 True False
1 1.0 True True
2 2.0 False True
2 3.0 False False
2 4.0 False False
so when df.groupby('key'), cluster with key=1, will be assigned True to amount_0_flag for each element of the cluster, since within the cluster, one element has negative 1 and another element has postive 1 as their amounts.
df.groupby('key')['amount'].sum()
only gives the sum of amount for each cluster not considering values in negative_amount and I am wondering how to also find the cluster and its rows with 0 sum amounts consdering negative_amount values using pandas/numpy.
Let's try this where I created a 'new_column' showing the comparison to your 'amount_0_flag':
df['new_column'] = (df.assign(amount_n = df.amount * np.where(df.negative_amount,-1,1))
.groupby('key')['amount_n']
.transform(lambda x: sum(x)<=0))
Output:
key amount amount_0_flag negative_amount new_column
0 1 1.0 True False True
1 1 1.0 True True True
2 2 2.0 False True False
3 2 3.0 False False False
4 2 4.0 False False False

replace values in pandas based on other two column

I have problem with replacement values in a column conditional other two columns.
For example we have three columns. A, B, and C
Columns A and B are both booleans, containing True and False, and column C contains three values: "Payroll", "Social", and "Other".
When in columns A and B are True in column C we have value "Payroll".
I want to change values in column C where both column A and B are True.
I tried following code: but gives me this error "'NoneType' object has no attribute 'where'":
data1.replace({'C' : { 'Payroll', 'Social'}},inplace=True).where((data1['A'] == True) & (data1['B'] == True))
but gives me this error "'NoneType' object has no attribute 'where'":
What can be done to this problem?
I think you need all for check if all Trues per rows and then assign output by filtered DataFrame by boolean mask:
data1 = pd.DataFrame({
'C': ['Payroll','Other','Payroll','Social'],
'A': [True, True, True, False],
'B':[False, True, True, False]
})
print (data1)
A B C
0 True False Payroll
1 True True Other
2 True True Payroll
3 False False Social
m = data1[['A', 'B']].all(axis=1)
#same output as
#m = data1['A'] & data1['B']
print (m)
0 False
1 True
2 True
3 False
dtype: bool
print (data1[m])
A B C
1 True True Other
2 True True Payroll
data1[m] = data1[m].replace({'C' : { 'Payroll':'Social'}})
print (data1)
A B C
0 True False Payroll
1 True True Other
2 True True Social
3 False False Social
Well you can use apply function to do this
def change_value(dataframe):
for index, row in df.iterrows():
if row['A'] == row['B'] == True:
row['C'] = # Change to whatever value you want
else:
row ['C'] = # Change how ever you want

Resources