Python: How to set n previous rows as True if row x is True in a DataFrame - python-3.x

My df (using pandas):
Value Class
1 False
5 False
7 False
2 False
4 False
3 True
2 False
If a row has Class as True, I want to set all n previous rows as true as well. Let's say n = 3, then the desired output is:
Value Class
1 False
5 False
7 True
2 True
4 True
3 True
2 False
I've looked up similar questions but they seem to focus on adding new columns. I would like to avoid that and just change the values of the existing one. My knowledge is pretty limited so I don't know how to tackle this.

Idea is replace False to missing values by Series.where and then use back filling function with limit parameter by Series.bfill, last replace missing values to False and convert values to boolean:
n = 3
df['Class'] = df['Class'].where(df['Class']).bfill(limit=n).fillna(0).astype(bool)
print (df)
Value Class
0 1 False
1 5 False
2 7 True
3 2 True
4 4 True
5 3 True
6 2 False

Related

Excel Formula based on previous rows

There are 3 columns:
Date, Name, Bonus_Point?
If a player scores a 4 or lower in the Name Column for three consecutive Dates, then Bonus_Point will return a 'Yes' or 'No'
For example, for 1/30/22, there would be a 'Yes' because there were 3 previous instances (including 1/30/22) where the score is less than or equal to 4.
But for 2/2/22, Bonus_Point? would be 'No' because on the third day, Name scored a 5.
Assuming your columns are A through C, and the row 1 is the header row and your data is in rows 2 and down, enter this formula in C4:
=AND(B2<=4,B3<=4,B4<=4)
Then fill down. (See further down for "yes" and "no")
Date
Name
Bonus_Point?
1/28/22
3
1/29/22
3
1/30/22
3
TRUE
1/31/22
3
TRUE
2/1/22
4
TRUE
2/2/22
5
FALSE
2/3/22
2
FALSE
2/4/22
5
FALSE
2/5/22
4
FALSE
2/6/22
3
FALSE
2/7/22
2
TRUE
2/8/22
3
TRUE
2/9/22
4
TRUE
2/10/22
3
TRUE
2/11/22
2
TRUE
2/12/22
2
TRUE
3/13/22
3
TRUE
If you want "Yes" and "No", you can do that through formatting or add it to the formula:
=IF(AND(B2<=4,B3<=4,B4<=4),"Yes","No")

How to find the number of rows that has been updated in pandas

How can we find the number of rows that got updated in pandas.
New['Updated']= np.where((New.Class=='B')&(New.Flag=='Y'),'N',np.where((New.Class=='R')&(New.Flag=='N'),'Y',New.Flag))
data.Flag=data['Tracking_Nbr'].map(New.set_index('Tracking_Nbr').Updated)
You need store the Flag before the change , here I using Flag1
df2['Updated']=np.where((df2.Class=='B')&(df2.Flag=='Y'),'N',np.where((df2.Class=='R')&(df2.Flag=='N'),'Y',df2.Flag))
df1['Flag1']=df1['Flag']
df1.Flag=df1['Tracking_Nbr'].map(df2.set_index('Tracking_Nbr').Updated)
df1[df1['Flag1']!=df1['Flag']]
More information
df1['Flag1']!=df1['Flag']
Out[716]:
0 True
1 True
2 True
3 True
4 True
5 True
6 True
dtype: bool

pandas dataframe create a new column whose values are based on groupby sum on another column

I am trying to create a new column amount_0_flag for a df, the values in that column are based on groupby another column key, for which if amount sum is 0, assigned True to amount_0_flag, otherwise False. The df looks like,
key amount amount_0_flag negative_amount
1 1.0 True False
1 1.0 True True
2 2.0 False True
2 3.0 False False
2 4.0 False False
so when df.groupby('key'), cluster with key=1, will be assigned True to amount_0_flag for each element of the cluster, since within the cluster, one element has negative 1 and another element has postive 1 as their amounts.
df.groupby('key')['amount'].sum()
only gives the sum of amount for each cluster not considering values in negative_amount and I am wondering how to also find the cluster and its rows with 0 sum amounts consdering negative_amount values using pandas/numpy.
Let's try this where I created a 'new_column' showing the comparison to your 'amount_0_flag':
df['new_column'] = (df.assign(amount_n = df.amount * np.where(df.negative_amount,-1,1))
.groupby('key')['amount_n']
.transform(lambda x: sum(x)<=0))
Output:
key amount amount_0_flag negative_amount new_column
0 1 1.0 True False True
1 1 1.0 True True True
2 2 2.0 False True False
3 2 3.0 False False False
4 2 4.0 False False False

replace values in pandas based on other two column

I have problem with replacement values in a column conditional other two columns.
For example we have three columns. A, B, and C
Columns A and B are both booleans, containing True and False, and column C contains three values: "Payroll", "Social", and "Other".
When in columns A and B are True in column C we have value "Payroll".
I want to change values in column C where both column A and B are True.
I tried following code: but gives me this error "'NoneType' object has no attribute 'where'":
data1.replace({'C' : { 'Payroll', 'Social'}},inplace=True).where((data1['A'] == True) & (data1['B'] == True))
but gives me this error "'NoneType' object has no attribute 'where'":
What can be done to this problem?
I think you need all for check if all Trues per rows and then assign output by filtered DataFrame by boolean mask:
data1 = pd.DataFrame({
'C': ['Payroll','Other','Payroll','Social'],
'A': [True, True, True, False],
'B':[False, True, True, False]
})
print (data1)
A B C
0 True False Payroll
1 True True Other
2 True True Payroll
3 False False Social
m = data1[['A', 'B']].all(axis=1)
#same output as
#m = data1['A'] & data1['B']
print (m)
0 False
1 True
2 True
3 False
dtype: bool
print (data1[m])
A B C
1 True True Other
2 True True Payroll
data1[m] = data1[m].replace({'C' : { 'Payroll':'Social'}})
print (data1)
A B C
0 True False Payroll
1 True True Other
2 True True Social
3 False False Social
Well you can use apply function to do this
def change_value(dataframe):
for index, row in df.iterrows():
if row['A'] == row['B'] == True:
row['C'] = # Change to whatever value you want
else:
row ['C'] = # Change how ever you want

How to check if pandas dataframe rows have certain values in various columns, scalability

I have implemented the CN2 classification algorithm, it induces rules to classify the data of the form:
IF Attribute1 = a AND Attribute4 = b THEN class = class 1
My current implementation loops through a pandas DataFrame containing the training data using the iterrows() function and returns True or False for each row if it satisfies the rule or not, however, I am aware this is a highly inefficient solution. I would like to vectorise the code, my current attempt is like so:
DataFrame = df
age prescription astigmatism tear rate
1 1 2 1
2 2 1 1
2 1 1 2
rule = {'age':[1],'prescription':[1],'astigmatism':[1,2],'tear rate':[1,2]}
df.isin(rule)
This produces:
age prescription astigmatism tear rate
True True True True
False False True True
False True True True
I have coded the rule to be a dictionary which contains a single value for target attributes and the set of all possible values for non-target attributes.
The result I would like is a single True or False for each row if the conditions of the rule are met or not and the index of the rows which evaluate to all True. Currently I can only get a DataFrame with a T/F for each value. To be concrete, in the example i have shown, I wish the result to be the index of the first row which is the only row which satisfies the rule.
I think you need check if at least one value per row is True use DataFrame.any:
mask = df.isin(rule).any(axis=1)
print (mask)
0 True
1 True
2 True
dtype: bool
Or for check if all values are Trues use DataFrame.all:
mask = df.isin(rule).all(axis=1)
print (mask)
0 True
1 False
2 False
dtype: bool
For filtering is possible use boolean indexing:
df = df[mask]

Resources