Checking for specific value change between columns in pandas

Checking for specific value change between columns in pandas - python-3.x

I've got 4 columns with numeric values between 1 and 4, and I'm trying to see which rows change from a value of 1 to a value of 4 progressing from column a to column d within those 4 columns. Currently I'm pulling the difference between each of the columns and looking for a value of 3. Is there a better way to do this?
Here's what I'm looking for (with 0's in place of nan):
ID a b c d check
1 1 0 1 4 True
2 1 0 1 1 False
3 1 1 1 4 True
4 1 3 3 4 True
5 0 0 1 4 True
6 1 2 3 3 False
7 1 0 0 4 True
8 1 4 4 4 True
9 1 4 3 4 True
10 1 4 1 1 True

You can just do cummax
col = ['a','b','c','d']
s = df[col].cummax(1)
df['new'] = s[col[:3]].eq(1).any(1) & s[col[-1]].eq(4)
Out[523]:
0 True
1 False
2 True
3 True
4 True
5 False
6 True
7 True
8 True
dtype: bool

You can try compare the index of 4 and 1 in apply
cols = ['a', 'b', 'c', 'd']
def get_index(lst, num):
return lst.index(num) if num in lst else -1
df['Check'] = df[cols].apply(lambda row: get_index(row.tolist(), 4) > get_index(row.tolist(), 1), axis=1)
print(df)
ID a b c d check Check
0 1 1 0 1 4 True True
1 2 1 0 1 1 False False
2 3 1 1 1 4 True True
3 4 1 3 3 4 True True
4 5 0 0 1 4 True True
5 6 1 2 3 3 False False
6 7 1 0 0 4 True True
7 8 1 4 4 4 True True
8 9 1 4 3 4 True True

Related

what is the good way to add 1 in column values if value greater than 2 python

I want to add 1 in column values if column value is greater than 2
here is my dataframe
df=pd.DataFrame({'A':[1,1,1,1,1,1,3,2,2,2,2,2,2],'flag':[1,1,0,1,1,1,5,1,1,0,1,1,1]})
df_out
df=pd.DataFrame({'A':[1,1,1,1,1,1,3,2,2,2,2,2,2],'flag':[1,1,0,1,1,1,6,1,1,0,1,1,1]})

Use DataFrame.loc with add 1:
df.loc[df.A.gt(2), 'flag'] += 1
print (df)
A flag
0 1 1
1 1 1
2 1 0
3 1 1
4 1 1
5 1 1
6 3 6
7 2 1
8 2 1
9 2 0
10 2 1
11 2 1
12 2 1
Or:
df['flag'] = np.where(df.A.gt(2), df['flag'] + 1, df['flag'])
EDIT:
mean = df.groupby(pd.cut(df['x'], bins))['y'].transform('mean')
df['flag'] = np.where(mean.gt(2), df['y'] + 1, df['y'])
And then:
x= df.groupby(pd.cut(df['x'], bins))['y'].apply(lambda x:abs(x-np.mean(x)))

Remove rows from Dataframe where row above or below has same value in a specific column

Starting Dataframe:
A B
0 1 1
1 1 2
2 2 3
3 3 4
4 3 5
5 1 6
6 1 7
7 1 8
8 2 9
Desired result - eg. Remove rows where column A has values that match the row above or below:
A B
0 1 1
2 2 3
3 3 4
5 1 6
8 2 9

You can use boolean indexing, the following condition will return true if value of A is NOT equal to value of A's next row
new_df = df[df['A'].ne(df['A'].shift())]
A B
0 1 1
2 2 3
3 3 4
5 1 6
8 2 9

pandas dataframe groups check the number of unique values of a column is one but exclude empty strings

I have the following df,
id invoice_no
1 6636
1 6637
2 6639
2 6639
3
3
4 6635
4 6635
4 6635
the invoice_no for id 3 are all empty strings or spaces; I want to
df['same_invoice_no'] = df.groupby("id")["invoice_no"].transform('nunique') == 1
but also consider spaces and empty string invoice_no in each group as same_invoice_no = False; I am wondering how to do that. The result will look like,
id invoice_no same_invoice_no
1 6636 False
1 6637 False
2 6639 True
2 6639 True
3 False
3 False
4 6635 True
4 6635 True
4 6635 True

Empty strings equate to True but NaNs don't. Replace empty strings by Numpy nan
df.replace('', np.nan, inplace = True)
df['same_invoice_no'] = df.groupby("id")["invoice_no"].transform('nunique') == 1
id invoice_no same_invoice_no
0 1 6636.0 False
1 1 6637.0 False
2 2 6639.0 True
3 2 6639.0 True
4 3 NaN False
5 3 NaN False
6 4 6635.0 True
7 4 6635.0 True
8 4 6635.0 True

Select from a DataFrame based on several levels of the MultiIndex

How to extend the logic of selecting from a DataFrame based on the first N-1 levels when N > 2?
As an example, consider a DataFrame:
midx = pd.MultiIndex.from_product([[0, 1], [10, 20, 30], ["a", "b"]])
df = pd.DataFrame(1, columns=midx, index=np.arange(3))
In[11]: df
Out[11]:
0 1
10 20 30 10 20 30
a b a b a b a b a b a b
0 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1
2 1 1 1 1 1 1 1 1 1 1 1 1
Here, it is easy to select columns where 0 or 1 are in the first level:
df[[0, 1]]
But the same logic does not extend to selecting columns with 0 or 1 in the first and 10 or 20 in the second level:
In[13]: df[[(0, 10), (0, 20), (1, 10), (1, 20)]]
ValueError: operands could not be broadcast together with shapes (4,2) (3,) (4,2)
The following works:
df.loc[:, pd.IndexSlice[[0, 1], [10, 20], :]]
but is cumbersome, especially when the selector needs to be extracted from another DataFrame with a 2-level MultiIndex:
idx = df.columns.droplevel(2)
In[16]: idx
Out[16]:
MultiIndex(levels=[[0, 1], [10, 20, 30]],
labels=[[0, 0, 0, 0, 0, 0, 1, 1, 1, ... 1, 2, 2]])
In[17]: df[idx]
ValueError: operands could not be broadcast together with shapes (12,2) (3,) (12,2)
EDIT: Ideally, I would also like to be able to order columns this way, not just select them — again, in the spirit of df[[1, 0]] being able to order columns based on the first level.

If possible, you can filter by boolean indexing with get_level_values and isin:
m1 = df.columns.get_level_values(0).isin([0,1])
m2 = df.columns.get_level_values(1).isin([10,20])
print (m1)
[ True True True True True True True True True True True True]
print (m2)
[ True True True True False False True True True True False False]
print (m1 & m2)
[ True True True True False False True True True True False False]
df1 = df.loc[:, m1 & m2]
print (df1)
0 1
10 20 10 20
a b a b a b a b
0 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1
2 1 1 1 1 1 1 1 1
df.columns = df.columns.droplevel(2)
print (df)
0 1
10 10 20 20 30 30 10 10 20 20 30 30
0 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1
2 1 1 1 1 1 1 1 1 1 1 1 1
df2 = df.loc[:, m1 & m2]
print (df2)
0 1
10 10 20 20 10 10 20 20
0 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1
2 1 1 1 1 1 1 1 1

How to delete the entire row if any of its value is 0 in pandas

In the below example I only want to retain the row 1 and 2
I want to delete all the rows which has 0 anywhere across the column:
kt b tt mky depth
1 1 1 1 1 4
2 2 2 2 2 2
3 3 3 0 3 3
4 0 4 0 0 0
5 5 5 5 5 0
the output should read like below:
kt b tt mky depth
1 1 1 1 1 4
2 2 2 2 2 2
I have tried:
df.loc[(df!=0).any(axis=1)]
But it deletes the row only if all of its corresponding columns are 0

You are really close, need DataFrame.all for check all Trues per row:
df = df.loc[(df!=0).all(axis=1)]
print (df)
kt b tt mky depth
1 1 1 1 1 4
2 2 2 2 2 2
Details:
print (df!=0)
kt b tt mky depth
1 True True True True True
2 True True True True True
3 True True False True True
4 False True False False False
5 True True True True False
print ((df!=0).all(axis=1))
1 True
2 True
3 False
4 False
5 False
dtype: bool
Alternative solution with any for check at least one True for row with changed mask df == 0 and inversing by ~:
df = df.loc[~(df==0).any(axis=1)]

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Checking for specific value change between columns in pandas - python-3.x

You can just do cummax col = ['a','b','c','d'] s = df[col].cummax(1) df['new'] = s[col[:3]].eq(1).any(1) & s[col[-1]].eq(4) Out[523]: 0 True 1 False 2 True 3 True 4 True 5 False 6 True 7 True 8 True dtype: bool

Related

what is the good way to add 1 in column values if value greater than 2 python

Remove rows from Dataframe where row above or below has same value in a specific column

pandas dataframe groups check the number of unique values of a column is one but exclude empty strings

Select from a DataFrame based on several levels of the MultiIndex

How to delete the entire row if any of its value is 0 in pandas

Categories

Resources