find index of row element in pandas

find index of row element in pandas - python-3.x

If you have a df:
apple banana carrot
a 1 2 3
b 2 3 1
c 0 0 1
To find the index for the columns where a cell is equal to 0 is df[df['apple']==0].index
but can you transpose this so you find the index of row c where it is 0?
Basically I need to drop the columns where c==0 and would like to do this in one line by row rather than by each column.

If want test row c and get all columns if 0:
c = df.columns[df.loc['c'] == 0]
print (c)
Index(['apple', 'banana'], dtype='object')
If want test all rows:
c1 = df.columns[df.eq(0).any()]
print (c1)
Index(['apple', 'banana'], dtype='object')
If need remove columns if 0 in any row:
df = df.loc[:, df.ne(0).all()]
print (df)
carrot
a 3
b 1
c 1
Detail/explanation:
First compare all values of DataFrame by ne (!=):
print (df.ne(0))
apple banana carrot
a True True True
b True True True
c False False True
Then get all rows if all True rows:
print (df.ne(0).all())
apple False
banana False
carrot True
dtype: bool
Last filter by DataFrame.loc:
print (df.loc[:, df.loc['c'].ne(0)])
carrot
a 3
b 1
c 1
If need test only c row solution is similar, only first select c row by loc and omit all:
df = df.loc[:, df.loc['c'].ne(0)]

Yes you can, df.T[df.T['c']==0]

Related

Given a column value, check if another column value is present in preceding or next 'n' rows in a Pandas data frame

I have the following data
jsonDict = {'Fruit': ['apple', 'orange', 'apple', 'banana', 'orange', 'apple','banana'], 'price': [1, 2, 1, 3, 2, 1, 3]}
Fruit price
0 apple 1
1 orange 2
2 apple 1
3 banana 3
4 orange 2
5 apple 1
6 banana 3
What I want to do is check if Fruit == banana and if yes, I want the code to scan the preceding as well as the next n rows from the index position of the 'banana' row, for an instance where Fruit == apple. An example of the expected output is shown below taking n=2.
Fruit price
2 apple 1
5 apple 1
I have tried doing
position = df[df['Fruit'] == 'banana'].index
resultdf= df.loc[((df.index).isin(position)) & (((df['Fruit'].index+2).isin(['apple']))|((df['Fruit'].index-2).isin(['apple'])))]
# Output is an empty dataframe
Empty DataFrame
Columns: [Fruit, price]
Index: []
Preference will be given to vectorized approaches.

IIUC, you can use 2 masks and boolean indexing:
# df = pd.DataFrame(jsonDict)
n = 2
m1 = df['Fruit'].eq('banana')
# is the row ±n of a banana?
m2 = m1.rolling(2*n+1, min_periods=1, center=True).max().eq(1)
# is the row an apple?
m3 = df['Fruit'].eq('apple')
out = df[m2&m3]
output:
Fruit price
2 apple 1
5 apple 1

Collapse values from multiple rows of a column into an array when all other columns values are same

I have a table with 7 columns where for every few rows, 6 columns remain same and only the 7th changes. I would like to merge all these rows into one row, and combine the value of the 7th column into a list.
So if I have this dataframe:
A B C
0 a 1 2
1 b 3 4
2 c 5 6
3 c 7 6
I would like to convert it to this:
A B C
0 a 1 2
1 b 3 4
2 c [5, 7] 6
Since the values of column A and C were same in row 2 and 3, they would get collapsed into a single row and the values of B will be combined into a list.
Melt, explode, and pivot don't seem to have such functionality. How can achieve this using Pandas?

Use GroupBy.agg with custom lambda function, last add DataFrame.reindex for same order of columns by original:
f = lambda x: x.tolist() if len(x) > 1 else x
df = df.groupby(['A','C'])['B'].agg(f).reset_index().reindex(df.columns, axis=1)
You can also create columns names dynamic like:
changes = ['B']
cols = df.columns.difference(changes).tolist()
f = lambda x: x.tolist() if len(x) > 1 else x
df = df.groupby(cols)[changes].agg(f).reset_index().reindex(df.columns, axis=1)
print (df)
A B C
0 a 1 2
1 b 3 4
2 c [5, 7] 6
For all lists in column solution is simplier:
changes = ['B']
cols = df.columns.difference(changes).tolist()
df = df.groupby(cols)[changes].agg(list).reset_index().reindex(df.columns, axis=1)
print (df)
A B C
0 a [1] 2
1 b [3] 4
2 c [5, 7] 6

Here is another approach using pivot_table and applymap:
(df.pivot_table(index='A',aggfunc=list).applymap(lambda x: x[0] if len(set(x))==1 else x)
.reset_index())
A B C
0 a 1 2
1 b 3 4
2 c [5, 7] 6

Drop by multiple columns groups if specific values not exit in another column in Pandas

How can I drop the whole group by city and district if date's value of 2018/11/1 not exits in the following dataframe:
city district date value
0 a c 2018/9/1 12
1 a c 2018/10/1 4
2 a c 2018/11/1 5
3 b d 2018/9/1 3
4 b d 2018/10/1 7
The expected result will like this:
city district date value
0 a c 2018/9/1 12
1 a c 2018/10/1 4
2 a c 2018/11/1 5
Thank you!

Create helper column by DataFrame.assign, compare by datetime and test if at least one true per groups with GroupBy.any and GroupBy.transform for possible filter by boolean indexing:
mask = (df.assign(new=df['date'].eq('2018/11/1'))
.groupby(['city','district'])['new'].transform('any'))
df = df[mask]
print (df)
city district date value
0 a c 2018/9/1 12
1 a c 2018/10/1 4
2 a c 2018/11/1 5
If error with misisng values in mask one possivle idea is replace misisng values in columns used for groups:
mask = (df.assign(new=df['date'].eq('2018/11/1'),
city= df['city'].fillna(-1),
district= df['district'].fillna(-1))
.groupby(['city','district'])['new'].transform('any'))
df = df[mask]
print (df)
city district date value
0 a c 2018/9/1 12
1 a c 2018/10/1 4
2 a c 2018/11/1 5
Another idea is add possible misisng index values by reindex and also replace missing values to False:
mask = (df.assign(new=df['date'].eq('2018/11/1'))
.groupby(['city','district'])['new'].transform('any'))
df = df[mask.reindex(df.index, fill_value=False).fillna(False)]
print (df)
city district date value
0 a c 2018/9/1 12
1 a c 2018/10/1 4
2 a c 2018/11/1 5

There's a special GroupBy.filter() method for this. Assuming date is already datetime:
filter_date = pd.Timestamp('2018-11-01').date()
df = df.groupby(['city', 'district']).filter(lambda x: (x['date'].dt.date == filter_date).any())

How to split record from a DataFrame cross pairs in pandas?

I hava a dataframe like this :
a b c
0 A B 1
4 B A 1
1 C D -1
3 D C 3
2 E F 3
The '0' row and '4'row are a pair, I will remove one row by the value of 'c' columns. According to 'c' columns, I decide to remove which one or remove all of them. If mirror pair have same value in c column, I will remove one row, or I will remove all of them.
a b c
0 A B 1
2 E F 3
I use while, but my data set is huge. Have any good ideas ?

IIUC using np.sort with duplicated
df1=df.loc[~pd.DataFrame(np.sort(df[['a','b']].values,axis=1)).duplicated().values]
a b c
0 A B 1
1 C D -1
2 E F 3

You may use agg with frozenset and duplicated and slicing
s = df[['a', 'b']].agg(frozenset, axis=1)
m = ~s.duplicated(keep=False) | (s.duplicated(keep=False) & df.c.duplicated())
df.loc[m]
Out[165]:
a b c
4 B A 1
2 E F 3

first select the non-duplicated rows using np.sort and Series.duplicated (see m1 detail)
Then you can use DataFrame.groupby
and group according to columns a, b (see detail g). Then perform a Boolean indexing using Groupby.transform to eliminate duplicates when c does not match.:
df2=df.reset_index(drop=True)
m1=~pd.DataFrame(np.sort(df2[['a','b']])).duplicated()
g=m1.cumsum()
m2=~df2.groupby(g,sort=False)['c'].transform(lambda x: (x.nunique()==len(x))&(len(x)>1))
mask=m1&m2
print(mask)
0 True
1 False
2 False
3 False
4 True
dtype: bool
df_filtered=df2[mask]
print(df_filtered)
a b c
0 A B 1
4 E F 3
Details:
m1
0 True
1 False
2 True
3 False
4 True
dtype: bool
m2
0 True
1 True
2 False
3 False
4 True
dtype: bool
g
0 1
1 1
2 2
3 2
4 3
dtype: int64

Getting all rows where for column 'C' the entry is larger than the preceding element in column 'C'

How can I select all rows of a data frame where a condition is met according to a column, which has to do with the relationship between every 2 entries of that column. To give the specific example, lets say I have a DataFrame:
>>>df = pd.DataFrame({'A': [ 1, 2, 3, 4],
'B':['spam', 'ham', 'egg', 'foo'],
'C':[4, 5, 3, 4]})
>>> df
A B C
0 1 spam 4
1 2 ham 5
2 3 egg 3
3 4 foo 4
>>>df2 = df[ return every row of df where C[i] > C[i-1] ]
>>> df2
A B C
1 2 ham 5
3 4 foo 4
There is plenty of great information about slicing and indexing in the pandas docs and here, but this is a bit more complicated, I think. I could also be going about it wrong. What I'm looking for is the rows of data where the value stored in C is no longer monotonously declining.
Any help is appreciated!

Use boolean indexing with compare by shifted column values:
print (df[df['C'] > df['C'].shift()])
A B C
1 2 ham 5
3 4 foo 4
Detail:
print (df['C'] > df['C'].shift())
0 False
1 True
2 False
3 True
Name: C, dtype: bool
If want all monotonously declining rows compare diff of column:
print (df[df['C'].diff() > 0])
A B C
1 2 ham 5
3 4 foo 4

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

find index of row element in pandas - python-3.x

Yes you can, df.T[df.T['c']==0]

Related

Given a column value, check if another column value is present in preceding or next 'n' rows in a Pandas data frame

Collapse values from multiple rows of a column into an array when all other columns values are same

Drop by multiple columns groups if specific values not exit in another column in Pandas

How to split record from a DataFrame cross pairs in pandas?

Getting all rows where for column 'C' the entry is larger than the preceding element in column 'C'

Categories

Resources