Check if any row has the same values as a numpy array - python-3.x

I am working with a pandas.Dataframe that looks as follows:
A B C D
index
1 0 0 0 1
2 1 0 0 1
3 ...
4 ...
...
And I am creating a numpy.arrays that have the same shape as a row within this dataframe. I want to check if the array I am creating 'is present' within the dataframe.
In this case, for example, my array would look like this, if it is in the dataframe:
a= [0,0,0,1]
It is not if it looks like this:
b = [1,1,1,1]
Any help, even if it is a link to the right answer, is much appreciated as I have looked through stackoverflow and fortunately I did not miss anything.

df = pd.DataFrame({'A':[0, 1, 0, 0],
'B':[0, 0, 1, 1],
'C':[0, 0, 0, 0],
'D':[1, 1, 0, 1]})
# A B C D
# 0 0 0 0 1
# 1 1 0 0 1
# 2 0 1 0 0
# 3 0 1 0 1
>>> a = [0, 0, 0, 1]
>>> (df == a).all(axis=1).any()
True
>>> b = [1, 1, 1, 1]
>>> (df == b).all(axis=1).any()
False

Related

Flag if there was an occurrence of a value in another column of pandas dataframe?

I have the following dataframe:
import pandas as pd
df = pd.DataFrame({'a': ['x', 'x', 'y','w', 'x', 'z', 'z', 'y', 'w'],
'Flag': [1, 0, 0, 0, 1, 0, 0, 0, 1]})
I want to add a column b that will flag if any entry of a has a flag of 1 or not:
a Flag b
x 1 1
x 0 1
y 0 0
w 0 1
x 1 1
z 0 0
z 0 0
y 0 0
w 1 1
What I did is: groupby a, cumsum Flag, every entry that > 0 will get 1, 0 otherwise.
Is there any simpler method or function to do this?
You could do it with isin and .astype(int):
df['b'] = df['a'].isin(df.loc[df['Flag'].eq(1), 'a']).astype(int)
>>> df
a Flag b
0 x 1 1
1 x 0 1
2 y 0 0
3 w 0 1
4 x 1 1
5 z 0 0
6 z 0 0
7 y 0 0
8 w 1 1
>>>
Or for other situations, you might need np.where:
df['b'] = np.where(df['a'].isin(df.loc[df['Flag'].eq(1), 'a']), 1, 0)

Pandas Multiple condition on column

Column A Column B
1 10
1 10
0 10
0 10
3 20
I have to check if column A is 0 and column B is 10 then change column A to 1
df.loc[(df['Column A']==0) & (df['Column B'] == 10) ,df['Column A']] = 1
But i am getting an error
Final df should look something like this None of [Int64Index([1, 1, 0, 0, 0, 1, 1, 0, 1, 1,\n ...\n 1, 0, 0, 0, 0, 2, 1, 0, 1, 1],\n dtype='int64', length=2715)] are in the [columns]"
I think my solution is not correct. All the indexes are correct
Column A Column B
1 10
1 10
1 10
1 10
3 20

Diagonal Dataframe to 1 row

I need to convert a diagonal Dataframe to 1 row Dataframe.
Input:
df = pd.DataFrame([[7, 0, 0, 0],
[0, 2, 0, 0],
[0, 0, 3, 0],
[0, 0, 0, 8],],
columns=list('ABCD'))
A B C D
0 7 0 0 0
1 0 2 0 0
2 0 0 3 0
3 0 0 0 8
Expected output:
A B C D
0 7 2 3 8
what i tried so far to do this:
df1 = df.sum().to_frame().transpose()
df1
A B C D
0 7 2 3 8
It does the job. But is there any elegant way to do this by groupby or some other pandas builtin?
Not sure if there is any other 'elegant' way, I can only propose alternatives:
Use numpy.diagonal
pd.DataFrame([df.to_numpy().diagonal()], columns=df.columns)
A B C D
0 7 2 3 8
Use groupby with boolean (not sure if this is better than your solution):
df.groupby([True] * len(df), as_index=False).sum()
A B C D
0 7 2 3 8
You can use: np.diagonal(df):
pd.DataFrame(np.diagonal(df), df.columns).T
A B C D
0 7 2 3 8

vectorize groupby pandas

I have a dataframe like this:
day time category count
1 1 a 13
1 2 a 47
1 3 a 1
1 5 a 2
1 6 a 4
2 7 a 14
2 2 a 10
2 1 a 9
2 4 a 2
2 6 a 1
I want to group by day, and category and get a vector of the counts per time. Where time can be between 1 and 10. The max and min of time I have defined in two variables called max and min.
This is how I want the resulting dataframe to look:
day category count
1 a [13,47,1,0,2,4,0,0,0,0]
2 a [9,10,0,2,0,1,14,0,0,0]
Does anyone know how to make this aggregation into a vaector?
Use reindex with MultiIndex.from_product for append missing categories and then groupby with list:
df = df.set_index(['day','time', 'category'])
a = df.index.levels[0]
b = range(1,11)
c = df.index.levels[2]
df = df.reindex(pd.MultiIndex.from_product([a,b,c], names=df.index.names), fill_value=0)
df = df.groupby(['day','category'])['count'].apply(list).reset_index()
print (df)
day category count
0 1 a [13, 47, 1, 0, 2, 4, 0, 0, 0, 0]
1 2 a [9, 10, 0, 2, 0, 1, 14, 0, 0, 0]
EDIT:
df = (df.set_index(['day','time', 'category'])['count']
.unstack(1, fill_value=0)
.reindex(columns=range(1,11), fill_value=0))
print (df)
time 1 2 3 4 5 6 7 8 9 10
day category
1 a 13 47 1 0 2 4 0 0 0 0
2 a 9 10 0 2 0 1 14 0 0 0
df = df.apply(list, 1).reset_index(name='count')
print (df)
day ... count
0 1 ... [13, 47, 1, 0, 2, 4, 0, 0, 0, 0]
1 2 ... [9, 10, 0, 2, 0, 1, 14, 0, 0, 0]
[2 rows x 3 columns]

Remove all data in a DF by group based on a condition (pandas,python3)

I have a pandas DF like this:
User Enrolled Time
1 0 12
1 0 1
1 1 2
1 1 3
2 1 3
2 0 4
2 1 1
3 0 2
3 0 3
3 1 4
4 0 1
I want to remove all rows of a users information after they have enrolled. Each users chance to enroll is timed in order. Expected output to look like this:
User Enrolled Time
1 0 12
1 0 1
1 1 2
2 1 3
3 0 2
3 0 3
3 1 4
Hoping someone could help me!
EDIT: Example based on comment for correct answer:
User Enrolled Time
4 0 1
4 0 2
4 0 3
5 0 1
I think what you're looking for is a groupby followed by an apply which does the correct logic for each user. For example:
df = pd.DataFrame([[ 1, 0, 12],
[ 1, 0, 1],
[ 1, 1, 2],
[ 1, 1, 3],
[ 2, 1, 3],
[ 2, 0, 4],
[ 2, 1, 1],
[ 3, 0, 2],
[ 3, 0, 3],
[ 3, 1, 4]],
columns=['User', 'Enrolled', 'Time'])
def filter_enrollment(df):
enrolled = df[df.Enrolled == 1].index.min()
return df[df.index <= enrolled]
result = df.groupby('User').apply(filter_enrollment).reset_index(drop=True)
The result is:
>>> print(result)
User Enrolled Time
0 1 0 12
1 1 0 1
2 1 1 2
3 2 1 3
4 3 0 2
5 3 0 3
6 3 1 4
Here I'm assuming your rows are in order of time. If you want to expliticly filter by the time column instead just change index to Time in the filter function.
Edit: to get the answer of the edited question, you can change the filter function to something like this:
def filter_enrollment(df):
enrolled = df[df.Enrolled == 1].index.min()
if pd.isnull(enrolled):
return df
else:
return df[df.index <= enrolled]

Resources