Count frequency of certain number from a single row including all columns in dataframe - python-3.x

Dataframe looks like :
id a b c d e f
1 0 -1 -1 1 0 -1
2 -1 1 0 -1 -1 -1
If I pass 1 as id, I should be able to calculate the occurences of -1 in that row i.e. the count should be 3.

Try this:
print (((df.loc[1] == -1).sum()) * -1)

Related

pandas - show column name + sum in which the sum is higher than zero

I read my dataframe in with:
dataframe = pd.read_csv("testFile.txt", sep = "\t", index_col= 0)
I got a dataframe like this:
cell 17472131 17472132 17472133 17472134 17472135 17472136
cell_0 1 0 1 0 1 0
cell_1 0 0 0 0 1 0
cell_2 0 1 1 1 0 0
cell_3 1 0 0 0 1 0
with pandas I would like to get all the column names in which the sum of the column is > 1 and the total sum.
So I would like:
17472131 2
17472133 2
17472135 3
I figured out how to get the sums of each column with
dataframe.sum(axis=0)
but this also returns the columns with a sum lower than 2.. is there a way to only show the columns with a higher value than i.e. 1?
One pretty neat way is to use lambda function in loc:
df.set_index('cell').sum().loc[lambda x: x>1]
Output:
17472131 2
17472133 2
17472135 3
dtype: int64
Details: df.sum returns a pd.Series and we can use lambda x: x>1 to produce as boolean series which loc use boolean indexing to select only True parts of the pd.Series.

groupby and trim some rows based on condition

I have a data frame something like this:
df = pd.DataFrame({"ID":[1,1,2,2,2,3,3,3,3,3],
"IF_car":[1,0,0,1,0,0,0,1,0,1],
"IF_car_history":[0,0,0,1,0,0,0,1,0,1],
"observation":[0,0,0,1,0,0,0,2,0,3]})
I want output where I can trim rows in groupby with ID and condition on "IF_car_history" == 1
tried_df = df.groupby(['ID']).apply(lambda x: x.loc[:(x['IF_car_history'] == '1').idxmax(),:]).reset_index(drop = True)
I want to drop rows in a groupby by after i get ['IF_car_history'] == '1'
expected output:
Thanks
First compare values for mask m by Series.eq and then use GroupBy.cumsum, and for values before 1 compare by 0, last filter by boolean indexing, but because id necesary remove after last 1 is used swapped values by slicing with [::-1].
m = df['IF_car_history'].eq(1).iloc[::-1]
df1 = df[m.groupby(df['ID']).cumsum().ne(0).iloc[::-1]]
print (df1)
ID IF_car IF_car_history observation
2 2 0 0 0
3 2 1 1 1
5 3 0 0 0
6 3 0 0 0
7 3 1 1 2
8 3 0 0 0
9 3 1 1 3

Dataset with maximal rows by userId indicated

I have a dataframe like this:
ID date var1 var2 var3
AB 22/03/2020 0 1 3
AB 29/03/2020 0 3 3
CD 22/03/2020 0 1 1
And I would like to have a new dataset that, if it is a maximal column (can happen ties too) leaves the same number of the original dataset on the rows; otherwise set -1 if it is not the maximal. So it would be:
ID date var1 var2 var3
AB 22/03/2020 -1 -1 3
AB 29/03/2020 -1 3 3
CD 22/03/2020 -1 1 1
But I am not managing at all how to do this. What can I try next?
Select only numeric columns by DataFrame.select_dtypes:
df1 = df.select_dtypes(np.number)
Or select all columns without first two by positions by DataFrame.iloc:
df1 = df.iloc[:, 2:]
Or select columns with var label by DataFrame.filter:
df1 = df1.filter(like='var')
And then set new values by DataFrame.where with max:
df[df1.columns] = df1.where(df1.eq(df1.max(1), axis=0), -1)
print (df)
ID date var1 var2 var3
0 AB 22/03/2020 -1 -1 3
1 AB 29/03/2020 -1 3 3
2 CD 22/03/2020 -1 1 1
IIUC use where and date back
s=df.loc[:,'var1':]
df.update(s.where(s.eq(s.max(1),axis=0),-1))
df
ID date var1 var2 var3
0 AB 22/03/2020 -1 -1 3
1 AB 29/03/2020 -1 3 3
2 CD 22/03/2020 -1 1 1

How to replace the values of 1's and 0's of various column into a single column of a data frame?

The 0's and 1's need to be transposed to there appropriate headers in python.
How can I achieve this and get the column final_list?
If there is always only one 1 per rows use DataFrame.dot:
df = pd.DataFrame({'a':[0,1,0],
'b':[1,0,0],
'c':[0,0,1]})
df['Final'] = df.dot(df.columns)
print (df)
a b c Final
0 0 1 0 b
1 1 0 0 a
2 0 0 1 c
If possible multiple 1 also add separator and then remove it by Series.str.rstrip from output Series:
df = pd.DataFrame({'a':[0,1,0],
'b':[1,1,0],
'c':[1,1,1]})
df['Final'] = df.dot(df.columns + ',').str.rstrip(',')
print (df)
a b c Final
0 0 1 1 b,c
1 1 1 1 a,b,c
2 0 0 1 c

How to filter values from a multi index Pandas data frame

I have a multi index pandas data frame df as below:
Count
Letter Direction
A -1 3
1 0
B -1 2
1 4
C -1 4
1 10
D -1 8
1 1
E -1 4
1 5
F -1 1
1 1
I want to filter out Letters which has Count < 2 in both or either one of the direction.
Tried df[df.Count < 2], but its giving the below output:
Count
Letter Direction
A 1 0
D 1 1
F -1 1
1 1
Desired output is as below,
Count
Letter Direction
A -1 3
1 0
D -1 8
1 1
F -1 1
1 1
what should i do to get the above?
Use GroupBy.transform with boolean mask and GroupBy.any - any check if at least one True per first level of MultiIndex and transform return mask with same size like original DataFrame, so possible filter by boolean indexing:
df = df[(df.Count < 2).groupby(level=0).transform('any')]
print (df)
Count
Letter Direction
A -1 3
1 0
D -1 8
1 1
F -1 1
1 1
Another solution is use MultiIndex.get_level_values for get values of Letter by condition and select by DataFrame.loc:
df = df.loc[df.index.get_level_values(0)[df.Count < 2]]
print (df)
Count
Letter Direction
A -1 3
1 0
D -1 8
1 1
F -1 1
1 1

Resources