Count frequency of certain number from a single row including all columns in dataframe

Count frequency of certain number from a single row including all columns in dataframe - python-3.x

Dataframe looks like :
id a b c d e f
1 0 -1 -1 1 0 -1
2 -1 1 0 -1 -1 -1
If I pass 1 as id, I should be able to calculate the occurences of -1 in that row i.e. the count should be 3.

Try this:
print (((df.loc[1] == -1).sum()) * -1)

Related

pandas - show column name + sum in which the sum is higher than zero

I read my dataframe in with:
dataframe = pd.read_csv("testFile.txt", sep = "\t", index_col= 0)
I got a dataframe like this:
cell 17472131 17472132 17472133 17472134 17472135 17472136
cell_0 1 0 1 0 1 0
cell_1 0 0 0 0 1 0
cell_2 0 1 1 1 0 0
cell_3 1 0 0 0 1 0
with pandas I would like to get all the column names in which the sum of the column is > 1 and the total sum.
So I would like:
17472131 2
17472133 2
17472135 3
I figured out how to get the sums of each column with
dataframe.sum(axis=0)
but this also returns the columns with a sum lower than 2.. is there a way to only show the columns with a higher value than i.e. 1?

One pretty neat way is to use lambda function in loc:
df.set_index('cell').sum().loc[lambda x: x>1]
Output:
17472131 2
17472133 2
17472135 3
dtype: int64
Details: df.sum returns a pd.Series and we can use lambda x: x>1 to produce as boolean series which loc use boolean indexing to select only True parts of the pd.Series.

groupby and trim some rows based on condition

I have a data frame something like this:
df = pd.DataFrame({"ID":[1,1,2,2,2,3,3,3,3,3],
"IF_car":[1,0,0,1,0,0,0,1,0,1],
"IF_car_history":[0,0,0,1,0,0,0,1,0,1],
"observation":[0,0,0,1,0,0,0,2,0,3]})
I want output where I can trim rows in groupby with ID and condition on "IF_car_history" == 1
tried_df = df.groupby(['ID']).apply(lambda x: x.loc[:(x['IF_car_history'] == '1').idxmax(),:]).reset_index(drop = True)
I want to drop rows in a groupby by after i get ['IF_car_history'] == '1'
expected output:
Thanks

First compare values for mask m by Series.eq and then use GroupBy.cumsum, and for values before 1 compare by 0, last filter by boolean indexing, but because id necesary remove after last 1 is used swapped values by slicing with [::-1].
m = df['IF_car_history'].eq(1).iloc[::-1]
df1 = df[m.groupby(df['ID']).cumsum().ne(0).iloc[::-1]]
print (df1)
ID IF_car IF_car_history observation
2 2 0 0 0
3 2 1 1 1
5 3 0 0 0
6 3 0 0 0
7 3 1 1 2
8 3 0 0 0
9 3 1 1 3

Dataset with maximal rows by userId indicated

I have a dataframe like this:
ID date var1 var2 var3
AB 22/03/2020 0 1 3
AB 29/03/2020 0 3 3
CD 22/03/2020 0 1 1
And I would like to have a new dataset that, if it is a maximal column (can happen ties too) leaves the same number of the original dataset on the rows; otherwise set -1 if it is not the maximal. So it would be:
ID date var1 var2 var3
AB 22/03/2020 -1 -1 3
AB 29/03/2020 -1 3 3
CD 22/03/2020 -1 1 1
But I am not managing at all how to do this. What can I try next?

Select only numeric columns by DataFrame.select_dtypes:
df1 = df.select_dtypes(np.number)
Or select all columns without first two by positions by DataFrame.iloc:
df1 = df.iloc[:, 2:]
Or select columns with var label by DataFrame.filter:
df1 = df1.filter(like='var')
And then set new values by DataFrame.where with max:
df[df1.columns] = df1.where(df1.eq(df1.max(1), axis=0), -1)
print (df)
ID date var1 var2 var3
0 AB 22/03/2020 -1 -1 3
1 AB 29/03/2020 -1 3 3
2 CD 22/03/2020 -1 1 1

IIUC use where and date back
s=df.loc[:,'var1':]
df.update(s.where(s.eq(s.max(1),axis=0),-1))
df
ID date var1 var2 var3
0 AB 22/03/2020 -1 -1 3
1 AB 29/03/2020 -1 3 3
2 CD 22/03/2020 -1 1 1

How to replace the values of 1's and 0's of various column into a single column of a data frame?

The 0's and 1's need to be transposed to there appropriate headers in python.
How can I achieve this and get the column final_list?

If there is always only one 1 per rows use DataFrame.dot:
df = pd.DataFrame({'a':[0,1,0],
'b':[1,0,0],
'c':[0,0,1]})
df['Final'] = df.dot(df.columns)
print (df)
a b c Final
0 0 1 0 b
1 1 0 0 a
2 0 0 1 c
If possible multiple 1 also add separator and then remove it by Series.str.rstrip from output Series:
df = pd.DataFrame({'a':[0,1,0],
'b':[1,1,0],
'c':[1,1,1]})
df['Final'] = df.dot(df.columns + ',').str.rstrip(',')
print (df)
a b c Final
0 0 1 1 b,c
1 1 1 1 a,b,c
2 0 0 1 c

How to filter values from a multi index Pandas data frame

I have a multi index pandas data frame df as below:
Count
Letter Direction
A -1 3
1 0
B -1 2
1 4
C -1 4
1 10
D -1 8
1 1
E -1 4
1 5
F -1 1
1 1
I want to filter out Letters which has Count < 2 in both or either one of the direction.
Tried df[df.Count < 2], but its giving the below output:
Count
Letter Direction
A 1 0
D 1 1
F -1 1
1 1
Desired output is as below,
Count
Letter Direction
A -1 3
1 0
D -1 8
1 1
F -1 1
1 1
what should i do to get the above?

Use GroupBy.transform with boolean mask and GroupBy.any - any check if at least one True per first level of MultiIndex and transform return mask with same size like original DataFrame, so possible filter by boolean indexing:
df = df[(df.Count < 2).groupby(level=0).transform('any')]
print (df)
Count
Letter Direction
A -1 3
1 0
D -1 8
1 1
F -1 1
1 1
Another solution is use MultiIndex.get_level_values for get values of Letter by condition and select by DataFrame.loc:
df = df.loc[df.index.get_level_values(0)[df.Count < 2]]
print (df)
Count
Letter Direction
A -1 3
1 0
D -1 8
1 1
F -1 1
1 1

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Count frequency of certain number from a single row including all columns in dataframe - python-3.x

Dataframe looks like : id a b c d e f 1 0 -1 -1 1 0 -1 2 -1 1 0 -1 -1 -1 If I pass 1 as id, I should be able to calculate the occurences of -1 in that row i.e. the count should be 3.

Try this: print (((df.loc[1] == -1).sum()) * -1)

Related

pandas - show column name + sum in which the sum is higher than zero

groupby and trim some rows based on condition

Dataset with maximal rows by userId indicated

How to replace the values of 1's and 0's of various column into a single column of a data frame?

How to filter values from a multi index Pandas data frame

Categories

Resources