how to get specific column names from a pandas DF that satisfies a condition

how to get specific column names from a pandas DF that satisfies a condition - python-3.x

I have data as follow:
col1 col2 col3 col4 col5
0 1 0 1 0
1 1 0 0 1
1 1 1 0 1
I want it as below:-
col1 col2 col3 col4 col5 col6
0 1 0 1 0 col2,col4
1 1 0 0 1 col2,col4,col5
1 1 1 0 1 col1,col2,col3,col5
Whereever the value is 1, the column name should be appended in col 6. I tried idx.max(), however its not working may be because there are more than one column which satisfies the condition. Can anyone please help?

You can do a matrix multiplication here:
(df # (df.columns + ',')).str[:-1]
Output:
col1 col2 col3 col4 col5 col6
0 0 1 0 1 0 col2,col4
1 1 1 0 0 1 col1,col2,col5
2 1 1 1 0 1 col1,col2,col3,col5

Related

excel find the count of 2 filtered columns

There are paired columns that I am comparing(col1 and col2, col3 and col4) with either blank or '0' or '1'. I basically want to know how many are intersect
id col1 col2 col3 col4
id1 0 1
id2 1 1 0
id3 0 1 1
id4
id5 0
for this table I want to count of how many ids are 0 or 1(between col1 and col2). If I use countA(b2:c4) I get 4 but I need to get 3 as only 3 ids are affected for each pair
. Is therea formula that would actually give 3 for col1 and col2 and 3 for col3 and col4.
SUMPRODUCT(--(B$2:B$7+C$2:C$7=0))
fails here and provides 3 instead of 5

How to evaluate multiple columns on pandas?

I have the following pandas dataframe:
col1 col2 col3 .... colN
5 2 4 .... 9
1 2 3 .... 9
7 1 4 .... 0
1 4 7 .... 8
What I need is a way to determinate the order between several columns:
col1 col2 col3 .... colN
5 2 4 .... 9 ----> colN >= ... >= col5 >= col2 >= col3
1 2 3 .... 9 ----> colN >= ... >= col3 >= col2 >= col1
7 1 4 .... 0 ----> col1 >= ... >= col3 >= col2 >= colN
1 4 7 .... 8 ----> colN >= ... >= col3 >= col2 >= col1
And give them a numeric alias. For example:
colN >= ... >= col5 >= col2 >= col3 = X
colN >= ... >= col3 >= col2 >= col1 = Y
col1 >= ... >= col3 >= col2 >= colN = Z
:
:
col1 col2 col3 .... colN order
5 2 4 .... 9 X
1 2 3 .... 9 Y
7 1 4 .... 0 Z
1 4 7 .... 8 Y
:
:
The number of columns may change and the alias has to follow a patron. Example with 3 columns:
col1 >= col2 >= col3 = 1
col1 >= col3 >= col2 = 2
col2 >= col1 >= col3 = 3
col2 >= col3 >= col2 = 4
col3 >= col1 >= col2 = 5
col3 >= col2 >= col1 = 6
Thanks and regards

You can use:
df['order'] = df.apply(lambda x: '>='.join(x.sort_values(ascending=False).index), axis=1)
df['alias'] = df.groupby('order').ngroup() + 1
Input
col1 col2 col3
0 5 2 4
1 1 2 3
2 7 1 4
3 1 4 7
Output:
col1 col2 col3 order alias
0 5 2 4 col1>=col3>=col2 1
1 1 2 3 col3>=col2>=col1 2
2 7 1 4 col1>=col3>=col2 1
3 1 4 7 col3>=col2>=col1 2
Or for specific pattern:
alias_pattern = {'col1>=col3>=col2' : 2, 'col3>=col2>=col1' : 5}
df['alias'] = df['order'].map(alias_pattern)
Output:
col1 col2 col3 order alias
0 5 2 4 col1>=col3>=col2 2
1 1 2 3 col3>=col2>=col1 5
2 7 1 4 col1>=col3>=col2 2
3 1 4 7 col3>=col2>=col1 5

Pandas: Create different dataframes from an unique multiIndex dataframe

I would like to know how to pass from a multiindex dataframe like this:
A B
col1 col2 col1 col2
1 2 12 21
3 1 2 0
To two separated dfs. df_A:
col1 col2
1 2
3 1
df_B:
col1 col2
12 21
2 0
Thank you for the help

I think here is better use DataFrame.xs for selecting by first level:
print (df.xs('A', axis=1, level=0))
col1 col2
0 1 2
1 3 1
What need is not recommended, but possible create DataFrames by groups:
for i, g in df.groupby(level=0, axis=1):
globals()['df_' + str(i)] = g.droplevel(level=0, axis=1)
print (df_A)
col1 col2
0 1 2
1 3 1
Better is create dictionary of DataFrames:
d = {i:g.droplevel(level=0, axis=1)for i, g in df.groupby(level=0, axis=1)}
print (d['A'])
col1 col2
0 1 2
1 3 1

How to merge tab separated data (always starting with letters) into one string?

I have the following data in a file:
col1 col2 col3 col4 col5 col6
ABC DEF GE-10 0 0 12 4 16 0
HIJ KLM 7 0 123 40 0 0
NOP QL 17 0 0 6 10 1
I want to merge all text information into one string (with _ between) so that it looks like in this:
col1 col2 col3 col4 col5 col6
ABC_DEF_GE-10 0 0 12 4 16 0
HIJ_KLM 7 0 123 40 0 0
NOP_QL 17 0 0 6 10 1
The issue is that the text information to be merged exists in col 1-2 for some rows and in col 1-3 in some rows.
How is this accomplished in Bash?

test.js
#!/bin/bash.
file='read_file.txt'
#Reading each line
while read line; do
#Reading each word
wordString=""
count=1
for word in $line; do
if [[ $word =~ ^[0-9]+$ ]];then
#starts with a numberic value
wordString="${wordString} ${word}"
else
#doesn't starts with a numberic value
wordString="${wordString}_${word}"
fi
done
#remove first character and print the line
echo ${wordString#?}
done < $file
put this in the below file in the same directory
read_file.txt
col1 col2 col3 col4 col5 col6
ABC DEF GE-10 0 0 12 4 16 0
HIJ KLM 7 0 123 40 0 0
NOP QL 17 0 0 6 10 1

Pandas dataframe drop rows which store certain number of zeros in it

Hello I have dataframe which is having [13171 rows x 511 columns] what I wanted is remove the rows which is having certain number of zeros
for example
col0 col1 col2 col3 col4 col5
ID1 0 2 0 2 0
ID2 1 1 2 10 1
ID3 0 1 3 4 0
ID4 0 0 1 0 3
ID5 0 0 0 0 1
in ID5 row contains 4 zeros in it so I wanted to drop that row. like this I have large dataframe which is having more than 100-300 zeros in rows
I tried below code
df=df[(df == 0).sum(1) >= 4]
for small dataset like above example code is working but for [13171 rows x 511 columns] not working(df=df[(df == 0).sum(1) >= 15]) any one suggest me how can I get proper result
output
col0 col1 col2 col3 col4 col5
ID1 0 2 0 2 0
ID2 1 1 2 10 1
ID3 0 1 3 4 0
ID4 0 0 1 0 3

This will work:
drop_indexs = []
for i in range(len(df.iloc[:,0])):
if (df.iloc[i,:]==0).sum()>=4: # 4 is how many zeros should row min have
drop_indexs.append(i)
updated_df = df.drop(drop_indexs)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

how to get specific column names from a pandas DF that satisfies a condition - python-3.x

You can do a matrix multiplication here: (df # (df.columns + ',')).str[:-1] Output: col1 col2 col3 col4 col5 col6 0 0 1 0 1 0 col2,col4 1 1 1 0 0 1 col1,col2,col5 2 1 1 1 0 1 col1,col2,col3,col5

Related

excel find the count of 2 filtered columns

How to evaluate multiple columns on pandas?

Pandas: Create different dataframes from an unique multiIndex dataframe

How to merge tab separated data (always starting with letters) into one string?

Pandas dataframe drop rows which store certain number of zeros in it

Categories

Resources