excel find the count of 2 filtered columns - excel-formula

There are paired columns that I am comparing(col1 and col2, col3 and col4) with either blank or '0' or '1'. I basically want to know how many are intersect
id col1 col2 col3 col4
id1 0 1
id2 1 1 0
id3 0 1 1
id4
id5 0
for this table I want to count of how many ids are 0 or 1(between col1 and col2). If I use countA(b2:c4) I get 4 but I need to get 3 as only 3 ids are affected for each pair
. Is therea formula that would actually give 3 for col1 and col2 and 3 for col3 and col4.
SUMPRODUCT(--(B$2:B$7+C$2:C$7=0))
fails here and provides 3 instead of 5

Related

Pandas: Create different dataframes from an unique multiIndex dataframe

I would like to know how to pass from a multiindex dataframe like this:
A B
col1 col2 col1 col2
1 2 12 21
3 1 2 0
To two separated dfs. df_A:
col1 col2
1 2
3 1
df_B:
col1 col2
12 21
2 0
Thank you for the help
I think here is better use DataFrame.xs for selecting by first level:
print (df.xs('A', axis=1, level=0))
col1 col2
0 1 2
1 3 1
What need is not recommended, but possible create DataFrames by groups:
for i, g in df.groupby(level=0, axis=1):
globals()['df_' + str(i)] = g.droplevel(level=0, axis=1)
print (df_A)
col1 col2
0 1 2
1 3 1
Better is create dictionary of DataFrames:
d = {i:g.droplevel(level=0, axis=1)for i, g in df.groupby(level=0, axis=1)}
print (d['A'])
col1 col2
0 1 2
1 3 1

Pandas dataframe drop rows which store certain number of zeros in it

Hello I have dataframe which is having [13171 rows x 511 columns] what I wanted is remove the rows which is having certain number of zeros
for example
col0 col1 col2 col3 col4 col5
ID1 0 2 0 2 0
ID2 1 1 2 10 1
ID3 0 1 3 4 0
ID4 0 0 1 0 3
ID5 0 0 0 0 1
in ID5 row contains 4 zeros in it so I wanted to drop that row. like this I have large dataframe which is having more than 100-300 zeros in rows
I tried below code
df=df[(df == 0).sum(1) >= 4]
for small dataset like above example code is working but for [13171 rows x 511 columns] not working(df=df[(df == 0).sum(1) >= 15]) any one suggest me how can I get proper result
output
col0 col1 col2 col3 col4 col5
ID1 0 2 0 2 0
ID2 1 1 2 10 1
ID3 0 1 3 4 0
ID4 0 0 1 0 3
This will work:
drop_indexs = []
for i in range(len(df.iloc[:,0])):
if (df.iloc[i,:]==0).sum()>=4: # 4 is how many zeros should row min have
drop_indexs.append(i)
updated_df = df.drop(drop_indexs)

groupby column in pandas

I am trying to groupby columns value in pandas but I'm not getting.
Example:
Col1 Col2 Col3
A 1 2
B 5 6
A 3 4
C 7 8
A 11 12
B 9 10
-----
result needed grouping by Col1
Col1 Col2 Col3
A 1,3,11 2,4,12
B 5,9 6,10
c 7 8
but I getting this ouput
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x0000025BEB4D6E50>
I am getting using excel power query with function group by and count all rows, but I canĀ“t get the same with python and pandas. Any help?
Try this
(
df
.groupby('Col1')
.agg(lambda x: ','.join(x.astype(str)))
.reset_index()
)
it outputs
Col1 Col2 Col3
0 A 1,3,11 2,4,12
1 B 5,9 6,10
2 C 7 8
Very good I created solution between 0 and 0:
df[df['A'] != 0].groupby((df['A'] == 0).cumsum()).sub()
It will group column between 0 and 0 and sum it

Identify the relationship between two columns and its respective value count in pandas

I have a Data frame as below :
Col1 Col2 Col3 Col4
1 111 a Test
2 111 b Test
3 111 c Test
4 222 d Prod
5 333 e Prod
6 333 f Prod
7 444 g Test
8 555 h Prod
9 555 i Prod
Expected output :
Column 1 Column 2 Relationship Count
Col2 Col3 One-to-One 2
Col2 Col3 One-to-Many 3
Explanation :
I need to identify the relationship between Col2 & Col3 and also the value counts.
For Eg. 111(col2) is repeated 3 times and has 3 different respective values a,b,c in Col3.
This means col2 and col3 has one-to-Many relationship - count_1 : 1
222(col2) is not repeated and has only one respective value d in col3.
This means col2 and col3 has one-to-one relationshipt - count_2 : 1
333(col2) is repeated twice and has 2 different respective values e,f in col3.
This means col2 and col3 has one-to-Many relationship - count_1 : 1+1 ( increment this count for every one-to-many relationship)
Similarly for other column values increment the respective counter and display the final results as the expected dataframe.
If you only need to check the relationship between col2 and col3, you can do:
(
df.groupby(by='Col2').Col3
.apply(lambda x: 'One-to-One' if len(x)==1 else 'One-to-Many')
.to_frame('Relationship')
.groupby('Relationship').Relationship
.count().to_frame('Count').reset_index()
.assign(**{'Column 1':'Col2', 'Column 2':'Col3'})
.reindex(columns=['Column 1', 'Column 2', 'Relationship', 'Count'])
)
Output:
Column 1 Column 2 Relationship Count
0 Col2 Col3 One-to-Many 3
1 Col2 Col3 One-to-One 2

Grouping corresponding Rows based on One column

I have an Excel Sheet Dataframe with no fixed number of rows and columns.
eg.
Col1 Col2 Col3
A 1 -
A - 2
B 3 -
B - 4
C 5 -
I would like to Group Col1 which has the same content. Like the following.
Col1 Col2 Col3
A 1 2
B 3 4
C 5 -
I am using pandas GroupBy, but not getting what I wanted.
Try using groupby:
print(df.replace('-', pd.np.nan).groupby('Col1', as_index=False).first().fillna('-'))
Output:
Col1 Col2 Col3
0 A 1 2
1 B 3 4
2 C 5 -

Resources