How to return all rows that have equal number of values of 0 and 1?

How to return all rows that have equal number of values of 0 and 1? - python-3.x

I have dataframe that has 50 columns each column have either 0 or 1. How do I return all rows that have an equal (tie) in the number of 0 and 1 (25 "0" and 25 "1").
An example on a 4 columns:
A B C D
1 1 0 0
1 1 1 0
1 0 1 0
0 0 0 0
based on the above example it should return the first and the third row.
A B C D
1 1 0 0
1 0 1 0

Because you have four columns, we assume you must have atleast two sets of 1 in a row. So, please try
df[df.mean(1).eq(0.5)]

Related

Returning column header corresponding to matched value

need some help here.. I am looking to retrieve Gender from Sheet 2 corresponding to the name in Sheet 1.
Step 1 - Match the name in sheet 1 to sheet 2 (not all names in sheet 1 will be in sheet 2, mark NA for non matching names)
Step 2 - Look for the corresponding gender in sheet 2.
Step 3 - Retrieve the column header or the last number in the column header (1,2,3...6)
Sheet 1
Name
Gender
w
???
e
r
t
y
u
i
q
w
e
r
Sheet 2
Name
Male 1
Female 2
other 3
other 4
other 5
Do not know 6
w
1
0
0
0
0
0
a
0
0
0
0
0
1
q
1
0
0
0
0
0
r
0
1
0
0
0
0
e
1
0
0
0
0
0
t
0
0
0
0
1
0
y
0
0
0
0
0
1
u
0
1
0
0
0
0

with Office 365 we can use FILTER:
=IFERROR(FILTER($F$1:$K$1,INDEX($F$2:$K$9,MATCH(A2,$E$2:$E$9,0),0)=1),"No Match")
With older versions we can use another INDEX/MATCH:
=IFERROR(INDEX($F$1:$K$1,MATCH(1,INDEX($F$2:$K$9,MATCH(A2,$E$2:$E$9,0),0),0)),"No Match")

Merge multiple binary encoded rows into one in pandas dataframe

I have a pandas.DataFrame that looks like this:
A B C D E F
0 0 1 0 0 0
1 1 0 0 0 0
2 0 1 0 0 0
3 0 0 0 1 0
4 0 0 1 0 0
There are several rows that share a 1 in their columns and in each row there is only one 1 present. I want to merge the rows with each other so the resulting dataFrame would onyl consist of one row, that combines all the 1s of the dataframe, like this:
A B C D E F
0 1 1 1 1 0
Is there a smart, easy way to do this with pandas?

Use DataFrame.sum, then compare for greater or equal by Series.ge and last convert to 0,1 by Series.view:
s = df.sum().ge(1).view('i1')
Another idea if 0,1 values only is use DataFrame.any with convert mask to 0,1:
s = df.any().view('i1')
print (s)
A 1
B 1
C 1
D 1
E 1
F 0
dtype: int8

We can do
df.sum().ge(1).astype(int)
Out[316]:
A 1
B 1
C 1
D 1
E 1
F 0
dtype: int32

How to identify where a particular sequence in a row occurs for the first time

I have a dataframe in pandas, an example of which is provided below:
Person appear_1 appear_2 appear_3 appear_4 appear_5 appear_6
A 1 0 0 1 0 0
B 1 1 0 0 1 0
C 1 0 1 1 0 0
D 0 0 1 0 0 1
E 1 1 1 1 1 1
As you can see 1 and 0 occurs randomly in different columns. It would be helpful, if anyone can suggest me a code in python such that I am able to find the column number where the 1 0 0 pattern occurs for the first time. For example, for member A, the first 1 0 0 pattern occurs at appear_1. so the first occurrence will be 1. Similarly for the member B, the first 1 0 0 pattern occurs at appear_2, so the first occurrence will be at column 2. The resulting table should have a new column named 'first_occurrence'. If there is no such 1 0 0 pattern occurs (like in row E) then the value in first occurrence column will the sum of number of 1 in that row. The resulting table should look something like this:
Person appear_1 appear_2 appear_3 appear_4 appear_5 appear_6 first_occurrence
A 1 0 0 1 0 0 1
B 1 1 0 0 1 0 2
C 1 0 1 1 0 0 4
D 0 0 1 0 0 1 3
E 1 1 1 1 1 1 6
Thank you in advance.

I try not to reinvent the wheel, so I develop on my answer to previous question. From that answer, you need to use additional idxmax, np.where, and get_indexer
cols = ['appear_1', 'appear_2', 'appear_3', 'appear_4', 'appear_5', 'appear_6']
df1 = df[cols]
m = df1[df1.eq(1)].ffill(1).notna()
df2 = df1[m].bfill(1).eq(0)
m2 = df2 & df2.shift(-1, axis=1, fill_value=True)
df['first_occurrence'] = np.where(m2.any(1), df1.columns.get_indexer(m2.idxmax(1)),
df1.shape[1])
Out[540]:
Person appear_1 appear_2 appear_3 appear_4 appear_5 appear_6 first_occurrence
0 A 1 0 0 1 0 0 1
1 B 1 1 0 0 1 0 2
2 C 1 0 1 1 0 0 4
3 D 0 0 1 0 0 1 3
4 E 1 1 1 1 1 1 6

Comparing two different sized pandas Dataframes and to find the row index with equal values

I need some help with comparing two pandas dataframe
I have two dataframes
The first dataframe is
df1 =
a b c d
0 1 1 1 1
1 0 1 0 1
2 0 0 0 1
3 1 1 1 1
4 1 0 1 0
5 1 1 1 0
6 0 0 1 0
7 0 1 0 1
and the second dataframe is
df2 =
a b c d
0 1 1 1 1
1 1 0 1 0
2 0 0 1 0
I want to find the row index of dataframe 1 (df1) which the entire row is the same as the rows in dataframe 2 (df2). My expect result would be
0
3
4
6
The order of the above index does not need to be in order, all I want is the index of dataframe 1 (df1)
Is there a way without using for loop?
Thanks
Tommy

You can using merge
df1.merge(df2,indicator=True,how='left').loc[lambda x : x['_merge']=='both'].index
Out[459]: Int64Index([0, 3, 4, 6], dtype='int64')

Excel check for value in different cells per row

Having this kind of data:
A B C D E
1 1 0 1 0 0
2 0 1 1 0 1
3 1 0 1 1 0
4 0 1 0 1 0
I would like to show true/false in column F if column A, C and E has the value of 1.
So not looking for a value in range - but different columns.

You can use the AND function, something like:
=IF(AND(A1=1,C1=1,E1=1),"TRUE","FALSE")

=IF(A2;IF(C2;IF(E2;TRUE;FALSE);FALSE);FALSE)
This will display TRUE if ALL three cells are 1, else FALSE.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to return all rows that have equal number of values of 0 and 1? - python-3.x

Because you have four columns, we assume you must have atleast two sets of 1 in a row. So, please try df[df.mean(1).eq(0.5)]

Related

Returning column header corresponding to matched value

Merge multiple binary encoded rows into one in pandas dataframe

How to identify where a particular sequence in a row occurs for the first time

Comparing two different sized pandas Dataframes and to find the row index with equal values

Excel check for value in different cells per row

Categories

Resources