I have a DataFrame that looks like this:
ID A B C D
6234 1 0 1 0
3417 1 0 0 0
9954 0 1 0 0
4369 0 0 0 1
6281 1 0 1 0
And I want to group it so as to make it look like this:
ID
A B C D
1 0 0 0 3
1 0 1 0 2
0 1 0 0 1
0 0 1 0 2
0 0 0 1 1
I have been using the following code, which has not gotten me very far.
import pandas as pd
data = [[6234,1,0,1,0],
[3417,1,0,0,0],
[9954,0,1,0,0],
[4369,0,0,0,1],
[6281,1,0,1,0]]
DF1 = pd.DataFrame(data, columns = ['ID','A','B','C','D'])
DF2 = DF1.groupby(['A','B','C','D']).count()
I would appreciate any insight that anyone might have to offer.
I have a dataframe in pandas, an example of which is provided below:
Person appear_1 appear_2 appear_3 appear_4 appear_5 appear_6
A 1 0 0 1 0 0
B 1 1 0 0 1 0
C 1 0 1 1 0 0
D 0 0 1 0 0 1
E 1 1 1 1 1 1
As you can see 1 and 0 occurs randomly in different columns. It would be helpful, if anyone can suggest me a code in python such that I am able to find the column number where the 1 0 0 pattern occurs for the first time. For example, for member A, the first 1 0 0 pattern occurs at appear_1. so the first occurrence will be 1. Similarly for the member B, the first 1 0 0 pattern occurs at appear_2, so the first occurrence will be at column 2. The resulting table should have a new column named 'first_occurrence'. If there is no such 1 0 0 pattern occurs (like in row E) then the value in first occurrence column will the sum of number of 1 in that row. The resulting table should look something like this:
Person appear_1 appear_2 appear_3 appear_4 appear_5 appear_6 first_occurrence
A 1 0 0 1 0 0 1
B 1 1 0 0 1 0 2
C 1 0 1 1 0 0 4
D 0 0 1 0 0 1 3
E 1 1 1 1 1 1 6
Thank you in advance.
I try not to reinvent the wheel, so I develop on my answer to previous question. From that answer, you need to use additional idxmax, np.where, and get_indexer
cols = ['appear_1', 'appear_2', 'appear_3', 'appear_4', 'appear_5', 'appear_6']
df1 = df[cols]
m = df1[df1.eq(1)].ffill(1).notna()
df2 = df1[m].bfill(1).eq(0)
m2 = df2 & df2.shift(-1, axis=1, fill_value=True)
df['first_occurrence'] = np.where(m2.any(1), df1.columns.get_indexer(m2.idxmax(1)),
df1.shape[1])
Out[540]:
Person appear_1 appear_2 appear_3 appear_4 appear_5 appear_6 first_occurrence
0 A 1 0 0 1 0 0 1
1 B 1 1 0 0 1 0 2
2 C 1 0 1 1 0 0 4
3 D 0 0 1 0 0 1 3
4 E 1 1 1 1 1 1 6
I have a dataset 'df' that looks something like this:
MEMBER seen_1 seen_2 seen_3 seen_4 seen_5 seen_6
A 1 0 0 1 0 1
B 1 1 0 0 1 0
C 1 1 1 0 0 1
D 0 0 1 0 0 1
As you can see there are several rows of ones and zeros. Can anyone suggest me a code in python such that I am able to count the number of times '1' occurs continuously before the first occurrence of a 1, 0 and 0 in order. For example, for member A, the first double zero event occurs at seen_2 and seen_3, so the event will be 1. Similarly for the member B, the first double zero event occurs at seen_3 and seen_4 so there are two 1s that occur before this. The resultant table should have a new column 'event' something like this:
MEMBER seen_1 seen_2 seen_3 seen_4 seen_5 seen_6 event
A 1 0 0 1 0 1 1
B 1 1 0 0 1 0 2
C 1 1 1 0 0 1 3
D 0 0 1 0 0 1 1
My approach:
df = df.set_index('MEMBER')
# count 1 on each rows since the last 0
s = (df.stack()
.groupby(['MEMBER', df.eq(0).cumsum(1).stack()])
.cumsum().unstack()
)
# mask of the zeros:
u = s.eq(0)
# look for the first 1 0 0
idx = (~u &
u.shift(-1, axis=1, fill_value=False) &
u.shift(-2, axis=1, fill_value=False) ).idxmax(1)
# look up
df['event'] = s.lookup(idx.index, idx)
Test data:
MEMBER seen_1 seen_2 seen_3 seen_4 seen_5 seen_6
0 A 1 0 1 0 0 1
1 B 1 1 0 0 1 0
2 C 1 1 1 0 0 1
3 D 0 0 1 0 0 1
4 E 1 0 1 1 0 0
Output:
MEMBER seen_1 seen_2 seen_3 seen_4 seen_5 seen_6 event
0 A 1 0 1 0 0 1 1
1 B 1 1 0 0 1 0 2
2 C 1 1 1 0 0 1 3
3 D 0 0 1 0 0 1 1
4 E 1 0 1 1 0 0 2
I want the simplest verb that gives a list of all boolean lists of given length.
e.g.
f=. NB. Insert magic here
f 2
0 0
0 1
1 0
1 1
f 3
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1
This functionality has been recently added to the stats/base addon.
load 'stats/base/combinatorial' NB. or just load 'stats'
permrep 2 NB. permutations of size 2 from 2 items with replacement
0 0
0 1
1 0
1 1
3 permrep 2 NB. permutations of size 3 from 2 items with replacement
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1
permrep NB. display definition of permrep
$:~ :(# #: i.#^~)
Using the Qt IDE you can view the script defining permrep and friends by entering open 'stats/base/combinatorial' in the Term window. Alternatively you can view it on Github.
To define f as specified in your question, the following should suffice:
f=: permrep&2
f=: (# #: i.#^~)&2 NB. alternatively
f 3
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1
The #: ("Antibase 2") vocab page has an example close to what I want. I don't really understand that primitive but the following code gives a list of base 2 digits of the numbers 0 to 2^n-1:
f=. #:#i.#(2^])
(Thanks to Dan for getting me to look up #:.)
how to remove outermost logic?
such as
input column D result
And(OR(A,B),C)
output column E binary number
OR(A,B)
A B C result(D)after extract(E)
0 0 0 0 0
0 0 1 0 0
0 1 0 0 1
0 1 1 1 1
1 0 0 0 1
1 0 1 1 1
1 1 0 0 1
1 1 1 1 1
i tried in excel
=IF(NOT(AND(D2,C2))=TRUE,1,0)
but can not remove outermost logic
result after extract
0 0 0 =IF(AND(OR(A2,B2),C2)=TRUE,1,0) =IF(OR(A2,B2)=TRUE,1,0) =IF(NOT(AND(D2,C2))=TRUE,1,0)
0 0 1 =IF(AND(OR(A3,B3),C3)=TRUE,1,0) =IF(OR(A3,B3)=TRUE,1,0) =IF(NOT(AND(D3,C3))=TRUE,1,0)
0 1 0 =IF(AND(OR(A4,B4),C4)=TRUE,1,0) =IF(OR(A4,B4)=TRUE,1,0) =IF(NOT(AND(D4,C4))=TRUE,1,0)
0 1 1 =IF(AND(OR(A5,B5),C5)=TRUE,1,0) =IF(OR(A5,B5)=TRUE,1,0) =IF(NOT(AND(D5,C5))=TRUE,1,0)
1 0 0 =IF(AND(OR(A6,B6),C6)=TRUE,1,0) =IF(OR(A6,B6)=TRUE,1,0) =IF(NOT(AND(D6,C6))=TRUE,1,0)
1 0 1 =IF(AND(OR(A7,B7),C7)=TRUE,1,0) =IF(OR(A7,B7)=TRUE,1,0) =IF(NOT(AND(D7,C7))=TRUE,1,0)
1 1 0 =IF(AND(OR(A8,B8),C8)=TRUE,1,0) =IF(OR(A8,B8)=TRUE,1,0) =IF(NOT(AND(D8,C8))=TRUE,1,0)
1 1 1 =IF(AND(OR(A9,B9),C9)=TRUE,1,0) =IF(OR(A9,B9)=TRUE,1,0) =IF(NOT(AND(D9,C9))=TRUE,1,0)
By "remove the outermost logic", I assume you want to remove the IF function.
One thing to note is that in a formula like =IF(AND(OR(A2,B2),C2)=TRUE,1,0) you never need the =TRUE test. =IF(AND(OR(A2,B2),C2),1,0) will work exactly the same.
There are a couple of ways to convert a boolean (i.e. true/false value) into an integer without the explicit IF. One is --AND(OR(A2,B2),C2). Another is int(AND(OR(A2,B2),C2)).