How to achieve following output in pandas dataframe [duplicate]

How to achieve following output in pandas dataframe [duplicate] - python-3.x

This question already has answers here:
Reconstruct a categorical variable from dummies in pandas
(7 answers)
Closed 4 years ago.
df:
category A B C D
x 0 1 0 0
y 1 0 0 0
z 1 0 0 0
l 0 0 0 1
m 0 1 0 0
n 0 0 1 0
how to get df like below
Category Sub-category
x B
y A
z A
l D
m B
n C
I tried:
df['sector'] = df.apply(lambda x: df.columns[x.argmax()], axis = 1)
but getting TypeError: ("reduction operation 'argmax' not allowed for this dtype", 'occurred at index 1')

Just do
df['sub_category'] = df[['A', 'B', 'C', 'D']].idxmax(axis=1)
category A B C D sub_category
0 x 0 1 0 0 B
1 y 1 0 0 0 A
2 z 1 0 0 0 A
3 l 0 0 0 1 D
4 m 0 1 0 0 B
5 n 0 0 1 0 C
Of course you may select only the columns you want
df[['category', 'sub_category']]
category sub_category
0 x B
1 y A
2 z A
3 l D
4 m B
5 n C

Related

Dataframe conversion

I am trying to convert a data frame into a 1,0 matrix format
data = pd.DataFrame({'Val1':['A','B','B'],
'Val2':['C','A','D'],
'Val3':['E','F','C'],
'Comb':['Comb1','Comb2','Comb3']})
data:
Val1 Val2 Val3 Comb
0 A C E Comb1
1 B A F Comb2
2 B D C Comb3
What I need is to convert to below data frame
Comb A C D E B F
0 Comb1 1 1 0 1 0 0
1 Comb2 1 0 0 0 1 1
2 Comb3 0 1 1 0 1 0
I was able to do it with a FOR loop but as my dataframe increases, the processing time increases. Is there a better way to do it?
header = set(data[['Val1','Val2','Val3']].values.ravel())
matrix = pd.DataFrame(columns=header)
for i in range(data.shape[0]):
temp_dict = {data["Val1"].iloc[i]:1, data["Val2"].iloc[i]:1, data["Val3"].iloc[i]:1}
matrix = matrix.append(temp_dict, ignore_index=True)
matrix = matrix.loc[:, matrix.columns.notnull()]
matrix = matrix.fillna(0)
matrix = pd.merge(data[["Comb"]],matrix, left_index=True, right_index=True, how= 'outer')
Thanks!

There may be a better solution, but this is what came to my mind: convert each raw to a dictionary of "present" letters, build a Series from the dictionary, and combine the Series into a dataframe.
data.loc[:, 'Val1':'Val3'].apply(lambda row:
pd.Series({letter: 1 for letter in row}), axis=1)\
.fillna(0).astype(int).join(data.Comb)
# A B C D E F Comb
#0 1 0 1 0 1 0 Comb1
#1 1 1 0 0 0 1 Comb2
#2 0 1 1 1 0 0 Comb3

There are propably multiple ways to solve this, I used pd.crosstab for it:
import pandas as pd
data = pd.DataFrame({'Val1':['A','B','B'],
'Val2':['C','A','D'],
'Val3':['E','F','C'],
'Comb':['Comb1','Comb2','Comb3']})
data["lst"] = data[['Val1', 'Val2', 'Val3']].values.tolist()
data = data.explode("lst")
print(pd.crosstab(data["Comb"], data["lst"]))
Out[20]:
lst A B C D E F
Comb
Comb1 1 0 1 0 1 0
Comb2 1 1 0 0 0 1
Comb3 0 1 1 1 0 0

I guess this will work. Please let me know if it works
pd.get_dummies(data, columns=['Val1','Val2','Val3'],prefix="",prefix_sep="").groupby(axis=1,level=0).sum()

Here's another way:
data.melt('Comb').set_index('Comb')['value'].str.get_dummies().sum(level=0).reset_index()
Output:
Comb A B C D E F
0 Comb1 1 0 1 0 1 0
1 Comb2 1 1 0 0 0 1
2 Comb3 0 1 1 1 0 0

Merge multiple binary encoded rows into one in pandas dataframe

I have a pandas.DataFrame that looks like this:
A B C D E F
0 0 1 0 0 0
1 1 0 0 0 0
2 0 1 0 0 0
3 0 0 0 1 0
4 0 0 1 0 0
There are several rows that share a 1 in their columns and in each row there is only one 1 present. I want to merge the rows with each other so the resulting dataFrame would onyl consist of one row, that combines all the 1s of the dataframe, like this:
A B C D E F
0 1 1 1 1 0
Is there a smart, easy way to do this with pandas?

Use DataFrame.sum, then compare for greater or equal by Series.ge and last convert to 0,1 by Series.view:
s = df.sum().ge(1).view('i1')
Another idea if 0,1 values only is use DataFrame.any with convert mask to 0,1:
s = df.any().view('i1')
print (s)
A 1
B 1
C 1
D 1
E 1
F 0
dtype: int8

We can do
df.sum().ge(1).astype(int)
Out[316]:
A 1
B 1
C 1
D 1
E 1
F 0
dtype: int32

PYEDA truthtable of functions

Hope there is anybody who feels good with PYEDA.
I want to add fictious variables to function
Let me have f=x1, but how can I get truthtable for this function , which will have x2 too
Like truthtable for f(x1)=x1 is:
x1 f
0 0
1 1
But for f(x1,x2)=x1 is:
x1 x2 f
0 0 0
0 1 0
1 0 1
1 1 1
But I will get first table, pyeda will simplify x1&(x2|~x2) to x1 automatically. How can I add this x2?
def calcFunction(function, i):
#here is is point with dimension-size 4
function=function.restrict({x4:i[3]})
function = function.restrict({x3:i[2]})
function = function.restrict({x2:i[1]})
function = function.restrict({x1:i[0]})
if function.satisfy_one() is not None:
return 1
return 0
Here is my algo to fix it, I am calculating func in each point manually, where function can containt 1-4 variables and I am calculating for all point and combinations of x1...x4.

I'm not sure I understand the question as asked, but you might want to try the expression simplify method.
For example:
In [1]: f = (X[1] & X[2]) | (X[3] | X[4] | ~X[3])
In [2]: expr2truthtable(f)
Out[2]:
x[4] x[3] x[2] x[1]
0 0 0 0 : 1
0 0 0 1 : 1
0 0 1 0 : 1
0 0 1 1 : 1
0 1 0 0 : 1
0 1 0 1 : 1
0 1 1 0 : 1
0 1 1 1 : 1
1 0 0 0 : 1
1 0 0 1 : 1
1 0 1 0 : 1
1 0 1 1 : 1
1 1 0 0 : 1
1 1 0 1 : 1
1 1 1 0 : 1
1 1 1 1 : 1
In [3]: f = f.simplify()
In [4]: f
Out[4]: 1
In [5]: expr2truthtable(f)
Out[5]: 1

Pattern identification and sequence detection

I have a dataset 'df' that looks something like this:
MEMBER seen_1 seen_2 seen_3 seen_4 seen_5 seen_6
A 1 0 0 1 0 1
B 1 1 0 0 1 0
C 1 1 1 0 0 1
D 0 0 1 0 0 1
As you can see there are several rows of ones and zeros. Can anyone suggest me a code in python such that I am able to count the number of times '1' occurs continuously before the first occurrence of a 1, 0 and 0 in order. For example, for member A, the first double zero event occurs at seen_2 and seen_3, so the event will be 1. Similarly for the member B, the first double zero event occurs at seen_3 and seen_4 so there are two 1s that occur before this. The resultant table should have a new column 'event' something like this:
MEMBER seen_1 seen_2 seen_3 seen_4 seen_5 seen_6 event
A 1 0 0 1 0 1 1
B 1 1 0 0 1 0 2
C 1 1 1 0 0 1 3
D 0 0 1 0 0 1 1

My approach:
df = df.set_index('MEMBER')
# count 1 on each rows since the last 0
s = (df.stack()
.groupby(['MEMBER', df.eq(0).cumsum(1).stack()])
.cumsum().unstack()
)
# mask of the zeros:
u = s.eq(0)
# look for the first 1 0 0
idx = (~u &
u.shift(-1, axis=1, fill_value=False) &
u.shift(-2, axis=1, fill_value=False) ).idxmax(1)
# look up
df['event'] = s.lookup(idx.index, idx)
Test data:
MEMBER seen_1 seen_2 seen_3 seen_4 seen_5 seen_6
0 A 1 0 1 0 0 1
1 B 1 1 0 0 1 0
2 C 1 1 1 0 0 1
3 D 0 0 1 0 0 1
4 E 1 0 1 1 0 0
Output:
MEMBER seen_1 seen_2 seen_3 seen_4 seen_5 seen_6 event
0 A 1 0 1 0 0 1 1
1 B 1 1 0 0 1 0 2
2 C 1 1 1 0 0 1 3
3 D 0 0 1 0 0 1 1
4 E 1 0 1 1 0 0 2

List column name having value greater than zero

I have following dataframe
A | B | C | D
1 0 2 1
0 1 1 0
0 0 0 1
I want to add the new column have any value of row in the column greater than zero along with column name
A | B | C | D | New
1 0 2 1 A-1, C-2, D-1
0 1 1 0 B-1, C-1
0 0 0 1 D-1

We can use mask and stack
s=df.mask(df==0).stack().\
astype(int).astype(str).\
reset_index(level=1).apply('-'.join,1).add(',').sum(level=0).str[:-1]
df['New']=s
df
Out[170]:
A B C D New
0 1 0 2 1 A-1,C-2,D-1
1 0 1 1 0 B-1,C-1
2 0 0 0 1 D-1

Combine the column names with the df values that are not zero and then filter out the None values.
df = pd.read_clipboard()
arrays = np.where(df!=0, df.columns.values + '-' + df.values.astype('str'), None)
new = []
for array in arrays:
new.append(list(filter(None, array)))
df['New'] = new
df
Out[1]:
A B C D New
0 1 0 2 1 [A-1, C-2, D-1]
1 0 1 1 0 [B-1, C-1]
2 0 0 0 1 [D-1]

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to achieve following output in pandas dataframe [duplicate] - python-3.x

Related

Dataframe conversion

Merge multiple binary encoded rows into one in pandas dataframe

PYEDA truthtable of functions

Pattern identification and sequence detection

List column name having value greater than zero

Categories

Resources