Ungroup pandas dataframe column values separated by comma - python-3.x

Hello I Have pandas dataframe which is grouped wanted to ungroup the dataframe the column values are separated comma the dataframe which is looking as below
col1 col2 name exams
0,0,0 0,0,0, A1 exm1,exm2, exm3
0,1,0,20 0,0,2,20 A2 exm1,exm2, exm4, exm5
0,0,0,30 0,0,20,20 A3 exm1,exm2, exm3, exm5
output how I wanted
col1 col2 name exam
0 0 A1 exm1
0 0 A1 exm2
0 0 A1 exm3
0 0 A2 exm1
1 0 A2 exm2
0 2 A2 exm4
20 20 A2 exm5
..............
30 20 A3 exm5
I am tried with Split (explode) pandas dataframe string entry to separate rows
but not able get proper approach any one please give me suggestion how can I get my output

Try with explode, notice , explode is the new function after pandas 0.25.0
df[['col1','col2','exams']]=df[['col1','col2','exams']].apply(lambda x : x.str.split(','))
df = df.join(pd.concat([df.pop(x).explode() for x in ['col1','col2','exams']],axis=1))
Out[62]:
name col1 col2 exams
0 A1 0 0 exm1
0 A1 0 0 exm2
0 A1 0 0 exm3
1 A2 0 0 exm1
1 A2 1 0 exm2
1 A2 0 2 exm4
1 A2 20 20 exm5
2 A3 0 0 exm1
2 A3 0 0 exm2
2 A3 0 20 exm3
2 A3 30 20 exm5

Related

order the DF by changing columns values to new rows

I have the following dataframe:
Time Image Mean
0 A1 1
1 A1 2
0 B1 3
1 B1 4
And I want to change this as following (remove image column, add the Image values as a row header and put the values of the mean):
Time A1 B1
0 1 3
1 2 4
Try:
print(
df.pivot(index="Time", columns="Image", values="Mean")
.reset_index()
.rename_axis("", axis=1)
)
Prints:
Time A1 B1
0 0 1 3
1 1 2 4

Adding new Column in Dataframe and updating row values as other columns name based on condition

I have a dataframe with columns as a, c1, c2, c3 c4.
df =
a. c1. c2. c3. c4.
P1 1 0 0 0
P2 0 0 0 1
P3 1 0 0 0
P4 0 1 0 0
On above df, I want to do following operations:
Add a new column main, whose value will be the name of column which contain value 1 for a particular row.
For eg: 1st row will have value 'c1' in its main column, similarly second row will have c4.
The resulting df will look like below:
df =
a. c1. c2. c3. c4. main
P1 1 0 0 0 c1
P2 0 0 0 1 c4
P3 1 0 0 0 c1
P4 0 1 0 0 c2
I am new to python and dataframes. Please help.
Use DataFrame.dot for matrix multiplication:
If a is first colum omit it by indexing:
df['main'] = df.iloc[:, 1:].dot(df.columns[1:])
#if possible multiple 1 per row
#df['main'] = df.iloc[:, 1:].dot(df.columns[1:] + ',').str.rstrip(',')
print (df)
a c1 c2 c3 c4 main
0 P1 1 0 0 0 c1
1 P2 0 0 0 1 c4
2 P3 1 0 0 0 c1
3 P4 0 1 0 0 c2
If a is index:
df['main'] = df.dot(df.columns)
#if possible multiple 1 per row
#df['main'] = df.dot(df.columns + ',').str.rstrip(',')
print (df)
c1 c2 c3 c4 main
a
P1 1 0 0 0 c1
P2 0 0 0 1 c4
P3 1 0 0 0 c1
P4 0 1 0 0 c2

DolphinDB pivot table

I have a table something looking like this:
id CompanyName ProductID productName
-- ----------- --------- -----------
1 c1 1 p1
2 c1 2 p2
3 c2 2 p2
4 c2 3 p3
5 c3 3 p3
6 c4 3 p3
7 c5 4 p4
8 c6 4 p4
9 c6 5 p5
Is it possible to run a DolphinDB query to get output like this:
companyName p1 p2 p3 p4 p5
------------------------------
c1 1 1 0 0 0
c2 0 1 1 0 0
c3 0 0 1 0 0
c4 0 0 1 0 0
c5 0 0 0 1 0
c6 0 0 0 1 1
The value in the above table is the number of each product in each company.I get it by the query:
select count(*) from t group by companyName,productName
t1=select count(ProductID) from t pivot by CompanyName, productName
nullFill!(t1,0)

How can I create Frequency Matrix using all columns

Let's say that I have a dataset that contains 4 binary columns for 2 rows.
It looks like this:
c1 c2 c3 c4 c5
r1 0 1 0 1 0
r2 1 1 1 1 0
I want to create a matrix that gives the number of occurrences of a column, given that it also occurred in another column. Kinda like a confusion matrix
My desired output is:
c1 c2 c3 c4 c5
c1 - 1 1 1 0
c2 1 - 1 2 0
c3 1 1 - 1 0
c4 1 2 1 - 0
I have used pandas crosstab but it only gives the desired output when using 2 columns. I want to use all of the columns
dot
df.T.dot(df)
# same as
# df.T # df
c1 c2 c3 c4 c5
c1 1 1 1 1 0
c2 1 2 1 2 0
c3 1 1 1 1 0
c4 1 2 1 2 0
c5 0 0 0 0 0
You can use np.fill_diagonal to make the diagonal zero
d = df.T.dot(df)
np.fill_diagonal(d.to_numpy(), 0)
d
c1 c2 c3 c4 c5
c1 0 1 1 1 0
c2 1 0 1 2 0
c3 1 1 0 1 0
c4 1 2 1 0 0
c5 0 0 0 0 0
And as long as we're using Numpy, you could go all the way...
a = df.to_numpy()
b = a.T # a
np.fill_diagonal(b, 0)
pd.DataFrame(b, df.columns, df.columns)
c1 c2 c3 c4 c5
c1 0 1 1 1 0
c2 1 0 1 2 0
c3 1 1 0 1 0
c4 1 2 1 0 0
c5 0 0 0 0 0
A way of using melt and merge with groupby
s=df.reset_index().melt('index').loc[lambda x : x.value==1]
s.merge(s,on='index').query('variable_x!=variable_y').groupby(['variable_x','variable_y'])['value_x'].sum().unstack(fill_value=0)
Out[32]:
variable_y c1 c2 c3 c4
variable_x
c1 0 1 1 1
c2 1 0 1 2
c3 1 1 0 1
c4 1 2 1 0

Reading crosstab numbers from excel to access

I have to load a bunch of numbers from excel to access. I used Import Excel data in Access to load the data earlier.
Earlier :-
Field1 Field2 Field3 QTY
A1 B1 C1 1
A1 B2 C2 2
A1 B3 C3 3
A1 B4 C4 4
A1 B5 C5 5
My data is in a crosstab format now.
For example :-
A1 B1 B2 B3 B4 B5
C1 1 0 0 0 0
C2 0 2 0 0 0
C3 0 0 3 0 0
C4 0 0 0 4 0
C5 0 0 0 0 5
Is there a direct way in which I can import this to access? Or do I have to use a macro to convert into the linear form used earlier.
Thanks,
Arka.

Resources