DolphinDB pivot table - pivot

I have a table something looking like this:
id CompanyName ProductID productName
-- ----------- --------- -----------
1 c1 1 p1
2 c1 2 p2
3 c2 2 p2
4 c2 3 p3
5 c3 3 p3
6 c4 3 p3
7 c5 4 p4
8 c6 4 p4
9 c6 5 p5
Is it possible to run a DolphinDB query to get output like this:
companyName p1 p2 p3 p4 p5
------------------------------
c1 1 1 0 0 0
c2 0 1 1 0 0
c3 0 0 1 0 0
c4 0 0 1 0 0
c5 0 0 0 1 0
c6 0 0 0 1 1
The value in the above table is the number of each product in each company.I get it by the query:
select count(*) from t group by companyName,productName

t1=select count(ProductID) from t pivot by CompanyName, productName
nullFill!(t1,0)

Related

Ungroup pandas dataframe column values separated by comma

Hello I Have pandas dataframe which is grouped wanted to ungroup the dataframe the column values are separated comma the dataframe which is looking as below
col1 col2 name exams
0,0,0 0,0,0, A1 exm1,exm2, exm3
0,1,0,20 0,0,2,20 A2 exm1,exm2, exm4, exm5
0,0,0,30 0,0,20,20 A3 exm1,exm2, exm3, exm5
output how I wanted
col1 col2 name exam
0 0 A1 exm1
0 0 A1 exm2
0 0 A1 exm3
0 0 A2 exm1
1 0 A2 exm2
0 2 A2 exm4
20 20 A2 exm5
..............
30 20 A3 exm5
I am tried with Split (explode) pandas dataframe string entry to separate rows
but not able get proper approach any one please give me suggestion how can I get my output
Try with explode, notice , explode is the new function after pandas 0.25.0
df[['col1','col2','exams']]=df[['col1','col2','exams']].apply(lambda x : x.str.split(','))
df = df.join(pd.concat([df.pop(x).explode() for x in ['col1','col2','exams']],axis=1))
Out[62]:
name col1 col2 exams
0 A1 0 0 exm1
0 A1 0 0 exm2
0 A1 0 0 exm3
1 A2 0 0 exm1
1 A2 1 0 exm2
1 A2 0 2 exm4
1 A2 20 20 exm5
2 A3 0 0 exm1
2 A3 0 0 exm2
2 A3 0 20 exm3
2 A3 30 20 exm5

Adding new Column in Dataframe and updating row values as other columns name based on condition

I have a dataframe with columns as a, c1, c2, c3 c4.
df =
a. c1. c2. c3. c4.
P1 1 0 0 0
P2 0 0 0 1
P3 1 0 0 0
P4 0 1 0 0
On above df, I want to do following operations:
Add a new column main, whose value will be the name of column which contain value 1 for a particular row.
For eg: 1st row will have value 'c1' in its main column, similarly second row will have c4.
The resulting df will look like below:
df =
a. c1. c2. c3. c4. main
P1 1 0 0 0 c1
P2 0 0 0 1 c4
P3 1 0 0 0 c1
P4 0 1 0 0 c2
I am new to python and dataframes. Please help.
Use DataFrame.dot for matrix multiplication:
If a is first colum omit it by indexing:
df['main'] = df.iloc[:, 1:].dot(df.columns[1:])
#if possible multiple 1 per row
#df['main'] = df.iloc[:, 1:].dot(df.columns[1:] + ',').str.rstrip(',')
print (df)
a c1 c2 c3 c4 main
0 P1 1 0 0 0 c1
1 P2 0 0 0 1 c4
2 P3 1 0 0 0 c1
3 P4 0 1 0 0 c2
If a is index:
df['main'] = df.dot(df.columns)
#if possible multiple 1 per row
#df['main'] = df.dot(df.columns + ',').str.rstrip(',')
print (df)
c1 c2 c3 c4 main
a
P1 1 0 0 0 c1
P2 0 0 0 1 c4
P3 1 0 0 0 c1
P4 0 1 0 0 c2

How can I create Frequency Matrix using all columns

Let's say that I have a dataset that contains 4 binary columns for 2 rows.
It looks like this:
c1 c2 c3 c4 c5
r1 0 1 0 1 0
r2 1 1 1 1 0
I want to create a matrix that gives the number of occurrences of a column, given that it also occurred in another column. Kinda like a confusion matrix
My desired output is:
c1 c2 c3 c4 c5
c1 - 1 1 1 0
c2 1 - 1 2 0
c3 1 1 - 1 0
c4 1 2 1 - 0
I have used pandas crosstab but it only gives the desired output when using 2 columns. I want to use all of the columns
dot
df.T.dot(df)
# same as
# df.T # df
c1 c2 c3 c4 c5
c1 1 1 1 1 0
c2 1 2 1 2 0
c3 1 1 1 1 0
c4 1 2 1 2 0
c5 0 0 0 0 0
You can use np.fill_diagonal to make the diagonal zero
d = df.T.dot(df)
np.fill_diagonal(d.to_numpy(), 0)
d
c1 c2 c3 c4 c5
c1 0 1 1 1 0
c2 1 0 1 2 0
c3 1 1 0 1 0
c4 1 2 1 0 0
c5 0 0 0 0 0
And as long as we're using Numpy, you could go all the way...
a = df.to_numpy()
b = a.T # a
np.fill_diagonal(b, 0)
pd.DataFrame(b, df.columns, df.columns)
c1 c2 c3 c4 c5
c1 0 1 1 1 0
c2 1 0 1 2 0
c3 1 1 0 1 0
c4 1 2 1 0 0
c5 0 0 0 0 0
A way of using melt and merge with groupby
s=df.reset_index().melt('index').loc[lambda x : x.value==1]
s.merge(s,on='index').query('variable_x!=variable_y').groupby(['variable_x','variable_y'])['value_x'].sum().unstack(fill_value=0)
Out[32]:
variable_y c1 c2 c3 c4
variable_x
c1 0 1 1 1
c2 1 0 1 2
c3 1 1 0 1
c4 1 2 1 0

duplicate rows and swap clumns based on condition

I have the following table:
name a0 a1 type val
0 name1 1 0 AB 100
1 name1 2 0 AB 105
2 name2 1 2 BB 110
3 name3 1 0 AN 120
and I want to do this.
For every type I see where the type name does not contain the same 2 letters, I want to duplicate the row and swap the a0 and a1 columns and the letters of the type column. So, my result will be:
name a0 a1 type val
0 name1 1 0 AB 100
1 name1 0 1 BA 100
2 name1 2 0 AB 105
3 name1 0 2 BA 105
4 name2 1 2 BB 110
5 name3 1 0 AN 120
6 name3 0 1 NA 120
Note that for example for the same name we can have the same type but different a0 and a1 (and hence val).
So, we can have name1 and type AB as in the first two lines of the original table.
I tried:
df1 = pd.DataFrame({'name':['name1', 'name1', 'name2', 'name3'], 'a0':[1, 2, 1, 1], 'a1':[0, 0, 2, 0], 'type':['AB', 'AB', 'BB', 'AN'], 'val':[100,105, 110, 120]})
for idx in df1.index:
a1 = df1.loc[idx, 'a0']
a0 = df1.loc[idx, 'a1']
val = df1.loc[idx, 'val']
name = df1.loc[idx, 'name']
if df1.loc[idx, 'type'] == 'AB':
new_type = 'BA'
elif df1.loc[idx, 'type'] == 'AN':
new_type = 'NA'
row = pd.DataFrame({'name':name, 'a0':a0 , 'a1':a1 , 'type':new_type, 'val':val}, index=np.arange(idx))
df1 = df1.append(row, ignore_index=False)
df1 = df1.sort_index().reset_index(drop=True)
but it gives me:
name a0 a1 type val
0 name1 1 0 AB 100
1 name1 2 0 BA 105
2 name1 0 2 BA 105
3 name1 2 0 BA 105
4 name1 0 2 BA 105
5 name1 2 0 BA 105
6 name1 0 2 BA 105
7 name1 2 0 AB 105
8 name2 1 2 BB 110
9 name3 1 0 AN 120
First create mask for identify values with 2 different letters, create new DataFrame by DataFrame.assign, swap values in columns, join together and sorting by index, last create default index values:
mask = df['type'].apply(set).str.len() == 2
df1 = df[mask].assign(type=lambda x: df['type'].str[1] + df['type'].str[0])
df1[['a0','a1']] = df1[['a1','a0']].to_numpy()
#pandas below
#df1[['a0','a1']] = df1[['a1','a0']].values
df = pd.concat([df, df1]).sort_index().reset_index(drop=True)
print (df)
name a0 a1 type val
0 name1 1 0 AB 100
1 name1 0 1 BA 100
2 name1 2 0 AB 105
3 name1 0 2 BA 105
4 name2 1 2 BB 110
5 name3 1 0 AN 120
6 name3 0 1 NA 120
You can use:
def myfunc(x):
x['type']=x['type'][::-1]
x[['a0','a1']]=x[['a1','a0']].values
return x
m=df.type.apply(set).str.len().gt(1)
pd.concat([df,df.loc[m].apply(myfunc,axis=1)],ignore_index=True).sort_values(['name','val'])
name a0 a1 type val
0 name1 1 0 AB 100
4 name1 0 1 BA 100
1 name1 2 0 AB 105
5 name1 0 2 BA 105
2 name2 1 2 BB 110
3 name3 1 0 AN 120
6 name3 0 1 NA 120

Reading crosstab numbers from excel to access

I have to load a bunch of numbers from excel to access. I used Import Excel data in Access to load the data earlier.
Earlier :-
Field1 Field2 Field3 QTY
A1 B1 C1 1
A1 B2 C2 2
A1 B3 C3 3
A1 B4 C4 4
A1 B5 C5 5
My data is in a crosstab format now.
For example :-
A1 B1 B2 B3 B4 B5
C1 1 0 0 0 0
C2 0 2 0 0 0
C3 0 0 3 0 0
C4 0 0 0 4 0
C5 0 0 0 0 5
Is there a direct way in which I can import this to access? Or do I have to use a macro to convert into the linear form used earlier.
Thanks,
Arka.

Resources