I have to load a bunch of numbers from excel to access. I used Import Excel data in Access to load the data earlier.
Earlier :-
Field1 Field2 Field3 QTY
A1 B1 C1 1
A1 B2 C2 2
A1 B3 C3 3
A1 B4 C4 4
A1 B5 C5 5
My data is in a crosstab format now.
For example :-
A1 B1 B2 B3 B4 B5
C1 1 0 0 0 0
C2 0 2 0 0 0
C3 0 0 3 0 0
C4 0 0 0 4 0
C5 0 0 0 0 5
Is there a direct way in which I can import this to access? Or do I have to use a macro to convert into the linear form used earlier.
Thanks,
Arka.
Related
input VAR1 VAR2
A1 1
A2 0
A3 1
A4 1
A5 1
A6 1
A7 1
A8 1
A9 1
A10 1
A15 1
B7 0
A1 0
A16 1
A17 1
A18 1
A19 1
A20 0
A21 1
end
Say you have data such as the ones shown. I have VAR1 and wish to create from it VAR2 which takes values 1 if VAR1 contains at the beginning: A1, A3-A10, A15-A19, A21 or if not then it is zero. I believe for this you can use strpos(VAR1) but is it possible to say for example: strpos(VAR1, "A1, A3/A10, A15/A19, A21") ?
The following works if you have a small number of strings of interest. You may need an alternate approach if you are searching for a larger number of strings, where writing out the ranges of strings (e.g. A3-A10) is unfeasible.
clear
input str3 VAR1 VAR2
A1 1
A2 0
A3 1
A4 1
A5 1
A6 1
A7 1
A8 1
A9 1
A10 1
A15 1
B7 0
A1 1
A16 1
A17 1
A18 1
A19 1
A20 0
A21 1
end
gen wanted = 0
local mystrings = "A1 A3 A4 A5 A6 A7 A8 A9 A10 A15 A16 A17 A18 A19 A21"
foreach string in `mystrings' {
replace wanted = 1 if strpos(VAR1, "`string'") == 1
}
assert wanted == VAR2
Note that in your example input, the second occurrence of A1 had a value of 0 but should have a value of 1 according to your post.
Here is a more generalisable solution for larger ranges of strings:
gen A = 0
replace A = 1 if strpos(VAR1,"A") == 1
gen newvar = substr(VAR1,2,.)
destring newvar, replace
gen wanted = 0
replace wanted = 1 if A == 1 & (inlist(newvar,1,21) | inrange(newvar,3,10) | inrange(newvar,15,19))
assert wanted == VAR2
Hello I Have pandas dataframe which is grouped wanted to ungroup the dataframe the column values are separated comma the dataframe which is looking as below
col1 col2 name exams
0,0,0 0,0,0, A1 exm1,exm2, exm3
0,1,0,20 0,0,2,20 A2 exm1,exm2, exm4, exm5
0,0,0,30 0,0,20,20 A3 exm1,exm2, exm3, exm5
output how I wanted
col1 col2 name exam
0 0 A1 exm1
0 0 A1 exm2
0 0 A1 exm3
0 0 A2 exm1
1 0 A2 exm2
0 2 A2 exm4
20 20 A2 exm5
..............
30 20 A3 exm5
I am tried with Split (explode) pandas dataframe string entry to separate rows
but not able get proper approach any one please give me suggestion how can I get my output
Try with explode, notice , explode is the new function after pandas 0.25.0
df[['col1','col2','exams']]=df[['col1','col2','exams']].apply(lambda x : x.str.split(','))
df = df.join(pd.concat([df.pop(x).explode() for x in ['col1','col2','exams']],axis=1))
Out[62]:
name col1 col2 exams
0 A1 0 0 exm1
0 A1 0 0 exm2
0 A1 0 0 exm3
1 A2 0 0 exm1
1 A2 1 0 exm2
1 A2 0 2 exm4
1 A2 20 20 exm5
2 A3 0 0 exm1
2 A3 0 0 exm2
2 A3 0 20 exm3
2 A3 30 20 exm5
I have a dataframe with columns as a, c1, c2, c3 c4.
df =
a. c1. c2. c3. c4.
P1 1 0 0 0
P2 0 0 0 1
P3 1 0 0 0
P4 0 1 0 0
On above df, I want to do following operations:
Add a new column main, whose value will be the name of column which contain value 1 for a particular row.
For eg: 1st row will have value 'c1' in its main column, similarly second row will have c4.
The resulting df will look like below:
df =
a. c1. c2. c3. c4. main
P1 1 0 0 0 c1
P2 0 0 0 1 c4
P3 1 0 0 0 c1
P4 0 1 0 0 c2
I am new to python and dataframes. Please help.
Use DataFrame.dot for matrix multiplication:
If a is first colum omit it by indexing:
df['main'] = df.iloc[:, 1:].dot(df.columns[1:])
#if possible multiple 1 per row
#df['main'] = df.iloc[:, 1:].dot(df.columns[1:] + ',').str.rstrip(',')
print (df)
a c1 c2 c3 c4 main
0 P1 1 0 0 0 c1
1 P2 0 0 0 1 c4
2 P3 1 0 0 0 c1
3 P4 0 1 0 0 c2
If a is index:
df['main'] = df.dot(df.columns)
#if possible multiple 1 per row
#df['main'] = df.dot(df.columns + ',').str.rstrip(',')
print (df)
c1 c2 c3 c4 main
a
P1 1 0 0 0 c1
P2 0 0 0 1 c4
P3 1 0 0 0 c1
P4 0 1 0 0 c2
I have a table something looking like this:
id CompanyName ProductID productName
-- ----------- --------- -----------
1 c1 1 p1
2 c1 2 p2
3 c2 2 p2
4 c2 3 p3
5 c3 3 p3
6 c4 3 p3
7 c5 4 p4
8 c6 4 p4
9 c6 5 p5
Is it possible to run a DolphinDB query to get output like this:
companyName p1 p2 p3 p4 p5
------------------------------
c1 1 1 0 0 0
c2 0 1 1 0 0
c3 0 0 1 0 0
c4 0 0 1 0 0
c5 0 0 0 1 0
c6 0 0 0 1 1
The value in the above table is the number of each product in each company.I get it by the query:
select count(*) from t group by companyName,productName
t1=select count(ProductID) from t pivot by CompanyName, productName
nullFill!(t1,0)
Let's say that I have a dataset that contains 4 binary columns for 2 rows.
It looks like this:
c1 c2 c3 c4 c5
r1 0 1 0 1 0
r2 1 1 1 1 0
I want to create a matrix that gives the number of occurrences of a column, given that it also occurred in another column. Kinda like a confusion matrix
My desired output is:
c1 c2 c3 c4 c5
c1 - 1 1 1 0
c2 1 - 1 2 0
c3 1 1 - 1 0
c4 1 2 1 - 0
I have used pandas crosstab but it only gives the desired output when using 2 columns. I want to use all of the columns
dot
df.T.dot(df)
# same as
# df.T # df
c1 c2 c3 c4 c5
c1 1 1 1 1 0
c2 1 2 1 2 0
c3 1 1 1 1 0
c4 1 2 1 2 0
c5 0 0 0 0 0
You can use np.fill_diagonal to make the diagonal zero
d = df.T.dot(df)
np.fill_diagonal(d.to_numpy(), 0)
d
c1 c2 c3 c4 c5
c1 0 1 1 1 0
c2 1 0 1 2 0
c3 1 1 0 1 0
c4 1 2 1 0 0
c5 0 0 0 0 0
And as long as we're using Numpy, you could go all the way...
a = df.to_numpy()
b = a.T # a
np.fill_diagonal(b, 0)
pd.DataFrame(b, df.columns, df.columns)
c1 c2 c3 c4 c5
c1 0 1 1 1 0
c2 1 0 1 2 0
c3 1 1 0 1 0
c4 1 2 1 0 0
c5 0 0 0 0 0
A way of using melt and merge with groupby
s=df.reset_index().melt('index').loc[lambda x : x.value==1]
s.merge(s,on='index').query('variable_x!=variable_y').groupby(['variable_x','variable_y'])['value_x'].sum().unstack(fill_value=0)
Out[32]:
variable_y c1 c2 c3 c4
variable_x
c1 0 1 1 1
c2 1 0 1 2
c3 1 1 0 1
c4 1 2 1 0