Count specific value for IDs in two dataframes

Count specific value for IDs in two dataframes - python-3.x

I have two dataframes
df1
+----+-------+
| | Key |
|----+-------|
| 0 | 30 |
| 1 | 31 |
| 2 | 32 |
| 3 | 33 |
| 4 | 34 |
| 5 | 35 |
+----+-------+
df2
+----+-------+--------+
| | Key | Test |
|----+-------+--------|
| 0 | 30 | Test4 |
| 1 | 30 | Test5 |
| 2 | 30 | Test6 |
| 3 | 31 | Test4 |
| 4 | 31 | Test5 |
| 5 | 31 | Test6 |
| 6 | 32 | Test3 |
| 7 | 33 | Test3 |
| 8 | 33 | Test3 |
| 9 | 34 | Test1 |
| 10 | 34 | Test1 |
| 11 | 34 | Test2 |
| 12 | 34 | Test3 |
| 13 | 34 | Test3 |
| 14 | 34 | Test3 |
| 15 | 35 | Test3 |
| 16 | 35 | Test3 |
| 17 | 35 | Test3 |
| 18 | 35 | Test3 |
| 19 | 35 | Test3 |
+----+-------+--------+
I want to count how many times each Test is listed for each Key.
+----+-------+-------+-------+-------+-------+-------+-------+
| | Key | Test1 | Test2 | Test3 | Test4 | Test5 | Test6 |
|----+-------|-------|-------|-------|-------|-------|-------|
| 0 | 30 | | | | 1 | 1 | 1 |
| 1 | 31 | | | | 1 | 1 | 1 |
| 2 | 32 | | | 1 | | | |
| 3 | 33 | | | 2 | | | |
| 4 | 34 | 2 | 1 | 3 | | | |
| 5 | 35 | | | 5 | | | |
+----+-------+-------+-------+-------+-------+-------+-------+
What I've tried
Using join and groupby, I first got the count for each Key, regardless of Test.
result_df = df1.join(df2.groupby('Key').size().rename('Count'), on='Key')
+----+-------+---------+
| | Key | Count |
|----+-------+---------|
| 0 | 30 | 3 |
| 1 | 31 | 3 |
| 2 | 32 | 1 |
| 3 | 33 | 2 |
| 4 | 34 | 6 |
| 5 | 35 | 5 |
+----+-------+---------+
I tried to group the Key with the Test
result_df = df1.join(df2.groupby(['Key', 'Test']).size().rename('Count'), on='Key')
but this returns an error
ValueError: len(left_on) must equal the number of levels in the index of "right"

Check with crosstab
pd.crosstab(df2.Key,df2.Test).reindex(df1.Key).replace({0:''})

Here another solution with groupby & pivot. Using this solution you don't need df1 at all.
# | create some dummy data
tests = ['Test' + str(i) for i in range(1,7)]
df = pd.DataFrame({'Test': np.random.choice(tests, size=100), 'Key': np.random.randint(30, 35, size=100)})
df['Count Variable'] = 1
# | group & count aggregation
df = df.groupby(['Key', 'Test']).count()
df = df.pivot(index="Key", columns="Test", values="Count Variable").reset_index()

Related

Unpivoting data in excel that contains multiple (15) categories in columns using vba

I have code that unpivots columns into rows. There are 19 categories of data, 15 of which have been unpivoted. However, my problem is that some of the tables that are unpivoted are not showing up on the new rows. I am asking for anyone's expertise to help as this will be helpful for me in future endeavors. I have created a table. Bear in mind this time is extremely wide/long and has 131 columns I believe and only 7 rows. Below is the table of the original data (it is make believe data of course but will be used for real data in the future). Also, the 2nd table is the how I want it to look. The 3rd table is how I how it actually looks. Under that is my code. I will glady upvote anyone who will help me. Thank you in advance.
Original data:
| usr | Company | Dept.# | Dept1 | Dept2 | Dept3 | Dept4 | Hr1 | Tr1 | F1 | A1 | HOH1 | M1 | R1 | SO1 | BIG1 | T1 | P1 | X1 | Y1 | Z1 | Tin1 | Hr1 | Tr1 | F1 | A1 | HOH1 | M1 | R1 | SO1 | BIG1 | T1 | P1 | X1 | Y1 | Z1 | Tin1 | Hr1 | Tr1 | F1 | A1 | HOH1 | M1 | R1 | SO1 | BIG1 | T1 | P1 | X1 | Y1 | Z1 | Tin1 | Hr1 | Tr1 | F1 | A1 | HOH1 | M1 | R1 | SO1 | BIG1 | T1 | P1 | X1 | Y1 | Z1 | Tin1 | Hr2 | Tr2 | F2 | A2 | HOH2 | M2 | R2 | SO2 | BIG2 | T2 | P2 | X2 | Y2 | Z2 | Tin2 | Hr2 | Tr2 | F2 | A2 | HOH2 | M2 | R2 | SO2 | BIG2 | T2 | P2 | X2 | Y2 | Z2 | Tin2 | Hr2 | Tr2 | F2 | A2 | HOH2 | M2 | R2 | SO2 | BIG2 | T2 | P2 | X2 | Y2 | Z2 | Tin2 | Hr3 | Tr3 | F3 | A3 | HOH3 | M3 | R3 | SO3 | BIG3 | T3 | P3 | X2 | Y2 | Z2 | Tin2 | Hr3 | Tr3 | F3 | A3 | HOH3 | M3 | R3 | SO3 | BIG3 | T3 | P3 | X3 | Y3 | Z3 | Tin3 | Hr4 | Tr4 | F4 | A4 | HOH4 | M4 | R4 | SO4 | BIG4 | T4 | P4 | X4 | Y4 | Z4 | Tin4 |
|------|---------|--------|-------|-------|-------|-------|-----|-----|-----|-----|------|----|----|-----|------|----|-----|-----|-----|----|------|-----|-----|----|-----|------|----|-----|-----|------|----|-----|-----|-----|----|------|-----|-----|----|-----|------|----|----|-----|------|-----|----|----|----|-----|------|-----|-----|----|-----|------|----|----|-----|------|----|----|-----|-----|----|------|-----|-----|-----|-----|------|----|----|-----|------|----|----|-----|----|-----|------|-----|-----|----|-----|------|----|----|-----|------|----|----|----|----|-----|------|-----|-----|-----|-----|------|----|----|-----|------|----|----|----|-----|----|------|-----|-----|-----|-----|------|----|-----|-----|------|----|----|-----|-----|----|------|-----|-----|-----|-----|------|----|----|-----|------|----|----|-----|-----|-----|------|-----|-----|-----|-----|------|----|----|-----|------|----|-----|-----|-----|-----|------|
| xxxx | OS | 1 | Train | | | | 20 | 89 | 355 | 123 | 435 | 90 | 5 | 55 | 676 | 34 | 43 | 984 | 345 | 74 | 846 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
| xxxx | OPC | 2 | Poxy1 | Poxy2 | | | | | | | | | | | | | | | | | | 45 | 546 | 68 | 345 | 903 | 70 | 345 | 23 | 54 | 32 | 234 | 23 | 567 | 69 | 64 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 38 | 67 | 235 | 789 | 7 | 40 | 99 | 98 | 87 | 89 | 34 | 312 | 42 | 756 | 23 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
| xxxx | Oxy R | 4 | H1 | H2 | H3 | H4 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 22 | 36 | 13 | 678 | 64 | 40 | 34 | 239 | 76 | 87 | 34 | 999 | 965 | 34 | 93 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 89 | 54 | 761 | 765 | 9 | 20 | 22 | 65 | 78 | 98 | 78 | 75 | 354 | 23 | 23 | | | | | | | | | | | | | | | | 36 | 80 | 123 | 543 | 17 | 20 | 11 | 908 | 988 | 7 | 86 | 245 | 546 | 763 | 324 | 25 | 90 | 111 | 432 | 84 | 25 | 63 | 784 | 98 | 78 | 854 | 754 | 234 | 865 | 43 |
| xxxx | HPK | 3 | Test1 | Test2 | Test3 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 99 | 456 | 39 | 567 | 223 | 50 | 5 | 32 | 549 | 435 | 34 | 87 | 64 | 348 | 942 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 52 | 21 | 47 | 876 | 1 | 30 | 46 | 92 | 78 | 12 | 34 | 12 | 12 | 421 | 23 | | | | | | | | | | | | | | | | 90 | 76 | 773 | 654 | 49 | 10 | 223 | 982 | 566 | 23 | 54 | 786 | 356 | 73 | 654 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
| xxxx | Mano | 1 | Porp | | | | 42 | 657 | 645 | 234 | 344 | 80 | 45 | 364 | 97 | 23 | 634 | 34 | 23 | 87 | 84 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
| xxxx | Macro | 2 | Otto1 | Otto2 | | | | | | | | | | | | | | | | | | 75 | 574 | 46 | 456 | 453 | 60 | 44 | 235 | 867 | 5 | 433 | 234 | 346 | 46 | 35 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 23 | 433 | 186 | 987 | 2 | 30 | 34 | 58 | 87 | 43 | 34 | 23 | 62 | 73 | 32 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
How I want it to look:
| usr | Company | Dept# | Dept | Hrs | Tr | F | A | HOH | M | R | SO | BIG | T | P | X | Y | Z | Tin |
|------|---------|-------|-------|-----|-----|-----|-----|-----|----|-----|-----|-----|-----|-----|-----|-----|-----|-----|
| xxxx | OS | 1 | Train | 20 | 89 | 355 | 123 | 435 | 90 | 5 | 55 | 676 | 34 | 43 | 984 | 345 | 74 | 846 |
| xxxx | OPC | 2 | Poxy1 | 45 | 546 | 68 | 345 | 903 | 70 | 345 | 23 | 54 | 32 | 234 | 23 | 567 | 69 | 64 |
| xxxx | OPC | 2 | Poxy2 | 38 | 67 | 235 | 789 | 7 | 40 | 99 | 98 | 87 | 89 | 34 | 312 | 42 | 756 | 23 |
| xxxx | Oxy R | 4 | H1 | 22 | 36 | 13 | 678 | 64 | 40 | 34 | 239 | 76 | 87 | 34 | 999 | 965 | 34 | 93 |
| xxxx | Oxy R | 4 | H2 | 89 | 54 | 761 | 765 | 9 | 20 | 22 | 65 | 78 | 98 | 78 | 75 | 354 | 23 | 23 |
| xxxx | Oxy R | 4 | H3 | 36 | 80 | 123 | 543 | 17 | 20 | 11 | 908 | 988 | 7 | 86 | 245 | 546 | 763 | 324 |
| xxxx | Oxy R | 4 | H4 | 25 | 90 | 111 | 432 | 84 | 25 | 63 | 784 | 98 | 78 | 854 | 754 | 234 | 865 | 43 |
| xxxx | HPK | 3 | Test1 | 99 | 456 | 39 | 567 | 223 | 50 | 5 | 32 | 549 | 435 | 34 | 87 | 64 | 348 | 942 |
| xxxx | HPK | 3 | Test2 | 52 | 21 | 47 | 876 | 1 | 30 | 46 | 92 | 78 | 12 | 34 | 12 | 12 | 421 | 23 |
| xxxx | HPK | 3 | Test3 | 90 | 76 | 773 | 654 | 49 | 10 | 223 | 982 | 566 | 23 | 54 | 786 | 356 | 73 | 654 |
| xxxx | Mano | 1 | Porp | 42 | 657 | 645 | 234 | 344 | 80 | 45 | 364 | 97 | 23 | 634 | 34 | 23 | 87 | 84 |
| xxxx | Macro | 2 | Otto1 | 73 | 574 | 46 | 456 | 453 | 60 | 44 | 235 | 867 | 5 | 433 | 234 | 346 | 46 | 35 |
| xxxx | Macro | 2 | Otto2 | 23 | 433 | 186 | 987 | 2 | 30 | 34 | 58 | 87 | 43 | 34 | 23 | 62 | 73 | 32 |
This is how it actually looks which is wrong. As you can see data is missing for some reason:
| usr | Company | Dept# | Dept | Hrs | Tr | F | A | HOH | M | R | SO | BIG | T | P | X | Y | Z | Tin |
|------|---------|-------|-------|-----|-----|-----|-----|-----|----|-----|-----|-----|-----|-----|-----|-----|-----|-----|
| xxxx | OS | 1 | Train | 20 | 89 | 355 | 123 | 435 | 90 | 5 | 55 | 676 | 34 | 43 | 984 | 345 | 74 | 846 |
| xxxx | OPC | 2 | Poxy1 | 45 | 546 | 68 | 345 | 903 | 70 | 345 | 23 | 54 | 32 | 234 | 23 | 567 | 69 | 64 |
| xxxx | OPC | 2 | Poxy2 | 38 | 67 | 235 | 789 | 7 | 40 | 99 | 98 | 87 | 89 | 34 | 312 | 42 | 756 | 23 |
| xxxx | Oxy R | 4 | H1 | 22 | 36 | 13 | 678 | 64 | 40 | 34 | 239 | 76 | 87 | 34 | 999 | 965 | 34 | 93 |
| xxxx | Oxy R | 4 | H2 | 89 | 54 | 761 | 765 | 9 | 20 | 22 | 65 | 78 | 98 | 78 | 75 | 354 | 23 | 23 |
| xxxx | Oxy R | 4 | H3 | | | | | | | | | | | | | | | |
| xxxx | Oxy R | 4 | H4 | | | | | | | | | | | | | | | |
| xxxx | HPK | 3 | Test1 | 99 | 456 | 39 | 567 | 223 | 50 | 5 | 32 | 549 | 435 | 34 | 87 | 64 | 348 | 942 |
| xxxx | HPK | 3 | Test2 | 52 | 21 | 47 | 876 | 1 | 30 | 46 | 92 | 78 | 12 | 34 | 12 | 12 | 421 | 23 |
| xxxx | HPK | 3 | Test3 | | | | | | | | | | | | | | | |
| xxxx | Mano | 1 | Porp | 42 | 657 | 645 | 234 | 344 | 80 | 45 | 364 | 97 | 23 | 634 | 34 | 23 | 87 | 84 |
| xxxx | Macro | 2 | Otto1 | 73 | 574 | 46 | 456 | 453 | 60 | 44 | 235 | 867 | 5 | 433 | 234 | 346 | 46 | 35 |
| xxxx | Macro | 2 | Otto2 | 23 | 433 | 186 | 987 | 2 | 30 | 34 | 58 | 87 | 43 | 34 | 23 | 62 | 73 | 32 |
Here is my code:
Sub buttonclick()
Dim Ary As Variant, Nary As Variant, Cary As Variant
Dim r As Long, c As Long, nr As Long, cc As Long
Cary = Array("0853", 6898, 113128, 143143)
With Sheets("Sheet1")
Ary = .Range("A2:DM" & .Range("A" & Rows.Count).End(xlUp).Row).Value2
End With
ReDim Nary(1 To UBound(Ary) * 4, 1 To 19)
For r = 1 To UBound(Ary)
For c = 4 To 7
If Ary(r, c) = "" Then Exit For
nr = nr + 1
Nary(nr, 1) = Ary(r, 1): Nary(nr, 2) = Ary(r, 2): Nary(nr, 3) = Ary(r, 3)
Nary(nr, 4) = Ary(r, c)
For cc = Left(Cary(c - 4), 2) To Right(Cary(c - 4), 2) Step 15
Nary(nr, 5) = Nary(nr, 5) & Ary(r, cc)
Nary(nr, 6) = Nary(nr, 6) & Ary(r, cc + 1)
Nary(nr, 7) = Nary(nr, 7) & Ary(r, cc + 2)
Nary(nr, 8) = Nary(nr, 8) & Ary(r, cc + 3)
Nary(nr, 9) = Nary(nr, 9) & Ary(r, cc + 4)
Nary(nr, 10) = Nary(nr, 10) & Ary(r, cc + 5)
Nary(nr, 11) = Nary(nr, 11) & Ary(r, cc + 6)
Nary(nr, 12) = Nary(nr, 12) & Ary(r, cc + 7)
Nary(nr, 13) = Nary(nr, 13) & Ary(r, cc + 8)
Nary(nr, 14) = Nary(nr, 14) & Ary(r, cc + 9)
Nary(nr, 15) = Nary(nr, 15) & Ary(r, cc + 10)
Nary(nr, 16) = Nary(nr, 16) & Ary(r, cc + 11)
Nary(nr, 17) = Nary(nr, 17) & Ary(r, cc + 12)
Nary(nr, 18) = Nary(nr, 18) & Ary(r, cc + 13)
Nary(nr, 19) = Nary(nr, 19) & Ary(r, cc + 14)
Next cc
Next c
Next r
With Sheets("Sheet2")
.UsedRange.ClearContents
.Range("A1").Resize(, 19).Value = Array("usr", "Company", "Dept.#", "Dept", "Hrs",
"Tr", "F", "A", "HOH", "M", "R", "SO", "BIG", _
"T", "P", "X", "Y", "Z", "Tin")
.Range("A2").Resize(nr, 19).Value = Nary
End With
End Sub

How to create a calculated column in access 2013 to detect duplicates

I'm recreating a tool I made in Excel as it's getting bigger and performance is getting out of hand.
The issue is that I only have MS Access 2013 on my work laptop and I'm fairly new to the Expression Builder in Access 2013, which has a very limited function base to be honest.
My data has duplicates on the [Location] column, meaning that, I have multiple SKUs on that warehouse location. However, some of my calculations need to be done only once per [Location]. My solution to that, in Excel, was to make a formula (see below) putting 1 only on the first appearance of that location, putting 0 on next appearances. Doing that works like a charm because summing over that [Duplicate] column while imposing multiple criteria returns the number of occurrences of the multiple criteria counting locations only once.
Now, MS Access 2013 Expression Builder has no SUM nor COUNT functions to create a calculated column emulating my [Duplicate] column from Excel. Preferably, I would just input the raw data and let Access populate the calculated fields vs also inputting the calculated fields as well, since that would defeat my original purpose of reducing the computational cost of creating my dashboard.
The question is, how would you create a calculated column, in MS Access 2013 Expression Builder to recreate the below Excel function:
= IF($D$2:$D3=$D4,0,1)
In the sake of reducing the file size (over 100K rows) I even replace the 0 by a blank character "".
Thanks in advance for your help
Y

First and foremost, understand MS Access' Expression Builder is a convenience tool to build an SQL expression. Everything in Query Design ultimately is to build an SQL query. For this reason, you have to use a set-based mentality to see data in whole sets of related tables and not cell-by-cell mindset.
Specifically, to achieve:
putting 1 only on the first appearance of that location, putting 0 on next appearances
Consider a whole set-based approach by joining on a separate, aggregate query to identify the first value of your needed grouping, then calculate needed IIF expression. Below assumes you have an autonumber or primary key field in table (a standard in relational databases):
Aggregate Query (save as a separate query, adjust columns as needed)
SELECT ColumnD, MIN(AutoNumberID) As MinID
FROM myTable
GROUP BY ColumnD
Final Query (join to original table and build final IIF expression)
SELECT m.*, IIF(agg.MinID = AutoNumberID, 1, 0) As Dup_Indicator
FROM myTable m
INNER JOIN myAggregateQuery agg
ON m.[ColumnD] = agg.ColumnD
To demonstrate with random data:
Original
| ID | GROUP | INT | NUM | CHAR | BOOL | DATE |
|----|--------|-----|--------------|------|-------|------------|
| 1 | r | 9 | 1.424490258 | B6z | TRUE | 7/4/1994 |
| 2 | stata | 10 | 2.591235683 | h7J | FALSE | 10/5/1971 |
| 3 | spss | 6 | 0.560461966 | Hrn | TRUE | 11/27/1990 |
| 4 | stata | 10 | -1.499272175 | eXL | FALSE | 4/17/2010 |
| 5 | stata | 15 | 1.470269177 | Vas | TRUE | 6/13/2010 |
| 6 | r | 14 | -0.072238898 | puP | TRUE | 4/1/1994 |
| 7 | julia | 2 | -1.370405263 | S2l | FALSE | 12/11/1999 |
| 8 | spss | 6 | -0.153684675 | mAw | FALSE | 7/28/1977 |
| 9 | spss | 10 | -0.861482674 | cxC | FALSE | 7/17/1994 |
| 10 | spss | 2 | -0.817222582 | GRn | FALSE | 10/19/2012 |
| 11 | stata | 2 | 0.949287754 | xgc | TRUE | 1/18/2003 |
| 12 | stata | 5 | -1.580841322 | Y1D | TRUE | 6/3/2011 |
| 13 | r | 14 | -1.671303816 | JCP | FALSE | 5/15/1981 |
| 14 | r | 7 | 0.904181025 | Rct | TRUE | 7/24/1977 |
| 15 | stata | 10 | -1.198211174 | qJY | FALSE | 5/6/1982 |
| 16 | julia | 10 | -0.265808162 | 10s | FALSE | 3/18/1975 |
| 17 | r | 13 | -0.264955027 | 8Md | TRUE | 6/11/1974 |
| 18 | r | 4 | 0.518302149 | 4KW | FALSE | 9/12/1980 |
| 19 | r | 5 | -0.053620183 | 8An | FALSE | 4/17/2004 |
| 20 | r | 14 | -0.359197116 | F8Q | TRUE | 6/14/2005 |
| 21 | spss | 11 | -2.211875193 | AgS | TRUE | 4/11/1973 |
| 22 | stata | 4 | -1.718749471 | Zqr | FALSE | 2/20/1999 |
| 23 | python | 10 | 1.207878576 | tcC | FALSE | 4/18/2008 |
| 24 | stata | 11 | 0.548902226 | PFJ | TRUE | 9/20/1994 |
| 25 | stata | 6 | 1.479125922 | 7a7 | FALSE | 3/2/1989 |
| 26 | python | 10 | -0.437245299 | r32 | TRUE | 6/7/1997 |
| 27 | sas | 14 | 0.404746106 | 6NJ | TRUE | 9/23/2013 |
| 28 | stata | 8 | 2.206741458 | Ive | TRUE | 5/26/2008 |
| 29 | spss | 12 | -0.470694096 | dPS | TRUE | 5/4/1983 |
| 30 | sas | 15 | -0.57169507 | yle | TRUE | 6/20/1979 |
SQL (uses aggregate in subquery but can be a stored query)
SELECT r.*, IIF(sub.MinID = r.ID,1, 0) AS Dup
FROM Random_Data r
LEFT JOIN
(
SELECT r.GROUP, MIN(r.ID) As MinID
FROM Random_Data r
GROUP BY r.Group
) sub
ON r.[Group] = sub.[GROUP]
Output (notice the first GROUP value is tagged 1, all else 0)
| ID | GROUP | INT | NUM | CHAR | BOOL | DATE | Dup |
|----|--------|-----|--------------|------|-------|------------|-----|
| 1 | r | 9 | 1.424490258 | B6z | TRUE | 7/4/1994 | 1 |
| 2 | stata | 10 | 2.591235683 | h7J | FALSE | 10/5/1971 | 1 |
| 3 | spss | 6 | 0.560461966 | Hrn | TRUE | 11/27/1990 | 1 |
| 4 | stata | 10 | -1.499272175 | eXL | FALSE | 4/17/2010 | 0 |
| 5 | stata | 15 | 1.470269177 | Vas | TRUE | 6/13/2010 | 0 |
| 6 | r | 14 | -0.072238898 | puP | TRUE | 4/1/1994 | 0 |
| 7 | julia | 2 | -1.370405263 | S2l | FALSE | 12/11/1999 | 1 |
| 8 | spss | 6 | -0.153684675 | mAw | FALSE | 7/28/1977 | 0 |
| 9 | spss | 10 | -0.861482674 | cxC | FALSE | 7/17/1994 | 0 |
| 10 | spss | 2 | -0.817222582 | GRn | FALSE | 10/19/2012 | 0 |
| 11 | stata | 2 | 0.949287754 | xgc | TRUE | 1/18/2003 | 0 |
| 12 | stata | 5 | -1.580841322 | Y1D | TRUE | 6/3/2011 | 0 |
| 13 | r | 14 | -1.671303816 | JCP | FALSE | 5/15/1981 | 0 |
| 14 | r | 7 | 0.904181025 | Rct | TRUE | 7/24/1977 | 0 |
| 15 | stata | 10 | -1.198211174 | qJY | FALSE | 5/6/1982 | 0 |
| 16 | julia | 10 | -0.265808162 | 10s | FALSE | 3/18/1975 | 0 |
| 17 | r | 13 | -0.264955027 | 8Md | TRUE | 6/11/1974 | 0 |
| 18 | r | 4 | 0.518302149 | 4KW | FALSE | 9/12/1980 | 0 |
| 19 | r | 5 | -0.053620183 | 8An | FALSE | 4/17/2004 | 0 |
| 20 | r | 14 | -0.359197116 | F8Q | TRUE | 6/14/2005 | 0 |
| 21 | spss | 11 | -2.211875193 | AgS | TRUE | 4/11/1973 | 0 |
| 22 | stata | 4 | -1.718749471 | Zqr | FALSE | 2/20/1999 | 0 |
| 23 | python | 10 | 1.207878576 | tcC | FALSE | 4/18/2008 | 1 |
| 24 | stata | 11 | 0.548902226 | PFJ | TRUE | 9/20/1994 | 0 |
| 25 | stata | 6 | 1.479125922 | 7a7 | FALSE | 3/2/1989 | 0 |
| 26 | python | 10 | -0.437245299 | r32 | TRUE | 6/7/1997 | 0 |
| 27 | sas | 14 | 0.404746106 | 6NJ | TRUE | 9/23/2013 | 1 |
| 28 | stata | 8 | 2.206741458 | Ive | TRUE | 5/26/2008 | 0 |
| 29 | spss | 12 | -0.470694096 | dPS | TRUE | 5/4/1983 | 0 |
| 30 | sas | 15 | -0.57169507 | yle | TRUE | 6/20/1979 | 0 |

SIGN() formula returns unexpected results

In continuation of my previous question: Sumproduct with multiple criteria on one range
Jeeped provided me with an very helpful formula to achieve a sumproduct() which takes multiple criteria. My current case is however a bit broader:
Take these example tables:
First column is the ID number, second column a respondent group(A,B). Column headers are question types (X,Y,Z).
Table Q1
| | | X | Y | Y | Z | Y |
|----|---|---|---|---|---|---|
| 1 | A | 2 | 2 | 1 | | 1 |
| 2 | A | 1 | 1 | | | 2 |
| 3 | A | 1 | 1 | | | 1 |
| 4 | A | 2 | 1 | | | 1 |
| 5 | A | 1 | 2 | 1 | | 1 |
| 6 | A | 1 | 1 | | | 1 |
| 7 | A | | | | | |
| 8 | A | | | | | |
| 9 | A | 1 | 1 | | | 1 |
| 10 | A | 2 | 2 | 2 | | 2 |
| 11 | A | | | | | |
| 12 | A | 1 | 2 | 1 | | 2 |
| 13 | B | | | | | |
| 14 | B | 1 | 1 | | | 1 |
| 15 | B | 2 | 2 | 1 | | 1 |
Table Q2
| | | X | Y | Y | Z | Y |
|----|---|---|---|---|---|---|
| 1 | A | 1 | 2 | 1 | | 1 |
| 2 | A | 1 | 1 | | | 1 |
| 3 | A | 1 | 1 | | | 1 |
| 4 | A | 1 | 1 | | | 1 |
| 5 | A | 1 | 1 | | | 1 |
| 6 | A | 1 | 1 | | | 1 |
| 7 | A | | | | | |
| 8 | A | | | | | |
| 9 | A | 1 | 1 | | | 1 |
| 10 | A | 1 | 1 | | | 1 |
| 11 | A | | | | | |
| 12 | A | 1 | 2 | 1 | | 1 |
| 13 | B | | | | | |
| 14 | B | 1 | 1 | | | 1 |
| 15 | B | 1 | 2 | 1 | | 1 |
Now I want to know the amount of times a respondent answered 1 (yes) on Q2 for each question type (X,Y,Z). The catch is that if someone answered 1 (yes) on Q1 it should "override" the answer on Q2, as we assume that when someone answers yes on Q1 (implementation of a measure), their answer on Q2 (knowledge of said measure) has to be yes as well.
The second catch is that for the first two occurrences of Y there can only be yes in one of both columns, so in fact there can only be two yes answers for question type Y for each respondent.
I used the following formula (on sheet 3): =SUMPRODUCT(SIGN(('Q1'!$C$2:$G$16=1)+('Q2'!$C$2:$G$16=1))*('Q2'!$B$2:$B$16=Blad3!$D5)*('Q2'!$C$1:$G$1=Blad3!E$4)) to obtain the following results.
| | X | Y | Z |
|---|---|----|---|
| A | 9 | 19 | 0 |
| B | 2 | 4 | 0 |
For X these results are correct, as there are 9 1's in table Q2.
For Y the results for B are correct, for A however they are not, as there are only 9 respondents, answering max 2 questions would result in a max of 18, we have 19 however.

It turns out there is nothing wrong with the formula, just that it isn't suited for the way this data is organised. If you look at row 5:
Q1
| | | X | Y | Y | Z | Y |
|----|---|---|---|---|---|---|
| 5 | A | 1 | 2 | 1 | | 1 |
Q2
| | | X | Y | Y | Z | Y |
|----|---|---|---|---|---|---|
| 5 | A | 1 | 1 | | | 1 |
If we condense that to everywhere there is a 1 in any of the Y column we get this table:
| | | X | Y | Y | Z | Y |
|----|---|---|---|---|---|---|
| 5 | A | | 1 | 1 | | 1 |
When I ask for the sumproduct() for this combined table the result will be 3.
To prevent this I added a helper column (between the two Y and the Z column) to my tables, with the following formula: IF(OR(D1=1,E1=1),1,""). Removed the headers from the double Y columns, and re-running the query produced the correct results.
New table Q1 looks like this then:
| | | X | | | Y | Z | Y |
|----|---|---|---|---|---|---|---|
| 1 | A | 2 | 2 | 1 | 1 | | 1 |
| 2 | A | 1 | 1 | | 1 | | 2 |
| 3 | A | 1 | 1 | | 1 | | 1 |
| 4 | A | 2 | 1 | | 1 | | 1 |
| 5 | A | 1 | 2 | 1 | 1 | | 1 |
| 6 | A | 1 | 1 | | 1 | | 1 |
| 7 | A | | | | | | |
| 8 | A | | | | | | |
| 9 | A | 1 | 1 | | 1 | | 1 |
| 10 | A | 2 | 2 | 2 | | | 2 |
| 11 | A | | | | | | |
| 12 | A | 1 | 2 | 1 | 1 | | 2 |
| 13 | B | | | | | | |
| 14 | B | 1 | 1 | | 1 | | 1 |
| 15 | B | 2 | 2 | 1 | 1 | | 1 |

How to add space between rows and sum up automatically in Excel

let's say that I have a table like the below:
| | Value 1 | Value 2 | Value 3 | |
|---|---------|---------|---------|---|
| A | 22 | 12 | 3 | |
| A | 5 | 6 | 12 | |
| A | 19 | 9 | 13 | |
| A | 22 | 43 | 31 | |
| B | 7 | 12 | 23 | |
| B | 5 | 5 | 8 | |
| B | 35 | 78 | 9 | |
| B | 45 | 1 | 8 | |
| C | 34 | 56 | 0 | |
| C | 22 | 1 | 14 | |
| C | 13 | 46 | 45 | |
and that I'd need to transform it into the below:
| | Value 1 | Value 2 | Value 3 | |
|---|---------|---------|---------|---|
| A | 22 | 12 | 3 | |
| A | 5 | 6 | 12 | |
| A | 19 | 9 | 13 | |
| A | 22 | 43 | 31 | |
| | 68 | 70 | 59 | |
| | | | | |
| B | 7 | 12 | 23 | |
| B | 5 | 5 | 8 | |
| B | 35 | 78 | 9 | |
| B | 45 | 1 | 8 | |
| | 92 | 96 | 48 | |
| | | | | |
| C | 34 | 56 | 0 | |
| C | 22 | 1 | 14 | |
| C | 13 | 46 | 45 | |
| | 69 | 103 | 59 | |
How could I obtain the desired effect automatically?
There would be n empty rows after each group and the sums of each column within the group.

You can use the Subtotal feature of Excel. Subtotal is in the "Data" tab of the ribbon. To automatically add the totals between groupings. I don't think it adds the blank row. If you absolutely need the blank row, then I can generate some VBA that will work.

excel I need formula in column name "FEBRUARY"

I have a set of data as below.
SHEET 1
+------+-------+
| JANUARY |
+------+-------+
+----+----------+------+-------+
| ID | NAME |COUNT | PRICE |
+----+----------+------+-------+
| 1 | ALFRED | 11 | 150 |
| 2 | ARIS | 22 | 120 |
| 3 | JOHN | 33 | 170 |
| 4 | CHRIS | 22 | 190 |
| 5 | JOE | 55 | 120 |
| 6 | ACE | 11 | 200 |
+----+----------+------+-------+
SHEET2
+----+----------+------+-------+
| ID | NAME |COUNT | PRICE |
+----+----------+------+-------+
| 1 | CHRIS | 13 | 123 |
| 2 | ACE | 26 | 165 |
| 3 | JOE | 39 | 178 |
| 4 | ALFRED | 21 | 198 |
| 5 | JOHN | 58 | 112 |
| 6 | ARIS | 11 | 200 |
+----+----------+------+-------+
The RESULT should look like this in sheet1 :
+------+-------++------+-------+
| JANUARY | FEBRUARY |
+------+-------++------+-------+
+----+----------+------+-------++-------+-------+
| ID | NAME |COUNT | PRICE || COUNT | PRICE |
+----+----------+------+-------++-------+-------+
| 1 | ALFRED | 11 | 150 || 21 | 198 |
| 2 | ARIS | 22 | 120 || 11 | 200 |
| 3 | JOHN | 33 | 170 || 58 | 112 |
| 4 | CHRIS | 22 | 190 || 13 | 123 |
| 5 | JOE | 55 | 120 || 39 | 178 |
| 6 | ACE | 11 | 200 || 26 | 165 |
+----+----------+------+-------++-------+-------+
I need formula in column name "FEBRUARY". this formula will find its match in sheet 2

Assuming the first Count value should go in cell E3 of Sheet1, the following formula would be the usual way of doing it:-
=INDEX(Sheet2!C:C,MATCH($B3,Sheet2!$B:$B,0))
Then the Price (in F3) would be given by
=INDEX(Sheet2!D:D,MATCH($B3,Sheet2!$B:$B,0))

I think this query will work fine for your requirement
SELECT `Sheet1$`.ID,`Sheet1$`.NAME, `Sheet1$`.COUNT AS 'Jan-COUNT',`Sheet1$`.PRICE AS 'Jan-PRICE', `Sheet2$`.COUNT AS 'Feb-COUNT',`Sheet2$`.PRICE AS 'Feb-PRICE'
FROM `C:\Users\Nagendra\Desktop\aaaaa.xlsx`.`Sheet1$` `Sheet1$`, `C:\Users\Nagendra\Desktop\aaaaa.xlsx`.`Sheet2$` `Sheet2$`
WHERE (`Sheet1$`.NAME=`Sheet2$`.NAME)
Provide Actual path insted of
C:\Users\Nagendra\Desktop\aaaaa.xlsx
First you need to know about how to make connection. So refer http://smallbusiness.chron.com/use-sql-statements-ms-excel-41193.html

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Count specific value for IDs in two dataframes - python-3.x

Check with crosstab pd.crosstab(df2.Key,df2.Test).reindex(df1.Key).replace({0:''})

Related

Unpivoting data in excel that contains multiple (15) categories in columns using vba

How to create a calculated column in access 2013 to detect duplicates

SIGN() formula returns unexpected results

How to add space between rows and sum up automatically in Excel

excel I need formula in column name "FEBRUARY"

Categories

Resources