Transpose data based on two columns and criteria: Excel - excel

I have data that looks like this:
Group: Class: Value:
A 1 51
A 2 60
B 1 55
B 2 67
B 3 70
C 1 53
C 3 65
Need the data to look like this:
Group: 1: 2: 3:
A 51 60 0
B 55 67 70
C 53 0 65
The code I am trying is doing two things wrong: 1. Skipping rows 2. not matching value to class column which causes an issue for Group C since it puts the 65 value into class 2 not and not in class 3 for the final row (row 3 in this example).
=IFERROR(IF(AND($B2=1,COLUMN()<3+MATCH($B2,$B3:$B11000,0)),OFFSET(B2,COLUMN()-3,2-COLUMN()),""),"")

This has worked for me.Hope it helps!
=SUMPRODUCT(($A$2:$A$8=$E2)*($B$2:$B$8=COLUMN(A:A))*$C$2:$C$8)
If the C column format is text, let's try:
=IFERROR(OFFSET($C$2,MATCH(1,INDEX(($A$2:$A$8=$E2)*($B$2:$B$8=COLUMN(A$1)),),0)-1,),"")

I think it's more efficient to use PivotTable for such tasks:

Related

Excel MERGE two tables

I have SET 1
CLASS
Student
TEST
SCORE
A
1
1
46
A
1
2
50
A
1
3
45
A
2
1
45
A
2
2
47
A
2
3
31
A
3
1
34
A
3
2
45
B
1
1
36
B
2
1
31
B
2
2
41
B
3
1
50
C
1
1
42
C
3
1
31
and SET 2
CLASS
SIZE
YEARS
A
39
7
B
20
12
C
31
6
and wish to COMBINE to make SET 3
CLASS
STUDENT
TEST
SCORE
SIZE
YEARS
A
1
1
46
39
7
A
1
2
50
39
7
A
1
3
45
39
7
A
2
1
45
39
7
A
2
2
47
39
7
A
2
3
31
39
7
A
3
1
34
39
7
A
3
2
45
39
7
B
1
1
36
20
12
B
2
1
31
20
12
B
2
2
41
20
12
B
3
1
50
20
12
C
1
1
42
31
6
C
3
1
31
31
6
so basically add the SIZE and YEARS columns from SET 2 and merge on CLASS onto SET 1. In excel how you can do this? I need to match on CLASS
Define both sets as tables and “left join” in PowerQuery. There you can choose the columns of the resulting table.
https://learn.microsoft.com/en-us/power-query/merge-queries-left-outer
If you have Set 1 on the top left of a worksheet "Set1" and Set 2 on the top left of a worksheet "Set2", then you can use the formula
=VLOOKUP(A2;'Set2'!$A$2:$C$4;2;FALSE), where $A$2:$C$4 is the range of Set2, and A2 is the class value from Set1, which is what is used to do the lookup in Set2. The next argument, 2, means to take the second row from Set2, and the FALSE at the end means that you only want exact matches on the CLASS. You can do auto-fill with this formula, and do similar steps for the years. If you look up the help for VLOOKUP within Excel, that should help you to understand how it works.
Your first set of data is essentially your primary set of data that you just want to add attribute columns to. I built this example on Google Sheets which should help explain. Using spill formulas, only a few cells are needed with their own formulas. You can see them as they are highlighted in yellow. When you use in Excel, obviously make sure you change the column references, but this would get you the answer.
Note you have to have SpillRange in Excel for this to work. To test, see if you have the formula =unique()
This solution may work for you if both sets start in the same column. As example in my image, both of them start at column A. You can get all data with a single VLOOKUP formula:
Formula in cell E2 is:
=VLOOKUP($A2;$A$22:$R$25;COLUMN($B22);FALSE)
Notice the mixed references at first and third argument and absolute references in the second one. Third argument is critical, because is the relational position between both sets, that's the reason it's easier if both sets start at same column. If not, you'll need to adjust this argument substracting or adding, depending on the case.
Anyways, with a single formula, you can get any number of columns. The only disavantage of this formula is that you need to manually drag to right until you got all the columns (10, 30 or whatever). You'll notice you are done because the formula will raise an error:
This error means you are trying to get a referenced outside of your column area.

How to check if a value in a column is found in a list in a column, with Spark SQL?

I have a delta table A as shown below.
point
cluster
points_in_cluster
37
1
[37,32]
45
2
[45,67,84]
67
2
[45,67,84]
84
2
[45,67,84]
32
1
[37,32]
Also I have a table B as shown below.
id
point
101
37
102
67
103
84
I want a query like the following. Here in obviously doesn't work for a list. So, what would be the right syntax?
select b.id, a.point
from A a, B b
where b.point in a.points_in_cluster
As a result I should have a table like the following
id
point
101
37
101
32
102
45
102
67
102
84
103
45
103
67
103
84
Based on your data sample, I'd do an equi-join on point column and then an explode on points_in_cluster :
from pyspark.sql import functions as F
# assuming A is df_A and B is df_B
df_A.join(
df_B,
on="point"
).select(
"id",
F.explode("points_in_cluster").alias("point")
)
Otherwise, you use array_contains:
select b.id, a.point
from A a, B b
where array_contains(a.points_in_cluster, b.point)

IF formula too long

I’m trying to create a formula that will display mileage from one place to another.
Example: column one is location combinations (there are 39 locations and multiple combinations)
Eg-sams to Petes, sams to mc d, mc d to sams etc.
Last column with formula would automatically place mileage from point a to point b. Etc
The formula i created was IF but way too long
=IF(B12="SVES TO KHS",11,
IF(B12="SVES TO FRHS",4.1,
IF(B12="SVES TO CHS",6.9,
IF(B12="SVES TO KMS",9.5,
IF(B12="SVES TO ISM",6.2,
IF(B12="SVES TO HM",5.3,
IF(B12="SVES TO FHM",2.4,
IF(B12="SVES TO TSM",7.6,...
Is there a way to shorten the formula?
Best thing to do is create a separate table on sheet 2, in column a have a list of answers "SVES TO ", column b the miles. Then use a vlookup to find the miles
=Vlookup (b12, sheet 2!'a1:b50,2,0)
In this example there are 50 different SVES TO examples, change it to however many you have.
An X-to-Y/Y-to-X distance matrix should be your best bet. While VLOOKUP is good for a one-column-lookup/one-column-retrieval, an INDEX/MATCH/MATCH would be more appropriate for a true matrix.
Assume the folowwing data matrix with destinations along the first row and the first column.
a b c d e f g
a - 40 80 17 37 16 70
b 40 - 48 95 85 8 60
c 80 48 - 24 26 75 73
d 17 95 24 - 14 9 56
e 37 85 26 14 - 91 7
f 16 8 75 9 91 - 78
g 70 60 73 56 7 78 -
Note that distances like c-to-f are the same as f-to-c. (yes, there i a simple formula for this but that is another question). Obviously, any x-to-x or y-to-y should be zero when x = y.
In the sample image below your formula in L2 should be,
=INDEX($B$2:$H$8, MATCH(J2, A$2:A$8, 0), MATCH(K2, B$1:H$1, 0))
The cell and row highlighting were added with a couple simple conditional formats formulas.

Excel Rank Multiple Columns

I'm facing a issue with ranking in Excel particularly in regards to tie breaking. I tried several options but i guess they don't fit my issue. Its quite simple really, I'll explain:
The Data:
1 2 3 4 5 6 7 8 9 10
87 83 74 95 69 90 73 0 74 85
121 121 96 121 121 121 121 83 121 121
As you can see its easy for me to rank the first line (I'm working in columns instead of rows for the data). When i do a Rank Function gives the following result:
3 5 6 1 9 2 8 10 6 4
Which is correct.
The problem arises in the second line. There are ties because all of them reach the maximum of 121:
1 1 9 1 1 1 1 10 1 1
What i would like to do is take the first row as a tie breaker. So even if there is a tie the first line which was firstly text but now is a sequence from 1 to 10 could provide as secondary criteria to order the rank, thus giving the following ranking line:
1 2 9 3 4 5 6 10 7 9
Could one achieve this result?
Thank You very much in advance.
You need a helper row to break the tie. You can add a fraction of the first row to the second row to create a new row & use the new row to rank
A4 = A3+(A2/(MAX($A$2:$J$2)+1))
Using the MAX I ensure the fraction is less than 1 which is adequate to break ties in this case.
A6 = RANK(A4,$A$4:$J$4)
You can hide the helper row if you dont want to show it.

Return a Value from the first Column matching Max()

I have a table:
A B C D
2 10 70 45
2 20 80 55
3 30 90 65
3 40 15 76
4 50 25 85
4 60 35 95
I want to get the maximum from the array B1:D6 which is 95 and return
the value of the first column A which is 4
Insert a new column before column A and put the following formula in cell A2 and drag down:
=MAX(C2:E2)
Then put the following formula in cell H2:
=VLOOKUP(MAX(A2:A7), A2:B7, 2, FALSE)
Result:

Resources