Get Unique Value list in excel - excel

I am quite novice at Excel formulas, so could anyone help me
I have elements as follows
+----+---------+------------+
| A | B | C |
| nr | car | model |
| 1 | Ford | Mustang |
| 2 | Ford | Focus |
| 3 | Ford | Focus |
| 4 | Ferrari | 458 |
| 5 | Ferrari | Testarossa |
+----+---------+------------+
How could I get the results as follows
+---+---------+-----------------+
| 1 | Ford | Mustang, Focus |
| 2 | Ferrari | 458, Testarossa |
+---+---------+-----------------+
Where every value is unique. I Tried Vlookup, but it only returns 1 element with 1 value. Example: VLookup with Ford would return only the first result Mustang, but not Focus
If possible, please use only Formulas :)
I've seen similar questions but no answers

Now if you don't mind them being in different cells then:
=IFERROR(INDEX($C$2:$C$6,MATCH(1,IF(($B$2:$B$6=$F1)*(COUNTIF($F1:F1,$C$2:$C$6)=0),1,0),0)),"")
This is an array formula and as such must be confirmed with Ctrl-Shift-Enter instead of Enter or Tab to exit edit mode.

This can be done in an extra step. As in you will need an intermediate table.
So in my example, I take the data you have given (which seems sorted for Column car), but assume that it is not sorted (unsorted data in the screenshot).
Step 1:
Unsorted data is sorted, and table formed in rows 9-14. Add a column in D9, and in cell D10 put this formula:
=IF(B10=B9,CONCATENATE(VLOOKUP(B9,B9:D10,3,0),", ",C10),C10)
What this formula does is it looks up and concatenates values at the same time.
I find no of unique entries in the table in rows 9 and 14 with a formula in row 16. (This is just for convenience)
B16 =SUMPRODUCT(1/COUNTIF($B$10:$B$14,$B$10:$B$14))
Step 2:
In rows 18-20 (the no of unique entries kind of gave an idea how big this table will be),
B19=INDEX($B$10:$B$14,MATCH(0,INDEX(COUNTIF($B$18:B18,$B$10:$B$14),0,0),0))
This formula lists out unique car names in colB. In colC, you LOOKUP the last concatenated value for a car name (because we had kept on concatenating earlier, the last entry for a car name would have your result)
C19=LOOKUP(2,1/($B$9:$B$14=B19),$D$9:$D$14)
That should give you the resultant table! (Have pasted all of them as values in colG onwards so that you see the output)

Put this in D2
=IF(B2<>B1,C2,IF(C2<>C1,IF(D1="",C2,D1&", "&C2),""))
Drag down
nr car model ColD
1 Ford Mustang Mustang
2 Ford Focus Mustang, Focus
3 Ford Focus
4 Ferrari 458 458
5 Ferrari Testarossa 458, Testarossa
first occurrence of the last model per car has the data you want.
This would be a lot easier if you could kill column A and use Excel's remove dupes function.
If you want to do it with code, this will work:
Sub ConcatByMasterColumn()
Dim X As Long, MyString As String
Range("F1:G1").Formula = Array("Car", "Model")
For X = 2 To Range("A" & Rows.Count).End(xlUp).Row + 1
If Range("B" & X) <> Range("B" & X - 1) And X > 2 Then 'MyString = Range("B" & X).Text
Range("F" & Range("F" & Rows.Count).End(xlUp).Offset(1, 0).Row).Formula = Range("B" & X - 1).Text
Range("G" & Range("G" & Rows.Count).End(xlUp).Offset(1, 0).Row).Formula = Right(MyString, Len(MyString) - 2)
MyString = ""
End If
If Range("C" & X) <> Range("C" & X - 1) Then MyString = MyString & ", " & Range("C" & X).Text
Next
End Sub
Results:
Car Model
Ford Mustang, Focus
Ferrari 458, Testarossa

Related

Spreadsheet - Im trying to find the sum of the top n numbers in two columns

-UPDATED- Answered, thanks for all who helped.
Consider the following Google spreadsheet:
A B C D E
1 John | Bob | Sue | Tony
2 h1 2 | 1 | 3 | 2
3 h2 3 | 3 | 4 | 2
4 h3 1 | 2 | 1 | 3
5 h4 2 | 2 | 3 | 1
6 h5 2 | 1 | 1 | 3
7 h6 1 | 2 | 2 | 1
8 h7 1 | 2 | 1 | 3
Team | Player1 | Player 2 | Score
1 | John | Sue | ?
2 | Bob | Tony | ?
Each team is made of two partners, e.g. John and Sue. Each row contains a match: the team's score is the best of each member's. The team total score of the game is the sum of the match scores.
In the example:
Team 1 : John & Sue. Match scores: (3,4,1,3,2,2,1). Total score = 16.
Team 2 : Bob & Tony. Match scores: (2,3,3,2,3,2,3). Total score = 18.
Another example would be two golfers working as team and the best score between them is counted per hole, the at the end we add those up.
Can this be done using a spreadsheet?
To get the desired result, the formula comes up quite complicated:
=SUMPRODUCT(IF(MMULT((B12=$B$1:$E$1)*$B$2:$E$8,ROW(A1:A4)^0)>MMULT((C12=$B$1:$E$1)*$B$2:$E$8,ROW(A1:A4)^0),(B12=$B$1:$E$1)*$B$2:$E$8,(C12=$B$1:$E$1)*$B$2:$E$8))
but works in both Excel and GS
In Excel you can use the LARGE() function. It is the easiest option but a bit verbose.
If you want to the sum of the top 3 values in a column/row:
= large(A1:A10, 1), large(A1:A10, 2) + large(A1:A10, 3)
In Excel
If one has the new dynamic array formula LET():
=LET(x,INDEX($B$2:$E$8,0,MATCH(I2,$B$1:$E$1,0)),y,INDEX($B$2:$E$8,0,MATCH(J2,$B$1:$E$1,0)),SUMPRODUCT(((x>y)*x)+((y>=x)*(y))))
Else
=SUMPRODUCT(((INDEX($B$2:$E$8,0,MATCH(I2,$B$1:$E$1,0))>INDEX($B$2:$E$8,0,MATCH(J2,$B$1:$E$1,0)))*INDEX($B$2:$E$8,0,MATCH(I2,$B$1:$E$1,0)))+((INDEX($B$2:$E$8,0,MATCH(J2,$B$1:$E$1,0))>=INDEX($B$2:$E$8,0,MATCH(I2,$B$1:$E$1,0)))*(INDEX($B$2:$E$8,0,MATCH(J2,$B$1:$E$1,0)))))
Either of the following formulas will produce the desired sums of 16 and 18 (tested on my machine):
=ArrayFormula(SUM(IF($B$2:$B$8>$D$2:$D$8,$B$2:$B$8,$D$2:$D$8)))
=SUMPRODUCT(IF($B$2:$B$8>$D$2:$D$8,$B$2:$B$8,$D$2:$D$8))
Adjust to B->C and D->E for Bob + Tony. These formulas work by operating on arrays. They evaluate the IF statement once per cell in the B2:B8 range and generate an array of values ({3,4,1,3,2,2,1}). Then SUM or SUMPRODUCT will sum those values. ArrayFormula is necessary to force SUM to deal with the IF as an array.
Further customization can be built from here as desired. Play around with ArrayFormula and SUMPRODUCT as they have much more powerful use cases than this and have parallels in other spreadsheet softwares including Excel.

I have data stored in excel where I need to sort that data

In excel, I have data divided into
Year Code Class Count
2001 RAI01 LNS 9
2001 RAI01 APRP 4
2001 RAI01 3
2002 RAI01 BPR 3
2002 RAI01 BRK 3
2003 RAI01 URE 3
2003 CFCOLLTXFT APRP 2
2003 CFCOLLTXFT BPR 2
2004 CFCOLLTXFT GRL 2
2004 CFCOLLTXFT HDS 2
2005 RAI HDS 2
where I need to find the top 3 products for that particular customer for that particular year.
The real trick here is to rank each row based on a group.
Your rank is determined by your Count column (Column D).
Your group is determined by your Year and Code (I think) columns (Column A and B respectively).
You can use this gnarly sumproduct() formula to get a rank (Starting at 1) based on the Count for each Group.
So to get a ranking for each Year and Code from 1 to whatever, in a new column next to this data:
=SUMPRODUCT(($A$2:$A$50=A2)*(B2=$B$2:$B$50)*(D2<$D$2:$D$50))+1
And copy that down. Now you can AutoFilter on this to show all rows that have a rank less than 4. You can sort this on Customer, then Year and you should have a nice list of top 3 within each year/code.
Explanation of sumproduct.
Sumproduct goes row by row and applies the math that is defined for each row. When it is done it sums the results.
As an example, take the following worksheet:
+---+---+---+
| | A | B |
+---+---+---+
| 1 | 1 | 1 |
| 2 | 1 | 4 |
| 3 | 2 | 2 |
| 4 | 4 | 1 |
| 5 | 1 | 2 |
+---+---+---+
`=SUMPRODUCT((A1:A5)*(B1:B5))`
This sumproduct will take A1*B1, A2*B2, A3*B3, A4*B4, A5*B5 and then add those five results up to give you a number. That is 1 + 4 + 4 + 4 + 1 = 15
It will also work on conditional/boolean statements returning, for each row/condition a 1 or a 0 (for True and False, which is a "Boolean" value).
As an example, take the following worksheet that holds the type of publication in a library and a count:
+---+----------+---+
| | A | B |
+---+----------+---+
| 1 | Book | 1 |
| 2 | Magazine | 4 |
| 3 | Book | 2 |
| 4 | Comic | 1 |
| 5 | Pamphlet | 2 |
+---+----------+---+
=SUMPRODUCT((A1:A5="Book")*(B1:B5))
This will test to see if A1 is "Book" and return a 1 or 0 then multiple that result by whatever is B1. Then continue for each row in the range up to row 5. The result will 1+0+2+0+0 = 3. There are 3 books in the library (it's not a very big library).
For this answer's sumproduct:
So ($A$2:$A$50=A2) says to return a 1 if A2=A2 or a 0 if A2<>A2. It does that for A2 through A50 comparing it to A2, returning a 1 or a 0.
(B2=$B$2:$B$50) will test each cell B2 through B50 to see if it is equal to B2 and return a 1 or 0 for each test.
The same is true for (D2<$D$2:$D$50) but it's testing to see if the count is less than the current cells count.
So... essentially this is saying "For all the rows 1 through 50, test to find all the other rows that have the same value in Column A and B AND have a count less than this rows count. Count all of those rows up that meet that criteria, and add 1 to it. This is the rank of this row within its group."
Copying this formula has it redetermine that rank for each row allowing you to rank and filter.

Excel array formula to find row of largest values based on multiple criteria

Column: A | B | C | D
Row 1: Variable | Margin | Sales | Index
Row 2: banana | 2 | 20 | 1
Row 3: apple | 5 | 10 | 2
Row 4: apple | 10 | 20 | 3
Row 5: apple | 10 | 10 | 4
Row 6: banana | 10 | 15 | 5
Row 7: apple | 10 | 15 | 6
"Variable" sits in column A, row 1.
"Fruit" refers to A2:A6
"Margin" refers to B2:B6
"Sales" refers to C2:C6
"Index" refers to D2:D6
Question:
From the above table, I would like to find the row of two largest "Sales" values when Fruit = "apple" and Margin >= 10. The correct answer would be values from row 3 and 6. I have tried the following methods without success.
I have tried
=LARGE(IF(Fruit="apple",IF(Margin>=10,Sales)),{1,2}) + CSE
and this returns 20 and 15, but not the row.
I have tried
=MATCH(LARGE(IF(Fruit="apple",IF(Margin>=10,sales)),{1,2}),Sales,0)+1
but returns row 2 and 6 as the first matches to come up are the 20 and 15 from "banana" not "apple".
I have tried
=INDEX(D2:D7,LARGE(IF(Fruit="apple",IF(Margin>=10,ROW(Sales)-ROW(INDEX(Sales,1,1))+1)),{1,2}),1)
But this returns row 7 and 5 (i.e. "Index" 6 and 4) as these are just the first occurrences of "apple" starting from the bottom of the table. They are not the largest values.
Can this be done with an Excel formula or do would I need a macro? If macro, can I please get help with the macro? Thank you!
use this formula:
=INDEX(D:D,AGGREGATE(15,6,ROW($A$2:$A$7)/(($B$2:$B$7>=10)*($A$2:$A$7="apple")*($C$2:$C$7 = AGGREGATE(14,6,$C$2:$C$7/(($B$2:$B$7>=10)*($A$2:$A$7="apple")),F2))),1))
I put 1 and 2 in F2 and F3 respectively to find the first and second.
Edit #1
to deal with duplicates we need to add (COUNTIF($G$1:G1,$D$2:$D$7) = 0). The $G$1:G1 needs to refer to the cell directly above the first placement of this formula. So the formula needs to start in at least row 2.
=INDEX(D:D,AGGREGATE(15,6,ROW($A$2:$A$7)/((COUNTIF($G$1:G1,$D$2:$D$7) = 0)*($B$2:$B$7>=10)*($A$2:$A$7="apple")*($C$2:$C$7 = AGGREGATE(14,6,$C$2:$C$7/(($B$2:$B$7>=10)*($A$2:$A$7="apple")),F2))),1))

Split a single Excel row into multiple based on columns

I need to create multiple Excel rows based off of a single row. For example, I currently have a single row for each personnel and there are dozens of columns that are "grouped" so to speak. So say column K is its own group, then columns M, N, O are a group, P, Q, R, are a group, etc. I need that single row to become multiple rows - one row per group of columns. So the current situation is:
1 | Smith, John | Column K | Column M | Column N | Column O | Column P | Column Q| Column R
And I need that to become:
1 | Smith, John | Column K
2 | Smith, John | Blank | Column M | Column N | Column O
3 | Smith, John | Blank | Blank | Blank | Blank | Column P | Column Q | Column R
Here's a solution. You probably can do that math a bit smarter. I can't right now. ;)
Sub splitRows()
Dim i As Integer
With Sheets(1)
For i = 2 To (.UsedRange.Columns.Count / 3)
.Range(Cells(i, 1), Cells(i, 3)).Value = .Range(Cells(1, 1), Cells(1, 3)).Value
.Range(Cells(i, (i - 1) * 3 + 1), Cells(i, (i - 1) * 3 + 3)).Value = .Range(Cells(1, (i - 1) * 3 + 1), Cells(1, (i - 1) * 3 + 3)).Value
Next
End With
End Sub
This will only work correctly for sheets where there are always groups of three columns. If not, you have to change that .UsedRange.Columns.Count / 3 part.
Cheers.

Top ten ordering in Excel based on complex team rules

I have an excel spreadsheet in a format similar to the following...
| NAME | CLUB | STATUS | SCORE |
| Fred | a | Gent | 145 |
| Bert | a | Gent | 150 |
| Harry | a | Gent | 195 |
| Jim | a | Gent | 150 |
| Clare | a | Lady | 99 |
| Simon | a | Junior | 130 |
| John | b | Junior | 130 |
:
:
| Henry | z | Gent | 200 |
I need to convert this table into a list of the "Top Ten" teams. The rules are
Each team score is taken from the sum of four members of that club.
These totals should be of the best four scores except...
Each team must consist of at least one Junior or Lady
For example in the table above the team score for club A would be 625 not 640 as you would take the scores for Harry(190), Bert(150), Jim(150), and Simon(130). You could not take Fred's(145) score as that would give you only Gents.
My question is, can this be done easily as a series of Excel formula, or will I need to resort to using something more procedural?
Ideally the solution needs to be automatic in the team selections, I don't want to have to create separate hand crafted formula for each team. I also will not necessarily have a neatly ordered list of each clubs members. Although I could probably generate the list via an extra calculation sheet.
Public Function TopTen(Club As String, Scores As Range)
Dim i As Long
Dim vaScores As Variant
Dim bLady As Boolean
Dim lCnt As Long
Dim lTotal As Long
vaScores = FilterOnClub(Scores.Value, Club)
vaScores = SortOnScore(vaScores)
For i = LBound(vaScores, 2) To UBound(vaScores, 2)
If lCnt = 3 And Not bLady Then
If vaScores(3, i) <> "Gent" Then
lTotal = lTotal + vaScores(4, i)
bLady = True
lCnt = lCnt + 1
End If
Else
lTotal = lTotal + vaScores(4, i)
lCnt = lCnt + 1
If vaScores(3, i) <> "Gent" Then bLady = True
End If
If lCnt = 4 Then Exit For
Next i
TopTen = lTotal
End Function
Private Function FilterOnClub(vaScores As Variant, sClub As String) As Variant
Dim i As Long, j As Long
Dim aTemp() As Variant
For i = LBound(vaScores, 1) To UBound(vaScores, 1)
If vaScores(i, 2) = sClub Then
j = j + 1
ReDim Preserve aTemp(1 To 4, 1 To j)
aTemp(1, j) = vaScores(i, 1)
aTemp(2, j) = vaScores(i, 2)
aTemp(3, j) = vaScores(i, 3)
aTemp(4, j) = vaScores(i, 4)
End If
Next i
FilterOnClub = aTemp
End Function
Private Function SortOnScore(vaScores As Variant) As Variant
Dim i As Long, j As Long, k As Long
Dim aTemp(1 To 4) As Variant
For i = 1 To UBound(vaScores, 2) - 1
For j = i To UBound(vaScores, 2)
If vaScores(4, i) < vaScores(4, j) Then
For k = 1 To 4
aTemp(k) = vaScores(k, j)
vaScores(k, j) = vaScores(k, i)
vaScores(k, i) = aTemp(k)
Next k
End If
Next j
Next i
SortOnScore = vaScores
End Function
Use as =TopTen(H2,$B$2:$E$30) where H2 contains the club letter.
can this be done easily as a series of
Excel formula
Short answer, YES. (Depending on your definition of "easily").
Long answer...
(I think this works)
Here's my (brief) test data:
A B C D
1 NAME CLUB STATUS SCORE
2 Kevin a Gent 145
3 Lyle a Gent 150
4 Martin a Gent 195
5 Norm a Gent 150
6 Oonagh a Lady 100
7 Arthur b Gent 200
8 Brian b Gent 210
9 Charlie b Gent 190
10 Donald b Gent 220
11 Eddie b Junior 150
12 Quentin c Gent 145
13 Ryan c Gent 150
14 Sheila c Lady 195
15 Trevor c Gent 150
16 Ursula c Junior 200
Now, if I've understood the rules correctly, we want the best four scores, except that if the highest score by either a lady or a junior is not in the best four, we use that instead of the fourth highest. I've restated it somewhat, for reasons that may become apparent...
OK. Array formulae to the rescue! (I hope)
The highest score from team a should be
{=LARGE(IF(B2:B16="a",D2:D16,0),1)}
where the {} indicates an array formula created by using Control-Shift-Enter to input the formula. The top four are similarly created. For the Lady/Junior bit, we need a bit more complexity. Taking the Lady, we need this:
{=LARGE(IF($B$2:$B$16=$J3,IF($C$2:$C$16="Lady",$D$2:$D$16,0),0),1)}
Junior may safely be left as an exercise for the student, I hope.
I'm now looking at a table with the following layout for club "a"
J K L M N O P
1 Club 1 2 3 4 Lady Junior
2 a 195 150 150 145 100 0
The club score should be the top three "anyone" scores plus the best lady or junior if they're not already in the top four.
So in Q2 I'm putting this:
=SUM(K2:M2)+MIN(MAX(O2,P2),N2)
MAX(O2,P2) tells me the best lady or junior score, which has to be included. If it's higher than the fourth-highest team score, then it's already in the list and we just take the top four. Otherwise, we replace the fourth-highest score with the best lady/junior one.
Now we could do it all in one formula, by substituting the parts into the final formula:
{=LARGE(IF($B$2:$B$16=$J3,$D$2:$D$16,0),1)+
LARGE(IF($B$2:$B$16=$J3,$D$2:$D$16,0),2)+
LARGE(IF($B$2:$B$16=$J3,$D$2:$D$16,0),3)+
MIN(LARGE(IF($B$2:$B$16=$J3,$D$2:$D$16,0),4),
MAX(LARGE(IF($B$2:$B$18=$J3,IF($C$2:$C$18="Lady",$D$2:$D$18,0),0),1),
LARGE(IF($B$2:$B$18=$J3,IF($C$2:$C$18="Junior",$D$2:$D$18,0),0),1)))}
But I don't recommend it...
So for the above data, I end up with this:
Anyone Lady Junior
Club 1 2 3 4 1 1 Total
a 195 150 150 145 100 0 595
b 220 210 200 190 0 150 780
c 200 195 150 150 195 200 695
Rats. In my excitement at (I think) getting the hard part to work I forgot to mention that
The list of scores can be in any order
You can get the club rankings with RANK()
You can then pull the top 10 into another table using MATCH() and INDEX()
A B C D E F G H
1 club Sc Rank UniqRk Pos Club Score
2 third-equal#1 80 3 79.999980 1 1 best 100
3 second 90 2 89.999970 2 2 second 90
4 third-equal#2 80 3 79.999960 3 3 third-equal#1 80
5 best 100 1 99.999950 4 3 third-equal#2 80
6 worst 70 5 69.999940 5 5 worst 70
Columns A and B are our calculated scores, column E is the order in which clubs will be output in the final table. The other formulae are as follows:
C: =RANK(B2,$B$2:$B$6) # what it says, with ties both getting the lower number
D: =B2-ROW()*0.00001 # score, modified slightly to ensure uniqueness
F: =SMALL($C$2:$C$6,E2) # first output column, ranks including ties
G: =INDEX($A$2:$A$6,MATCH(LARGE($D$2:$D$6,E2),$D$2:$D$6,0))
# club name for position, using the modified score in D
H: =INDEX($B$2:$B$6,MATCH(LARGE($D$2:$D$6,E2),$D$2:$D$6,0))
# as G, but indexes into scores
What I do is lame, but it works.
Just make a new column then insert this formula =If(a1=N,b1,0) where A1 is criteria column, N is criteria and B1 is in the column that you are trying to get the large from. Then I just do the large formula in another column.
Sometimes I get all fancy and instead of rolling out a N, I will make it say $C$1, then spell out the criteria in that cell.
The perfect answer would be to have Microsoft add in a largeifs (please read this Microsoft)
Writing a solution in VBA would be my first choice, especially if the rules have the possibility of becoming more complex.
Use a pivot table which will act as a database query on the data you have. Pivot so that the teams go down the columns and team members along with their status type go across the pivot table. I'm not sure for 2003, but Excel 2007 lets you then sort so the highest scores appear to the left. Then your first sum can simply take the first three scores for the each team. However to get the last persons sum, you have to determine if you can use the 4th score, or if you have to use the max of the junior or Lady types. That could be done using a complex and brute force formula somewhat like this:
if (type of position 1 is a junior or a lady or ... 2 or 3... ) then use position 4 else if position 5 is a junior or lady then use 5 else if p 6 is ... and so on.
I don't think that this can be done unless the table is sorted in some way. Most of Excel's lookup functions require ordered lists. This could certainly be done with a VBA function.

Resources