Excel - two columns - remove duplicate values buy comparing pairs in rows - excel

I'm using Excel 2013, simplifying the actual data, suppose you have two columns of data, similar to below:
Column1
Column2
A
D
A
C
B
C
C
B
C
A
D
A
My goal is to either filter or remove the duplicate pairs regardless of what column they're in. My desired output would be as follows:
Column1
Column2
A
D
A
C
B
C
So, as you can see, the C-B row was eliminated because the pair exists in the row containing B-C. Same for the rows containing C-A (A-C) and D-A (A-D). Data will be in no particular order and it doesn't matter which of the two paired matches are removed.
Solution can be via creating a new table or even using VBA to output a new table. Whatever is easiest.
Thanks in advance!

I was able get this to work for me, so I thought I'd post my answer hopefully helping someone else. As I suspected, and Quailia suggested, VBA was the way to go.
Few things changed along the way. I needed to add an additional column to the table as it was determined that there would be some duplicates that I needed to keep. The values in the first column now group rows together, so later duplicates are kept. The values I need to check for duplicate pairs, are in column 2 and 3.
Table1:
Column1
Column2
Column3
01
A
D
01
A
C
01
B
C
01
C
B
01
C
A
01
D
A
Also, I realized I would be dealing with a fixed number of rows, so I didn't have a need to keep changing the size of my array.
Sub RemoveDuplicatePairs()
'Checks a 3 column table for rows with duplicate values when column 2 and 3 are swapped, removes duplicates and outputs unique rows to a new table
Dim aArrayList() As Variant
Dim bArrayList(1 To 144, 1 To 3) As Variant
Dim aRowNum As Integer, aColNum As Integer, bRowNum As Integer, bColNum As Integer
aArrayList = Range("Table1")
bRowNum = 1
'Loop through rows of 1st array
For aRowNum = 1 To UBound(aArrayList, 1)
If aRowNum = 1 Then
bArrayList(1, 1) = aArrayList(1, 1)
bArrayList(1, 2) = aArrayList(1, 2)
bArrayList(1, 3) = aArrayList(1, 3)
'Debug.Print bArrayList(bRowNum, 1) & aArrayList(bRowNum, 2) & bArrayList(aRowNum, 3)
bRowNum = 2
ElseIf aRowNum > 1 Then
'Check if current row in 1st array already exists in 2nd array by comparing values after flipping 2nd and 3rd column values from 1st array
For i = 1 To bRowNum - 1
If aArrayList(aRowNum, 1) = bArrayList(i, 1) And aArrayList(aRowNum, 2) = bArrayList(i, 3) And aArrayList(aRowNum, 3) = bArrayList(i, 2) Then
'Duplicate pair already exists, move onto next row of 1st array
GoTo SKIP
End If
Next i
'No duplicate found, add row from 1st array to 2nd array
bArrayList(bRowNum, 1) = aArrayList(aRowNum, 1)
bArrayList(bRowNum, 2) = aArrayList(aRowNum, 2)
bArrayList(bRowNum, 3) = aArrayList(aRowNum, 3)
'Debug.Print bArrayList(bRowNum, 1) & bArrayList(bRowNum, 2) & bArrayList(bRowNum, 3)
bRowNum = bRowNum + 1
SKIP:
End If
Next aRowNum
'Write array to new table
Sheets("Sheet1").Range("Table2").Value = bArrayList()
End Sub
Table2:
Column1
Column2
Column3
01
A
D
01
A
C
01
B
C
I'm sure there is an easier, neater way to do the same thing, but as my username implies, I'm still pretty new to this. Any suggestions that improve or simplify my logic are certainly welcome.

Related

Excel for loop to get value from another row

I have an spreadsheet that contains various data. It looks like this:
A A A B B C C C C
a 1 2 3 2 1 4 2 3 2
b 0 2 3 3 0 1 2 3 0
c 6 6 3 0 2 1 0 4 0
etc.
What I want is to add all the Aa's and come up with a Aa total, all the Bb's and come up with a Bb total, all the Ab's etc.
What I want to do is, for every column, check if it is A, B or C. I want to do that because the data may change I might end up with four columns for A, two for B, etc. I know however that a, b and c will stay where they are.
I also don't know the order of A, B and C. There could be two A's followed by two C's and then one B.
My final result will be a table containing all the totals:
Aa Ab Ac
Ba Bb Bc
Ca Cb Cc
Where in the previous example would mean that Aa = 1 + 2 + 3 = 6, Ab = 5, etc.
Something like that.
I think the way to go is for 1-1 (the total of Aa's) is to go through every column in the first row. Check if it is an A. If it is, then get the value of the same column but second row. Add it to the total. When gone through all the columns, show up the total in 1-1.
What I have so far (for A):
Sub getA()
Dim x As Integer
Dim total As Integer
'cols = Find number of columns with data in them
For x = 1 To cols
'cell = cell in Ax
If InStr(1, cellvalue, "a") = 1 Then
'val = value from row 5 in same column
total = total + Val
End If
Next
End Sub
But I don't really know how to proceed with the commented lines.
Finally, another thing I would like to know is how will these values be presented in their respective cells without any extra event being carried (button for example). They should just appear in their cells from the moment someone opens the spreadsheet.
Any help is greatly appreciated.
Thanks.
Just an FYI, this can be done using the SUMPRODUCT formula:
=SUMPRODUCT(($B$1:$J$1=D$9)*($A$2:$A$4=$C10)*$B$2:$J$4)
EDIT
To compare the first letter then use this formula:
=SUMPRODUCT((LEFT($B$1:$J$1,1)=D$9)*($A$2:$A$4=$C10)*$B$2:$J$4)
Are you looking for something like:
Function countletter(strLetter As String) As Double
Dim x As Double, y As Double, xMax As Double, yMax As Double
xMax = Range("A1").CurrentRegion.Columns.Count
yMax = Range("A1").CurrentRegion.Rows.Count
For x = 1 To xMax
For y = 1 To yMax
If Cells(y, x).Value = strLetter Then
countletter = countletter + 1
End If
Next
Next
End Function

Excel - Entries in List of values be based on some other column value

We have a scenario in excel (2010) where the list of values present in a dropdown change dynamically based on some column of that row. For eg. Consider the "Supervisor" dropdown in sheet1 below:
Emp Grade Supervisor
A 14
B 12
C 13
D 12
E 12
F 13
G 14
Now let's say there is a dropdown for the supervisor. For every employee, the supervisor can be a person of his grade or higher grades only. So, for eg. For grade 13 employee, can have a supervisor with grade 13 or grade 14 only, not grade 12.
How can I write a custom condition like this inside the list of values? I have tried with things like named range, offset etc. but none allows specifying custom conditions. Any help?
I found the following document to be helpful in creating dependent Data Validation dropdowns: DV0064 - Dependent Lists Clear Cells, which can be downloaded here (for free):
http://www.contextures.com/excelfiles.html#DataVal
You can tailor the example to your needs.
=OFFSET('validation pivot'!$A$1,0,1,COUNTIFS('validation pivot'!$A:$A,">="&B2),1)
The supervisor needs to be at least his pay grade (>=B2). In order to have it work you need to have the pivot inserted in validation pivot A1. How to create the pivot (hasty notes):
add grade and emp 'emp as subset
tabular view 'to have separate columns
repeat labels ' to be able to count them
remove autosums(both within and total) 'to not deal with evading it
hide column labels and filters 'same
descending order(grade) 'to get a simple match method
data: store none 'to refresh the descending order every time
See uploaded sample file.
This code (column A = EMP, B = Grade, C = Supervisor)
Sub test()
Dim actualgrade As Integer
Dim lastRowA As Integer
Dim numbers As String
lastRowA = Sheets("sheet1").Cells(Sheets("sheet1").Rows.Count, "A").End(xlUp).Row
For i = 2 To lastRowA '1 = headers
actualgrade = Cells(i, 2)
For j = 2 To lastRowA
If Cells(j, 2) >= actualgrade Then
numbers = numbers & " " & Cells(j, 1).Value
End If
Next j
Cells(i, 3).Value = numbers
numbers = ""
Next i
End Sub
Makes this result:
Emp Grade Supr
A 14 A G
B 12 A B C D R F G
C 13 A C F G
D 12 A B C D R F G
R 12 A B C D R F G
F 13 A C F G
G 14 A G
Feel free to change it like you need it

Spreadsheet/Excel array functions that compare and validate values

I got this dataset
ID fruit price
1 apple 10
2 apple 50
3 apple 100
4 banana 10
5 banana 20
6 banana 50
and would like a (set of) forumla(s) that go through the rows and output the row for each fruit that has the highest price.
In e.g. PHP I would do something like this
foreach $array as $row{
if in_array( $row[fruit] ){
/* check if current $row[price] for current $row[fruit] is larger than existing post. If yes replace */
}
}
How would I do that in Google Spreadsheets / Excel?
You can do it in Excel with array formulas (so you enter it with Ctrl+Shift+Enter)...
If your fruit is in B and price in C your array formula in D2 would be
=C2=MAX(IF($B$2:$B$7=B2,$C$2:$C$7,0))
This will give you TRUE or FALSE for whether that row has the highest price for that fruit.
It works by doing an IF on the array of fruits (rows 2 to 7 - you can make it longer) being the same as the current fruit - if it is the same, return the price, otherwise 0. We then get the MAX and compare it to the current row's price.
Good luck!
I've put together a quick VBA macro that you could use in excel that will output the fruit with the highest price.
The macro converts the table of fruit toa an array and then loops through the array to find the fruit with the highest value, before outputting it to the sheet. This macro relies on the table of fruit being positioned in columns A to C.
Sub getMaxPriceFruit()
'put data table into an array
Dim dataTableArray() As Variant
dataTableArray = Range("A2:C" & Cells(Rows.Count, "A").End(xlUp).Row)
'loop through the aray looking for the largest value
'capture array index in variable when largest is found
Dim maxArray(1 To 1, 1 To 3) As Variant
maxArray(1, 1) = 0
maxArray(1, 2) = ""
maxArray(1, 3) = 0
Dim i As Long
For i = 1 To UBound(dataTableArray)
If dataTableArray(i, 3) > maxArray(1, 3) Then
maxArray(1, 1) = dataTableArray(i, 1)
maxArray(1, 2) = dataTableArray(i, 2)
maxArray(1, 3) = dataTableArray(i, 3)
End If
Next i
'output the fruit with the max value
Range("F2").Value = maxArray(1, 1)
Range("G2").Value = maxArray(1, 2)
Range("H2").Value = maxArray(1, 3)
End Sub
The limitation of this script is that if there are two fruit with an equal max value, the first fruit in the list with that value will be selected as the winner. If you would like the additional code to output multiple fruits if they have the same max value I can provide, but put simply you could utilise the maxArray array to capture all of the top ranking fruits and then loop through this array to output them all in one go.
Hope that helps!
A pivot table with fruit for Rows and price for Values (Summarise by: MAX) may serve.

Auto calculate average over varying number values row by row

I have an Excel file with several columns in it and many rows. One column, say A has ID numbers. Another column, say G has prices. Column A has repeating ID numbers, however not all numbers repeat the same amount of times. Sometimes just once, other times 2, 3 or several times. Each column G for that row has a unique price.
Basically, I need to average those prices for a given ID in column A. If each ID was repeated the same number of times, this would be quite simple, but because they are not I have to manually do my average calculation for each grouping. Since my spreadsheet has many many rows, this is taking forever.
Here is an example (column H is the average that I am currently calculating manually):
A ... G H
1 1234 3.00 3.50
2 1234 4.00
3 3456 2.25 3.98
4 3456 4.54
5 3456 5.15
11 8890 0.70 0.95
13 8890 1.20
...
So in the above example, the average price for ID# 1234 would be 3.50. Likewise, the average price for ID# 3456 would be 3.98 and for #8890 would be 0.95.
NOTICE how rows are missing between row 5 and 11, and row 12 is missing too? That is because they are filtered out for some other reason. I need to exclude those hidden rows from my calculations and only calculate the average for the rows visible.
Im trying to write a VBA script that will automatically calculate this, then print that average value for each ID in column H.
Here is some code I have considered:
Sub calcAvg()
Dim rng As Range
Set rng = Range("sheet1!A1:A200003")
For Each Val In rng
Count = 0
V = Val.Value '''V is set equal to the value within the range
If Val.Value = V Then
Sum = Sum + G.Value
V = rng.Offset(1, 0) '''go to next row
Count = Count + 1
Else
'''V = Val.Value '''set value in this cell equal to the value in the next cell down.
avg = Sum / Count
H = avg '''Column G gets the avg value.
End If
Next Val
End Sub
I know there are some problems with the above code. Im not too familiar with VBA. Also this would print the avg on the same line everytime. Im not sure how to iterate the entire row.
This seems overly complicated. Its a simple problem in theory, but the missing rows and differing number of ID# repetitions makes it more complex.
If this can be done in an Excel function, that would be even better.
Any thoughts or suggestions would be greatly appreciated. thanks.
If you can add another row to the top of your data (put column Headers in it) its quite simple with a formula.
Formula for C2 is
=IF(A2<>A1,AVERAGEIFS(B:B,A:A,A2),"")
copy this down for all data rows.
This applies for Excel 2007 or later. If using Excel 2003 or earlier, use AVERAGEIF instead, adjusting ranges accordingly
If you can't add a header row, change the first formula (cell C1) to
=AVERAGEIFS(B:B,A:A,A1)
In my way ..
Sub calcAvg()
Dim x, y, i, y2, t, Count, Mount As Integer
Dim Seek0 As String
x = 1 '--> means Col A
y = 1 '--> means start - Row 1
y2 = 7 '--> means end - Row 19
For i = y To y2
If i = y Then
Seek0 = Cells(i, x)
t = i
Count = Cells(i, x + 6)
Mount = 1
Else
If Cells(i, x) <> Seek0 Then
Cells(t, x + 7) = Count / Mount
Count = Cells(i, x + 6)
Mount = 1
t = i
Seek0 = Cells(i, x)
Else
Count = Count + Cells(i, x + 6)
Mount = Mount + 1
End If
End If
Next
End Sub
Hope this helps ..

Updating column values on specific Row using VBA

How to update value from xt to xtt in 6th column, first row.
1 2 3 4 5 6
x xx xy xz x1 xt
y yx tt cc z3 xcc
Based on above data, I am getting range from worksheet. After getting the Row object, how do I update the Cell value in particular column?
As asked, you can update a specific column using the method:
'Sheet.Cells(row, column) = value
' i.e.
ActiveSheet.Cells(1, 6) = "xtt"
If you only want to perform the update if it has a value of "xt", then obviously you'd need to check the contents before performing the update... For example:
If (ActiveSheet.Cells(1, 6) = "xt") Then
ActiveSheet.Cells(1, 6) = "xtt"
End If

Resources