Excel remove duplicates based on 2 columns case-sensitive

Excel remove duplicates based on 2 columns case-sensitive - excel

I need to remove duplicates from an Excel worksheet based on the values in 2 columns while taking case into account.
In the example below, Rows 1 and 2 are duplicates (Row 2 should be removed). Row 3, 4, and 5 are unique.
Row
Column A
Column B
1
Abc
Def
2
Abc
Def
3
ABC
DEF
4
ABC
DeF
5
Abc
DeF
I've done this with other datasets using Data > Remove duplicates, but since it is case-insensitive, it won't work for this.
I also found this question, which is very similar, but only identifies duplicates based on 1 column.
(How to remove duplicates that are case SENSITIVE in Excel (for 100k records or more)?)

Try this code:
Sub SubRemoveDuplicates()
'Declarations.
Dim RngData As Range
Dim RngDataToBeCompared
Dim RngCell As Range
Dim RngRow As Range
'Settings.
Set RngData = Range("A1:C6")
Set RngDataToBeCompared = Range("B2:C6")
'Covering each row of the data to be compared.
For Each RngRow In RngDataToBeCompared.Rows
CP_Rerun_For:
'Covering each cell of the given row.
For Each RngCell In RngRow.Cells
'Checking if any cell is different from the one under it.
If RngCell.Value <> RngCell.Offset(1, 0).Value Then
'If said cell has been found, skip to the next row.
GoTo CP_Next_Row
End If
Next
'Checking if the range to be targeted is within RngData.
If Not Intersect(RngRow.Offset(1, 0).EntireRow, RngData) Is Nothing Then
'Deleting the row of duplicates.
Intersect(RngRow.Offset(1, 0).EntireRow, RngData).Delete (xlShiftUp)
'Rerunning this cycle for the given row in order to catch duplicates that comes in more than 2.
GoTo CP_Rerun_For
End If
CP_Next_Row:
Next
End Sub
Note: if you are going to cover an entire column with presumably many empty cells, the macro will cover (and eventually delete) all those empty cells too. The macro can be modified so it will stop when it encounters and empty row, or to dynamically determinate the appropriate range to be covered. Otherwise it might take more time than necessary.

I don't like using macros until it's last hope.
For your situation, I would suggest adding new columns, and with function
=lower(Column A), etc. get values of column A in lower case. Then I would add one more new column and do the same for Column B.
And after that, I would use Data/Remove Duplicates (converting range to Table format first). And then I would delete unnecessary columns which were added for converting everything to lowercase.

Use this frmula to manually delete. It combines two columns on one row and compares them with the column above.
=B2&C2=B1&C1
You can then edit or filter on Col D and delete.

Related

Create list of unique values from a range in Excel

I have data (names) spread out in a sheet in Excel. It is in columns A through M.
Some rows have data in every column (A-M), some rows have nothing.
I want to extract a list of every unique value (or even a list of every value, and then I can remove duplicates) in the range.
For the life of me, I cannot figure out how to do this and haven't found an example online that pulls unique values from a RANGE, and not a single column.
Any ideas? Let me know if you need a screenshot of the data for more context.

Give this macro a try:
Sub MAINevent()
Dim it As Range, r As Range, x0
With CreateObject("scripting.dictionary")
For Each it In Range("A:M").SpecialCells(2)
x0 = .Item(it.Value)
Next
Set r = Cells(1, "N").Resize(.Count, 1)
r.Value = Application.Transpose(.Keys)
End With
End Sub

You can use Go to Special (Ctrl+G -> Special) to select blanks and remove them (Ctrl+- with cells selected). Then use the formula described here to stack all your columns in one, on top of each other. Then replace formulas with values and finally, use Remove Duplicates.

VBA Code - Copying cells from one sheet to another sheet in excel depending on matching condition

I have two sheets Sheet1 and Sheet2 in excel .Sheet 1 has Columns C1,C2,C3,C4,C5.Sheet 2 has Columns C1,C2,C3.Now I have to perform 3 operations.
1.) Delete all rows in Sheet1 where values of Column 1 is not found in Column 1 of Sheet2.
2.) Replace values of C2,C3 of Sheet 2 into C2,C3 of Sheet 1 (C4,C5 remains the same) where value of C1 of Sheet 1 matches C1 of Sheet2.
3.) Append C1,C2,C3 data of Sheet 2 into Sheet 1 where values of C1 in Sheet 2 is not found in C1 of Sheet1(C4,C5 will be blank).
I am able to write the VB code for operation 1.Please help me with operation 2 and operation 3.
Sub delete_selected_rows()
Dim rng1 As Range, rng2 As Range, rngToDel As Range, c As Range
Dim lastRow As Long
With Worksheets("Sheet1")
lastRow = .Cells(.Rows.Count, "A").End(xlUp).Row
Set rng1 = .Range("A2:A" & lastRow)
End With
Set rng2 = Worksheets("Sheet2").Range("A:A")
For Each c In rng1
If IsError(Application.Match(c.Value, rng2, 0)) Then
'delete incidents which are not in process
If rngToDel Is Nothing Then
Set rngToDel = c
Else
Set rngToDel = Union(rngToDel, c)
End If
End If
Next c
If Not rngToDel Is Nothing Then rngToDel.EntireRow.Delete
End Sub

First, let my make two comments regarding your solution for step 1.
You could delete the rows on the spot if you stepped through the rows from the bottom to the top. (The row indexes only change for rows below the deleted one.)
IMO, it would be better to use rng2.Find instead of Applications.Match. This method of the range object returns the cell where a match is found and Nothing in case there is no match.
Now to step 2:
Using the notation as in your solution to step 1, you can get the row of the match in sheet2 using rng2.Find(c.Value).EntireRow. Then you can use its Cells property to get the second and third column.
Step 3:
You already know from step 1 how to find out that a row does not have a match in the other sheet. You just have to copy the values from the first three columns into the row after the last row of sheet1 for each row without a match in sheet1. (Best save the last row and then increment with each copied row.)
Above, I gave a simple solution to each step. However, if you have very large tables, this might be a bit slow. Basically, with this solution you are cycling through the rows three times, at each row, querying the worksheet for a value and searching all rows in the other table for a match.
An approach with better performance would be to load the entire ranges into two dimensional arrays using the Value property of the ranges, or better Value2. Then you can do something like a merge join to solve your three problems, i.e. you could sort both arrays by the first column using your favorite n*log(n) sorting algorithm and then step through both list in ascending order. While stepping through, you can save which rows to delete, update the appropriate rows, and append the additional rows. To enable this, you should keep track of the original rows while sorting the arrays. Finally, you would go through the rows marked for deletion from bottom to top and delete them. (You cannot delete right away since this messes up the row indexes for the rows below the deleted one.)
Yet another approach to implement your three steps, which combines ease of use and performance, is to write SQL queries against your sheets via ADODB.
Your first step would look something like this.
DELETE [sheet1$] WHERE [sheet1$].'header of first column' NOT IN (SELECT [sheet2$].'header of first column')
Step 2 would be an update statement and step 3 an insert into statement.

Comparing multiple columns in Excel and remove dups

I have 3 columns in Excel 2010 with email addresses, I need to be able to narrow all 3 columns to only have unique values. I don't necessarily need to merge the remaining values into a single column, but I definitely need to eliminate duplicates. I found another post that had a VB with it, but it didn't seem to work. It removed only a few duplicates:
Sub removeDuplicates()
Dim lastCol As Integer
lastCol = 5 'col 5 is column E
Dim wks As Worksheet
Set wks = Worksheets("Sheet1")
Dim searchRange As Range
Set searchRange = wks.Range("A1:A" & wks.Cells(Rows.Count, "A").End(xlUp).Row)
Dim compareArray As Variant
Dim searchArray As Variant
'Get all values from Col A to search against
compareArray = searchRange.Value
For col = lastCol - 1 To 1 Step -1
'Set values to search for matches
searchArray = searchRange.Offset(0, col - 1).Value
'Set values to last column to compare against
compareArray = searchRange.Offset(0, col).Value
For i = 1 To UBound(compareArray)
If compareArray(i, 1) = searchArray(i, 1) Then
'Match found, delete and shift left
Cells(i, col).Delete Shift:=xlToLeft
End If
Next i
Next col
End Sub
Thanks!

Here is how I would propose doing this if it is a one-off task that you don't have to do very often.
Rather than typing out the entire process in detail, I have done a screencast of how I did this (and the entire process barely took me a minute to do).
The quick overview:
You will need to add a few temporary helper columns for unique values from each email list (one for each list), a 'merged list' column and then a final column. Filter for the unique emails using the 'Advanced' filter option one column at a time. Paste those values into the temporary column for that email list and then clear the filter. Repeat until you have gone through each column and each temporary column has the unique values in it from each list. Once you have the uniques from each list, paste these one at a time into the 'merged list' column (stacking the results in one long list) and then do a unique filter on that. Copy/paste the uniques from that list into your final column, clear the filter, and you're done.
Screencast is below:
http://screencast.com/t/zL8VmUut
Cheers!

Since the first column are the ones you already contacted, swap the first column with the second and on the 3rd write a YES or NO value if email was found on the second column (the ones you already contacted).
Formula.
=IF(ISERROR(VLOOKUP(A2,$B$2:$B$11,1,FALSE)),"Not Contacted","Yes")
As you can see, the one with Yes status is on the contacted list, you just filter the Not Contacted and you will have a new pending list in column A.
Simple.

Removing duplicates from large sheet

I want to remove rows based on duplicate cells in a column from large sheet, without leaving duplicate sample (like "Remove Duplicates" Excel command does). So if I have:
1
2
2
3
I want, as a result:
1
3
This can be accomplished with conditional formatting, then filtering or sorting duplicates and deleting filtered data, but the process is very slow for large sheet.
Conditional formatting takes second, but just clicking on filter takes around 5min to display filter context menu and additional 20-30min to do actual filtering based on color. I tried this process on a different PCs with 4 cores and plenty of RAM and 100.000 rows sheet
I then thought to write VBA, iterate column cells and if cell is colored, then delete entire row (this is possible in Excel 2010, with Cells().DisplayFormat) but processing takes even more time.
Can someone suggest a faster way to remove duplicates on large sheet?

EDIT: Note that I have used 2 functions. Of this, test is a function to test whether the function works (which you will have to modify per your scenario).
Also, I filled cell A1 to A100000 with test values. Please modify it per your needs.
Option Explicit
Function GetUniqueItems(ByVal src As Range) As Variant
Dim returnValue
Dim dictOfItemsWith1Value
Dim dictOfItemsWithMoreThan1Value
Dim countOfCells As Long
Dim counter As Long
Dim srcValues As Variant
Dim currentValue
Dim cell As Range
srcValues = src.Value
countOfCells = src.Cells.Count
Set dictOfItemsWith1Value = CreateObject("Scripting.Dictionary")
Set dictOfItemsWithMoreThan1Value = CreateObject("Scripting.Dictionary")
For counter = 1 To countOfCells
currentValue = srcValues(counter, 1)
If dictOfItemsWithMoreThan1Value.exists(currentValue) Then
dictOfItemsWithMoreThan1Value(currentValue) = dictOfItemsWithMoreThan1Value(currentValue) + 1
Else
If Not dictOfItemsWith1Value.exists(currentValue) Then
dictOfItemsWith1Value.Add currentValue, 1
Else
dictOfItemsWith1Value.Remove currentValue
dictOfItemsWithMoreThan1Value.Add currentValue, 1
End If
End If
Next
ReDim returnValue(1 To dictOfItemsWith1Value.Count, 1 To 1)
Dim key
counter = 1
For Each key In dictOfItemsWith1Value.keys
returnValue(counter, 1) = key
counter = counter + 1
Next
GetUniqueItems = returnValue
End Function
Sub test()
Debug.Print Now
Dim uniqueValues
uniqueValues = GetUniqueItems(Range("A1:A100000"))
Range("A1:A100000").ClearContents
Range("A1").Resize(UBound(uniqueValues, 1)) = uniqueValues
Debug.Print Now
End Sub

My way to deal with large excel files where I have to remove large chunks of data:
After the last column, use a countif() (much like KazJaw and DanM's countif)
=COUNTIF($A$1:$A$100000,A1)
$A$1:$A$100000 contains your ids. Change accordingly.
Drag the formula to the bottom (Fill Down, or select the range $B$1:$B$100000 if this is the column you put the helper column then Ctrl+D)
Copy column and paste values in place to remove the formula. This will prevent any recalculations during/after any filtering.
Sort by the column with the counts. This makes deleting the large portion of rows much faster later on.
Look for where you start to get counts of 2 and delete all rows till bottom.
Delete the helper column.
Now, if you want to restore the original order, put yet another column after the count, after step 3 above, and after step 5, sort this new column by ascending order before deleting it in step 6.

If you data is in located in column A, this formula should do what you need fairly efficiently:
=COUNTIF(A$1:A$100000,A1)
This formula counts how many times the value in A1 appears in the range A1:A100000. (The dollar signs keep that range from moving down as your drag your formula down.)
Place this in B1 and drag down* to B100000 (assuming you have 100,000 rows).
Then just do a filter on column B to show only 1. (More than 1 means you have duplicates and shouldn't show it.)
*A short cut for dragging down is to just select B1, then press Ctrl-End, then hold down shift and click B100000. Then do Ctrl-D (which is a shortcut for Fill Down).

Aggregate words ending with a set of patterns into a new sheet

I have Excel 2010 and I am trying to consolidate a list of words.
Sheet1 contains a column with a list of words such as 'ing', 'ed', 'en'....
Sheet2 contains a column with words such as 'Flying', 'Opening', 'Taken', 'Baked', 'Awaken' etc.
Sheet3, for every column value in Sheet1, I want a function that compares the word with all the words from Sheet2 and collects them together in sheet3.
So, Sheet3 should have 3 columns after running this
first column should have 'Flying', 'Opening' - all words ending with 'ing'
second column should have 'Baked' - all words ending with 'ed'
third column should have 'Taken', "Awaken' - all words ending with 'en'
I tried using the formula below for one of the words and it seems to work for one word at a time but I am not sure how to automate this - do I need to use a macro for this ?
=IF(RIGHT(A1:A7, 3)="ing", A1:A7, "")
Can anyone please suggest how to approach this ?
Regards,
Sonu

Ignore my request for clarification - I have re-read and now understand your question.
The following code will work for what you describe. It assumes the following:
The list of word endings in Sheet1 and the list of words in sheet2
are both contiguous lists starting from cell A1 and extending
downwards.
Cell A1 in Sheet2 contains a column heading, e.g. "My Words" which will not be copied to the columns of words in Sheet3. If you want the heading copied, remove .Offset(1,0) from line 13.
There are no more entries in the list on Sheet1 than there are columns available to you in your version of Excel (e.g. XL2003=256)
Sheet3 is empty prior to the macro starting
You have not changed the default VBA codenames of the 3 worksheet
objects
The sheets are not protected
.
Sub extract()
Dim rFilt As Range
Dim rSrc As Range
Dim iTgtCol As Integer
If Sheet2.FilterMode Then Sheet2.ShowAllData
If Not Sheet2.AutoFilterMode Then Sheet2.Cells(1).CurrentRegion.AutoFilter
Set rSrc = Sheet2.Cells(1).CurrentRegion.Columns(1)
iTgtCol = 0
For Each rFilt In Sheet1.Cells(1).CurrentRegion.Columns(1).Cells
iTgtCol = iTgtCol + 1
With rSrc
.AutoFilter field:=1, Criteria1:="=*" & rFilt
.Offset(1, 0).SpecialCells(xlCellTypeVisible).Copy Sheet3.Cells(iTgtCol)
End With
Next rFilt
Sheet2.ShowAllData
End Sub

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string