Delete entire row based on duplicates in Column Y and keep the last record - excel

I am working with a dataset that is refreshed when a sharepoint survey is completed, and the responses to that survey are then exported to a table in Excel. I want to be able to delete an entire row(s) if the ZIP code (string) of the facility reviewed (Column Y) is the same, but I want to keep the most-recent survey response, or that which appears in the higher row value.
For example, row 38 contains a survey response with a ZIP code string of "33138." Row 52 (survey completed more-recently), was also completed for ZIP code "33138." I want to delete row 38, and retain row 52.
Looking for a VBA solution.
#BigBen I've tried this code, which I found on a few discussion boards. Also note, I plan run this from a button on "Dashboard" tab for records on the "data" tab.
Sub deduplicate()
Dim Rng As Range, Dn As Range, n As Long
Dim Lst As Long, nRng As Range
Lst = Range("Y" & Rows.Count).End(xlUp).Row
With CreateObject("scripting.dictionary")
.CompareMode = vbTextCompare
For n = Lst To 1 Step -1
If Not .Exists(Range("Y" & n).Value) Then
.Add Range("Y" & n).Value, Nothing
Else
If nRng Is Nothing Then
Set nRng = Range("Y" & n)
Else
Set nRng = Union(nRng, Range("Y" & n))
End If
End If
Next n
If Not nRng Is Nothing Then nRng.EntireRow.Delete
End With
End Sub
#BigBen, as part of a longer script, I also tried the following code. It sort of worked, but only removed the first instance of a duplicate, and not all duplicate rows.
Worksheets("Data").Activate
Dim lrow As Long
For lrow = Cells(Rows.Count, "Y").End(xlUp).Row To 2 Step -1
If Cells(lrow, "Y") = Cells(lrow, "Y").Offset(-1, 0) Then
Cells(lrow, "Y").Offset(-1, 0).EntireRow.Delete
End If
Next lrow

Based on your comment that the data is in a table (ListObject), something like this could work. This loops from the first to last row, deleting the row if a CountIf on the column, using the current row's value, is greater than 1.
Sub DedupeZipCodes()
Dim tbl As ListObject: Set tbl = ThisWorkbook.Sheets("Data").ListObjects("Table1")
Dim zipCol As ListColumn: Set zipCol = tbl.ListColumns("Zip Code")
Dim currentRow As Long, lastRow As Long
With zipCol
currentRow = 1
lastRow = .DataBodyRange.Rows.Count
Do While currentRow < lastRow
If Application.CountIf(.DataBodyRange, .DataBodyRange(currentRow).Value) > 1 Then
.DataBodyRange(currentRow).EntireRow.Delete
lastRow = .DataBodyRange.Rows.Count
Else
currentRow = currentRow + 1
End If
Loop
End With
End Sub

Related

VBA - Remove cell that contains word from same column

I've seen similar posts out there but not quite the same and seem to be confused on the results I'm getting...
I essentially need to de-dupe a column on LIKE words, so it's somewhat straightforward but apparently not as easy as I thought.
I have a dataset like soo...
When I run my macro it removes rows (as I intended), but doesn't seem to remove all the rows or the wrong rows...
It actually removes the highlighted/yellow rows
I was thinking it should actually remove something like the bottom rows.. where it would keep "aerospace" but remove "aerospace 2019", since the 2019 is kinda redundant and not applicable to me.
My macro is simple, but I thought it would do the trick... what am I doing wrong?
Sub container()
Dim ws As Worksheet, rw As Long, col As Long, i As Long
Set ws = ActiveSheet 'or whatever
i = 2
'For col = 2 To 5 'placeholder in case multiple columns are needed - remove Set col above
For rw = 2 To ws.Cells(Rows.Count, 1).End(xlUp).Row 'from row 1 til last non-empty row
v = ws.Cells(rw, 2).Value 'set range
If Cells(i, 2).Value Like v Then 'determine if the cell contains the value of the word
Cells(i, 2).EntireRow.Delete 'delete
i = i + 1
End If
Next rw
'Next col
End Sub
After Ron's post I was able to create the below, but appears I'm still stuck. I think I've just been looking at this too long.
Sub container()
Dim ws As Worksheet, rng As Range, i As Long, rw As Long
Set ws = ActiveSheet 'or whatever
Set rng = ws.Range("B2:B" & ws.Cells(ws.Rows.Count, "B").End(xlUp).Row) 'set array range
i = Range("B" & Rows.Count).End(xlUp).Row
For rw = ws.Cells(Rows.Count, 1).End(xlDown).Row To 2
v = ws.Cells(rw, 2).Value
If InStr(1, v, rng) > 0 Then
cell.EntireRow.Delete
i = i - 1
End If
Next rw
End Sub

How to delete a row if every cell in a range contains the same text

Real project sample here: http://s000.tinyupload.com/?file_id=06911274635715855845
Sample here
its all in the title,
Lets say i got a doc with ten columns and three hundred rows, A and B contain a number and C to J can contain many words and sometimes the word "Banana".
I'd like to automate a task that goes line by line on the worksheet and deletes the whole row if every cell between C and J contains "Banana", ignoring A and B.
Usually when i have such a question i submit my ideas but i'm quite stumped here from the get go.
Would you be kind enough to help?
Try the next code, please. It will delete all rows having the same string in columns C to J ("Banana" inclusive...). It would be very fast. The deletion is done at the end, at once:
Edited:
Since, in an worksheet containing tables, the non contiguous entire rows range deletion is not allowed, I adapted the code to test if such a table is involved, intersect the collected range to be deleted (its EntireRow) with the table and delete the intersected table rows.
Please, test next updated code:
Sub testDeleteRowsSameWord()
Dim sh As Worksheet, lastRow As Long, i As Long, rngDel As Range
Set sh = ActiveSheet ' use here your necessary sheet
lastRow = sh.Range("C" & Rows.Count).End(xlUp).Row
For i = 1 To lastRow
If WorksheetFunction.CountIf(sh.Range("D" & i & ":EA" & i), _
sh.Range("D" & i).Value) = 128 Then
If rngDel Is Nothing Then
Set rngDel = sh.Range("A" & i)
Else
Set rngDel = Union(rngDel, sh.Range("A" & i))
End If
End If
Next i
If Not rngDel Is Nothing Then
If sh.ListObjects.Count > 0 Then
If sh.ListObjects.Count > 1 Then MsgBox _
"This solution works only for a table...": Exit Sub
Dim Tbl As ListObject, rngInt As Range
Set Tbl = sh.ListObjects(1)
Set rngInt = Intersect(Tbl.Range, rngDel.EntireRow)
If rngInt.Count > 0 Then
rngInt.Delete xlUp
Else
rngDel.EntireRow.Delete xlUp
End If
Else
rngDel.EntireRow.Delete xlUp
End If
End If
End Sub
They are infinite ways to achieve what you want.
One for example can be something like :
Dim i As Integer, j As Integer
Dim mBanana As Boolean
For i = 299 To 0 Step -1 'rows 1 to 300
mBanana = True
For j = 0 To 7 'columns C to J
If Sheets("nameofyoursheet").Range("C1").Offset(i, j).Value <> "Banana" Then
mBanana = False
End If
Next j
If mBanana = True Then
Sheets("nameofyoursheet").Range("C1").Offset(i, j).EntireRow.Delete
End If
Next i
Note that the numbers of rows and columns are hardcoded in the parameters of the For, you can easily adapt the code.

Need help to optimize the Excel VBA code that aggregates duplicates

Below is my source table
Name Sales
---------------------------------
Thomas 100
Jay 200
Thomas 100
Mathew 50
Output I need is as below
Name Sales
---------------------------------
Thomas 200
Jay 200
Mathew 50
Basically, I have 2 columns that can have duplicates and I need to aggregate the second column based on first column.
Current code I have is as below. Its working perfectly fine. It takes around 45 seconds to run for 4500 records. I was wondering if there is a more efficient way to do this... as it seems to be a trivial requirement.
'Combine duplicate rows and sum values
Dim Rng As Range
Dim LngRow As Long, i As Long
LngLastRow = lRow 'The last row is calculated somewhere above...
'Initializing the first row
i = 1
'Looping until blank cell is encountered in first column
While Not Cells(i, 1).Value = ""
'Initializing range object
Set Rng = Cells(i, 1)
'Looping from last row to specified first row
For LngRow = LngLastRow To (i + 1) Step -1
'Checking whether value in the cell is equal to specified cell
If Cells(LngRow, 1).Value = Rng.Value Then
Rng.Offset(0, 1).Value = Rng.Offset(0, 1).Value + Cells(LngRow, 2).Value
Rows(LngRow).Delete
End If
Next LngRow
i = i + 1
Wend
Note that this is part of a larger excel app and hence I definitely need the solution to be in Excel VBA.
Here you go:
Option Explicit
Sub Consolidate()
Dim arrData As Variant
Dim i As Long
Dim Sales As New Scripting.Dictionary 'You will need the library Microsoft Scripting Runtime
Application.ScreenUpdating = False 'speed up the code since excel won't show you what is happening
'First of all, working on arrays always speeds up a lot the code because you are working on memory
'instead of working with the sheets
With ThisWorkbook.Sheets("YourSheet") 'change this
i = .Cells(.Rows.Count, 1).End(xlUp).Row 'last row on column A
arrData = .Range("A2", .Cells(i, 2)).Value 'here im assuming your row 1 has headers and we are storing the data into an array
End With
'Then we create a dictionary with the data
For i = 1 To UBound(arrData) 'from row 2 to the last on Q1 (the highest)
If Not Sales.Exists(arrData(i, 1)) Then
Sales.Add arrData(i, 1), arrData(i, 2) 'We add the worker(Key) with his sales(Item)
Else
Sales(arrData(i, 1)) = Sales(arrData(i, 1)) + arrData(i, 2) 'if the worker already exists, sum his sales
End If
Next i
'Now you have all the workers just once
'If you want to delete column A and B and just leave the consolidate data:
With ThisWorkbook.Sheets("YourSheet") 'change this
i = .Cells(.Rows.Count, 1).End(xlUp).Row 'last row on column A
.Range("A2:B" & i).ClearContents
.Cells(2, 1).Resize(Sales.Count) = Application.Transpose(Sales.Keys) 'workers
.Cells(2, 2).Resize(Sales.Count) = Application.Transpose(Sales.Items) 'Their sales
End With
Application.ScreenUpdating = True 'return excel to normal
End Sub
To learn everything about dictionaries (and more) check this
With data in cols A and B like:
Running this short macro:
Sub KopyII()
Dim cell As Range, N As Long
Columns("A:A").Copy Range("C1")
ActiveSheet.Range("C:C").RemoveDuplicates Columns:=1, Header:=xlNo
N = Cells(Rows.Count, "C").End(xlUp).Row
Range("B1").Copy Range("D1")
Range("D2:D" & N).Formula = "=SUMPRODUCT(--(A:A= C2),(B:B))"
End Sub
will produce this in cols C and D:
NOTE:
This relies on Excel's builtin RemoveDuplicates feature.
EDIT#1:
As chris neilsen points out, this function should be a bit quicker to evaluate:
Sub KopyIII()
Dim cell As Range, N As Long, A As Range, C As Range
Set A = Range("A:A")
Set C = Range("C:C")
A.Copy C
C.RemoveDuplicates Columns:=1, Header:=xlNo
N = Cells(Rows.Count, "C").End(xlUp).Row
Range("B1").Copy Range("D1") ' the header
Range("D2:D" & N).Formula = "=SUMIFS(B:B,A:A,C2)"
End Sub

Deleting Duplicates while ignoring blank cells in VBA

I have some code in VBA that is attempting to delete duplicate transaction IDs. However, i'd like to ammend the code to only delete duplicates that have a transaction ID - so, if there is no transaction ID, i'd like that row to be left alone. Here is my code below:
With MySheet
newLastRow = .Range("A" & .Rows.Count).End(xlUp).Row
newLastCol = .Cells(5 & .Columns.Count).End(xlToLeft).Column
Set Newrange = .Range(.Cells(5, 1), .Cells(newLastRow, newLastCol))
Newrange.RemoveDuplicates Columns:=32, Header:= _
xlYes
End With
I was also wondering - in the remove.duplicates command - is there a way where I can have the column I want looked at to be named rather than have it be 32 in case I add or remove columns at a later date?
Here is an image of the data: I'd like the ExchTransID column that have those 3 blank spaces to be left alone.
Modify and try the below:
Option Explicit
Sub test()
Dim Lastrow As Long, Times As Long, i As Long
Dim rng As Range
Dim str As String
'Indicate the sheet your want to work with
With ThisWorkbook.Worksheets("Sheet1")
'Find the last row with IDs
Lastrow = .Cells(.Rows.Count, "A").End(xlUp).Row
'Set the range with all IDS
Set rng = .Range("A1:A" & Lastrow)
'Loop column from buttom to top
For i = Lastrow To 1 Step -1
str = .Range("A" & i).Value
If str <> "" Then
Times = Application.WorksheetFunction.CountIf(rng, str)
If Times > 1 Then
.Rows(i).EntireRow.Delete
End If
End If
Next i
End With
End Sub

Deleting duplicates and replacing entries in a row – Excel VBA

In this project, I am looking to delete duplicates based on the ID number by keeping the latest entries. Additionally, I want to keep every cell in Column D and onward from the previous entries. This ultimately means that the latest entries will be replaced in the previous entries’ row. Please see tables below for more clarity:
Based on the example given above, the result I am looking for is to:
Delete duplicates based on the ID from columns A to C and keep the latest entries
Keep Columns D to H from the previous entries
Replace previous entries by the latest ones in the previous entries’ row.
In other words: Update Columns A to C without modifying Columns D to H
So, the initial code that I had was as follow. It only kept the previous entries and kept columns D to H:
Sub Delete_Duplicates()
Sheet5.Range("$A$1:$H$29999").RemoveDuplicates Columns:=Array(1) _
, Header:=xlYes
End Sub
The table below shows what i would obtain:
The next code I did was to keep the newest entries, but this deletes my entries in column D to H:
Sub Delete_Duplicates_2()
Dim Rng As Range, Dn As Range, n As Long
Dim Lst As Long, nRng As Range
Set Rng = Sheet5.Range("$A$2:$H$29999")
Lst = Range("A" & Rows.Count).End(xlUp).Row
With CreateObject("scripting.dictionary")
.CompareMode = vbTextCompare
For n = Lst To 1 Step -1
If Not .Exists(Range("A" & n).Value) Then
.Add Range("A" & n).Value, Nothing
Else
If nRng Is Nothing Then
Set nRng = Range("A" & n)
Else
Set nRng = Union(nRng, Range("A" & n))
End If
End If
Next n
If Not nRng Is Nothing Then
nRng.EntireRow.Delete
End With
End Sub
The table below shows what I would obtain:
I am open to any suggestions and thank you for your help!
Try this solution - since you're essentially working with a string in your date column, we have to split out the number and test to see if it's greater or less than the other week's number:
Option Explicit
Sub Delete_Duplicates()
Dim i As Long, j As Long
Dim id As String, weeknum As Long
For i = Cells(Rows.Count, 1).End(xlUp).Row To 2 Step -1
id = Cells(i, 1).Value
weeknum = Split(Cells(i, 3).Value, " ")(1)
For j = i - 1 To 2 Step -1
If Cells(j, 1).Value = id Then
If Split(Cells(j, 3).Value, " ")(1) < weeknum Then
Rows(j).Delete
i = i - 1
Else
Rows(i).Delete
Exit For
End If
End If
Next j
Next i
End Sub

Resources