VBA Remove duplicates taking 30 minutes to run with no idea why - excel

So the code below copies data from two columns in one sheet. Pastes these so that values are displayed in another sheet (because one column is a formula =Left(Column+1,4)) and then attempts to run a remove duplicates across the two columns that are pasted.
This takes roughly 30 minutes to run on what is essentially 100k cells (2 columns of 50k rows each).
This is what I've been using
Sub ProjTrending1()
Dim s1 As Worksheet, s2 As Worksheet
Dim St As Date, Et As Date
Dim Tt As Double
St = Time
Application.ScreenUpdating = False
'Defines S1 as a Worksheet
Set s1 = Sheets("All Data")
'Defines S2 as WorkSheet
Set s2 = Sheets("Workings")
'Defines LastR1
Dim LR1 As Long
Dim LR2 As Long
'Finds last row cell working sheet
LR2 = s1.Cells(Rows.Count, 10).End(xlUp).Row
'Takes Data from Order Column of defined data Sheet and copy & pastes it to Working Sheet Column B
s1.Range("J1:J" & LR2).Copy s2.Range("A1")
s1.Range("e1:e" & LR2).Copy
s2.Range("b1").PasteSpecial Paste:=xlPasteValues
LR1 = s2.Range("A1").CurrentRegion.Rows.Count
'Removes Duplicates from Column B Working sheet
s2.Range("A2:B" & LR1).RemoveDuplicates Columns:=Array(1, 2), Header:=xlNo
'Copies the formula from C2 and applies it to all cells in column C where column A has values (simple concatenate + countifs(B$2:B2,B2)
s2.Range("C2").Copy s2.Range("C2:C" & LR1)
Et = Time
Tt = (Et - St) * 24 * 60 * 60
MsgBox Timetaken
End Sub
I've also tried using a dictionary to do this but I'm new to dictionaries so whilst the code looks good compared to my usual attempts its because its taken from a couple of different sources. (Copied and Pasted the data to sheet2 incase this overwrote the source data)
Sub M_delete_duplicates()
sn = Sheets("Sheet2").Cells(1).CurrentRegion.Resize(, 5)
With CreateObject("scripting.dictionary")
For j = 1 To UBound(sn)
.Item(sn(j, 1)) = Application.Index(sn, j, 0)
Next
Sheets("Sheet2").Cells(1, 4).Resize(.Count, UBound(sn, 2)) = Application.Index(.Items, 0, 0)
End With
End Sub
This is as slow to run and it only does remove duplicates based on single column and I need it to operate on two columns. The potential way around this is to concatenate the two columns of data and run the remove duplicates once and then break the data using =right(Value,X)
If wanted to do it manually it takes 30 seconds max. It makes no sense to me as to why it takes so long to run.
Can anyone help with why this might be taking so long to run? and how I might modify the dictionary code to remove duplicates over two columns?
Thanks in advance

Updated from my comment. This uses a dictionary to track which rows have been added and then copies unique rows across to the destination sheet. You may want to modify it a bit for your use (e.g. update sheet names) Always test this first on a copy of your data set or make a back up before running code
Option Explicit
Public Sub ExampleRemoveDuplicates()
Dim dict As Object
Dim temp As String
Dim calc As String
Dim headers As Variant
Dim NoCol As Long, NoRow As Long, i As Long, j As Long
Dim c, key
With Application
.ScreenUpdating = False
calc = .Calculation
.Calculation = xlCalculationManual
End With
Set dict = CreateObject("Scripting.Dictionary")
' Change this to the sheet that is applicable
With Sheet1
NoCol = .Cells(1, .Columns.Count).End(xlToLeft).Column
' Assumes first row of sheet is headers
headers = .Range(.Cells(1, 1), .Cells(1, NoCol)).Value2
' Change this to destination sheet
With Sheet2
.Cells.Clear
.Range(.Cells(1, 1), .Cells(1, NoCol)).Value2 = headers
End With
For Each c In .Range(.Cells(2, 1), .Cells(.Cells(.Rows.Count, 1).End(xlUp).Row, 1))
ReDim arr(1 To NoCol)
j = 1
Do
arr(j) = c.Offset(0, j - 1).Value2
j = j + 1
Loop Until j = NoCol + 1
temp = Join(arr, "//")
If Not dict.exists(temp) And Not temp = vbNullString Then
dict.Add key:=temp, Item:=arr
' Change this to destination sheet
With Sheet2
NoRow = .Cells(.Rows.Count, 1).End(xlUp).Row + 1
.Range(.Cells(NoRow, 1), .Cells(NoRow, NoCol)).Value2 = arr
End With
End If
Next c
End With
i = 1
ReDim Results(1 To dict.Count, 1 To NoCol)
For Each key In dict.keys
For j = 1 To NoCol
Results(i, j) = dict(key)(j)
Next j
i = i + 1
Next key
' Change this to destination sheet
With Sheet2.Cells(1, 1)
.Range(.Offset(1, 0), .Offset(dict.Count, NoCol - 1)) = Results
End With
With Application
.Calculation = calc
.ScreenUpdating = True
End With
End Sub

Related

Excel VBA - add rows in dependence of a value in a cell

I have a table with information in column A and an appropriate value in column B. I want to write a macro that inserts a new row for each "Person" in dependence of the value in column B and copies the original information into that row, which for example means that in the end there are 5 rows with "Person A", 2 rows for "Person B" etc.
original table:
result:
My first approach looks like that. It doesn't work.
Dim i, j, k As Integer
For i = Range("A" & Range("A:A").Rows.Count).End(xlUp).Row To 1 Step -1
For j = 1 To Range("B" & i)
Rows(i).Select
Selection.Insert Shift:=xlDown
k = k + j
Range(Cells(k, 1), Cells(k, 2)).Copy Destination:=Range("A" & i)
Next j
Next i
This would work for you, changing the number of inserts based on value in column B:
Option Explicit
Sub test()
With Sheets(1)
Dim lastRow As Long: lastRow = .Cells(.Rows.Count, 1).End(xlUp).Row
Dim i As Long
For i = lastRow To 1 Step -1
If IsNumeric(.Cells(i, 2).Value) = True Then
Dim numberOfInserts As Long
numberOfInserts = .Cells(i, 2).Value - 1
If numberOfInserts > 0 Then
Dim insertCount As Long
For insertCount = 1 To numberOfInserts
.Rows(i).Copy
.Rows(i).Insert
Next insertCount
End If
End If
Next i
End With
End Sub
First we check that you're dealing with numbers. Second you have a single line already, so number -1, then that this number is >0. Lastly, you insert via a loop which does the counting for you.
Test data:
Output after running:
Your index calculation is messed up. Use the debugger, step thru the code (F8) and notice what happens:
a) Your Select/Insert-construct creates a new row above the row you want to copy, not below.
b) Your calculation of index k fails: You are not initializing k, so it starts with value 0. Than you add j (1..3) to k, resulting in values 1, 3, 6, and copy data from that line.
I would suggest you take a different approach: Copy the original data into an array and then loop over that array. This avoids multiple Select, Copy and Insert statements (that are slow) and allow to copy the data from top to bottom.
Sub copy()
Dim rowCount As Long
Dim data As Variant
With ActiveSheet ' Replace with the sheet you want to work with
' Copy the current table into array
rowCount = .Cells(.Rows.Count, 1).End(xlUp).row
data = .Range(.Cells(1, 1), .Cells(rowCount, 2))
Dim oldRow As Long, newRow As Long
newRow = 1
' Loop over old data
For oldRow = 1 To rowCount
Dim repeatCount As Long
repeatCount = Val(data(oldRow, 2)) ' We want to have so many occurrences of the row
if repeatCount <= 0 Then repeatCount=1
Dim col As Long
' Create "repeatCount" rows of data (copy column by column)
For col = 1 To 2
.Cells(newRow, col).Resize(repeatCount, 1) = data(oldRow, col)
Next col
newRow = newRow + repeatCount
Next
End With
End Sub

Find the maximum consecutive repeated value on the bases of two columns

I need the expert help in VBA as I am new. Actually I am looking for Vba code for Consecutive Count on the bases of two column (Serial Number and Alert Code) on button click event. The Column row are not fixed (dynamically change). The Consecutive count is maximum repeat count for Alert Code per Serial number. This should display in output worksheet as per max repeat Alert count per Serial number
Input Worksheet:
Expected Output :
The repeat count work as below pattern from Input sheet (Just for reference only).
Mine source code as below but this does not reference the 1st Column Serial Number (This only work for One column like AlertCode) :
Sub ConsecutiveCount()
Dim lr As Long, c As Range, a As Long
Application.ScreenUpdating = False
lr = Worksheets("Count2").Cells(Rows.Count, 1).End(xlUp).Row
For Each c In Range("B2:B" & lr)
If c.Value <> c.Offset(1).Value Then
a = Cells(c.Row, 3).End(xlUp).Row
' Range(Cells(c.Row, 4), Cells(c.Row, 4).End(xlUp).Offset(1)).Value = c.Row - a
Cells(c.Row, 3).Value = c.Row - a
Else
End If
Next c
Application.ScreenUpdating = True
End Sub
Current Output (Serial number not included)
Screenshot(s) / here(♪) refers:
Named ranges/setup
First, define a couple of named ranges to assist with referencing / formulating in VBA:
Name: range_data: dynamic range that references the two columns of interest (here, col 1&2 in Sheet1):
Refers to: =Sheet1!$D$3:OFFSET(Sheet1!$E$3,COUNTA(Sheet1!$E$3:$E$99995)-1,0,1,1)
Name: range_summary_startcell: a static range that references the desired upper-left cell of the output table / summary.
Refers to: =Sheet1!$G$3
The summary table itself shall comprise a number of rows (depending upon range_data) and 3 columns (given the input/Q) - this will be produced by the macro (code below) and can be seen in screenshot above (G3:I5) - the macro functions shall determine the appropriate dimensions automatically
Code
With these two named ranges (i.e. 'range_data' & 'range_summary_startcell') defined, the following VB code produces the desired output per your Q:
Sub Macro_Summary()
'
'JB_007 07/01/2022
'
'
Application.ScreenUpdating = True
Range("range_summary_startcell").Select
ActiveCell.Formula2R1C1 = "=UNIQUE(range_data)"
ActiveSheet.Calculate
x = ActiveCell.End(xlDown).Row
Set range_count = ActiveCell.Offset(0, 2)
range_count.Select
range_count.Formula2R1C1 = _
"=COUNTIFS(INDEX(range_data,0,2),RC[-1],INDEX(range_data,0,1),RC[-2])"
Selection.AutoFill Destination:=Range(range_count, range_count.Offset(x - range_count.Row))
ActiveSheet.Calculate
End Sub
Caveats: assumes you have Office 365 compatible version of Excel
GIF - Running Macro
Notes (♪) saved as macro-free workbook for your own security if you wish to download underlying workbook - otherwise identical to screenshot(s) in this proposed soln.
Sub ConsecutiveCount()
Dim srcLastRow As Long, cntConsec As Long, i As Long
Dim rng As Range
Dim srcArr() As Variant
Dim srcSht As Worksheet
Dim destsht As Worksheet
Dim destArr() As Variant
Dim combID As String
Dim splitID As Variant
Application.ScreenUpdating = False
Set srcSht = Worksheets("Input")
Set destsht = Worksheets("Output")
With srcSht
srcLastRow = .Cells(.Rows.Count, "A").End(xlUp).Row + 1 ' include 1 blank line
srcArr = .Range(.Cells(2, "A"), .Cells(srcLastRow, "B"))
End With
Dim dict As Object
Dim dKey As Variant
Set dict = CreateObject("Scripting.dictionary")
cntConsec = 0
For i = LBound(srcArr) To UBound(srcArr)
cntConsec = cntConsec + 1
If i <> UBound(srcArr) Then
If srcArr(i, 1) <> srcArr(i + 1, 1) Or srcArr(i, 2) <> srcArr(i + 1, 2) Then
combID = srcArr(i, 1) & "|" & srcArr(i, 2)
If dict.Exists(combID) Then
' check if sum is more
If dict(combID) < cntConsec Then ' If new max for combination
dict(combID) = cntConsec
End If
Else
' add to dictionary
dict(combID) = cntConsec
End If
cntConsec = 0
End If
End If
Next i
ReDim destArr(1 To dict.Count, 1 To 3)
i = 0
For Each dKey In dict.keys
splitID = Split(dKey, "|")
i = i + 1
destArr(i, 1) = splitID(0)
destArr(i, 2) = splitID(1)
destArr(i, 3) = dict(dKey)
Next dKey
destsht.Range("A2").Resize(UBound(destArr), 3).Value = destArr
Application.ScreenUpdating = True
End Sub

Need help to optimize the Excel VBA code that aggregates duplicates

Below is my source table
Name Sales
---------------------------------
Thomas 100
Jay 200
Thomas 100
Mathew 50
Output I need is as below
Name Sales
---------------------------------
Thomas 200
Jay 200
Mathew 50
Basically, I have 2 columns that can have duplicates and I need to aggregate the second column based on first column.
Current code I have is as below. Its working perfectly fine. It takes around 45 seconds to run for 4500 records. I was wondering if there is a more efficient way to do this... as it seems to be a trivial requirement.
'Combine duplicate rows and sum values
Dim Rng As Range
Dim LngRow As Long, i As Long
LngLastRow = lRow 'The last row is calculated somewhere above...
'Initializing the first row
i = 1
'Looping until blank cell is encountered in first column
While Not Cells(i, 1).Value = ""
'Initializing range object
Set Rng = Cells(i, 1)
'Looping from last row to specified first row
For LngRow = LngLastRow To (i + 1) Step -1
'Checking whether value in the cell is equal to specified cell
If Cells(LngRow, 1).Value = Rng.Value Then
Rng.Offset(0, 1).Value = Rng.Offset(0, 1).Value + Cells(LngRow, 2).Value
Rows(LngRow).Delete
End If
Next LngRow
i = i + 1
Wend
Note that this is part of a larger excel app and hence I definitely need the solution to be in Excel VBA.
Here you go:
Option Explicit
Sub Consolidate()
Dim arrData As Variant
Dim i As Long
Dim Sales As New Scripting.Dictionary 'You will need the library Microsoft Scripting Runtime
Application.ScreenUpdating = False 'speed up the code since excel won't show you what is happening
'First of all, working on arrays always speeds up a lot the code because you are working on memory
'instead of working with the sheets
With ThisWorkbook.Sheets("YourSheet") 'change this
i = .Cells(.Rows.Count, 1).End(xlUp).Row 'last row on column A
arrData = .Range("A2", .Cells(i, 2)).Value 'here im assuming your row 1 has headers and we are storing the data into an array
End With
'Then we create a dictionary with the data
For i = 1 To UBound(arrData) 'from row 2 to the last on Q1 (the highest)
If Not Sales.Exists(arrData(i, 1)) Then
Sales.Add arrData(i, 1), arrData(i, 2) 'We add the worker(Key) with his sales(Item)
Else
Sales(arrData(i, 1)) = Sales(arrData(i, 1)) + arrData(i, 2) 'if the worker already exists, sum his sales
End If
Next i
'Now you have all the workers just once
'If you want to delete column A and B and just leave the consolidate data:
With ThisWorkbook.Sheets("YourSheet") 'change this
i = .Cells(.Rows.Count, 1).End(xlUp).Row 'last row on column A
.Range("A2:B" & i).ClearContents
.Cells(2, 1).Resize(Sales.Count) = Application.Transpose(Sales.Keys) 'workers
.Cells(2, 2).Resize(Sales.Count) = Application.Transpose(Sales.Items) 'Their sales
End With
Application.ScreenUpdating = True 'return excel to normal
End Sub
To learn everything about dictionaries (and more) check this
With data in cols A and B like:
Running this short macro:
Sub KopyII()
Dim cell As Range, N As Long
Columns("A:A").Copy Range("C1")
ActiveSheet.Range("C:C").RemoveDuplicates Columns:=1, Header:=xlNo
N = Cells(Rows.Count, "C").End(xlUp).Row
Range("B1").Copy Range("D1")
Range("D2:D" & N).Formula = "=SUMPRODUCT(--(A:A= C2),(B:B))"
End Sub
will produce this in cols C and D:
NOTE:
This relies on Excel's builtin RemoveDuplicates feature.
EDIT#1:
As chris neilsen points out, this function should be a bit quicker to evaluate:
Sub KopyIII()
Dim cell As Range, N As Long, A As Range, C As Range
Set A = Range("A:A")
Set C = Range("C:C")
A.Copy C
C.RemoveDuplicates Columns:=1, Header:=xlNo
N = Cells(Rows.Count, "C").End(xlUp).Row
Range("B1").Copy Range("D1") ' the header
Range("D2:D" & N).Formula = "=SUMIFS(B:B,A:A,C2)"
End Sub

Delete rows sub very slow to process

I built a macro in Excel that stores input from multiple input tabs into a database (table format). As part of the macro I included a Sub to delete any previous entries for a given year (CYear) before writing new entries for that year.
This was working fine until the size of the workbook increased to about 10MB. The following part of the code now takes >1 hour to run. Is there any other method which might be faster?
Application.ScreenUpdating = False and Application.Calculation = xlCalculationManual are included as part of the larger Sub, r will approach some thousands of rows.
Dim r As Long
Sheets("Database").Activate
For r = ActiveSheet.UsedRange.Rows.Count To 1 Step -1
If Cells(r, "G") = Range("C5") Then
ActiveSheet.Rows(r).EntireRow.Delete
End If
Next
Deleting something in a Worksheet is a rather slow operation, and depending on how many rows you want to delete (and it seems to be a lot), you should collect everything that should be deleted in a Range-Variable and delete it all at once.
One additional aspect is that UsedRange is not always reliable, and if you are unlucky, the macro checks everything from the very last possible row (=1048576), which could also be an issue. The construct .Cells(.Rows.Count, "G").End(xlUp).row will get the row number of the last used row in Col 'G'.
Try the following code
Sub del()
Dim r As Long
Dim deleteRange As Range
Set deleteRange = Nothing
With ThisWorkbook.Sheets(1)
For r = .Cells(.Rows.Count, "G").End(xlUp).row To 1 Step -1
If .Cells(r, "G") = .Range("C5") Then
If deleteRange Is Nothing Then
Set deleteRange = .Cells(r, "G")
Else
Set deleteRange = Union(deleteRange, .Cells(r, "G"))
End If
End If
Next
End With
If Not deleteRange Is Nothing Then
deleteRange.EntireRow.Delete
End If
End Sub
Hey bob I found that when you work with thousands of rows or hundreds of thousands you may want to try arrays. They are insanely fast to do the same as you would on the sheet
Try this:
Sub DeleteRows()
Dim arr, arr1, yeartocheck As Integer, yearchecked As Integer, ws As Worksheet, i As Long, j As Long, x As Long
Set ws = ThisWorkbook.Sheets("DataBase")
yeartocheck = ws.Range("C5")
arr = ws.UsedRange.Value 'the whole sheet allocated on memory
ReDim arr1(1 To UBound(arr), 1 To UBound(arr, 2)) 'lets define another array as big as the first one
For i = 1 To UBound(arr1, 2) 'headers for the final array
arr1(1, i) = arr(1, i)
Next i
x = 2 'here starts the data on the final array (1 is for the headers)
For i = 2 To UBound(arr) 'loop the first array looking to match your condition
yearchecked = arr(i, 7)
If yearchecked <> yeartocheck Then 'if they don't match, the macro will store that row on the final array
For j = 1 To UBound(arr, 2)
arr1(x, j) = arr(i, j)
Next j
x = x + 1 'if we store a new row, we need to up the x
End If
Next i
With ws
.UsedRange.ClearContents 'clear what you have
.Range("A1", .Cells(UBound(arr1), UBound(arr, 2))).Value = arr1 'fill the sheet with all the data without the CYear
End With
End Sub

Countif on sheet with 700k rows freezes program

I currently have two lists. A list of "Grantors" in column A and the same list with duplicates removed in column B. I am trying to get a count of how many times a given Grantor is in Column A using countif however my list in Column A is over 700k rows. I am using 64bit excel but every time I run code to do this excel freezes and crashes.
Is there a way to do this in excel or do I need to take another approach like using a pivot table or creating tables in access?
I have written a few sub routines but this is the latest, got from another post on this forum.
Sub Countif()
Dim lastrow As Long
Dim rRange As Range
Dim B As Long '< dummy variable to represent column B
B = 2
With Application
.ScreenUpdating = False 'speed up processing by turning off screen updating
.DisplayAlerts = False
End With
'set up a range to have formulas applied
With Sheets(2)
lastrow = Cells(Rows.Count, "A").End(xlUp).Row
Set rRange = .Range(.Cells(2, B), .Cells(lastrow, B))
End With
'apply the formula to the range
rRange.Formula = "=COUNTIF($A$2:$A$777363,C2)"
'write back just the value to the range
rRange.Value = rRange.Value
With Application
.ScreenUpdating = True
.DisplayAlerts = True
End With
End Sub
Something like this:
Sub Countif()
Dim allVals, uniqueVals, i As Long, dict, v, dOut(), r As Long
''creating dummy data
' With Sheet2.Range("A2:A700000")
' .Formula = "=""VAL_"" & round(RAND()*340000,0)"
' .Value = .Value
' End With
'
'get the raw data and unique values
With Sheet2
allVals = .Range("A2:A" & .Cells(.Rows.Count, "A").End(xlUp).Row).Value
uniqueVals = .Range("B2:B" & .Cells(.Rows.Count, "B").End(xlUp).Row).Value
End With
ReDim dOut(1 To UBound(uniqueVals, 1), 1 To 1) 'for counts...
Set dict = CreateObject("scripting.dictionary")
'map unique value to index
For i = 1 To UBound(uniqueVals, 1)
v = uniqueVals(i, 1)
If Len(v) > 0 Then dict(v) = i
Next i
'loop over the main list and count each unique value in colB
For i = 1 To UBound(allVals, 1)
v = allVals(i, 1)
If Len(v) > 0 Then
If dict.exists(v) Then
r = dict(v)
dOut(r, 1) = dOut(r, 1) + 1
End If
End If
Next i
'output the counts
Sheet2.Range("C2").Resize(UBound(dOut, 1), 1).Value = dOut
End Sub
Runs in ~30sec with 700k values in A and 300k uniques in B
... or maybe this.
Caution: this overwrites the de-duplicated values in column A of the target worksheet.
Option Explicit
Sub countUnique()
Dim arr As Variant, i As Long, dict As Object
Debug.Print Timer
Set dict = CreateObject("scripting.dictionary")
dict.comparemode = vbTextCompare
With Worksheets("sheet2")
arr = .Range(.Cells(2, "A"), .Cells(.Rows.Count, "A").End(xlUp)).Value2
End With
For i = LBound(arr, 1) To UBound(arr, 1)
dict.Item(arr(i, 1)) = dict.Item(arr(i, 1)) + 1
Next i
With Worksheets("sheet3")
.Cells(2, "A").Resize(dict.Count, 1) = bigTranspose(dict.keys)
.Cells(2, "B").Resize(dict.Count, 1) = bigTranspose(dict.items)
End With
Debug.Print Timer
End Sub
Function bigTranspose(arr1 As Variant)
Dim t As Long
ReDim arr2(LBound(arr1) To UBound(arr1), 1 To 1)
For t = LBound(arr1) To UBound(arr1)
arr2(t, 1) = arr1(t)
Next t
bigTranspose = arr2
End Function
42.64 seconds for 700K originals and 327K uniques on a Surface Pro tablet. This might be improved by turning off calculation and enableevents. Screenupdating really shouldn't be an issue.

Resources