Need help to optimize the Excel VBA code that aggregates duplicates - excel

Below is my source table
Name Sales
---------------------------------
Thomas 100
Jay 200
Thomas 100
Mathew 50
Output I need is as below
Name Sales
---------------------------------
Thomas 200
Jay 200
Mathew 50
Basically, I have 2 columns that can have duplicates and I need to aggregate the second column based on first column.
Current code I have is as below. Its working perfectly fine. It takes around 45 seconds to run for 4500 records. I was wondering if there is a more efficient way to do this... as it seems to be a trivial requirement.
'Combine duplicate rows and sum values
Dim Rng As Range
Dim LngRow As Long, i As Long
LngLastRow = lRow 'The last row is calculated somewhere above...
'Initializing the first row
i = 1
'Looping until blank cell is encountered in first column
While Not Cells(i, 1).Value = ""
'Initializing range object
Set Rng = Cells(i, 1)
'Looping from last row to specified first row
For LngRow = LngLastRow To (i + 1) Step -1
'Checking whether value in the cell is equal to specified cell
If Cells(LngRow, 1).Value = Rng.Value Then
Rng.Offset(0, 1).Value = Rng.Offset(0, 1).Value + Cells(LngRow, 2).Value
Rows(LngRow).Delete
End If
Next LngRow
i = i + 1
Wend
Note that this is part of a larger excel app and hence I definitely need the solution to be in Excel VBA.

Here you go:
Option Explicit
Sub Consolidate()
Dim arrData As Variant
Dim i As Long
Dim Sales As New Scripting.Dictionary 'You will need the library Microsoft Scripting Runtime
Application.ScreenUpdating = False 'speed up the code since excel won't show you what is happening
'First of all, working on arrays always speeds up a lot the code because you are working on memory
'instead of working with the sheets
With ThisWorkbook.Sheets("YourSheet") 'change this
i = .Cells(.Rows.Count, 1).End(xlUp).Row 'last row on column A
arrData = .Range("A2", .Cells(i, 2)).Value 'here im assuming your row 1 has headers and we are storing the data into an array
End With
'Then we create a dictionary with the data
For i = 1 To UBound(arrData) 'from row 2 to the last on Q1 (the highest)
If Not Sales.Exists(arrData(i, 1)) Then
Sales.Add arrData(i, 1), arrData(i, 2) 'We add the worker(Key) with his sales(Item)
Else
Sales(arrData(i, 1)) = Sales(arrData(i, 1)) + arrData(i, 2) 'if the worker already exists, sum his sales
End If
Next i
'Now you have all the workers just once
'If you want to delete column A and B and just leave the consolidate data:
With ThisWorkbook.Sheets("YourSheet") 'change this
i = .Cells(.Rows.Count, 1).End(xlUp).Row 'last row on column A
.Range("A2:B" & i).ClearContents
.Cells(2, 1).Resize(Sales.Count) = Application.Transpose(Sales.Keys) 'workers
.Cells(2, 2).Resize(Sales.Count) = Application.Transpose(Sales.Items) 'Their sales
End With
Application.ScreenUpdating = True 'return excel to normal
End Sub
To learn everything about dictionaries (and more) check this

With data in cols A and B like:
Running this short macro:
Sub KopyII()
Dim cell As Range, N As Long
Columns("A:A").Copy Range("C1")
ActiveSheet.Range("C:C").RemoveDuplicates Columns:=1, Header:=xlNo
N = Cells(Rows.Count, "C").End(xlUp).Row
Range("B1").Copy Range("D1")
Range("D2:D" & N).Formula = "=SUMPRODUCT(--(A:A= C2),(B:B))"
End Sub
will produce this in cols C and D:
NOTE:
This relies on Excel's builtin RemoveDuplicates feature.
EDIT#1:
As chris neilsen points out, this function should be a bit quicker to evaluate:
Sub KopyIII()
Dim cell As Range, N As Long, A As Range, C As Range
Set A = Range("A:A")
Set C = Range("C:C")
A.Copy C
C.RemoveDuplicates Columns:=1, Header:=xlNo
N = Cells(Rows.Count, "C").End(xlUp).Row
Range("B1").Copy Range("D1") ' the header
Range("D2:D" & N).Formula = "=SUMIFS(B:B,A:A,C2)"
End Sub

Related

Search match twice 2 keywords same column and copy result to another sheet

I am stuck i don't know what code to use so i can search the same column twice for 2 different keyword and then copy data from the same row to another spreadsheet in sequence from a start cell. for details here's what i am trying to do.
Limit the search within a range of the worksheet (ex. Sheet 1 B1:N:200)
Search the 8th column (I) of the limit range Sheet1 for keyword ("Goods")
Copy the data found in the 2nd (C) and 5th column (F) of same row where instance "Goods " is found
Paste Value of Sheet 1 - column 2 to Sheet2 - Column 3 (no format values only), and Sheet 1 column 5 to Sheet 2 Column4 (with format and values) on a specific starting point (ex. Sheet 2 - B3) Next Match Result will be Sheet 2 - B4 and so on
5.Search AGAIN the 8th column of Sheet1 for keyword ("Services") starting from the top (B1:N1)
6.Copy the data found in the 2nd (C) and 5th column (F) of same row where instance "Services" is found
Paste Value of Sheet 1 - column 2 to Sheet2 - Column 3 (no format values only), and Sheet 1 column 5 to Sheet 2 Column4 (with format and values) to next row after the last PASTE from "Goods" was done. (ex last row match paste was C35 and D35 new found value should be paste in C36 a D36)
Ending Output should be all "Goods" results first then "Services" results
I hope i have conveyed what i need clearly
I am trying to work on this code that i found here but i just don't get how to insert the 2nd search loop for services., how to paste on specific cell in sheet2, how to follow the last row for services paste
Sub CopyCells
Dim lngLastRowSht1 As Long
Dim lngLastRowSht2 As Long
Dim counterSht1 As Long
Dim counterSht2 As Long
With Worksheets(1)
lngLastRowSht1 = .Cells(.Rows.Count, 8).End(xlUp).Row
lngLastRowSht2 = Worksheets(2).Cells(Worksheets(2).Rows.Count, 5).End(xlUp).Row
For counterSht1 = 1 To lngLastRowSht1
For counterSht2 = 1 To lngLastRowSht2
If Sheets(1).Range("" & (counterSht1)).Value = "Goods" Then
Sheets(2).Range("B" & (counterSht2), "D" & (counterSht2)).Value = Sheets(1).Range("C" & counterSht1, "D" & counterSht1).Value
End If
Next counterSht2
Next counterSht1
End With
End Sub
Edit1
As per request of sir Chris this is how it should look like
Answer for this Query was best solved by #CDP1802 Worked as needed.
I learned that I needed 2 counters for it to work :) and I also learned how to properly label target destination.
Thank you for this community:)
Increment the target row after each copy.
Option Explicit
Sub CopyCells()
Const ROW_START = 3
Dim wb As Workbook, ws1 As Worksheet, ws2 As Worksheet
Dim n As Long, r As Long, lastrow1 As Long, lastrow2 as Long
Dim keywords, word, t0 As Single: t0 = Timer
keywords = Array("Goods", "Services")
Set wb = ThisWorkbook
Set ws1 = wb.Sheets(1)
Set ws2 = wb.Sheets(2)
lastrow2 = ROW_START
Application.ScreenUpdating = False
With ws1
lastrow1 = .Cells(.Rows.Count, "I").End(xlUp).Row
For Each word In keywords
For r = 1 To lastrow1
If Len(.Cells(r, "I")) = 0 Then
Exit For
ElseIf .Cells(r, "I") = word Then
'Sht1 col 2 to Sht2 Col 3 (no format values only)
'Sht1 col 5 to Sht2 Col 4 (with format and values)
ws2.Cells(lastrow2, "C") = .Cells(r, "B")
ws2.Cells(lastrow2, "D") = .Cells(r, "E")
.Cells(r, "E").Copy
ws2.Cells(lastrow2, "D").PasteSpecial xlPasteFormats
lastrow2 = lastrow2 + 1
n = n + 1
End If
Next
Next
End With
Application.ScreenUpdating = True
MsgBox r - 1 & " rows scanned " & vbLf & n & " rows copied", _
vbInformation, Format(Timer - t0, "0.0 secs")
End Sub
You could make two routines: one for services and one for goods. But that code and the code above isn't very efficient.
Since Services & Goods are in the same column, try using the autofilter:
Sheets(2).UsedRange.autofilter Field:=8, Criteria1:=Array("Goods", "Services"), VisibleDropDown:=False, Operator:=xlFilterValues
Sheets(2).UsedRange.SpecialCells(xlCellTypeVisible).Copy
Sheets(1).Range("A1").PasteSpecial
Application.CutCopyMode = False

VBA - Remove cell that contains word from same column

I've seen similar posts out there but not quite the same and seem to be confused on the results I'm getting...
I essentially need to de-dupe a column on LIKE words, so it's somewhat straightforward but apparently not as easy as I thought.
I have a dataset like soo...
When I run my macro it removes rows (as I intended), but doesn't seem to remove all the rows or the wrong rows...
It actually removes the highlighted/yellow rows
I was thinking it should actually remove something like the bottom rows.. where it would keep "aerospace" but remove "aerospace 2019", since the 2019 is kinda redundant and not applicable to me.
My macro is simple, but I thought it would do the trick... what am I doing wrong?
Sub container()
Dim ws As Worksheet, rw As Long, col As Long, i As Long
Set ws = ActiveSheet 'or whatever
i = 2
'For col = 2 To 5 'placeholder in case multiple columns are needed - remove Set col above
For rw = 2 To ws.Cells(Rows.Count, 1).End(xlUp).Row 'from row 1 til last non-empty row
v = ws.Cells(rw, 2).Value 'set range
If Cells(i, 2).Value Like v Then 'determine if the cell contains the value of the word
Cells(i, 2).EntireRow.Delete 'delete
i = i + 1
End If
Next rw
'Next col
End Sub
After Ron's post I was able to create the below, but appears I'm still stuck. I think I've just been looking at this too long.
Sub container()
Dim ws As Worksheet, rng As Range, i As Long, rw As Long
Set ws = ActiveSheet 'or whatever
Set rng = ws.Range("B2:B" & ws.Cells(ws.Rows.Count, "B").End(xlUp).Row) 'set array range
i = Range("B" & Rows.Count).End(xlUp).Row
For rw = ws.Cells(Rows.Count, 1).End(xlDown).Row To 2
v = ws.Cells(rw, 2).Value
If InStr(1, v, rng) > 0 Then
cell.EntireRow.Delete
i = i - 1
End If
Next rw
End Sub

Paste from list not found in current range to bottom of current range

I have column A that has all existing categories, new categories are listed in column C. I'm trying to determine how to take these new categories, and add them to column "a" if they aren't already in column A. In the example the new categories in column C are added to column A even if there are already in column A. I would also need range("a1") in the if-then line to be a dynamic range since new categories will be added as the code runs. Some constructive criticism would be greatly appreciated as well to help me in the future.
Sub newcategory()
Dim newcatcount As Integer
Dim i As Integer
newcat = Range("c100000").End(xlUp).Row
For i = 1 To newcat
If Cells(i, 3).Value <> Range("a1") Then
Cells(i, 3).Select
Selection.copy
Range("a100000").End(xlUp).Offset(1, 0).Select
ActiveSheet.Paste
End If
Next
End Sub
Please give this a try...
Sub AddNewCategories()
Dim lrA As Long, lrC As Long, i As Long, j As Long
Dim x, y, z(), dict
lrA = Cells(Rows.Count, 1).End(xlUp).Row
lrC = Cells(Rows.Count, 3).End(xlUp).Row
'Array to hold the categories in column A starting from Row1, assuming the categories start from A1. If not, change it accordingly.
x = Range("A1:A" & lrA).Value
'Array to hold the new categories in column C starting from Row1, assuming the categories start from C1. If not, change it accordingly.
y = Range("C1:C" & lrC).Value
Set dict = CreateObject("Scripting.Dictionary")
For i = 1 To UBound(x, 1)
dict.Item(x(i, 1)) = ""
Next i
For i = 1 To UBound(y, 1)
If Not dict.exists(y(i, 1)) Then
dict.Item(y(i, 1)) = ""
j = j + 1
ReDim Preserve z(1 To j)
z(j) = y(i, 1)
End If
Next i
If j > 0 Then
Range("A" & lrA + 1).Resize(j).Value = Application.Transpose(z)
End If
Set dict = Nothing
End Sub
you could use excel built in RemoveDuplicates() function, as follows (mind the comments):
Option Explicit
Sub newcategory()
Dim newcat As Range
With Worksheets("Categories") ' change "Categories" to your actual sheeet name
Set newcat = .Range("C1", .Cells(.Rows.Count, 3).End(xlUp)) ' get the range of all nwe categories in reference sheet column C from row 1 down to last not empty one
.Cells(.Rows.Count, 1).End(xlUp).Resize(newcat.Rows.Count).Value = newcat.Value ' append new categories values below existing categories in column A
.Range("A1", .Cells(.Rows.Count, 1).End(xlUp)).RemoveDuplicates Columns:=Array(1), Header:=xlNo ' remove duplicates
End With
End Sub

VBA Remove duplicates taking 30 minutes to run with no idea why

So the code below copies data from two columns in one sheet. Pastes these so that values are displayed in another sheet (because one column is a formula =Left(Column+1,4)) and then attempts to run a remove duplicates across the two columns that are pasted.
This takes roughly 30 minutes to run on what is essentially 100k cells (2 columns of 50k rows each).
This is what I've been using
Sub ProjTrending1()
Dim s1 As Worksheet, s2 As Worksheet
Dim St As Date, Et As Date
Dim Tt As Double
St = Time
Application.ScreenUpdating = False
'Defines S1 as a Worksheet
Set s1 = Sheets("All Data")
'Defines S2 as WorkSheet
Set s2 = Sheets("Workings")
'Defines LastR1
Dim LR1 As Long
Dim LR2 As Long
'Finds last row cell working sheet
LR2 = s1.Cells(Rows.Count, 10).End(xlUp).Row
'Takes Data from Order Column of defined data Sheet and copy & pastes it to Working Sheet Column B
s1.Range("J1:J" & LR2).Copy s2.Range("A1")
s1.Range("e1:e" & LR2).Copy
s2.Range("b1").PasteSpecial Paste:=xlPasteValues
LR1 = s2.Range("A1").CurrentRegion.Rows.Count
'Removes Duplicates from Column B Working sheet
s2.Range("A2:B" & LR1).RemoveDuplicates Columns:=Array(1, 2), Header:=xlNo
'Copies the formula from C2 and applies it to all cells in column C where column A has values (simple concatenate + countifs(B$2:B2,B2)
s2.Range("C2").Copy s2.Range("C2:C" & LR1)
Et = Time
Tt = (Et - St) * 24 * 60 * 60
MsgBox Timetaken
End Sub
I've also tried using a dictionary to do this but I'm new to dictionaries so whilst the code looks good compared to my usual attempts its because its taken from a couple of different sources. (Copied and Pasted the data to sheet2 incase this overwrote the source data)
Sub M_delete_duplicates()
sn = Sheets("Sheet2").Cells(1).CurrentRegion.Resize(, 5)
With CreateObject("scripting.dictionary")
For j = 1 To UBound(sn)
.Item(sn(j, 1)) = Application.Index(sn, j, 0)
Next
Sheets("Sheet2").Cells(1, 4).Resize(.Count, UBound(sn, 2)) = Application.Index(.Items, 0, 0)
End With
End Sub
This is as slow to run and it only does remove duplicates based on single column and I need it to operate on two columns. The potential way around this is to concatenate the two columns of data and run the remove duplicates once and then break the data using =right(Value,X)
If wanted to do it manually it takes 30 seconds max. It makes no sense to me as to why it takes so long to run.
Can anyone help with why this might be taking so long to run? and how I might modify the dictionary code to remove duplicates over two columns?
Thanks in advance
Updated from my comment. This uses a dictionary to track which rows have been added and then copies unique rows across to the destination sheet. You may want to modify it a bit for your use (e.g. update sheet names) Always test this first on a copy of your data set or make a back up before running code
Option Explicit
Public Sub ExampleRemoveDuplicates()
Dim dict As Object
Dim temp As String
Dim calc As String
Dim headers As Variant
Dim NoCol As Long, NoRow As Long, i As Long, j As Long
Dim c, key
With Application
.ScreenUpdating = False
calc = .Calculation
.Calculation = xlCalculationManual
End With
Set dict = CreateObject("Scripting.Dictionary")
' Change this to the sheet that is applicable
With Sheet1
NoCol = .Cells(1, .Columns.Count).End(xlToLeft).Column
' Assumes first row of sheet is headers
headers = .Range(.Cells(1, 1), .Cells(1, NoCol)).Value2
' Change this to destination sheet
With Sheet2
.Cells.Clear
.Range(.Cells(1, 1), .Cells(1, NoCol)).Value2 = headers
End With
For Each c In .Range(.Cells(2, 1), .Cells(.Cells(.Rows.Count, 1).End(xlUp).Row, 1))
ReDim arr(1 To NoCol)
j = 1
Do
arr(j) = c.Offset(0, j - 1).Value2
j = j + 1
Loop Until j = NoCol + 1
temp = Join(arr, "//")
If Not dict.exists(temp) And Not temp = vbNullString Then
dict.Add key:=temp, Item:=arr
' Change this to destination sheet
With Sheet2
NoRow = .Cells(.Rows.Count, 1).End(xlUp).Row + 1
.Range(.Cells(NoRow, 1), .Cells(NoRow, NoCol)).Value2 = arr
End With
End If
Next c
End With
i = 1
ReDim Results(1 To dict.Count, 1 To NoCol)
For Each key In dict.keys
For j = 1 To NoCol
Results(i, j) = dict(key)(j)
Next j
i = i + 1
Next key
' Change this to destination sheet
With Sheet2.Cells(1, 1)
.Range(.Offset(1, 0), .Offset(dict.Count, NoCol - 1)) = Results
End With
With Application
.Calculation = calc
.ScreenUpdating = True
End With
End Sub

Copying a formula down through x number of rows

I'm at a loss on this and need some help. I've lurked around at answers and have Frankensteined together some code for a macro but it just isn't working.
Here is part of what I have so far:
With ActiveSheet
Firstrow = 1
Lastrow = .Cells(.Rows.Count, "A").End(xlUp).Row
For lrow = Lastrow To Firstrow Step -1
With .Cells(lrow, "G")
Range("G1").Select
ActiveCell.FormulaR1C1 = "=IF(ISNUMBER(RC[1]),RC[1],RC[-1])"
End With
Next lrow
End With
I have a very similar block of code before this that deletes crap from the text files I'm importing and it works perfectly through all the number of rows. When I run the same thing with this formula, it only puts the formula in G1 and doesn't cycle through the rest of the sheet. I've tried this and it works, but copies down through all million plus rows:
ActiveCell.FormulaR1C1 = "=IF(ISNUMBER(RC[1]),RC[1],RC[-1])"
Selection.AutoFill Destination:=Range("G:G")
I've tried this and then run the same code that gets rid of the text file crap but I get an error "End If without block If".
To fill the formula in one cell at a time you need to cycle through them; don't keep relying on the ActiveCell property.
With ActiveSheet
Firstrow = 1
Lastrow = .Cells(.Rows.Count, "A").End(xlUp).Row
For lrow = Lastrow To Firstrow Step -1
.Cells(lrow, "G").FormulaR1C1 = "=IF(ISNUMBER(RC[1]),RC[1],RC[-1])"
Next lrow
End With
But you can speed things up by putting the formula into all of the cells at once.
With ActiveSheet
Firstrow = 1
Lastrow = .Cells(.Rows.Count, "A").End(xlUp).Row
With .Range(.Cells(Firstrow, "G"), .Cells(Lastrow, "G"))
.FormulaR1C1 = "=IF(ISNUMBER(RC[1]),RC[1],RC[-1])"
End With
End With
See How to avoid using Select in Excel VBA macros for more methods on getting away from relying on select and activate to accomplish your goals.
Another version, to dynamically select the columns based on their titles. Comments included.
Dim row As Range
Dim cell As Range
Static value As Integer
'Set row numbers
'Find the starting row. Located using Title of column "Start" plus whatever number of rows.
Dim RowStart As Long
Set FindRow = Range("A:A").Find(What:="Start", LookIn:=xlValues)
RowStart = FindRow.row + 1
'End of the range. Located using a "finished" cell
Dim RowFinish As Long
Set FindRow = Range("A:A").Find(What:="Finished", LookIn:=xlValues)
RowFinish = FindRow.row - 1
'Set range - Goes Cells(Rownumber, Columnnumber)
'Simply ammend RowStart and RowFinish to change which rows you want.
' In your case you need to change the second column number to paste in horizontally.
Set rng = Range(Cells(RowStart, 1), Cells(RowFinish, 1))
'Start the counter from the starting row.
value = RowStart
For Each row In rng.Rows
For Each cell In row.Cells
'Insert relevant formula into each cell in range.
cell.Formula = _
"=IF(ISNUMBER(RC[1]),RC[1],RC[-1])"
'Increment row variable.
value = value + 1
Next cell
Next row

Resources