VBA: Condense worksheet (multiple cols) to 2 columns based on header name and column value - excel

I have a workbook that contains several sheets of data that I have combined. I removed some unnecessary sheets and cells (that are colour filled) and removed blanks (code sample below). I now have one work sheet with dates as headers and item numbers (col length vary).
I need to condense this again. I need two columns, columns A and B, B for every item number pulled back from the sheet and the Col A needs to be the header name of the column the item number was pulled from. The amount of columns will extend over time as more dates are added.
I just don't know where to go from here... The script is basic 'and then' I have quality checked it and it works up to this point.
Worksheets.Add Sheets(1)
ActiveSheet.Name = "Combined"
For i = 2 To Sheets.Count
Set xRg = Sheets(1).UsedRange
If i > 2 Then
Set xRg = Sheets(1).Cells(xRg.Rows.Count + 1, 1)
End If
Sheets(i).Activate
ActiveSheet.UsedRange.Copy xRg
Next i
Sheets("Data").Delete
For Each ws In Worksheets
If ws.Name <> "Combined" Then
ws.Visible = xlSheetHidden
End If
Next ws
I then have a box pop up to delete specific coloured cells and end with this:
Columns("A:MK").SpecialCells(xlCellTypeBlanks).Delete Shift:=xlUp
I can copy column values over, after the above, to a new sheet but then adding header values based on the last cell in that column reaches my limitations of VBA.
I can't see that this has been asked and answered previously, any ideas?

Try this code
Sub Test()
Dim a, ws As Worksheet, sh As Worksheet, i As Long, j As Long, k As Long
Set ws = ThisWorkbook.Worksheets("Combined")
Set sh = ThisWorkbook.Worksheets("Condensed")
a = ws.Range("A1").CurrentRegion.Value
ReDim b(1 To UBound(a, 1) * UBound(a, 2), 1 To 2)
For j = LBound(a, 2) To UBound(a, 2)
For i = 2 To UBound(a)
k = k + 1
b(k, 1) = a(1, j)
b(k, 2) = a(i, j)
Next i
Next j
With sh.Range("A1")
.Resize(1, 2).Value = Array("Header1", "Header2")
.Offset(1).Resize(k, UBound(b, 2)).Value = b
End With
End Sub

you could use Dictionary object
assuming you want to condense data in a worksheet named "Condensed" already in place
Sub Condense()
Dim cel As Range
Dim dict As Object
Set dict = CreateObject("Scripting.Dictionary")
With Worksheets("Combined")
For Each cel In .Range("A1", .Cells(1, .Columns.Count).End(xlToLeft))
dict.Add cel.Value, .Range(cel.Offset(1), cel.End(xlDown)).Value
Next
End With
Dim key As Variant
With Worksheets("Condensed")
For Each key In dict.keys
With .Cells(.Rows.Count, 1).End(xlUp).Offset(1).Resize(UBound(dict(key)))
.Value = key
.Offset(, 1) = dict(key)
End With
Next
End With
End Sub

Related

Delete rows sub very slow to process

I built a macro in Excel that stores input from multiple input tabs into a database (table format). As part of the macro I included a Sub to delete any previous entries for a given year (CYear) before writing new entries for that year.
This was working fine until the size of the workbook increased to about 10MB. The following part of the code now takes >1 hour to run. Is there any other method which might be faster?
Application.ScreenUpdating = False and Application.Calculation = xlCalculationManual are included as part of the larger Sub, r will approach some thousands of rows.
Dim r As Long
Sheets("Database").Activate
For r = ActiveSheet.UsedRange.Rows.Count To 1 Step -1
If Cells(r, "G") = Range("C5") Then
ActiveSheet.Rows(r).EntireRow.Delete
End If
Next
Deleting something in a Worksheet is a rather slow operation, and depending on how many rows you want to delete (and it seems to be a lot), you should collect everything that should be deleted in a Range-Variable and delete it all at once.
One additional aspect is that UsedRange is not always reliable, and if you are unlucky, the macro checks everything from the very last possible row (=1048576), which could also be an issue. The construct .Cells(.Rows.Count, "G").End(xlUp).row will get the row number of the last used row in Col 'G'.
Try the following code
Sub del()
Dim r As Long
Dim deleteRange As Range
Set deleteRange = Nothing
With ThisWorkbook.Sheets(1)
For r = .Cells(.Rows.Count, "G").End(xlUp).row To 1 Step -1
If .Cells(r, "G") = .Range("C5") Then
If deleteRange Is Nothing Then
Set deleteRange = .Cells(r, "G")
Else
Set deleteRange = Union(deleteRange, .Cells(r, "G"))
End If
End If
Next
End With
If Not deleteRange Is Nothing Then
deleteRange.EntireRow.Delete
End If
End Sub
Hey bob I found that when you work with thousands of rows or hundreds of thousands you may want to try arrays. They are insanely fast to do the same as you would on the sheet
Try this:
Sub DeleteRows()
Dim arr, arr1, yeartocheck As Integer, yearchecked As Integer, ws As Worksheet, i As Long, j As Long, x As Long
Set ws = ThisWorkbook.Sheets("DataBase")
yeartocheck = ws.Range("C5")
arr = ws.UsedRange.Value 'the whole sheet allocated on memory
ReDim arr1(1 To UBound(arr), 1 To UBound(arr, 2)) 'lets define another array as big as the first one
For i = 1 To UBound(arr1, 2) 'headers for the final array
arr1(1, i) = arr(1, i)
Next i
x = 2 'here starts the data on the final array (1 is for the headers)
For i = 2 To UBound(arr) 'loop the first array looking to match your condition
yearchecked = arr(i, 7)
If yearchecked <> yeartocheck Then 'if they don't match, the macro will store that row on the final array
For j = 1 To UBound(arr, 2)
arr1(x, j) = arr(i, j)
Next j
x = x + 1 'if we store a new row, we need to up the x
End If
Next i
With ws
.UsedRange.ClearContents 'clear what you have
.Range("A1", .Cells(UBound(arr1), UBound(arr, 2))).Value = arr1 'fill the sheet with all the data without the CYear
End With
End Sub

Countif on sheet with 700k rows freezes program

I currently have two lists. A list of "Grantors" in column A and the same list with duplicates removed in column B. I am trying to get a count of how many times a given Grantor is in Column A using countif however my list in Column A is over 700k rows. I am using 64bit excel but every time I run code to do this excel freezes and crashes.
Is there a way to do this in excel or do I need to take another approach like using a pivot table or creating tables in access?
I have written a few sub routines but this is the latest, got from another post on this forum.
Sub Countif()
Dim lastrow As Long
Dim rRange As Range
Dim B As Long '< dummy variable to represent column B
B = 2
With Application
.ScreenUpdating = False 'speed up processing by turning off screen updating
.DisplayAlerts = False
End With
'set up a range to have formulas applied
With Sheets(2)
lastrow = Cells(Rows.Count, "A").End(xlUp).Row
Set rRange = .Range(.Cells(2, B), .Cells(lastrow, B))
End With
'apply the formula to the range
rRange.Formula = "=COUNTIF($A$2:$A$777363,C2)"
'write back just the value to the range
rRange.Value = rRange.Value
With Application
.ScreenUpdating = True
.DisplayAlerts = True
End With
End Sub
Something like this:
Sub Countif()
Dim allVals, uniqueVals, i As Long, dict, v, dOut(), r As Long
''creating dummy data
' With Sheet2.Range("A2:A700000")
' .Formula = "=""VAL_"" & round(RAND()*340000,0)"
' .Value = .Value
' End With
'
'get the raw data and unique values
With Sheet2
allVals = .Range("A2:A" & .Cells(.Rows.Count, "A").End(xlUp).Row).Value
uniqueVals = .Range("B2:B" & .Cells(.Rows.Count, "B").End(xlUp).Row).Value
End With
ReDim dOut(1 To UBound(uniqueVals, 1), 1 To 1) 'for counts...
Set dict = CreateObject("scripting.dictionary")
'map unique value to index
For i = 1 To UBound(uniqueVals, 1)
v = uniqueVals(i, 1)
If Len(v) > 0 Then dict(v) = i
Next i
'loop over the main list and count each unique value in colB
For i = 1 To UBound(allVals, 1)
v = allVals(i, 1)
If Len(v) > 0 Then
If dict.exists(v) Then
r = dict(v)
dOut(r, 1) = dOut(r, 1) + 1
End If
End If
Next i
'output the counts
Sheet2.Range("C2").Resize(UBound(dOut, 1), 1).Value = dOut
End Sub
Runs in ~30sec with 700k values in A and 300k uniques in B
... or maybe this.
Caution: this overwrites the de-duplicated values in column A of the target worksheet.
Option Explicit
Sub countUnique()
Dim arr As Variant, i As Long, dict As Object
Debug.Print Timer
Set dict = CreateObject("scripting.dictionary")
dict.comparemode = vbTextCompare
With Worksheets("sheet2")
arr = .Range(.Cells(2, "A"), .Cells(.Rows.Count, "A").End(xlUp)).Value2
End With
For i = LBound(arr, 1) To UBound(arr, 1)
dict.Item(arr(i, 1)) = dict.Item(arr(i, 1)) + 1
Next i
With Worksheets("sheet3")
.Cells(2, "A").Resize(dict.Count, 1) = bigTranspose(dict.keys)
.Cells(2, "B").Resize(dict.Count, 1) = bigTranspose(dict.items)
End With
Debug.Print Timer
End Sub
Function bigTranspose(arr1 As Variant)
Dim t As Long
ReDim arr2(LBound(arr1) To UBound(arr1), 1 To 1)
For t = LBound(arr1) To UBound(arr1)
arr2(t, 1) = arr1(t)
Next t
bigTranspose = arr2
End Function
42.64 seconds for 700K originals and 327K uniques on a Surface Pro tablet. This might be improved by turning off calculation and enableevents. Screenupdating really shouldn't be an issue.

Paste from list not found in current range to bottom of current range

I have column A that has all existing categories, new categories are listed in column C. I'm trying to determine how to take these new categories, and add them to column "a" if they aren't already in column A. In the example the new categories in column C are added to column A even if there are already in column A. I would also need range("a1") in the if-then line to be a dynamic range since new categories will be added as the code runs. Some constructive criticism would be greatly appreciated as well to help me in the future.
Sub newcategory()
Dim newcatcount As Integer
Dim i As Integer
newcat = Range("c100000").End(xlUp).Row
For i = 1 To newcat
If Cells(i, 3).Value <> Range("a1") Then
Cells(i, 3).Select
Selection.copy
Range("a100000").End(xlUp).Offset(1, 0).Select
ActiveSheet.Paste
End If
Next
End Sub
Please give this a try...
Sub AddNewCategories()
Dim lrA As Long, lrC As Long, i As Long, j As Long
Dim x, y, z(), dict
lrA = Cells(Rows.Count, 1).End(xlUp).Row
lrC = Cells(Rows.Count, 3).End(xlUp).Row
'Array to hold the categories in column A starting from Row1, assuming the categories start from A1. If not, change it accordingly.
x = Range("A1:A" & lrA).Value
'Array to hold the new categories in column C starting from Row1, assuming the categories start from C1. If not, change it accordingly.
y = Range("C1:C" & lrC).Value
Set dict = CreateObject("Scripting.Dictionary")
For i = 1 To UBound(x, 1)
dict.Item(x(i, 1)) = ""
Next i
For i = 1 To UBound(y, 1)
If Not dict.exists(y(i, 1)) Then
dict.Item(y(i, 1)) = ""
j = j + 1
ReDim Preserve z(1 To j)
z(j) = y(i, 1)
End If
Next i
If j > 0 Then
Range("A" & lrA + 1).Resize(j).Value = Application.Transpose(z)
End If
Set dict = Nothing
End Sub
you could use excel built in RemoveDuplicates() function, as follows (mind the comments):
Option Explicit
Sub newcategory()
Dim newcat As Range
With Worksheets("Categories") ' change "Categories" to your actual sheeet name
Set newcat = .Range("C1", .Cells(.Rows.Count, 3).End(xlUp)) ' get the range of all nwe categories in reference sheet column C from row 1 down to last not empty one
.Cells(.Rows.Count, 1).End(xlUp).Resize(newcat.Rows.Count).Value = newcat.Value ' append new categories values below existing categories in column A
.Range("A1", .Cells(.Rows.Count, 1).End(xlUp)).RemoveDuplicates Columns:=Array(1), Header:=xlNo ' remove duplicates
End With
End Sub

Excel VBA: How to use Dictionary between two workbooks for Vlookup?

I have a worksheet (in my desktop) which have over 13.000 rows with duplicate data which I add 3 new column with vlookup from another file (called ProductList) located on sharedrive. I tried to use Dictionary but I don't know how can I add a different workbook to my code? I add ProductList to my file as Sheet2 and successfully run the code.
Sub ckv()
Dim Rng As Range, Dn As Range, n As Long, Tic As Object, ray As Variant
'This is normally my ProductList workbook, which has 4 columns
With Sheets("Sheet2")
ray = .Range("A1").CurrentRegion.Resize(, 4)
End With
Set Tic = CreateObject("scripting.dictionary")
Tic.CompareMode = vbTextCompare
For n = 2 To UBound(ray, 1)
Tic(ray(n, 1)) = n
Next
'And this is my file in my desktop
With Sheets("Sheet1")
Set Rng = .Range(.Range("I2"), .Range("I" & Rows.Count).End(xlUp))
For Each Dn In Rng
If Tic.Exists(Dn.Value) Then
Dn.Offset(, -2) = ray(Tic(Dn.Value), 2)
Dn.Offset(, -1) = ray(Tic(Dn.Value), 3)
Dn.Offset(, 1) = ray(Tic(Dn.Value), 4)
End If
Next Dn
End With
End Sub
regards

VBA Remove duplicates taking 30 minutes to run with no idea why

So the code below copies data from two columns in one sheet. Pastes these so that values are displayed in another sheet (because one column is a formula =Left(Column+1,4)) and then attempts to run a remove duplicates across the two columns that are pasted.
This takes roughly 30 minutes to run on what is essentially 100k cells (2 columns of 50k rows each).
This is what I've been using
Sub ProjTrending1()
Dim s1 As Worksheet, s2 As Worksheet
Dim St As Date, Et As Date
Dim Tt As Double
St = Time
Application.ScreenUpdating = False
'Defines S1 as a Worksheet
Set s1 = Sheets("All Data")
'Defines S2 as WorkSheet
Set s2 = Sheets("Workings")
'Defines LastR1
Dim LR1 As Long
Dim LR2 As Long
'Finds last row cell working sheet
LR2 = s1.Cells(Rows.Count, 10).End(xlUp).Row
'Takes Data from Order Column of defined data Sheet and copy & pastes it to Working Sheet Column B
s1.Range("J1:J" & LR2).Copy s2.Range("A1")
s1.Range("e1:e" & LR2).Copy
s2.Range("b1").PasteSpecial Paste:=xlPasteValues
LR1 = s2.Range("A1").CurrentRegion.Rows.Count
'Removes Duplicates from Column B Working sheet
s2.Range("A2:B" & LR1).RemoveDuplicates Columns:=Array(1, 2), Header:=xlNo
'Copies the formula from C2 and applies it to all cells in column C where column A has values (simple concatenate + countifs(B$2:B2,B2)
s2.Range("C2").Copy s2.Range("C2:C" & LR1)
Et = Time
Tt = (Et - St) * 24 * 60 * 60
MsgBox Timetaken
End Sub
I've also tried using a dictionary to do this but I'm new to dictionaries so whilst the code looks good compared to my usual attempts its because its taken from a couple of different sources. (Copied and Pasted the data to sheet2 incase this overwrote the source data)
Sub M_delete_duplicates()
sn = Sheets("Sheet2").Cells(1).CurrentRegion.Resize(, 5)
With CreateObject("scripting.dictionary")
For j = 1 To UBound(sn)
.Item(sn(j, 1)) = Application.Index(sn, j, 0)
Next
Sheets("Sheet2").Cells(1, 4).Resize(.Count, UBound(sn, 2)) = Application.Index(.Items, 0, 0)
End With
End Sub
This is as slow to run and it only does remove duplicates based on single column and I need it to operate on two columns. The potential way around this is to concatenate the two columns of data and run the remove duplicates once and then break the data using =right(Value,X)
If wanted to do it manually it takes 30 seconds max. It makes no sense to me as to why it takes so long to run.
Can anyone help with why this might be taking so long to run? and how I might modify the dictionary code to remove duplicates over two columns?
Thanks in advance
Updated from my comment. This uses a dictionary to track which rows have been added and then copies unique rows across to the destination sheet. You may want to modify it a bit for your use (e.g. update sheet names) Always test this first on a copy of your data set or make a back up before running code
Option Explicit
Public Sub ExampleRemoveDuplicates()
Dim dict As Object
Dim temp As String
Dim calc As String
Dim headers As Variant
Dim NoCol As Long, NoRow As Long, i As Long, j As Long
Dim c, key
With Application
.ScreenUpdating = False
calc = .Calculation
.Calculation = xlCalculationManual
End With
Set dict = CreateObject("Scripting.Dictionary")
' Change this to the sheet that is applicable
With Sheet1
NoCol = .Cells(1, .Columns.Count).End(xlToLeft).Column
' Assumes first row of sheet is headers
headers = .Range(.Cells(1, 1), .Cells(1, NoCol)).Value2
' Change this to destination sheet
With Sheet2
.Cells.Clear
.Range(.Cells(1, 1), .Cells(1, NoCol)).Value2 = headers
End With
For Each c In .Range(.Cells(2, 1), .Cells(.Cells(.Rows.Count, 1).End(xlUp).Row, 1))
ReDim arr(1 To NoCol)
j = 1
Do
arr(j) = c.Offset(0, j - 1).Value2
j = j + 1
Loop Until j = NoCol + 1
temp = Join(arr, "//")
If Not dict.exists(temp) And Not temp = vbNullString Then
dict.Add key:=temp, Item:=arr
' Change this to destination sheet
With Sheet2
NoRow = .Cells(.Rows.Count, 1).End(xlUp).Row + 1
.Range(.Cells(NoRow, 1), .Cells(NoRow, NoCol)).Value2 = arr
End With
End If
Next c
End With
i = 1
ReDim Results(1 To dict.Count, 1 To NoCol)
For Each key In dict.keys
For j = 1 To NoCol
Results(i, j) = dict(key)(j)
Next j
i = i + 1
Next key
' Change this to destination sheet
With Sheet2.Cells(1, 1)
.Range(.Offset(1, 0), .Offset(dict.Count, NoCol - 1)) = Results
End With
With Application
.Calculation = calc
.ScreenUpdating = True
End With
End Sub

Resources