How to speed up the comparision of 2 collections in VBA? - excel

I have written a macro that gets a collection of collection and than takes two of the collections and gives me the similarity.
Now if I compare the two collections with a simple for loop it will take hours to compare all 854 collection that are contained in pCol.
Here is my code:
Function CompareCollections(ByVal pCol As Collection) As Collection
Dim outer As Long
Dim inner As Long
'collections that will be compared to each other
Dim inCol As Collection
Dim outCol As Collection
'collection used for return values
Dim retCol As Collection
'result of single comparison
Dim res As CompResult
'comparison variables
Dim iIdx As Long
Dim oIdx As Long
Dim same As Long
Set retCol = New Collection
For outer = 1 To pCol.Count - 1
Set outCol = pCol(outer)
For inner = outer + 1 To pCol.Count
Set inCol = pCol(inner)
Set res = New CompResult
res.LeftTable = outCol(1) 'index 1 contains a header
res.RightTable = inCol(1)
'compare the two collections <== PART I WANT TO SPEED UP
same = 0
For oIdx = 2 To outCol.Count 'starting with 2 to ignore the header
For iIdx = 2 To inCol.Count
If inCol(iIdx) = outCol(oIdx) Then same = same + 1
Next iIdx
DoEvents
Next oIdx
res.Result1 = same / (outCol.Count - 1)
res.Result2 = same / (inCol.Count - 1)
retCol.Add res
Set res = Nothing
Set inCol = Nothing
DoEvents
Next inner
Set outCol = Nothing
DoEvents
Next outer
Set CompareCollections = retCol
End Function
I really hope you guys can help me.
EDIT:
The CompResult class is a simple structure, because I could not add a custom type to the collection:
Private mLeftTable As String
Private mRightTable As String
Private mResult1 As Double
Private mResult2 As Double
Public Property Get LeftTable() As String
LeftTable = mLeftTable
End Property
Public Property Let LeftTable(value As String)
mLeftTable = value
End Property
Public Property Get RightTable() As String
RightTable = mRightTable
End Property
Public Property Let RightTable(value As String)
mRightTable = value
End Property
Public Property Get Result1() As Double
Result = mResult1
End Property
Public Property Let Result1(value As Double)
mResult1 = value
End Property
Public Property Get Result2() As Double
Result = mResult2
End Property
Public Property Let Result2(value As Double)
mResult2 = value
End Property

A first tip: try to precalculate outCol.Count, inCol.Count and pCol.Count in order to avoid unnecessary calculations.
Second tip: if in your object CompResult the res.Result1 and res.Result2 are integers, use "\" instead of "/".
Third tip: try to use integers instead of long values wherever you can.
Fourth tip: try to replace for loops by a "for each" loops when looping for every column. It seems a little faster.
A last tip might be transform collections (ranges) in arrays and iterate through them, as it seems faster than iterate through ranges.

Related

Why can't I declare my Class Object as such?

I am currently creating a Class Object for a VBA file, its objective is to act as a range dictionary that can be passed single cells. If this cell is contained in one of the ranges, it returns the value associated to the corresponding range key. The class name is "rangeDic".
It is in the making so its functionalities are not implemented yet. Here's the code:
Private zone() As String
Private bounds() As String
Private link As Dictionary
Const ContextId = 33
'Init zone
Private Sub Class_Initialize()
Set link = New Dictionary
ReDim zone(0)
ReDim bounds(0)
End Sub
'properties
Property Get linkDico() As Dictionary
Set linkDico = link
End Property
Property Set linkDico(d As Dictionary)
Set link = d
End Property
Property Get pZone() As String()
pZone = zone
End Property
Property Let pZone(a() As String)
Let zone = a
End Property
'methods
Public Sub findBounds()
Dim elmt As String
Dim i As Integer
Dim temp() As String
i = 1
For Each elmt In zone
ReDim Preserve bounds(i)
temp = Split(elmt, ":")
bounds(i - 1) = temp(0)
bounds(i) = temp(1)
i = i + 2
Next elmt
End Sub
I was trying to instanciate it in a test sub in order to debug mid conception. Here's the code:
Sub test()
Dim rd As rangeDic
Dim ran() As String
Dim tabs() As Variant
Dim i As Integer
i = 1
With ThisWorkbook.Worksheets("DataRanges")
While .Cells(i, 1).Value <> none
ReDim Preserve ran(i - 1)
ReDim Preserve tabs(i - 1)
ran(i - 1) = .Cells(i, 1).Value
tabs(i - 1) = .Cells(i, 3).Value
i = i + 1
Wend
End With
Set rd = createRangeDic(ran, tabs)
End Sub
Public Function createRangeDic(zones() As String, vals() As Variant) As rangeDic
Dim obje As Object
Dim zonesL As Integer
Dim valsL As Integer
Dim i As Integer
zonesL = UBound(zones) - LBound(zones)
valsL = UBound(vals) - LBound(vals)
If zonesL <> valsL Then
Err.Raise vbObjectError + 5, "", "The key and value arrays are not the same length.", "", ContextId
End If
Set obje = New rangeDic
obje.pZone = zones()
For i = 0 To 5
obje.linkDico.add zones(i), vals(i)
Next i
Set createRangeDic = obje
End Function
Take a look at line 2 of Public Function createRangeDic. I have to declare my object as "Object", if I try declaring it as "rangeDic", Excel crashes at line obje.pZone = zones(). Upon looking in the Windows Event Log, I can see a "Error 1000" type of application unknown error resulting in the crash, with "VB7.DLL" being the faulty package.
Why so ? Am I doing something wrong ?
Thanks for your help
Edit: I work under Excel 2016
It looks like this is a bug. My Excel does not crash but I get an "Internal Error".
Let's clarify a few things first, since you're coming from a Java background.
Arrays can only be passed by reference
In VBA an array can only be passed by reference to another method (unless you wrap it in a Variant). So, this declaration:
Property Let pZone(a() As String) 'Implicit declaration
is the equivalent of this:
Property Let pZone(ByRef a() As String) 'Explicit declaration
and of course, this:
Public Function createRangeDic(zones() As String, vals() As Variant) As rangeDic
is the equivalent of this:
Public Function createRangeDic(ByRef zones() As String, ByRef vals() As Variant) As rangeDic
If you try to declare a method parameter like this: ByVal a() As String you will simply get a compile error.
Arrays are copied when assigned
Assuming two arrays called a and b, when doing something like a = b a copy of the b array is assigned to a. Let's test this. In a standard module drop this code:
Option Explicit
Sub ArrCopy()
Dim a() As String
Dim b() As String
ReDim b(0 To 0)
b(0) = 1
a = b
a(0) = 2
Debug.Print "a(0) = " & a(0)
Debug.Print "b(0) = " & b(0)
End Sub
After running ArrCopy my immediate window looks like this:
As shown, the contents of array b are not affected when changing array a.
A property Let always receives it's parameters ByVal regardless of whether you specify ByRef
Let's test this. Create a class called Class1 and add this code:
Option Explicit
Public Property Let SArray(ByRef arr() As String)
arr(0) = 1
End Property
Public Function SArray2(ByRef arr() As String)
arr(0) = 2
End Function
Now create a standard module and add this code:
Option Explicit
Sub Test()
Dim c As New Class1
Dim arr() As String: ReDim arr(0 To 0)
arr(0) = 0
Debug.Print arr(0) & " - value before passing to Let Property"
c.SArray = arr
Debug.Print arr(0) & " - value after passing to Let Property"
arr(0) = 1
Debug.Print arr(0) & " - value before passing to Function"
c.SArray2 arr
Debug.Print arr(0) & " - value after passing to Function"
End Sub
After running Test, my immediate window looks like this:
So, this simple test proves that the Property Let does a copy of the array even though arrays can only be passed ByRef.
The bug
Your original ran variable (Sub test) is passed ByRef to createRangeDic under a new name zones which is then passed ByRef again to pZone (the Let property). Under normal circumstances there should be no issue with passing an array ByRef as many times as you want but here it seems it is an issue because the Property Let is trying to make a copy.
Interestingly if we replace this (inside createRangeDic):
obje.pZone = zones()
with this:
Dim x() As String
x = zones
obje.pZone = x
the code runs with no issue even if obje is declared As rangeDic. This works because the x array is a copy of the zones array.
It looks that the Property Let cannot make a copy of an array that has been passed ByRef multiple times but it works perfectly fine if it was passed ByRef just once. Maybe because of the way stack frames are added in the call stack, there is a memory access issue but difficult to say. Regardless what the problem is, this seems to be a bug.
Unrelated to the question but I must add a few things:
Using ReDim Preserve in a loop is a bad idea because each time a new memory is allocated for a new (larger) array and each element is copied from the old array to the new array. This is very slow. Instead use a Collection as
#DanielDuĊĦek suggested in the comments or minimize the number of ReDim Preserve calls (for example if you know how many values you will have then just dimension the array once at the beginning).
Reading a Range cell by cell is super slow. Read the whole Range into an array by using the Range.Value or Range.Value2 property (I prefer the latter). Both methods returns an array as long as the range has more than 1 cell.
Never expose a private member object of a class if that object is responsible for the internal workings of the class. For example you should never expose the private collection inside a custom collection class because it breaks encapsulation. In your case the linkDico exposes the internal dictionary which can the be modified from outside the main class instance. Maybe it does not break anything in your particular example but just worth mentioning. On the other hand Property Get pZone() As String() is safe as this returns a copy of the internal array.
Add Option Explicit to the top of all your modules/classes to make sure you enforce proper variable declaration. Your code failed to compile for me because none does not exist in VBA unless you have it somewhere else in your project. There were a few other issues that I found once I turned the option on.

VBA: How to loop through multiple variant arrays of the same length and turn them into Objects

I currently have 3 separate variants that are all arrays consisting of 13 rows and 1 column. One variant represents names, another represents changes and the last represents occurrences. Please see my starter code below:
Sub watchList()
Dim names As Variant
names = Sheets("Watch Calculations").Range("B4:B16")
Dim changes As Variant
changes = Sheets("Watch Calculations").Range("G4:G16")
Dim occurances As Variant
occurrences = Sheets("Watch Calculations").Range("G22:G34")
End Sub
I also have a class called counterParty with the following fields:
Public Name As String
Public changeStatus As String
Public negativeOccurences As Integer
How can I loop through all 3 variants at the same time and input the values into an object of the counterParty class based on the row number of each variant. Please see psuedo code below:
Dim i As Integer
Dim MyArray(1 To 13) As Integer
For i = 1 To UBound(MyArray)
'psuedo code stars here
create new object of class counterParty
set object.Name = names(i,1)
set object.changeStatus = changes(i,1)
set object.negativeOccurences= occurrences.get(i,1)
add object to array of counterParty objects
Next i
Try this out
First the class module:
Private pName As String
Private pchangeStatus As String
Private pnegativeOccurrences As Long
Public Property Get Name() As String
Name = pName
End Property
Public Property Let Name(lName As String)
pName = lName
End Property
Public Property Get changeStatus() As String
changeStatus = pchangeStatus
End Property
Public Property Let changeStatus(lchangeStatus As String)
pchangeStatus = lchangeStatus
End Property
Public Property Get negativeOccurrences() As Long
negativeOccurrences = pnegativeOccurrences
End Property
Public Property Let negativeOccurrences(lnegativeOccurrences As Long)
pnegativeOccurrences = lnegativeOccurrences
End Property
Then the module:
Dim names As Variant
names = Sheets("Watch Calculations").Range("B4:B16")
Dim changes As Variant
changes = Sheets("Watch Calculations").Range("G4:G16")
Dim occurrences As Variant
occurrences = Sheets("Watch Calculations").Range("G22:G34")
Dim i As Long
Dim clsarr(1 To 13) As Object 'You can use lbound and ubound on names() to get dynamic values
Dim mycls As Class1
For i = 1 To UBound(names)
Set mycls = New Class1 'Overwrite current object
'assign values to the class properties
mycls.Name = names(i, 1)
mycls.changeStatus = changes(i, 1)
mycls.negativeOccurrences = occurrences(i, 1)
Set clsarr(i) = mycls
Next i

VBA - create unique ID of String / Hash

First of all I want so say sorry for not showing any code but right now I need some guidelines on how to take out a unique ID of a string.
So I have some problems of how to organize data. Lets say that the data is organized so that each dataID has their unique name. I collect the data into a array that holds it.
The problem I now have is that I want a easy way to search for these nameID. Imagine that the data is a lot bigger and contain more than a few hundred of different unique combinations of nameID's. Therefor I do not think searching for the id itself would be appropriate and I'm thinking of creating an hash that I could use an algorithm on to search the array. I want to do this because later on I will compare the names and add the values to the respective nameID. Keep in mind that the nameID will most of the time have the same structure but eventually a new name like total_air could be implemented and then I need to search in the array to get right value.
Updated:
Example of an code that collect the data from excel:
For Each targetSheet In wb.Worksheets
With targetSheet
'Populate the array
xData(0) = Application.Transpose(Range(Cells(1, 1), Cells(1, 1).End(xlDown)).Value2)
cnt = UBound(xData(0))
End With
Call dData.init(cnt)
'Populate the objectarray
dData.setNameArray = xData(0)
Next targetSheet
Type object:
Private index As Integer
Private id As String
Private nameID() As Variant
Private data() As Variant
Private cnt As Integer
Public Sub init(value As Integer)
index = 0
cnt = value
id = ""
ReDim nameID(0 To cnt)
ReDim data(0 To cnt)
End Sub
Property Let setID(value As String)
id = value
End Property
Property Let setNameArray(value As Variant)
nameID = value
End Property
dList that inherit the dataStruct:
Private xArray() As dataStruct
Private listInd As Integer
Public Sub init(cnt As Integer)
ReDim xArray(1 To cnt)
Dim num As Integer
For num = 1 To cnt
Set xArray(num) = New dataStruct
Next
listInd = 1
End Sub
Property Let addArray(value As dataStruct)
Set xArray(listInd) = value
listInd = listInd + 1
End Property
How the hole list will look like:
I would strongly advocate using a dictionary. Not only is it much faster to find an item (I would assume that it is implemented with some kind of hashing), it has big advantages when it comes to adding or removing items.
When you have an array and want to add an item, you either have always to use redim preserve which is really expensive, or you define the array larger than initially needed and always have to keep the information how many items are really used. And deleting an item from an array is rather complicated.
You cannot add a typed variable as item value into a dictionary, but you can add a object. So instead of your Type definition, create a simple class module, containing only these lines (of course you can create the class with properties, getter and setter but that's irrelevant for this example)
Public id As Long
Public name As String
Public value As Long
Then, dealing with the dictionary is rather simple (note that you have to add a reference to the Microsoft Scripting Runtime
Option Explicit
Dim myList As New Dictionary
Sub AddItemValues(id As Long, name As String, value As Long)
Dim item As New clsMyData
With item
.id = id
.name = name
.value = value
End With
Call AddItem(item)
End Sub
Sub AddItem(item As clsMyData)
If myList.Exists(item.id) Then
set myList(item.id) = item
Else
Call myList.Add(item.id, item)
End If
End Sub
Function SearchItem(id As Long) As clsMyData
If myList.Exists(id) Then
Set SearchItem = myList(id)
Else
Set SearchItem = Nothing
End If
End Function
Function SearchName(name As String) As clsMyData
Dim item As Variant
For Each item In myList.Items
If item.name = name Then
Set SearchName = item
Exit Function
End If
Next item
Set SearchName = Nothing
End Function
So as long as you deal with Id's, the dictionary will do all the work for you. Only if you search for the name, you have to loop over all items of the dictionary, which is as easy as looping over an array.
Some test (of course you should add some error handling)
Sub test()
Call AddItemValues(32, "input_air", 0)
Call AddItemValues(45, "air_Procent", 99)
Call AddItemValues(89, "output_air", 34)
Debug.Print SearchItem(45).name
Debug.Print SearchName("output_air").value
' Change value of output_air
Call AddItemValues(89, "output_air", 1234)
Debug.Print SearchName("output_air").value
End Sub

Collection of Objects Passing a new UDT to each Object while Looping thru an Array

My aMRecon array is 2500 rows x 65 columns. I need to evaluate up to 10+ columns within each single row and thus I believe I need to create an object representing each row. I have created a UDT and in a basic procedure below I am trying to create an object for each row with each object having an .EntityID property (which is the cell value in each row within Column B or Column 2).
Public Type BreakInfo
EntityID As String
IssueName As String
ValDate As Date
LedgerAMT As Long
NetAMTL As Long
PriceDiff As Long
End Type
Sub Fill_Objects()
Dim aMrow As Integer, aMcol As Integer
Dim BI As BreakInfo
For aMcol = LBound(aMRecon, 2) To UBound(aMRecon, 2)
For aMrow = LBound(aMRecon, 1) To UBound(aMRecon, 1)
If aMcol = 2 Then
Debug.Print aMRecon(aMrow, aMcol)
Set ObjLSL = New Collection
BI.EntityID = aMRecon(aMrow, aMcol)
End If
Next aMrow
Next aMcol
End If
End Sub
Do I need to somehow create a collection of objects? Could someone please show me an example to help. As of right now I think I only have one object and the .EntityID property keeps getting overwritten. Thank you
In Fact each row at will only have 1 property, basically each property is a Column Header. Am I going about this the most efficient way? Eventually I will need to evaluate each property within an object and categorize it.
Inserted a ClassModule entitle BreakInfo
'Public EntityID As String
Public EntityID As Variant
Public IssueName As String
Public ValDate As Date
Public LedgerAMT As Long
Public NetAMTL As Long
Public PriceDiff As Long
That's all that's in the class.
You need to first create (insert) a Class Module, name it BreakInfo, and give it Public members like this:
Option Explicit
Public EntityID As String
Public IssueName As String
Public ValDate As Date
Public LedgerAMT As Long
Public NetAMTL As Long
Public PriceDiff As Long
Then you can use something like this:
Sub Fill_Objects()
Dim aMrow As Integer, aMcol As Integer
Dim BI As BreakInfo
Dim ObjLSL As Collection
Dim key As Long
'Create the Collection instance.
Set ObjLSL = New Collection
For aMcol = LBound(aMRecon, 2) To UBound(aMRecon, 2)
For aMrow = LBound(aMRecon, 1) To UBound(aMRecon, 1)
If aMcol = 2 Then
'Instantiate a BreakInfo.
Set BI = New BreakInfo
BI.EntityID = aMRecon(aMrow, aMcol)
'...
key = key + 1
ObjLSL.Add BI, CStr(key)
End If
Next aMrow
Next aMcol
End Sub
Notice that the collection is instantiated once, before the loops. A collection can't ingest variables of user-defined types, but it will gladly gobble up object instances.
Edit
The question has changed. If you worry about efficiency, you could hardcode aMcol = 2 and do without the outer For and the If aMcol = 2. Other than that, I don't understand what you're trying to do with your values.

User Defined Type vs Class Speed

I have a workbook with two sheets of data that I need to perform operations on. I started off working with the data directly from the sheets but soon found that to be very slow, so changed things to read the sheets into two arrays (in two separate methods called from Workbook_Open).
I had a user defined type created for the data on each sheet, I then found that I was not able to add these to collections or scripting dictionaries, so I transferred them to classes.
So now I have a class called CDealerData with 4 private fields and public properties for each. The issue is that the execution of reading the data into the array is double that of when I was using a type. Is that just how it is or am I doing something wrong.
Class:
Option Explicit
Private pBAC As String
Private pAccountNumber As String
Private pYear As Integer
Private pUnits As Variant
Public Property Get BAC() As String
BAC = pBAC
End Property
Public Property Let BAC(Value As String)
pBAC = Value
End Property
Public Property Get AccountNumber() As String
AccountNumber = pAccountNumber
End Property
Public Property Let AccountNumber(Value As String)
pAccountNumber = Value
End Property
Public Property Get Year() As String
Year = pYear
End Property
Public Property Let Year(Value As String)
pYear = Value
End Property
Public Property Get Units() As String
Units = pUnits
End Property
Public Property Let Units(Value As String)
pUnits = Value
End Property
Option Explicit
Private pBAC As String
Private pAccountNumber As String
Private pYear As Integer
Private pUnits As Variant
Public Property Get BAC() As String
BAC = pBAC
End Property
Public Property Let BAC(Value As String)
pBAC = Value
End Property
Public Property Get AccountNumber() As String
AccountNumber = pAccountNumber
End Property
Public Property Let AccountNumber(Value As String)
pAccountNumber = Value
End Property
Public Property Get Year() As String
Year = pYear
End Property
Public Property Let Year(Value As String)
pYear = Value
End Property
Public Property Get Units() As String
Units = pUnits
End Property
Public Property Let Units(Value As String)
pUnits = Value
End Property
Module:
Option Explicit
Public NumberOfYears As Integer
Public DealersData() As CDealerData
Public Sub ReadDealerData()
'** Reads the contents of RawData into an Array
'** of custom type DealerData, defined above
Dim MyDealerData As CDealerData
Dim LastRow As Long
Dim i As Long
Dim j As Long
LastRow = SheetRawData.UsedRange.Rows.Count
ReDim DealersData(LastRow * NumberOfYears)
For i = 0 To LastRow
For j = 0 To NumberOfYears - 1 'Year columns
Set MyDealerData = New CDealerData
MyDealerData.BAC = SheetRawData.Cells(i + 2, 1).Value
MyDealerData.AccountNumber = SheetRawData.Cells(i + 2, 3).Value
MyDealerData.Year = j + 1
MyDealerData.Units = CDec(SheetRawData.Cells(i + 2, 4 + j).Value) 'Assign column based on j
Set DealersData(i) = MyDealerData
Next j
Next i
End Sub
The UDT will be much faster than using a class in this manner for a number of reasons.
The UDT is a structure in memory with the data that can be directly written
The Class will have Let and Get properties which are functions that execute and have some overhead
Creation and Destruction of the class would add to a tiny bit of overhead, but nothing noticeable in your case
To improve performance, you may consider using Public Variables instead of private properties, but then again that may defeat the purpose of you using a class.
If you are looking to simply use this as a data container, you are better off with a User-defined data type.
If you wish to further manipulate this data with Class specific functions, then the Class approach is better
Also, a general approach to speeding things up is to access the spreadsheet as few times as possible.
For e.g. code such as the following
For i = 1 to 10
Variable = Worksheets("Sheet1").Range("A1").Cell(i,1).Value
Next i
can be replaced by
Dim VariantArray as Variant
VariantArray = Workeheets("Sheet1").Range("A1:A10")
' Now VariantArray(0,0) has the first element, (1,0) has the second, etc.
A note on profiling: Do note #BlackHawk's suggestion in the comments below, to use the MicroTimer tool. It is incredibly useful for isolating portions of code and finding the performance impact to a very precise level.
Also, while this is true for any platform, VBA performance can be inconsistent at times depending on how much pressure is there on Excel's resources at the moment, and hence, even though the MicroTimer is precise, it might not be accurately representative and you might want to consider running loops at different times to correctly gauge the impact of different sections of your code.
Use this syntax to read entire arrays with one operation Dim x() as Variant : x = Range("A1").Resize(40,20).Value.
This will read the cells starting from A1 in 40 rows and 20 columns into an 2D array of Variant(,).
The you can loop through this array to put values into the user type and it will be much faster, like DealersData(i*NumberOfYears+j).BAC = x(2*i-1,j) or however you have things organized.
As the first I would optimze the CDealerData-Class as follows:
Private pUnits As Decimal 'instead of Variant, the internal mapping uses Time
Private pYear As Long 'instead of integer because outside of the Class you calc with Long
Furthermore I suggest you create a Method to set the Data by one line instead of writeable Properties:
Public Sub SetData(BAC As String, AccountNumber as String, Year as Long, Units as Decimal)
pBAC = BAC
pAccountNumber = AccountNumber
pYear = Year
pUnits = Units
End Sub
The usage in your Module would look like this:
For i = 0 To LastRow
For j = 0 To NumberOfYears - 1 'Year columns
Set MyDealerData = New CDealerData
MyDealerData.SetData(SheetRawData.Cells(i + 2, 1).Value, SheetRawData.Cells(i + 2, 3).Value, j + 1, CDec(SheetRawData.Cells(i + 2, 4 + j).Value))
'Assign column based on j
Set DealersData(i) = MyDealerData
Next j
Next i
Also with a Class you can use a Collection and you woudn't need ReDim for the Array.
Hope it helps.
Cheers
Andy

Resources