LIFO (Stack) Algorithm/Class for Excel VBA - excel

I'm looking to implement a "Stack" Class in VBA for Excel. I want to use a Last In First Out structure. Does anyone came across this problem before ? Do you know external libraries handling structure such as Stack, Hastable, Vector... (apart the original Excel Collection etc...)
Thanks

Here is a very simple stack class.
Option Explicit
Dim pStack As Collection
Public Function Pop() As Variant
With pStack
If .Count > 0 Then
Pop = .Item(.Count)
.Remove .Count
End If
End With
End Function
Public Function Push(newItem As Variant) As Variant
With pStack
.Add newItem
Push = .Item(.Count)
End With
End Function
Public Sub init()
Set pStack = New Collection
End Sub
Test it
Option Explicit
Sub test()
Dim cs As New cStack
Dim i As Long
Set cs = New cStack
With cs
.init
For i = 1 To 10
Debug.Print CStr(.Push(i))
Next i
For i = 1 To 10
Debug.Print CStr(.Pop)
Next i
End With
End Sub
Bruce

Bruce McKinney provided code for a Stack, List, and Vector in this book (it was VB5(!), but that probably doesn't matter much):
http://www.amazon.com/Hardcore-Visual-Basic-Bruce-McKinney/dp/1572314222
(It's out of print, but used copies are cheap.)
The source code appears to be available here:
http://vb.mvps.org/hardweb/mckinney2a.htm#2
(Caveat - I've never used any of his code, but I know he's a highly regarded, long-time VB expert, and his book was included on MSDN for a long time.)
I'm sure there are also many different implementations for these things floating around the internet, but I don't know if any of them are widely used by anybody but their authors.
Of course, none of this stuff is that hard to write your own code for, given that VBA supports resizeable arrays (most of the way to a vector) and provides a built-in Collection class (most of the way to a list). Charles William's answer for a stack is about all the info you need. Just provide your own wrapper around either an array or a Collection, but the code inside can be relatively trivial.
For a hashtable, the MS Scripting Runtime includes a Dictionary class that basically is one. See:
Hash Table/Associative Array in VBA

I do not know of any external VBA libraries for these structures.
For my procedure-call stack I just use a global array and array pointer with Push and Pop methods.

You can use the class Stack in System.Collections, as you can use Queue and others. Just search for vb.net stack for documentation. I have not tried all methods (e.g. Getenumerator - I don't know how to use an iterator, if at all possible in VBA). Using a stack or a queue gives you some nice benefits, normally not so easy in VBA. You can use
anArray = myStack.ToArray
EVEN if the stack is empty (Returns an array of size 0 to -1).
Using a custom Collections Object, it works very fast due to its simplicity and can easily be rewritten (e.g. to only handle strongly typed varibles). You might want to make a check for empty stack. If you try to use Pop on an empty stack, VBA will not handle it gracefully, as all null-objects. I found it more reasonable to use:
If myStack.Count > 0 Then
from the function using the stack, instead of baking it into clsStack.Pop. If you bake it into the class, a call to Pop can return a value of chosen type - of course you can use this to handle empty values, but you get much more grief that way.
An example of use:
Private Sub TestStack()
Dim i as long
Dim myStack as clsStack
Set myStack = New clsStack
For i = 1 to 2
myStack.Push i
Next
For i = 1 to 3
If myStack.Count > 0 Then
Debug.Print myStack.Pop
Else
Debug.Print "Stack is empty"
End If
Next
Set myStack = Nothing
End Sub
Using a LIFO-stack can be extremely helpful!
Class clsStack
Dim pStack as Object
Private Sub Class_Initialize()
set pStack = CreateObject("System.Collections.Stack")
End Sub
Public Function Push(Value as Variant)
pStack.Push Value
End Function
Public Function Pop() As Variant
Pop = pStack.Pop
End Function
Public Function Count() as long
Count = pstack.Count
End Function
Public Function ToArray() As Variant()
ToArray = pStack.ToArray()
End Function
Public Function GetHashCode() As Integer
GetHashCode = pStack.GetHashCode
End Function
Public Function Clear()
pStack.Clear
End Function
Private Sub Class_terminate()
If (Not pStack Is Nothing) Then
pStack.Clear
End If
Set pStack = Nothing
End Sub

Related

How to run a sub in a `Create` function and how to make a mock/stub/fake for a chart `Series`?

Preface
About 10 years ago I started refactoring and improving the ChartSeries class of John Walkenbach. Unfortunately it seems that the original it is not available any more online.
Following the Rubberduck Blog for quite some time now I try to improve my VBA skills. But in the past I only have written -- I guess the experts would call it -- "script-like god-procedures" (because of not knowing better). So I am pretty new to classes and especially interfaces and factories.
Actual Questions
I try to refactor the whole class by dividing it into multiple classes also using interfaces and than also adding unit tests. For just reading the parts of a formula it would be sufficient to get the Series.Formula and then do all the processing. So it would be nice to call the Run sub in the Create function. But everything I tried so far to do so failed. Thus, I currently running Run in all Get properties etc. (and test, if the formula changed and exit Run than. Is this possible and when yes, how?
Second, to add unit tests -- of course using rubberduck for them -- I currently rely on real Charts/ChartObjects. How do I create a stub/mock/fake for a Series? (Sorry, I don't know the correct term.)
And here a simplified version of the code.
Many thanks in advance for any help.
normal module
'#Folder("ChartSeries")
Option Explicit
Public Sub ExampleUsage()
Dim wks As Worksheet
Set wks = ThisWorkbook.Worksheets(1)
Dim crt As ChartObject
Set crt = wks.ChartObjects(1)
Dim srs As Series
Set srs = crt.Chart.SeriesCollection(3)
Dim MySeries As IChartSeries
Set MySeries = ChartSeries.Create(srs)
With MySeries
Debug.Print .XValues.FormulaPart
End With
End Sub
IChartSeries.cls
'#Folder("ChartSeries")
'#Interface
Option Explicit
Public Function IsSeriesAccessible() As Boolean
End Function
Public Property Get FullFormula() As String
End Property
Public Property Get XValues() As ISeriesPart
End Property
'more properties ...
ChartSeries.cls
'#PredeclaredId
'#Exposed
'#Folder("ChartSeries")
Option Explicit
Implements IChartSeries
Private Type TChartSeries
Series As Series
FullSeriesFormula As String
OldFullSeriesFormula As String
IsSeriesAccessible As Boolean
SeriesParts(eElement.[_First] To eElement.[_Last]) As ISeriesPart
End Type
Private This As TChartSeries
Public Function Create(ByVal Value As Series) As IChartSeries
'NOTE: I would like to run the 'Run' sub somewhere here (if possible)
With New ChartSeries
.Series = Value
Set Create = .Self
End With
End Function
Public Property Get Self() As IChartSeries
Set Self = Me
End Property
Friend Property Let Series(ByVal Value As Series)
Set This.Series = Value
End Property
Private Function IChartSeries_IsSeriesAccessible() As Boolean
Call Run
IChartSeries_IsSeriesAccessible = This.IsSeriesAccessible
End Function
Private Property Get IChartSeries_FullFormula() As String
Call Run
IChartSeries_FullFormula = This.FullSeriesFormula
End Property
Private Property Get IChartSeries_XValues() As ISeriesPart
Call Run
Set IChartSeries_XValues = This.SeriesParts(eElement.eXValues)
End Property
'more properties ...
Private Sub Class_Initialize()
With This
Dim Element As eElement
For Element = eElement.[_First] To eElement.[_Last]
Set .SeriesParts(Element) = New SeriesPart
Next
End With
End Sub
Private Sub Class_Terminate()
With This
Dim Element As LongPtr
For Element = eElement.[_First] To eElement.[_Last]
Set .SeriesParts(Element) = Nothing
Next
End With
End Sub
Private Sub Run()
If Not GetFullSeriesFormula Then Exit Sub
If Not HasFormulaChanged Then Exit Sub
Call GetSeriesFormulaParts
End Sub
'(simplified version)
Private Function GetFullSeriesFormula() As Boolean
GetFullSeriesFormula = False
With This
'---
'dummy to make it work
.FullSeriesFormula = _
"=SERIES(Tabelle1!$B$2,Tabelle1!$A$3:$A$5,Tabelle1!$B$3:$B$5,1)"
'---
.OldFullSeriesFormula = .FullSeriesFormula
.FullSeriesFormula = .Series.Formula
End With
GetFullSeriesFormula = True
End Function
Private Function HasFormulaChanged() As Boolean
With This
HasFormulaChanged = (.OldFullSeriesFormula <> .FullSeriesFormula)
End With
End Function
Private Sub GetSeriesFormulaParts()
Dim MySeries As ISeriesFormulaParts
'(simplified version without check for Bubble Chart)
Set MySeries = SeriesFormulaParts.Create( _
This.FullSeriesFormula, _
False _
)
With MySeries
Dim Element As eElement
For Element = eElement.[_First] To eElement.[_Last] - 1
This.SeriesParts(Element).FormulaPart = _
.PartSeriesFormula(Element)
Next
'---
'dummy which normally would be retrieved
'by 'MySeries.PartSeriesFormula(eElement.eXValues)'
This.SeriesParts(eElement.eXValues).FormulaPart = _
"Tabelle1!$A$3:$A$5"
'---
End With
Set MySeries = Nothing
End Sub
'more subs and functions ...
ISeriesPart.cls
'#Folder("ChartSeries")
'#Interface
Option Explicit
Public Enum eEntryType
eNotSet = -1
[_First] = 0
eInaccessible = eEntryType.[_First]
eEmpty
eInteger
eString
eArray
eRange
[_Last] = eEntryType.eRange
End Enum
Public Property Get FormulaPart() As String
End Property
Public Property Let FormulaPart(ByVal Value As String)
End Property
Public Property Get EntryType() As eEntryType
End Property
Public Property Get Range() As Range
End Property
'more properties ...
SeriesPart.cls
'#PredeclaredId
'#Folder("ChartSeries")
'#ModuleDescription("A class to handle each part of the 'Series' string.")
Option Explicit
Implements ISeriesPart
Private Type TSeriesPart
FormulaPart As String
EntryType As eEntryType
Range As Range
RangeString As String
RangeSheet As String
RangeBook As String
RangePath As String
End Type
Private This As TSeriesPart
Private Property Get ISeriesPart_FormulaPart() As String
ISeriesPart_FormulaPart = This.FormulaPart
End Property
Private Property Let ISeriesPart_FormulaPart(ByVal Value As String)
This.FormulaPart = Value
Call Run
End Property
Private Property Get ISeriesPart_EntryType() As eEntryType
ISeriesPart_EntryType = This.EntryType
End Property
Private Property Get ISeriesPart_Range() As Range
With This
If .EntryType = eEntryType.eRange Then
Set ISeriesPart_Range = .Range
Else
' Call RaiseError
End If
End With
End Property
Private Property Set ISeriesPart_Range(ByVal Value As Range)
Set This.Range = Value
End Property
'more properties ...
Private Sub Class_Initialize()
This.EntryType = eEntryType.eNotSet
End Sub
Private Sub Run()
'- set 'EntryType'
'- If it is a range then find the range parts ...
End Sub
'a lot more subs and functions ...
ISeriesParts.cls
'#Folder("ChartSeries")
'#Interface
Option Explicit
Public Enum eElement
[_First] = 1
eName = eElement.[_First]
eXValues
eYValues
ePlotOrder
eBubbleSizes
[_Last] = eElement.eBubbleSizes
End Enum
'#Description("fill me")
Public Property Get PartSeriesFormula(ByVal Element As eElement) As String
End Property
SeriesFormulaParts.cls
'#PredeclaredId
'#Exposed
'#Folder("ChartSeries")
Option Explicit
Implements ISeriesFormulaParts
Private Type TSeriesFormulaParts
FullSeriesFormula As String
IsSeriesInBubbleChart As Boolean
WasRunCalled As Boolean
SeriesFormula As String
RemainingFormulaPart(eElement.[_First] To eElement.[_Last]) As String
PartSeriesFormula(eElement.[_First] To eElement.[_Last]) As String
End Type
Private This As TSeriesFormulaParts
Public Function Create( _
ByVal FullSeriesFormula As String, _
ByVal IsSeriesInBubbleChart As Boolean _
) As ISeriesFormulaParts
'NOTE: I would like to run the 'Run' sub somewhere here (if possible)
With New SeriesFormulaParts
.FullSeriesFormula = FullSeriesFormula
.IsSeriesInBubbleChart = IsSeriesInBubbleChart
Set Create = .Self
End With
End Function
Public Property Get Self() As ISeriesFormulaParts
Set Self = Me
End Property
'#Description("Set the full series formula ('ChartSeries')")
Public Property Let FullSeriesFormula(ByVal Value As String)
This.FullSeriesFormula = Value
End Property
Public Property Let IsSeriesInBubbleChart(ByVal Value As Boolean)
This.IsSeriesInBubbleChart = Value
End Property
Private Property Get ISeriesFormulaParts_PartSeriesFormula(ByVal Element As eElement) As String
'NOTE: Instead of running 'Run' here, it would be better to run it in 'Create'
Call Run
ISeriesFormulaParts_PartSeriesFormula = This.PartSeriesFormula(Element)
End Property
'(replaced with a dummy)
Private Sub Run()
If This.WasRunCalled Then Exit Sub
'extract stuff from
This.WasRunCalled = True
End Sub
'a lot more subs and functions ...
You can already!
Public Function Create(ByVal Value As Series) As IChartSeries
With New ChartSeries <~ With block variable has access to members of the ChartSeries class
.Series = Value
Set Create = .Self
End With
End Function
...only, like the .Series and .Self properties, it has to be a Public member of the ChartSeries interface/class (the line is blurry in VBA, since every class has a default interface / is also an interface).
Idiomatic Object Assignment
A note about this property:
Friend Property Let Series(ByVal Value As Series)
Set This.Series = Value
End Property
Using a Property Let member to Set an object reference will work - but it isn't idiomatic VBA code anymore, as you can see in the .Create function:
.Series = Value
If we read this line without knowing about the nature of the property, this looks like any other value assignment. Only problem is, we're not assigning a value, but a reference - and reference assignments in VBA are normally made using a Set keyword. If we change the Let for a Set in the Series property definition, we would have to do this:
Set .Series = Value
And that would look much more readily like the reference assignment it is! Without it, there appears to be implicit let-coercion happening, and that makes it ambiguous code: VBA requires a Set keyword for reference assignments, because any given object can have a paraterless default property (e.g. how foo = Range("A1") implicitly assigns foo to the Value of the Range).
Caching & Responsibilities
Now, back to the Run method - if it's made Public on the ChartSeries class, but not exposed on the implemented IChartSeries interface, then it's a member that can only be invoked from 1) the ChartSeries default instance, or 2) any object variable that has a ChartSeries declared type. And since our "client code" is working off IChartSeries, we can guard against 1 and shrug off 2.
Note that the Call keyword is superfluous, and the Run method is really just pulling metadata from the encapsulated Series object, and caching it at instance level - I'd give it a name that sounds more like "refresh cached properties" than "run something".
Your hunch is a good one: Property Get should be a simple return function, without any side-effects. Invoking a method that scans an object and resets instance state in a Property Get accessor makes it side-effecting, which is a design smell - in theory.
If Run is invoked immediately after creation before the Create function returns the instance, then this Run method boils down to "parse the series and cache some metadata I'll reuse later", and there's nothing wrong with that: invoke it from Create, and remove it from the Property Get accessors.
The result is an object whose state is read-only and more robustly defined; the counterpart of that is that you now have an object whose state might be out of sync with the actual Excel Series object on the worksheet: if code (or the user) tweaks the Series object after the IChartSeries is initialized, the object and its state is stale.
One solution is to go out of your way to identify when a series is stale and make sure you keep the cache up-to-date.
Another solution would be to remove the problem altogether by no longer caching the state - that would mean one of two things:
Generating the object graph once on creation, effectively moving the caching responsibility to the caller: calling code gets a read-only "snapshot" to work with.
Generating a new object graph out of the series metadata, every time the calling code needs it: effectively, it moves the caching responsibility to the caller, which isn't a bad idea at all.
Making things read-only removes a lot of complexity! I'd go with the first option.
Overall, the code appears nice & clean (although it's unclear how much was scrubbed for this post) and you appear to have understood the factory method pattern leveraging the default instance and exposing a façade interface - kudos! The naming is overall pretty good (although "Run" sticks out IMO), and the objects look like they each have a clear, defined purpose. Good job!
Unit Testing
I currently rely on real Charts/ChartObjects. How do I create a stub/mock/fake for a Series? (Sorry, I don't know the correct term.)
Currently, you can't. When if/when this PR gets merged, you'll be able to mock Excel's interfaces (and much, much more) and write tests against your classes that inject a mock Excel.Series object that you can configure for your tests' purposes... but until then, this is where the wall is.
In the mean time, the best you can do is wrap it with your own interface, and stub it. In other words, wherever there's a seam between your code and Excel's object model, we slip an interface between the two: instead of taking in a Excel.Series object, you'd be taking in some ISeriesWrapper, and then the real code would be using an ExcelSeriesWrapper that works off an Excel.Series, and the test code might be using a StubSeriesWrapper whose properties return either hard-coded values, or values configured by tests: the code that works at the seam between the Excel library and your project, can't be tested - and we woulnd't want to anyway, because then we'd be testing Excel, not our own code.
You can see this in action in the example code for the next upcoming RD News article here; that article will discuss exactly this, using ADODB connections. The principle is the same: none of the 94 unit tests in that project ever open any actual connection, and yet with dependency injection and wrapper interfaces we're able to test every single bit of functionality, from opening a database connection to committing a transaction... without ever hitting an actual database.

Any posibility to modify the sub and function bodies created via VBA IDE

I am willing to add some code to the begining and end of each sub or function for enabling flow tracing / debugging.
Now I copy this (almost standard code manually into the beginig of each sub / function, and also before each exit sub/function and end sub / function statement.
Something like this
public sub a()
...
**logging_successful = pushCallIntoStack("sub a")**
...
On Error Goto errorOccured
...
**logging_successful = popCallFromStack("sub a")**
Exit Sub
...
errorOccured:
...
**logging_successful = popCallFromStack("sub a")**
...
End Sub
Being able to insert these standart codes via VBIDE as default - at least in the standard entry and exit points - will save me sometime.
What you want to do is technically doable but are you sure you want to add boilerplate code with hard-coded strings to all of your procedures? That is lot of maintenance which also makes it much harder to refactor your code. I've seen lot of error messages saying it came from "Foo" but they came from "Bar" because at one point the code was in Foo but then it got moved or renamed to Bar, but they forgot to update the string constant. There is no such guarantees that the string constants are in sync with the actual procedure names.
Before sinking potentially hours into this solution, I would encourage you to first consider third-party addins that can do a much better job of helping you getting the detailed error output you need. One such solution would be vbWatchDog which provides you not only the stack tracing but also much extended diagnostics... without any changes to your source code. Precisely because it can do without any embedding constants, it won't be liable to give you outdated information.
I should note that there are also other third-party addin such as MZ-Tools which provides a one click button for adding an error template that could conceivably be used to provide what you want. However, the operation is not reversible, which adds to your maintenance burden; changing a procedure would mean you'd have to strip away the old error template, then re-add, and if there's any customization, to re-add it.
If in spite of all, you insist on continuing doing it by your own hand, you can do something like the following:
Public Sub AddBoilerPlate(TargetComponent As VBIDE.VBComponent)
Dim m As VBIDE.CodeModule
Dim i As Long
Set m = TargetComponent.CodeModule
For i = m.CountOfDeclarationLines + 1 To m.CountOfLines
Dim ProcName As String
Dim ProcKind As VBIDE.vbext_ProcKind
ProcName = m.ProcOfLine(i, ProcKind)
Dim s As Long
s = m.ProcBodyLine(ProcName, ProcKind) + 1
m.InsertLines s, <your push code>
'Loop the lines within the procedure to find the End *** line then insert the pop code
Next
End Sub
This is an incomplete sample, does not perform checks for pre-existing template. A more complete sample would probably delete any previous template before inserting.
You may need to amend the codes for your own needs but below's the general idea (e.g. change "Module2" to the name of your module and include more checks to determine where to add in new codes)
Public Sub sub_test()
Dim i As Long
With ThisWorkbook.VBProject.VBComponents("Module2").CodeModule
For i = 1 To .Countoflines
If InStr(.Lines(i, 1), "End Sub") > 0 Then
.Insertlines i, "**logging_successful = popCallFromStack(""sub a"")**"
End If
Next i
End With
End Sub

Performance alternative over Scripting.Dictionary

I am coding a Manager in Excel-VBA with several buttons.
One of them is to generate a tab using another Excel file (let me call it T) as input.
Some properties of T:
~90MB size
~350K lines
Contains sales data of the last 14 months (unordered).
Relevant columns:
year/month
total-money
seller-name
family-product
client-name
There is not id columns (like: cod-client, cod-vendor, etc.)
Main relation:
Sellers sells many Products to many Clients
I am generating a new Excel tab with data from T of the last year/month grouped by Seller.
Important notes:
T is the only available input/source.
If two or more Sellers sells the same Product to the same Client, the total-money should be counted to all of those Sellers.
This is enough, now you know what I have already coded.
My code works, but, it takes about 4 minutes of runtime.
I have already coded some other buttons using smaller sources (not greater than 2MB) which runs in 5 seconds.
Considering T size, 4 minutes runtime could be acceptable.
But I'm not proud of it, at least not yet.
My code is mainly based on Scripting.Dictionary to map data from T, and then I use for each key in obj ... next key to set the grouped data to the new created tab.
I'm not sure, but here are my thoughts:
If N is the total keys in a Scripting.Dictionary, and I need to check for obj.Exists(str) before aggregating total-money. It will run N string compares to return false.
Similarly it will run maximun N string compares when I do Set seller = obj(seller_name).
I want to be wrong with my thoughts. But if I'm not wrong, my next step (and last hope) to reduce the runtime of this function is to code my own class object with Tries.
I will only start coding tomorrow, what I want is just some confirmation if I am in the right way, or some advices if I am in the wrong way of doing it.
Do you have any suggestions? Thanks in advance.
Memory Limit Exceeded
In short:
The main problem was because I used a dynamic programming approach of storing information (preprocessing) to make the execution time faster.
My code now runs in ~ 13 seconds.
There are things we learn the hard way. But I'm glad I found the answer.
Using the Task Manager I was able to see my code reaching 100% memory usage.
The DP approach I mentioned above using Scripting.Dictionary reached 100% really faster.
The DP approach I mentioned above using my own cls_trie implementation also reached 100%, but later than the first.
This explains the ~4-5 min compared to ~2-3 min total runtime of above attempts.
In the Task Manager I could also see that the CPU usage never hited 2%.
Solution was simple, I had to balance CPU and Memory usages.
I changed some DP approaches to simple for-loops with if-conditions.
The CPU usage now hits ~15%.
The Memory usage now hits ~65%.
I know this is relative to the CPU and Memory capacity of each machine. But in the client machine it is also running in no more than 15 seconds now.
I created one GitHub repository with my cls_trie implementation and added one excel file with an example usage.
I'm new to the excel-vba world (4 months working with it right now). There might probably have some ways to improve my cls_trie implementation, I'm openned to suggestions:
Option Explicit
Public Keys As Collection
Public Children As Variant
Public IsLeaf As Boolean
Public tObject As Variant
Public tValue As Variant
Public Sub Init()
Set Keys = New Collection
ReDim Children(0 To 255) As cls_trie
IsLeaf = False
Set tObject = Nothing
tValue = 0
End Sub
Public Function GetNodeAt(index As Integer) As cls_trie
Set GetNodeAt = Children(index)
End Function
Public Sub CreateNodeAt(index As Integer)
Set Children(index) = New cls_trie
Children(index).Init
End Sub
'''
'Following function will retrieve node for a given key,
'creating a entire new branch if necessary
'''
Public Function GetNode(ByRef key As Variant) As cls_trie
Dim node As cls_trie
Dim b() As Byte
Dim i As Integer
Dim pos As Integer
b = CStr(key)
Set node = Me
For i = 0 To UBound(b) Step 2
pos = b(i) Mod 256
If (node.GetNodeAt(pos) Is Nothing) Then
node.CreateNodeAt pos
End If
Set node = node.GetNodeAt(pos)
Next
If (node.IsLeaf) Then
'already existed
Else
node.IsLeaf = True
Keys.Add key
End If
Set GetNode = node
End Function
'''
'Following function will get the value for a given key
'Creating the key if necessary
'''
Public Function GetValue(ByRef key As Variant) As Variant
Dim node As cls_trie
Set node = GetNode(key)
GetValue = node.tValue
End Function
'''
'Following sub will set a value to a given key
'Creating the key if necessary
'''
Public Sub SetValue(ByRef key As Variant, value As Variant)
Dim node As cls_trie
Set node = GetNode(key)
node.tValue = value
End Sub
'''
'Following sub will sum up a value for a given key
'Creating the key if necessary
'''
Public Sub SumValue(ByRef key As Variant, value As Variant)
Dim node As cls_trie
Set node = GetNode(key)
node.tValue = node.tValue + value
End Sub
'''
'Following function will validate if given key exists
'''
Public Function Exists(ByRef key As Variant) As Boolean
Dim node As cls_trie
Dim b() As Byte
Dim i As Integer
b = CStr(key)
Set node = Me
For i = 0 To UBound(b) Step 2
Set node = node.GetNodeAt(b(i) Mod 256)
If (node Is Nothing) Then
Exists = False
Exit Function
End If
Next
Exists = node.IsLeaf
End Function
'''
'Following function will get another Trie from given key
'Creating both key and trie if necessary
'''
Public Function GetTrie(ByRef key As Variant) As cls_trie
Dim node As cls_trie
Set node = GetNode(key)
If (node.tObject Is Nothing) Then
Set node.tObject = New cls_trie
node.tObject.Init
End If
Set GetTrie = node.tObject
End Function
You can see in the above code:
I hadn't implemented any delete method because I didn't need it till now. But it would be easy to implement.
I limited myself to 256 children because in this project the text I'm working on is basically lowercase and uppercase [a-z] letters and numbers, and the probability that two text get mapped to the same branch node tends zero.
as a great coder said, everyone likes his own code even if other's code is too beautiful to be disliked [1]
My conclusion
I will probably never more use Scripting.Dictionary, even if it is proven that somehow it could be better than my cls_trie implementation.
Thank you all for the help.
I'm convinced that you've already found the right solution because there wasn't any update for last two years.
Anyhow, I want to mention (maybe it will help someone else) that your bottleneck isn't the Dictionary or Binary Tree. Even with millions of rows the processing in memory is blazingly fast if you have sufficient amount of RAM.
The botlleneck is usually the reading of data from worksheet and writing it back to the worksheet. Here the arrays come very userfull.
Just read the data from worksheet into the Variant Array.
You don't have to work with that array right away. If it is more comfortable for you to work with dictionary, just transfer all the data from array into dictionary and work with it. Since this process is entirely made in memory, don't worry about the performance penalisation.
When you are finished with data processing in dictionary, put all data from dictionary back to the array and write that array into a new worksheet at one shot.
Worksheets("New Sheet").Range("A1").Value = MyArray
I'm pretty sure it will take only few seconds

How do you GoTo a line label in a different object?

E.g. Given Sheet 1 contains:
Ref: Do things
How can I direct a code in Module 1 to GoTo Ref? If I were in the Sheet1 code moduke then I could simply use a
Goto Ref
But this doesn't work across different modules
Your question is not clear and you didn't provide any code, so this is a guess.
GoTo is used to jump to different locations within the same sub/function. You cannot use it to jump to parts of other sub routines or functions, which it sounds like you might be trying to do.
Also, "NapDone:" is not called a reference, it's formally called a line label. :)
To help expand on the other answers.. Like they said you shouldn't use GoTo for anything in VBA except error handling.
What you should be doing is calling a public sub/function from another module. For example in Module 1 you would have the following
Sub TestMod1()
Dim MyNumber As Integer
MyNumber = GetSquare(6)
'MyNumber returns from the function with a value of 36
End Sub
and on Module 2 you have
Public Function GetSquare(ByVal MyNumber As Integer)
GetSquare = MyNumber * MyNumber
End Function
So now you know how to avoid it. GoTo is not very good programming practice as you'll have things flying all over the place. Try to break down code you're repeating into multiple Subs and just call them when needed, or functions whatever be the case. Then you'll get into classes, which are just wrapped up to represent an object and it'll do all the work for that object.
This should get you on the right track.

VBA: What is causing this string argument passed to ParamArray to get changed to a number (that looks suspiciously like a pointer)?

FINAL EDIT: It does indeed appear to be a compiler bug - see the accepted answer.
Using VBA within Excel 2007, I have the following code in 'Class1':
Option Explicit
Public Function strange(dummy As String, ParamArray pa())
Debug.Print pa(LBound(pa))
End Function
Public Sub not_strange(dummy As String, ParamArray pa())
Debug.Print pa(LBound(pa))
End Sub
Public Function also_not_strange(ParamArray pa())
Debug.Print pa(LBound(pa))
End Function
and some mode code in a module:
Option Explicit
Public Function not_strange_either(dummy As String, ParamArray pa())
Debug.Print pa(LBound(pa))
End Function
Public Sub outer(v)
Dim c As Class1
Set c = New Class1
Call c.strange("", v(LBound(v)))
Call c.not_strange("", v(LBound(v)))
Call c.also_not_strange(v(LBound(v)))
Call not_strange_either("", v(LBound(v)))
End Sub
If call 'outer' from the Immediate window like this:
call outer(array("a"))
I get back output that seems strange:
102085832
a
a
a
It seems to matter whether the called routine is in a class module or not, whether it is a Sub or a Function, and whether or not there is an initial argument. Am I missing something about how VBA is supposed to work? Any ideas?
The strange number changes from run to run. I say "looks suspiciously like a pointer" because if I call this:
Public Sub outer2(v)
Dim c As Class1
Set c = New Class1
Dim ind As Long
For ind = LBound(v) To UBound(v)
Call c.strange("", v(ind))
Next ind
End Sub
like so:
call outer2(array("a","b","c"))
I get back output like:
101788312
101788328
101788344
It's the increment by 16 that makes me suspicious, but I really don't know. Also, passing a value, say by calling:
Call c.strange("", CStr(v(ind)))
works just fine.
EDIT: A little more info...If I assign the return value from 'c.strange' to something instead of throwing it away, I get the same behavior:
Public Sub outer3(v)
Dim c As Class1
Set c = New Class1
Dim x
x = c.strange("", v(LBound(v)))
Call c.not_strange("", v(LBound(v)))
Call c.also_not_strange(v(LBound(v)))
Call not_strange_either("", v(LBound(v)))
End Sub
Interestingly, if I call my test routines as above, with an argument that results from calling 'Array', the supposed-pointer value changes. However, if I call it like this:
call outer([{1,2,3}])
I get back the same number, even if I make the call repeatedly. (The number changes if I switch to another app in Windows, like my browser.) So, now I'm intrigued that the Excel evaluator (invoked with the brackets) seemingly caches its results...
Now this is awesome.
Reproduced on office 2003.
Looks like a compiler bug.
The problem is in this line:
Call c.strange("", v(LBound(v)))
Here the compiler creates a Variant that holds a 1D array of Variant's, the only element of which is a pointer instead of the value. This pointer then goes to the strange function which actually is not strange, it only prints the Variant\Long value passed to it.
This trick brings the compiler sanity back:
Call c.strange("", (v(LBound(v))))
EDIT
Yes, this magic number is a pointer to the VARIANT structure which is supposed to be passed to the strange method. The first field of which is 8, which is vbString, and the data field contains a pointer to the actual string "a".
Therefore, it is definitely a compiler bug... Yet another VB compiler bug in regard of arrays ;)

Resources