I'm trying to create a function to calculate the required rate of return of a retirement fund. The situation is this: The retiree has a target amount of funds for their retirement account at the commencement of their retirement.
They also have a target withdraw amount that they would like to draw down every year, for a target amount of years.
For example: A retirement fund amount of $1,000,000. A annual withdraw amount of $100,000, and a target of 15 years.
The code I have written is as follows:
Function RequiredReturn(retire_amnt, annual_spending, n_year)
Dim arr()
ReDim arr(0)
arr(0) = -retire_amnt
ReDim arr(1 To n_year)
For i = 1 To n_year
arr(i) = annual_spending
Next i
RequireReturn = Application.WorksheetFunction.IRR(arr)
End Function
The goal of this function is to:
Create an array with the first value (0) = the negative retirement amount
Fill the array from (1 to n) with the annual_spending amount
Call the IRR function on this to find the minimum required annual return to be able to achieve this cash flow.
If you were to do by entering this array into cells, and then use the IRR function, the required return would be 5.56%
Obviously this is not working.
How can I change this to populate the array correctly?
If populated correctly, is the IRR function being applied correctly?
Or more directly than your initial approach, use the RATE formula
This calls the initial amount as a negative, ie
Sub Test()
MsgBox Format(RequiredReturn(-1000000, 100000, 15), "#.0000%")
End Sub
options
RequiredReturn = Application.Evaluate("=RATE(" & n_year & "," & annual_spending & "," & retire_amnt & ")")
RequiredReturn = Application.Rate(n_year, annual_spending, retire_amnt)
Your assign the array twice and the second assignment overrides the first - that is the error in your array logic.
You can fix the array by defining it as:
ReDim arr(0 To n_year)
Which will give it n_year + 1 slots, where 0th slot can take the lump sum (retire_amnt) and the slots from 1-n_year take the annuity payments (annual_spending). Then you have the correct data structure for an IRR calculation.
The code below works and gives you the expected result of 5.56%
Option Explicit
Sub Test()
MsgBox Format(RequiredReturn(1000000, 100000, 15), "#.0000%")
End Sub
Function RequiredReturn(retire_amnt, annual_spending, n_year)
Dim arr() As Variant
Dim i As Long
ReDim arr(0 To n_year)
arr(0) = -retire_amnt
For i = 1 To n_year
arr(i) = annual_spending
Next i
RequiredReturn = Application.WorksheetFunction.IRR(arr)
End Function
I've mainly used IRR in the case of a capital out-flow followed by a series of equally timed cash in-flows. But in the case of annuity rate of return my understanding is that IRR can be used as well because it doesn't matter if the starting amount is a credit or debit as long as the subsequent cash in/out-flows are of the opposite signage. You probably don't want to risk your pension on that comment though :)
Related
I have implemented this method to multiply every array element by a number held in a variable. It is terribly slow.
Is there an accepted "fastest" way to multiply every element in a range by a constant? Or at least one which is not as slow? I have to do this 10 times and it takes a couple of minutes.
MultFactor = 10
For Each cell In Sheet1.Range("B3:B902")
cell.Value = cell.Value * MultFactor
Next cell
The solution cited in Multiply Entire Range By Value? multiplies by a constant (not a variable). If I use this code (changing the range from "A1:B10" to "B3:B902"), I get a nonsense answer.
Dim rngData As Range
Set rngData = Sheet12.Range("B3:B902")
rngData = Evaluate(rngData.Address & "*2")
My original values in B3:B902 are zero for the first 100 elements or so and then increase a bit and finally decrease and have another run of zeros, but what ends up in my range is a series of numbers that clobbers everything in my range. It begins at -224.5 and decreases by 0.5 all the way to the last cell.
-224.5
-224.0
-223.5
etc.
Even if that worked, how would I modify it to use the variable MultFactor?
This will be hundreds to thousands of times faster. The difference is that all of the calcs are done to a VBA array instead of directly to worksheet cells, one by one. Once the array is updated it is written back to the worksheet in one go. This reduces worksheet interaction to just two instances, reading the array and writing it. Reducing the number of instances that your VBA code touches anything on the worksheet side is critical to execution speed.
Sub Mozdzen()
Const FACTOR = 10
Const SOURCE = "B3:B902"
Dim i&, v
v = Sheet1.Range(SOURCE)
For i = 1 To UBound(v)
v(i, 1) = v(i, 1) * FACTOR
Next
Sheet1.Range(SOURCE) = v
End Sub
Building on the above idea, a better way to manage the code is to encapsulate the array multiplication with a dedicated function:
Sub Mozdzen()
Const FACTOR = 10
Const SOURCE = "B3:B902"
With Sheet2.Range(SOURCE)
.Value2 = ArrayMultiply(.Value2, FACTOR)
End With
End Sub
Function ArrayMultiply(a, multfactor#)
Dim i&
For i = 1 To UBound(a)
a(i, 1) = a(i, 1) * multfactor
Next
ArrayMultiply = a
End Function
You need:
rngData = Sheet12.Evaluate(rngData.Address & "*2")
since the address property doesn't include the sheet name by default (so your formula is evaluated in the context of the active sheet's range B3:B902)
Then it would need:
rngData = Sheet12.Evaluate(rngData.Address & "*" & MultFactor)
to add in your variable.
Base problem
I'm trying to make my own VBA functions in Excel and I want it to be able to accept essentially any kind of input and treat the input as a vector, but I haven't figured out how to do so for both continuous (e.g. (A1:A10) or (A1:R1)) and discontinuous (e.g. (A1;B5;G12)) ranges. I can make the functions work for either, but not for both types at the same time.
The reason I wish to do this is that I wish to make my own versions of AVERAGE and STDEV.S where it can handle #N/A values in the cells. I know that I can use AVERAGEIF(range;"<>#N/A") in order to do so for averages, but AVERAGEIF does not allow me to use discontinuous ranges and, as far as I know, there are no such alternative for STDEV.S.
Background on my data
My data is obtained from several samples which I have measured by various chemical means. I have prepared one sample per day and then spent the rest of the day measuring stuff on it. Each sample is considered "one experiment" and each experiment is stored as individual worksheets in which I store data from all different analysis methods and do any data treatment to make the data comparable (e.g. calculates molarity from molality, do adjustments for temperature differences, etc); I also store a lot of semi irrelevant information (such as notes that are not required for the final results, but which are still required to keep). Long story short, it's far too much data to have all runs stored in one worksheet as it would make it too messy to look at and too messy to treat individual experiments, especially whenever I add a new experiment to the pile of data; my current method allows me to simply copy an existing worksheet and pasting new data into the old equations. The treated data is then linked to an "overview" worksheet where I list the most interesting data structured in such way that I can easily compare the values from different measurements. The linking is done with an INDIRECT so that I can easily add new information from new experiments. Since the data comes from experiments, then there are bound to be data missing and I use #N/A to cover such holes as linking from one worksheet to another produces a "0" if the data is missing. I know I could replace the #N/A with a simple dash (-) or something similar, which will make the built in AVERAGE and STDEV.S work, but I want to use the same arrays of data for plotting and it appears as if only #N/A will remove the data point from the plot as the graphing in excel treats a dash as a zero value.
The data on my "overview" worksheet is arranged as
Date pH Na+ conc K+ conc ...lots of other variables
Date 1 7.4 140 3 ...
Date 2 7.1 #N/A 4 ...
.... ... ... ... ...
Date N 7.3 143 3.5 ...
Code which works for continuous ranges
What I have managed to do so far, which supports continuous ranges, is the following code example which calculates the standard deviation of cells which contain #N/A values. This code works perfectly when I select a whole column (or continuous part of a column), but not if i select discontinuous range of cells.
Function StdevNaN_S(xRange)
'Sample Standard deviation which excludes NaN values
xR = xRange 'I can, for some strange reason, not use UBound unless I re-store the data in xR...
NoE1 = UBound(xR, 1) 'Number of Elements along dimension 1 (columns)
NoE2 = UBound(xR, 2) 'Number of Elements along dimension 2 (rows)
NoE = NoE1 * NoE2 'Total Number of Elements (this way makes it work regardless of row or column range)
'Need to first calculate the NaN excluded average value - could use the AVERAGEIF to simplify, but that will break if the range is discontinuous
xSum = 0
xAmount = 0
For N = 1 To NoE
If IsNumeric(xRange(N)) Then
xSum = xSum + xRange(N)
xAmount = xAmount + 1 'counting how many cells that are used in the sum, used as the divisor in the average and the variance expression. Couldn't use the "CountIf" expression as it counted cells which contained text
Else
End If
Next N
xAvg = xSum / xAmount
'Uses the average in the variance calculation
xSum = 0
For N = 1 To NoE
If IsNumeric(xRange(N)) Then
xSum = xSum + (xRange(N) - xAvg) ^ 2 'Summing up (x - x_avg) ^ 2, which is the dividend of the variance expression
Else
End If
Next N
StdevNaN_S = (xSum / (xAmount - 1)) ^ 0.5 'the sample standard deviation is the square root of the corrected variance
End Function
My problem is that I wish to make averages and standard deviation calculations for parts of the data. Such as the sample produced on e.g., Date 1, 5, 19 and 34 was produced with a particular stock of chemicals, while Date 2:4, 6:11 and 25:33 from a second stock and the rest from a third one, so I wish to know if there are any influences of the specific stocks.
Code which works for discontinuous ranges
I found an example on cpaerson.com which showed how to allow a function to take discontinuous ranges and treat it as a vector. Their example is
Function SumOf(ParamArray Nums() As Variant) As Variant
''''''''''''''''''''''''''''''''''
' Add up the numbers in Nums
''''''''''''''''''''''''''''''''''
Dim N As Long
Dim D As Double
For N = LBound(Nums) To UBound(Nums)
If IsNumeric(Nums(N)) = True Then
D = D + Nums(N)
Else
SumOf = CVErr(xlErrNum)
Exit Function
End If
Next N
SumOf = D
End Function
However, this function works only for discontinuous selections - it works just as it should if I select e.g., (A1;A5;A19;A34) or (A1;A2;A3;...;A34) but it gives me an error if I select (A1:A34).
Question
How should I code my function so that I can select whichever cells I want and then use their content for calculations?
I finally managed to figure out how to sort the data so that the function can handle both continuous and discontinuous ranges, a lot thanks to the comments on the question from SJR and Ralph, as well as from the answer on this question.
The way to allow discontinuous ranges is with an ParamArray and then go through all parameters entered and check what they contain (this is where I initially failed as I didn't know how to make Excel check the content of each parameter I fed to the function). The tricky part is that if the parameter it currently checks contains only one cell, then the way it needs to be treated is different compared to how it should be treated if it contains a continuous range.
E.g., checking all parameters from the ParamArray will fail if one uses UBound on a parameter containing only one cell. Additionally, in order to properly address each cell in a continuous range in a parameter, then one needs to loop through InputParameters(i).Cells(j), whereas if the parameter is just a single cell, then it is enough to address it as InputParameters(i).
The code I produced now works as I wanted it; I can select any range of cells and calculate both the standard deviation and the average while excluding NaN values. I compared it to the built in STDEV.S, STDEV.P and AVERAGE and it produces the exact same result*. I have not clue why the built in functions does not exclude NaN values as default, but I include the code for the functions below for anyone who want to use it.
Code for STDEV.S which excludes NaN values
Function NaNStdev_S(ParamArray xRange() As Variant) As Double
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
'A function to calculate the sample standard deviation of any ranges of cells
'while excluding text, logicals, empty cells and cells containing #N/A.
'Can handle both continuous and discontinuous ranges.
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
Dim CellsUsed As Integer
Dim NumArg As Integer
Dim NumCell As Integer
Dim xAvg As Double
Dim xSum As Double
Dim xTemp As Variant
Dim xVect() As String
NumArg = UBound(xRange) 'Counts the number of input arguments (i.e., number of discontinuous regions)
For i = 0 To NumArg 'Goes through each discontinuous region
xTemp = xRange(i) 'Stores the current region in a temporary variable as several of the later operations cannot be performed on the full input array
If IsArray(xTemp) Then 'Checks if the current region is an array; if yes, then that array will be continuous
NumCell = UBound(xTemp, 1) * UBound(xTemp, 2) 'Checks how many cells are in the array
For j = 1 To NumCell 'Goes through all cells in the current region
If IsEmpty(xRange(i).Cells(j)) Then 'do nothing
ElseIf Application.IsLogical(xRange(i).Cells(j)) Then 'do nothing
ElseIf IsNumeric(xRange(i).Cells(j)) Then 'If the content of the cell is numeric, then use it
xSum = xSum + xRange(i).Cells(j) 'Add the current cell value to the sum of all cell values
CellsUsed = CellsUsed + 1 'Counts how many of the cell values that are actually used
ReDim Preserve xVect(CellsUsed) 'Adjusts the size of xVect
xVect(CellsUsed) = xRange(i).Cells(j) 'Reformats all usable values into one single vector for later use
Else
End If
Next j
Else 'If the current region is not an array, then it's just a single value
If IsEmpty(xRange(i)) Then 'do nothing
ElseIf IsNumeric(xRange(i)) Then 'If the content of the current region is numeric, then use it
xSum = xSum + xRange(i) 'Add the current cell (region) value to the sum of all cell values
CellsUsed = CellsUsed + 1 'Increase the counter of used values
ReDim Preserve xVect(CellsUsed) 'Adjusts the size of xVect
xVect(CellsUsed) = xRange(i) 'Adds the current value into the reformatted vector for later use
Else
End If
End If
Next i
xAvg = xSum / CellsUsed 'Average of all cells which contains numbers
xSum = 0 'resets the sum as it's no longer needed
For i = 1 To CellsUsed 'Goes through the reformatted vector and calculates the sum of (x - x_avg) ^ 2
xSum = xSum + (xVect(i) - xAvg) ^ 2 'This is the dividend of the variance equation
Next i
NaNStdev_S = (xSum / (CellsUsed - 1)) ^ 0.5 'the sample standard deviation is the square root of the corrected variance
End Function
Code for STDEV.P which excludes NaN values
Function NaNStdev_P(ParamArray xRange() As Variant) As Double
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
'A function to calculate the population standard deviation of any ranges of cells
'while excluding text, logicals, empty cells and cells containing #N/A.
'Can handle both continuous and discontinuous ranges.
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
Dim CellsUsed As Integer
Dim NumArg As Integer
Dim NumCell As Integer
Dim xAvg As Double
Dim xSum As Double
Dim xTemp As Variant
Dim xVect() As String
NumArg = UBound(xRange) 'Counts the number of input arguments (i.e., number of discontinuous regions)
For i = 0 To NumArg 'Goes through each discontinuous region
xTemp = xRange(i) 'Stores the current region in a temporary variable as several of the later operations cannot be performed on the full input array
If IsArray(xTemp) Then 'Checks if the current region is an array; if yes, then that array will be continuous
NumCell = UBound(xTemp, 1) * UBound(xTemp, 2) 'Checks how many cells are in the array
For j = 1 To NumCell 'Goes through all cells in the current region
If IsEmpty(xRange(i).Cells(j)) Then 'do nothing
ElseIf Application.IsLogical(xRange(i).Cells(j)) Then 'do nothing
ElseIf IsNumeric(xRange(i).Cells(j)) Then 'If the content of the cell is numeric, then use it
xSum = xSum + xRange(i).Cells(j) 'Add the current cell value to the sum of all cell values
CellsUsed = CellsUsed + 1 'Counts how many of the cell values that are actually used
ReDim Preserve xVect(CellsUsed) 'Adjusts the size of xVect
xVect(CellsUsed) = xRange(i).Cells(j) 'Reformats all usable values into one single vector for later use
Else
End If
Next j
Else 'If the current region is not an array, then it's just a single value
If IsEmpty(xRange(i)) Then 'do nothing
ElseIf IsNumeric(xRange(i)) Then 'If the content of the current region is numeric, then use it
xSum = xSum + xRange(i) 'Add the current cell (region) value to the sum of all cell values
CellsUsed = CellsUsed + 1 'Increase the counter of used values
ReDim Preserve xVect(CellsUsed) 'Adjusts the size of xVect
xVect(CellsUsed) = xRange(i) 'Adds the current value into the reformatted vector for later use
Else
End If
End If
Next i
xAvg = xSum / CellsUsed 'Average of all cells which contains numbers
xSum = 0 'resets the sum as it's no longer needed
For i = 1 To CellsUsed 'Goes through the reformatted vector and calculates the sum of (x - x_avg) ^ 2
xSum = xSum + (xVect(i) - xAvg) ^ 2 'This is the dividend of the variance equation
Next i
NaNStdev_P = (xSum / CellsUsed) ^ 0.5 'the population standard deviation is the square root of the variance
End Function
Code for AVERAGE which excludes NaN values
Function NaNAverage(ParamArray xRange() As Variant) As Double
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
'A function to calculate the average of any ranges of cells
'while excluding text, logicals, empty cells and cells containing #N/A.
'Can handle both continuous and discontinuous ranges.
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
Dim CellsUsed As Integer
Dim NumArg As Integer
Dim NumCell As Integer
Dim xSum As Double
Dim xTemp As Variant
NumArg = UBound(xRange) 'Counts the number of input arguments (i.e., number of discontinuous regions)
For i = 0 To NumArg 'Goes through each discontinuous region
xTemp = xRange(i) 'Stores the current region in a temporary variable as several of the later operations cannot be performed on the full input array
If IsArray(xTemp) Then 'Checks if the current region is an array; if yes, then that array will be continuous
NumCell = UBound(xTemp, 1) * UBound(xTemp, 2) 'Checks how many cells are in the array
For j = 1 To NumCell 'Goes through all cells in the current region
If IsEmpty(xRange(i).Cells(j)) Then 'do nothing
ElseIf Application.IsLogical(xRange(i).Cells(j)) Then 'do nothing
ElseIf IsNumeric(xRange(i).Cells(j)) Then 'If the content of the cell is numeric, then use it
xSum = xSum + xRange(i).Cells(j) 'Add the current cell value to the sum of all cell values
CellsUsed = CellsUsed + 1 'Counts how many of the cell values that are actually used
Else
End If
Next j
Else 'If the current region is not an array, then it's just a single value
If IsEmpty(xRange(i)) Then 'do nothing
ElseIf IsNumeric(xRange(i)) Then 'If the content of the current region is numeric, then use it
xSum = xSum + xRange(i) 'Add the current cell (region) value to the sum of all cell values
CellsUsed = CellsUsed + 1 'Increase the counter of used values
Else
End If
End If
Next i
NaNAverage = xSum / CellsUsed 'Average of all cells which contains numbers
End Function
*Disclaimer
I mentioned that the code produces exactly the same value as the built in function - however, I did notice one occasion when it did not.
I placed the following randomly chosen values as randomly sized and positioned ranges in my Excel sheet:
(00:01:00, -10, -33, 10, 33, 20, 66, 30, 40, 300, TRUE, {empty cell} , #N/A)
If they are randomly distributed (i.e., I placed them in the following cells (P22:Q23;R22:R23;S22:T22;S21:V21;Q28)), then they differ from the value which STDEV.S produces (I have manually excluded the cell with #N/A from the STDEV.S function), but they only differ on the 13th decimal (my function gives 93.5950714912684, while STDEV.S gives 93.5950714912683), which should be a small enough error to be irrelevant. Funny thing is, if I place all of the values as one row (i.e., I place all of the values on e.g. (M34:Y34)) then both my function and the built in function gives exactly the same result (i.e., 93.5950714912683). The error seems to stem from the cell containing 1 minute; if I change 00:01:00 to any other time value (such as 00:01:01 or 01:01:00), then both functions yields exactly the same result regardless if the values are placed on a row or as randomly distributed regions on the worksheet.
I cannot explain this odd behaviour, but so far it seems to be producing only an insignificant error, so I will assume that my code is working as intended.
I am executing the VBA code below in the following file:
http://www.filedropper.com/error13
I get an Error 13 Type Mismatch.
Here is the macro
Aggregate, Collate and Transpose rows into columns
Works fine when I select some of the rows (for example id 1001 or 1003 and 1004 together but when I try to process more rows I get Error 13.
I'm trying to process each id at a time but I have about 100..
For the sake of curiosity and intellectual exercise, I took a stab at the array processing method originally submitted by ZygD in the original question with a mind to overcome the Runtime error '13': Type mismatch error on larger data sets.
The ReDim Preserve statement can only redimension the last rank while preserving existing values already stored subject to the fact that you are raising the size of the array dimension and not shrinking it. This is what msdn.microsoft.com has to say on the subject:
Resizing with Preserve. If you use Preserve, you can resize only the last dimension of the array. For every other dimension, you must specify the bound of the existing array.
For example, if your array has only one dimension, you can resize that dimension and still preserve all the contents of the array, because you are changing the last and only dimension. However, if your array has two or more dimensions, you can change the size of only the last dimension if you use Preserve.
Due to the orientation of the data peeled out of the worksheet, the first rank was the dimension to grow with ReDim so the orientation was flipped with Application.Transpose, ReDim'ed with Preserve then flipped back. As the array grew with additional records, the Application.Transpose quickly reached its maximum capacity to reorient the array. I found some older documentation on this in XL: Limitations of Passing Arrays to Excel Using Automation but it is horribly out of date.
My solution was to transpose the values from ar1 into ar2 on the fly so that ar2 could be redimensioned without reorientation. Once processing was complete, the results were in the wrong orientation. To get the values back into the worksheet in the correct orientation, I wrote a helper function that transposed ar2 back into a truncated ar1. This pseudo-transpose was only needed once; just before stuffing the new aggregated values back into the reporting area.
Modified sub code:
Sub jpd_Transposing()
Const sDestination As String = "D2"
Dim ar1 As Variant
Dim ar2 As Variant
Dim i As Long 'counter
With ActiveSheet
ar1 = .Range("A2:B" & .Cells(Rows.Count, 1).End(xlUp).Row).Value
ReDim ar2(1 To 2, 1 To 1)
ar2(1, 1) = ar1(1, 1): ar2(2, 1) = ar1(1, 2)
For i = 2 To UBound(ar1, 1)
If ar1(i, 1) = ar2(1, UBound(ar2, 2)) Then
ar2(2, UBound(ar2, 2)) = ar2(2, UBound(ar2, 2)) & ar1(i, 2)
ElseIf ar1(i, 1) = vbNullString Then
ar2(2, UBound(ar2, 2)) = ar2(2, UBound(ar2, 2)) & " "
Else
ReDim Preserve ar2(1 To 2, 1 To UBound(ar2, 2) + 1)
ar2(1, UBound(ar2, 2)) = ar1(i, 1)
ar2(2, UBound(ar2, 2)) = ar1(i, 2)
End If
Next
ar1 = my_2D_Transpose(ar1, ar2)
.Range(sDestination).Resize(UBound(ar1, 1), UBound(ar1, 2)) = ar1
End With
End Sub
Function my_2D_Transpose(a1 As Variant, a2 As Variant)
Dim a As Long, b As Long
ReDim a1(1 To UBound(a2, 2), 1 To UBound(a2, 1))
For a = LBound(a2, 1) To UBound(a2, 1)
For b = LBound(a2, 2) To UBound(a2, 2)
a1(b, a) = Trim(a2(a, b))
Next b
Next a
my_2D_Transpose = a1
End Function
So now you might be wondering just how much improvement over the original worksheet based routine there was with the arrayed memory processing. As that was the logical next step, I ran both with a timer noting start and stop.
Sub timed()
Application.ScreenUpdating = False
Application.EnableEvents = False
Debug.Print Timer
Call concatenate_and_transpose_to_delim_string
Debug.Print Timer
Call jpd_Transposing
Debug.Print Timer
Application.EnableEvents = True
Application.ScreenUpdating = True
End Sub
Both of the report data results were identical. Note that I turned off screen updating and event handling for the duration of the test. This likely improved the worksheet method more than it improved the array method but I thought it was fair given that these are pretty standard techniques for improving the efficiency of a macro.
Timed Results:
Test environment: 45,894 rows × 2 columns of raw data converted to 123 rows × 2 columns of aggregated report data using a business class i5/8Gb based laptop (Win7, Excel 2010 version 14.0.7145.5000 (32-bit)
concatenate_and_transpose_to_delim_string (worksheet) .... 00:01.01 seconds¹
jpd_Transposing (memory array) ............................................... 00:00.07 seconds¹
¹Test was run several times. Times are typical.
Conclusions:
Okay, so we picked up almost a full second using a variant memory array over the worksheet read/write but that is still a whopping 93% improvement in efficiency of processing identical data to identical results. I've converted other long-running routines from worksheet driven to arrayed memory; the results were at least as appreciable and those were devoted to more repetitious lookup-type operations in large data matrices.
Was it worth it? That's pretty much up to the individual user and situation. Certainly there are benefits to be had but I write worksheet based code a lot faster than array based code so unless this ran several times a day every day, I probably wouldn't bother. The size of the project would also be a factor as benefits would increase with the amount of work to be done. Still, it's good to keep the methods used with memory array methods fresh in the mind and a mashup of methods may produce the best result in some cases.
FWIW, the VBA Trim function used in the transpose helper function produced no measurable detrimental effect (e.g. additional time) whether it was used or not and seemed the best place to ensure that the end result did not have a trailing space character left over from string concatenation.
I've pieced together a macro to allow me to calculate the cost of a story task by calculating the specific rate based on the developer assigned. I have the rate table on a second sheet. I am able to get a result for the cell that the macro is set to (Row 2), but want it to run on all rows. I know I have to set a generic range, but am not sure. How should I change the range declare to run on all rows?
Here is the code:
Sub GetCost()
Range("D2").Select
' Set Do loop to stop when an empty cell is reached.
Do Until IsEmpty(ActiveCell)
Dim Estimate As Integer, Assignee As String, RodRate As Integer, GarthRate As Integer, DerekRate As Integer, TotalCost As Integer
Estimate = ThisWorkbook.Worksheets("Sheet1").Range("D2").Value
Assignee = ThisWorkbook.Worksheets("Sheet1").Range("E2").Value
RodRate = ThisWorkbook.Worksheets("Sheet2").Range("B2").Value
GarthRate = ThisWorkbook.Worksheets("Sheet2").Range("B3").Value
DerekRate = ThisWorkbook.Worksheets("Sheet2").Range("B4").Value
If Assignee = "Rod" Then
TotalCost = Estimate * RodRate
ElseIf Assignee = "Garth" Then
TotalCost = Estimate * GarthRate
ElseIf Assignee = "Derek" Then
TotalCost = Estimate * DerekRate
Else
TotalCost = "0"
End If
ThisWorkbook.Worksheets("Sheet1").Range("F2").Formula = TotalCost
ActiveCell.Offset(1, 0).Select
Loop
End Sub
I have rewritten your code with explanations which I hope are enough for you to understand why. There is much more that I could say. I hope this is a good balance between too little and too much.
However, I have to point out that there are some excellent project management tools available. I do not believe this is a good use of your time.
Random points
On 32-bit computers, Long is better than Integer.
Do not declare your variables inside a loop. The scope of a variable declared inside a sub-routine is the
the sub-routine so declare them at the top of the sub-routine.
You can declare all your variables in a single Dim statement but I find it confusing unless there is a real association between two or more variable. I might have:
Dim RodRate As Long, GarthRate As Long, DerekRate As Long
because these variables are associated. However the trouble with this approach is that you will have to add MaryRate and JohnRate and AngelaRate when these people join your project.
You need an array:
Dim PersonRate(1 To 3) As Long
where PersonRate(1) = Rate for Rod, PersonRate(2) = Rate for Garth and PersonRate(3) = Rate for Derek.
But this is hardly any better. You want a table that can grow. So today:
Name Rate
Rod 20
Garth 25
Derek 15
Next week:
Name Rate
Rod 20
Garth 25
Derek 15
Mary 30
With this, you pick up the Assignee's name, run down the table until you find their name then look across for their rate.
I assume you have a table like this in Sheet2. You could keep going back to Sheet2 but better to load the table into an array.
We could have:
Dim PersonName() As String
Dim PersonRate() As Long
so PersonRate(2) gives the rate for PersonName(2).
Note in my first array declaration I wrote: PersonRate(1 To 3). This time, the brackets are empty. With PersonRate(1 To 3), I am saying I want exactly three entries in the array and this cannot be changed. With PersonRate(), I am saying I want an array but I will not know how many entries until run time.
I said we could have two arrays, PersonName() and PersonRate() and this is what I have done. This is an easy-to-understand approach but I do not think it is the best approach. I prefer structures. When you have got this macro working and before you start your next look up User Types which is the VBA name for a structure.
Consider:
With Sheets("Sheet2")
RowMax = .Cells(Rows.Count, "A").End(xlUp).Row
End With
There is a lot to explain here.
Cells means I want to address a cell within the active workbook. .Cells means I want to address a cell within the sheet identified in the With statement. This means I do not have to select Sheet1 or Sheet2 to look at their contents. Selecting worksheets is slow and the code tends to be more difficult to understand.
.Cells(Row, Column) identifies a cell. Row must be a number but column can be a number or a column code: A=1, B=2, Z=26, AA=27, etc.
Rows.Count returns the number of rows in a sheet for the version of Excel you are using. So .Cells(Rows.Count, "A") identifies the bottom of column "A".
End(xlUp) is the VBA equivalent of clicking Ctrl+UpArrow. If you are not familar with Ctrl+Arrow I suggest you play with these four controls. Note, these controls give easy to understand results with a rectangular table. However, if there are empty cells, the results can be strange.
Putting this together: .Cells(Rows.Count, "A").End(xlUp).Row means start at the bottom of column A, go up until you hit a cell with a value and return its row number. So this sets RowMax to the last row of the Rate table. When you add row 5 with Mary's name and rate, this code will automatically adjust.
Revised code
This should be enough to get you started. Welcome to the joys of programming.
' * Require all variables to be declared which means a misspelt name
' is not taken as an implicit declaration
Option Explicit
Sub GetCost()
Dim Estimate As Integer
Dim Assignee As String
Dim TotalCost As Integer
Dim PersonName() As String
Dim PersonRate() As String
Dim InxPerson As Long
Dim RowCrnt As Long
Dim RowMax As Long
' You can declare constants and use them in place of literals.
' You will see why later. I could have made these strings and
' used "A", "B", "D", "E" and "F" as the values. Change if that
' is easier for you.
Const ColS2Name As Long = 1
Const ColS2Rate As Long = 2
Const ColS1Estimate As Long = 4
Const ColS1Assignee As Long = 5
Const ColS1Total As Long = 6
' Before doing anything else we must load PersonName and PersonRate from
' Sheet2. I assume the structure of Sheet2 is:
' A B
' 1 Name Rate
' 2 Rod 20
' 3 Garth 25
' 4 Derek 15
With Sheets("Sheet2")
RowMax = .Cells(Rows.Count, ColS2Name).End(xlUp).Row
' I now know how big I want the the name and rate arrays to be
ReDim PersonName(1 To RowMax - 1)
ReDim PersonRate(1 To RowMax - 1)
' Load these arrays
For RowCrnt = 2 To RowMax
' I could have used 1 and 2 or "A" and "B" for the column
' but this is easier to understand particularly if you come
' back to this macro in six month's time.
PersonName(RowCrnt - 1) = .Cells(RowCrnt, ColS2Name).Value
PersonRate(RowCrnt - 1) = .Cells(RowCrnt, ColS2Rate).Value
Next
End With
With Sheets("Sheet1")
' I am using the same variable for rows in sheets Sheet1 and Sheet2.
' This is OK because I never look at Sheet1 and Sheet2 at the same time.
RowCrnt = 2
Do Until IsEmpty(.Cells(RowCrnt, ColS1Estimate))
Estimate = .Cells(RowCrnt, ColS1Estimate).Value
Assignee = .Cells(RowCrnt, ColS1Assignee).Value
.Cells(RowCrnt, ColS1Total).Value = 0
' Locate the Assignee in the PersonName array and
' extract the matching rate
For InxPerson = 1 To UBound(PersonName)
If PersonName(InxPerson) = Assignee Then
.Cells(RowCrnt, ColS1Total).Value = Estimate * PersonRate(InxPerson)
Exit For
End If
Next
RowCrnt = RowCrnt + 1
Loop
End With
End Sub
Tony's answer is a great solution and introduction to programming and very well written so I've +1 it. However unless I'm missing something code should always be the last resort in excel as it is very slow compared to formulas, I would have thought that a simple lookup would suffice, something like:
=D2*(vlookup(E2,'sheet2'!A:B,2,FALSE))
Copied down the column
First off, I apologize for starting a new thread but the original got confusing because I couldn't articulate my ask well (Link to original thread: Dynamic Nested Loops for Autofilter in Excel VBA). But now I have actually written the program to the way I like except using a switch statement instead of the more dynamic use of nested looping.
edit:
RSum is used to store a range and a boolean. The user selects the header cell for a column and chooses whether they want get a summation of that column or a unique count when summarizing. This allows for a collection of these objects to allow summarizing of multiple columns. This input wasn't so bad to make dynamic. The next input which starts as rtemp and ends as array1, is again the user selects the header cell for a column but this it takes the values in that column and saves a unique list to array1. With this list a for loop loops through the array using its value as criteria for an autofilter. For each step in the loop after the autofilter, the summary is calculated using the SumThisA taking the RSum object collection as an input. The data is laid out in columns where each row is a unique record.
So the question is, for the below code, I want the user to be able to select the number of categories to summarize by, have a popup to fill in those ranges (can figure this out), and then run a filter as such:
for i = 0 to UBound(array1)
Autofilter criteria1:=array1(i)
for j = 0 to UBound(array2)
Autofilter criteria1:=array2(j)
......
for x = 0 to UBound(arrayx)
Autofilter criteria1:=arrayx(x)
aSum(i,j,....x) = somefunction
Now I understand I would need to use a recursive function, but having never used one before and the somewhat complexity of this program, it is out of my understanding. Would anyone be able to help explain how to use it in this context? Plus because of the generalization of this program, it could be a useful tool for many people.
'---------Initialize Arrays---------------'
t = sMax - 1
Dim aSum()
ReDim aSum(UBound(arr1), t)
'---------------------Perform Summary----------------'
For i = LBound(arr1) To UBound(arr1)
If i = 0 Then
Data.AutoFilter field:=afield, Criteria1:=arr1, Operator:=xlFilterValues
Else
Data.AutoFilter field:=afield, Criteria1:=arr1(i)
End If
temp = SumThisA(SumValues, sMax)
For j = LBound(temp) To UBound(temp)
aSum(i, j) = temp(j)
Next j
Next i
Sum of Dollars For:
1. arrayA(1)-------100
- arrayB(1)------30
- arrayB(2)------70
2. arrayA(2)-------200
- arrayB(1)-----120
- arrayB(2)------80
3. Total-----------300
Here's a very kludgy example of recursion for what it seems you want to do. I faked up some criteria, so don't get hung up on how I'm testing for that, what's important is how the function Filter functions recursively. If I could pinpoint more exactly what you wanted I could craft it more precisely, and with less hardcoding.
Test Harness:
Public Sub Test()
Dim FilteredArray As Variant, cArray As Variant, working Array As Variant
Dim criteria As Integer
criteria = 1
ReDim criteriaArray(1 To 2)
cArray(1) = Range("C1").Value
cArray(2) = Range("C2").Value
Set workingArray = Range("A1:A7")
FilteredArray = Filter(workingArray, 7, cArray, criteria)
Range("D1") = FilteredArray
End Sub
Recursive Filter Function:
Public Function Filter(workingArray As Variant, index As Integer, _
criteriaArray As Variant, criteria) As Variant
Dim tempArray As Variant, i As Integer
ReDim tempArray(1 To 1)
For i = 1 To index
If Mid(workingArray(i), criteria, 1) = criteriaArray(criteria) Then
ReDim Preserve tempArray(1 To UBound(tempArray) + 1)
tempArray(UBound(tempArray) - 1) = workingArray(i)
End If
Next i
ReDim Preserve tempArray(1 To UBound(tempArray) - 1)
If criteria < 2 Then
Filter = Filter(tempArray, UBound(tempArray), criteriaArray, criteria + 1)
Else
Filter = tempArray
End If
End Function
Have you considered using a pivot table ? Your requirements seem very close to that functionality...