Application Calculate from VBA - Ignore Volatile Functions - excel

Question
Is it possible to efficiently simulate the result of application.calculate from VBA, but ignoring volatile functions?
Detail
Definition: Non-Volatile Equivalent: For any volatile workbook, define the non-volatile equivalent as the workbook whereby all cells with volatile references have been replaced by their raw values.
Definition: Non-Volatile Calculation Diffs: Suppose we have any workbook. Now suppose we took its non-volatile equivalent and did a full recalc on that. Take the cells whose values changed in the non-volatile equivalent, and their values. These are the non-volatile calculation diffs.
The question is asking this. For any workbook, with potential volatile references in it, is there a way to efficiently apply just the non-volatile diffs? Efficiency is key here- it's the whole reason we're looking to ignore volatile functions - see the use case below. So any solution that does not outperform a full recalc is useless.
Use Case
We have a workbook that is rife with INDIRECT usage. INDIRECT is an Excel-native volatile function. There are around 300,000 of these in the workbook, which contains about 1.5 million used cells in total. We do not have capacity to change these. As a result of these INDIRECT usages, our Workbook takes a long time to recalculate. Circa 10 seconds. So we have automatic calculation turned off & users periodically hit manually re-calc to refresh data as they go.
Our business users are fine with that, but some of the new VBA functions we are adding could use something more efficient than a 10 second wait.
What We've Tried so Far
Sheet.Calculate - this is handy in some cases. And we do use it. But it treats the data in all other sheets as values, not formulas. So if there are any zig-zag references this does not provide a fully consistent result. E.g. imagine a reference from sheet A -> sheet B -> sheet A: then one would need to calculate sheet A, then B, then A. The number of zig-zags is arbitrary. And that's just one case, with two sheets. To solve this, one would need to essentially rewrite the entire calculation of Excel.

There's one thing that comes to my mind and it's not ideal. Perhaps you can make your own function that's not volatile and is stored in the spreadsheet, that would do exactly what INDIRECT does (or whatever formulas you use). You could call it INDIRECT2 or something, and just replace them functions on your spreadsheet.
Unfortunately, this function is not availible for Application.WorksheetFunction.functionName but there are ways around it. This is just my general idea.

Use a non-volatile User Defined Function that does the same job as INDIRECT?
Public Function INDIRECT_NV(ByVal ref_text As String, Optional ByVal a1 As Boolean) As Variant
'Non-volatile INDIRECT function
'Does not accept R1C1 notation - the optional is purely for quick replacement
INDIRECT_NV = CVErr(xlErrRef) 'Defaults to an error message
On Error Resume Next 'If the next line fails, just output the error
Set INDIRECT_NV = Range(ref_text) 'Does the work of INDIRECT, but not for R1C1
End Function
Alternatively, depending on the use case, it may be possible to use INDEX and MATCH (non-volatile functions) to replace the INDIRECT queries instead

Related

Can you make a reference to an Excel cell with a variable address?

Would it be possible to make a reference like A(2+2)? I know that formula doesn't work but how would you formulate something like that?
A couple of ways:
one can use a volatile OFFSET:
=OFFSET(A2,2,0)
or another volatile function INDIRECT:
=INDIRECT("A"&ROW(A2)+2)
or better yet the non volatile INDEX:
=INDEX(A:A,ROW(A2)+2)
Volatile functions should not be used in mass. They will recalculate every time a change is made in Excel and that can greatly impact the user experience. The INDEX will only recalc if the data in Column A on the sheet reference changes. Thus cutting the number of superfluous calculation.

Avoiding duplicate long expressions in Excel

I often find myself writing Excel formulas that have something like this in the formula:
= IF(<long expression>=<some condition>,<long expression>,0)
Is there any way to accomplish this without needing to type out <long expression> twice (and also without using helper cells)?
Ideally, something that works similar to IFERROR, i.e.
= IFERROR(<some expression>,0)
This checks if <some expression> would return any type of error, and if it doesn't, it automatically returns <some expression> (without needing to again explicitly type it out a second time).
Is there an Excel function (or combination of Excel functions) similar to IFERROR but instead of checking an error condition, it checks a general (user-defined) condition based on the formula?
When it comes to formula efficiency and calculation speed, using helper cells can be of great value, even if they may initially muck up the spreadsheet design.
Put calculations into a helper cell and refer to the helper cell in the IF statement. That way the calculation will only happen once.
This method is preferred by spreadsheet auditors over the alternative of packing everything into one formula, because it is also much easier to follow and pick apart.
With careful spreadsheet planning you can house helper cells in a different (hidden) sheet or in columns that you hide to tidy up the design.
This answer applies to Excel 365 as of March 2020. A new function is now available in Insider builds of Excel. It is called LET() and is used to define, and assign a value to, a variable that can then be used multiple times inside the braces of the LET() function.
Example:
=LET(MyResult,XLOOKUP(C1,A1:A3,B1:B3),IF(MyResult=0,"",MyResult))
The first parameter is the name of a new variable, MyResult. The variable is assigned a value with the second parameter. This can be a constant, like a number or a text, or like in this case, a formula.
The third parameter is a calculation that can use the variable value.
In the following screenshot, the Xlookup returns a 0 because the found cell is empty.
In the next formula down, the Xlookup is wrapped in an IF statement, evaluated, and then repeated. This is the approach where the calculation is duplicated, as described in the question.
The third formula shows the LET() function and its result.

Need to stop UDFs recalculating when unrelated cells deleted

I've noticed that my UDFs recalculate whenever I delete cells. This causes massive delays when deleting entire columns, because the UDF gets called for each and every cell it is used in. So if you're using 1000 UDFS, then deleting a column or cell will call it 1000 times.
By way of example, put the following UDF in a module, then call it from the worksheet a bunch of times with =HelloWorld()
Function HelloWorld()
HelloWorld = "HelloWorld"
Debug.Print Now()
End Function
Then delete a row. If your experience is like mine, you'll see it gets called once for every instance of use.
Anyone have any ideas whether this behavior can be stopped? I'd also be interested why it should get called. Seems like a flaw in Excel's dependency tree to me, but there may well be a good reason.
Edit: After experimentation, I've found more actions that trigger UDFS:
Any change to the number of columns that a ListObject (i.e. Excel Table) spans through resizing (but not rows). Even if the UDFs themselves aren't in the ListObject concerned, or in fact in any ListObject at all.
Adding new cells or columns anywhere in the sheet (but not rows).
Note that Manual Calc Mode isn't an option on several fronts.
Firsty, given that it is an application-level setting, it simply presents too great a risk that someone will use the output of any one of the spreadsheets they happen to have open without realizing they are in manual calculation mode.
Secondly, I'm not actually designing a particular spreadsheet but rather am writing a book about how non-developers can utilize well-written off-the-shelf code such as UDFs to do things that would otherwise be beyond them. Examples include dynamically concatenating or splitting text, or the exact match binary search UDF that Charles Williams outlines at https://fastexcel.wordpress.com/2011/07/22/developing-faster-lookups-part-2-how-to-build-a-faster-vba-lookup/ (And yes, I give them much warning that usually a native formula-based solution will outperform a UDF. But as you'll see from the thread I've referenced above, carefully written functions can perform well).
I don't know how users will employ these.
In the absence of a programming solution, it looks like I'll just have to point out in the book that users may experience significant delay when adding or deleting cells or resizing ListObjects if they have resource-intensive UDFS employed. Even if those UDFs are efficiently written.
Inserting or deleting a row or column or cell will always trigger a recalc in automatic mode. (You can check this by adding =NOW() to an empty workbook and inserting or deleting things)
The question should be what (unexpected) circumstances flag a cell as dirty so that it gets recalced.
There is a (probably incomplete) list of such things at
http://www.decisionmodels.com/calcsecretsi.htm
Looks like I need to add some words about VBA UDFs (have not tested XLL UDFs - they may behave differently since they are registered in a different way to VBA UDFs)
Unfortunately, I don't believe it's possible to prevent a UDF from being recalculated when "unrelated" cells are deleted. The reason for this is that the argument passed to the UDF is in fact a Range object (not just the value within the cell(s)). Deleting "unrelated" cells can actually modify the Range.
For example, users can write this kind of UDF:
Function func1(rng)
func1 = rng.Address & " (" & Format(Now, "hh:mm:ss") & ")"
End Function
Admittedly, this is not the common (and recommended) approach to write a UDF. It should normally depend on the content (value) and not the container (range).
Here I'm just returning the address of the argument. I also append a timestamp to signal when the UDF is recalculated. If you delete any column on the worksheet, all cells with this UDF are recalculated. But not if you insert a column, leaving the cells on the right (of this new column) unchanged and with the wrong value (cell address). The results are the same with inserting/deleting rows. Strangely, inserting a single cell does force recalculation of all UDF's.
I tried to remove the "dependency" on the Range. But the behavior is also the same even if the UDF's argument is typed as double (instead of leaving it as a Variant like in my example).
As you explained, deleting a column will force UDFs to be recalculated. This makes sense because a UDF can depend on the Range argument. Wether this is a smart design for a UDF is a different matter.

How to optimize dirty cell optimization when using volatile formula in Excel (like INDIRECT() )?

This is maybe a little advanced topic.
When you press F9 or recalculate the sheets, Excel tries first to find out what cells just might had a chance to be changed, so it can skip evaluating them at all. The algorithm searches all ancestors and ancestors' ancestors to find out if value of any of them has changed. If so, it proceeds to actual evaluation.
That would be, at least, as long as there's no "INDIRECT()" formula in the path. If there is, the algorithm assumes that the value of this formula is volatile (i.e. it always changes) and so evaluates all the descendants.
There are more volatile formulas: RAND, AREAS, CELL, COLUMNS, INDEX, INDIRECT, NOW, OFFSET, ROWS, TODAY. Some of them obviously should be volatile (like RAND()).
The question: is there any way of telling Excel, that a given cell that is treated volatile by Excel should in fact be kept as frozen, unless its ancestors change?
One way of resolving the problem is to write my own versions of Excel volatile functions in VBA. VBA functions are assumed not volatile.
The problem is that there is a relative high cost of VBA invocation. Another is the need of 'reinventing the wheel'. I hope there is a cheaper solution.
Not the answer you want, but...
Don't use Formulas.
A user defined formula won't help either because of the limitations of what you can do inside them.
Recommend as previous comment suggested moving calculation into a pattern similiar to...
1. pull all sheet data into a 2D array.
2. perform X transformations on data.
3. push data back into an equaly sized sheet address, to the 2D array.

Excel: list ranges targeted by INDIRECT formulas

We have a few very large Excel workbooks (dozens of tabs, over a MB each, very complex calculations) with many dozens, perhaps hundreds of formulas that use the dreaded INDIRECT function. These formulas are spread out throughout the workbook, and target several tables of data to look-up for values.
Now I need to move the ranges of data that are targeted by these formulas to a different location in the same workbook.
(The reason is not particularly relevant, but interesting on its own. We need to run these things in Excel Calculation Services and the latency hit of loading each of the rather large tables one at a time proved to be unacceptably high. We are moving the tables in a contiguous range so we can load them all in one shot.)
Is there any way to locate all the INDIRECT formulas that currently refer to the tables we want to move?
I don't need to do this on-line. I'll happily take something that takes 4 hours to run as long as it is reliable.
Be aware that the .Precedent, .Dependent, etc methods only track direct formulas.
(Also, rewriting the spreadsheets in whatever is not an option for us).
Thanks!
You could iterate over the entire Workbook using vba (i've included the code from #PabloG and #euro-micelli ):
Sub iterateOverWorkbook()
For Each i In ThisWorkbook.Worksheets
Set rRng = i.UsedRange
For Each j In rRng
If (Not IsEmpty(j)) Then
If (j.HasFormula) Then
If InStr(oCell.Formula, "INDIRECT") Then
j.Value = Replace(j.Formula, "INDIRECT(D4)", "INDIRECT(C4)")
End If
End If
End If
Next j
Next i
End Sub
This example substitues every occurrence of "indirect(D4)" with "indirect(C4)". You can easily swap the replace-function with something more sophisticated, if you have more complicated indirect-functions. Performance is not that bad, even for bigger Workbooks.
Q: "Is there any way to locate all the INDIRECT formulas that currently refer to the tables we want to move?"
As I read it, you want to look inside the arguments of INDIRECT for references to areas of interest.
OTTOMH I'd write VBA to use a regular expression parser, or even a simple INSTR to find INDIRECT( read forward to the matching ), then EVALUATE() the string inside to convert it to the actual address, repeat as required for multiple INDIRECT(...) calls and dump the formula and its translation to two columns in a sheet.
You can use something like this in VBA:
Sub ListIndirectRef()
Dim rRng As Range
Dim oSh As Worksheet
Dim oCell As Range
For Each oSh In ThisWorkbook.Worksheets
Set rRng = oSh.UsedRange
For Each oCell In rRng
If InStr(oCell.Formula, "INDIRECT") Then
Debug.Print oCell.Address, oCell.Formula
End If
Next
Next
End Sub
Instead of Debug.Print you can add code to suit your taste
Unfortunately, the arguments of
INDIRECT are usually more complex than
that. Here's an actual formula from
one of the sheets, not the most
complex formula we have:
=IF(INDIRECT("'"&$B$5&"'!"&$O5&"1")="","",INDIRECT("'"&$B$5&"'!"&$O5&"1"))
hm, you could write a simple parser by ignoring most of the characters and just looking for the relevant parts (in this example: "A..Z", "0..9" and "!:" etc.) but you will run into troubles if the arguments in "indirect" are functions.
maybe the safer approach would be to print every occurence of "indirect" in a third sheet. you could then add the desired output and write a small search and replace program to write your changes back.
If you "get" every cell in a huge
spreadsheet you might end up needing
monstrous amounts of memory. I am
still willing to try and take that
risk.
PabloG's method of selecting the used range is the way to go (added it into my original code). The speed is pretty good, especially if you check whether the current cell contains a formula. Obviously, this all depends on the size of your workbook.
I'm not sure what the etiquette of SO is concerning mention of products with which the writer is connected, but OAK, the Operis Analysis Kit, an Excel add-in, can replace the INDIRECT functions by the cell references they resolve to. You can then use Excel's audit tools to determine what dependents each range has.
You would, of course, do this to a temporary copy of the workbook.
More at
http://www.operisanalysiskit.com/oakpruning.htm
http://www.operisanalysiskit.com/help/2007/index.html?oakconceptpruning.htm
Given the age of this question you may well have found an alternative solution or workaround.

Resources