I've noticed that my UDFs recalculate whenever I delete cells. This causes massive delays when deleting entire columns, because the UDF gets called for each and every cell it is used in. So if you're using 1000 UDFS, then deleting a column or cell will call it 1000 times.
By way of example, put the following UDF in a module, then call it from the worksheet a bunch of times with =HelloWorld()
Function HelloWorld()
HelloWorld = "HelloWorld"
Debug.Print Now()
End Function
Then delete a row. If your experience is like mine, you'll see it gets called once for every instance of use.
Anyone have any ideas whether this behavior can be stopped? I'd also be interested why it should get called. Seems like a flaw in Excel's dependency tree to me, but there may well be a good reason.
Edit: After experimentation, I've found more actions that trigger UDFS:
Any change to the number of columns that a ListObject (i.e. Excel Table) spans through resizing (but not rows). Even if the UDFs themselves aren't in the ListObject concerned, or in fact in any ListObject at all.
Adding new cells or columns anywhere in the sheet (but not rows).
Note that Manual Calc Mode isn't an option on several fronts.
Firsty, given that it is an application-level setting, it simply presents too great a risk that someone will use the output of any one of the spreadsheets they happen to have open without realizing they are in manual calculation mode.
Secondly, I'm not actually designing a particular spreadsheet but rather am writing a book about how non-developers can utilize well-written off-the-shelf code such as UDFs to do things that would otherwise be beyond them. Examples include dynamically concatenating or splitting text, or the exact match binary search UDF that Charles Williams outlines at https://fastexcel.wordpress.com/2011/07/22/developing-faster-lookups-part-2-how-to-build-a-faster-vba-lookup/ (And yes, I give them much warning that usually a native formula-based solution will outperform a UDF. But as you'll see from the thread I've referenced above, carefully written functions can perform well).
I don't know how users will employ these.
In the absence of a programming solution, it looks like I'll just have to point out in the book that users may experience significant delay when adding or deleting cells or resizing ListObjects if they have resource-intensive UDFS employed. Even if those UDFs are efficiently written.
Inserting or deleting a row or column or cell will always trigger a recalc in automatic mode. (You can check this by adding =NOW() to an empty workbook and inserting or deleting things)
The question should be what (unexpected) circumstances flag a cell as dirty so that it gets recalced.
There is a (probably incomplete) list of such things at
http://www.decisionmodels.com/calcsecretsi.htm
Looks like I need to add some words about VBA UDFs (have not tested XLL UDFs - they may behave differently since they are registered in a different way to VBA UDFs)
Unfortunately, I don't believe it's possible to prevent a UDF from being recalculated when "unrelated" cells are deleted. The reason for this is that the argument passed to the UDF is in fact a Range object (not just the value within the cell(s)). Deleting "unrelated" cells can actually modify the Range.
For example, users can write this kind of UDF:
Function func1(rng)
func1 = rng.Address & " (" & Format(Now, "hh:mm:ss") & ")"
End Function
Admittedly, this is not the common (and recommended) approach to write a UDF. It should normally depend on the content (value) and not the container (range).
Here I'm just returning the address of the argument. I also append a timestamp to signal when the UDF is recalculated. If you delete any column on the worksheet, all cells with this UDF are recalculated. But not if you insert a column, leaving the cells on the right (of this new column) unchanged and with the wrong value (cell address). The results are the same with inserting/deleting rows. Strangely, inserting a single cell does force recalculation of all UDF's.
I tried to remove the "dependency" on the Range. But the behavior is also the same even if the UDF's argument is typed as double (instead of leaving it as a Variant like in my example).
As you explained, deleting a column will force UDFs to be recalculated. This makes sense because a UDF can depend on the Range argument. Wether this is a smart design for a UDF is a different matter.
Related
I hear so many good things about the FILTER function in Excel, but I can't seem to get on with it. More often than not I end up with a #VALUE error. I'll usually get to the bottom of it, but this one has stumped me.
This is the formula:
=FILTER(Sheet1[#All],IFERROR((Sheet1[[#All],[Account]]=$D$1)*(Table2[[#All],[Duplicate 21711000]]=0),0))
I've put the IFERROR in because I thought perhaps a bunch of #N/A in the results could be causing an issue.
I've evaluated the Formula, and on the array on the Include side, everything seems to be working perfectly. The Boolean logic has worked, the 1s and 0s are in the right place to theoretically filter my list down to the few records I want to see. But when I click evaluate the final time, and it applies those 1s and 0s to the rows in the array, it results in the #VALUE error, and that's obviously all I can see in the cell.
Any ideas where I'm going wrong?
By the way, the point of this is to help a client who regularly pulls off a report from her accounting system, and then manually filters, copies and pastes the lines for new transactions onto separate tabs on her spreadsheet, over about 30 accounts. My plan is to use Power Query to bring the report into a tab in her spreadsheet, then use the Filter function to bring across the appropriate rows from the Power Query Table into each tab. That's the first criteria here.
I've created another table next to the PowerQuery table which uses a Count If to work out if that line has already been imported. So the second part of the criteria filters the list if the CountIf has registered a duplicate.
I can't just use PowerQuery everywhere, because she needs to then be able to add more columns and manipulate the data once it's been imported. So the final part of the job is to create a macro to copy > paste values, obviously copying the filter function into the row below first, ready for next month.
So Power Query imports all the data, Filter function extracts the specific data into each tab, and Macro locks in that data ready to be edited if need be.
If anyone can think of a better way of achieving the same result, I'm all ears.
So thank you #JvdV, the simple answer was indeed that the two tables were of different sizes, and FILTER requires for them to be the same size.
This caused a follow up issue though that in order to count if I had any duplicates in the sheet, I couldn't go with plan A and use a separate table, because it wouldn't dynamically change it's height when the Power Query table was refreshed.
I looked into maybe incorporating the table into the Power Query in the first place with a custom column. But this would mean that the PowerQuery would end up with 39 additional columns, which would all be brought across by each FILTER function.
So instead I did this. I know this is very niche, so probably won't help anyone, but I'm very proud of myself and want to post it somewhere:
=FILTER(Sheet1[#All],IFERROR((Sheet1[[#All],[Account]]=$D$1)*IF((COUNTIF('21711000 (125006) Int rec'!$C:$C,Sheet1[[#All],[Document Number]]))=0,1,0),0))
(I probably don't need the IFERROR anymore, but it works, so I'm not going to touch it, and it might be safe just in case there ever are any errors).
So this uses the fact that Excel spills now rather than needing specific array functions. Before I made COUNTIF into a separate column, but now I'm putting it in here, and it naturally spills in the background, essentially creating that column within the formula. Then the IF changes any 0 to a TRUE (1), and any other number (my duplicates) to a FALSE (0), and everything works.
It actually creates a circular reference, because the Filter is in column C, but I think that's okay for my purposes, because I don't want it to include the results from the filter anyway. I suspected that might happen, but it seems to be working so far.
I hope that can help someone.
Question
Is it possible to efficiently simulate the result of application.calculate from VBA, but ignoring volatile functions?
Detail
Definition: Non-Volatile Equivalent: For any volatile workbook, define the non-volatile equivalent as the workbook whereby all cells with volatile references have been replaced by their raw values.
Definition: Non-Volatile Calculation Diffs: Suppose we have any workbook. Now suppose we took its non-volatile equivalent and did a full recalc on that. Take the cells whose values changed in the non-volatile equivalent, and their values. These are the non-volatile calculation diffs.
The question is asking this. For any workbook, with potential volatile references in it, is there a way to efficiently apply just the non-volatile diffs? Efficiency is key here- it's the whole reason we're looking to ignore volatile functions - see the use case below. So any solution that does not outperform a full recalc is useless.
Use Case
We have a workbook that is rife with INDIRECT usage. INDIRECT is an Excel-native volatile function. There are around 300,000 of these in the workbook, which contains about 1.5 million used cells in total. We do not have capacity to change these. As a result of these INDIRECT usages, our Workbook takes a long time to recalculate. Circa 10 seconds. So we have automatic calculation turned off & users periodically hit manually re-calc to refresh data as they go.
Our business users are fine with that, but some of the new VBA functions we are adding could use something more efficient than a 10 second wait.
What We've Tried so Far
Sheet.Calculate - this is handy in some cases. And we do use it. But it treats the data in all other sheets as values, not formulas. So if there are any zig-zag references this does not provide a fully consistent result. E.g. imagine a reference from sheet A -> sheet B -> sheet A: then one would need to calculate sheet A, then B, then A. The number of zig-zags is arbitrary. And that's just one case, with two sheets. To solve this, one would need to essentially rewrite the entire calculation of Excel.
There's one thing that comes to my mind and it's not ideal. Perhaps you can make your own function that's not volatile and is stored in the spreadsheet, that would do exactly what INDIRECT does (or whatever formulas you use). You could call it INDIRECT2 or something, and just replace them functions on your spreadsheet.
Unfortunately, this function is not availible for Application.WorksheetFunction.functionName but there are ways around it. This is just my general idea.
Use a non-volatile User Defined Function that does the same job as INDIRECT?
Public Function INDIRECT_NV(ByVal ref_text As String, Optional ByVal a1 As Boolean) As Variant
'Non-volatile INDIRECT function
'Does not accept R1C1 notation - the optional is purely for quick replacement
INDIRECT_NV = CVErr(xlErrRef) 'Defaults to an error message
On Error Resume Next 'If the next line fails, just output the error
Set INDIRECT_NV = Range(ref_text) 'Does the work of INDIRECT, but not for R1C1
End Function
Alternatively, depending on the use case, it may be possible to use INDEX and MATCH (non-volatile functions) to replace the INDIRECT queries instead
I often find myself writing Excel formulas that have something like this in the formula:
= IF(<long expression>=<some condition>,<long expression>,0)
Is there any way to accomplish this without needing to type out <long expression> twice (and also without using helper cells)?
Ideally, something that works similar to IFERROR, i.e.
= IFERROR(<some expression>,0)
This checks if <some expression> would return any type of error, and if it doesn't, it automatically returns <some expression> (without needing to again explicitly type it out a second time).
Is there an Excel function (or combination of Excel functions) similar to IFERROR but instead of checking an error condition, it checks a general (user-defined) condition based on the formula?
When it comes to formula efficiency and calculation speed, using helper cells can be of great value, even if they may initially muck up the spreadsheet design.
Put calculations into a helper cell and refer to the helper cell in the IF statement. That way the calculation will only happen once.
This method is preferred by spreadsheet auditors over the alternative of packing everything into one formula, because it is also much easier to follow and pick apart.
With careful spreadsheet planning you can house helper cells in a different (hidden) sheet or in columns that you hide to tidy up the design.
This answer applies to Excel 365 as of March 2020. A new function is now available in Insider builds of Excel. It is called LET() and is used to define, and assign a value to, a variable that can then be used multiple times inside the braces of the LET() function.
Example:
=LET(MyResult,XLOOKUP(C1,A1:A3,B1:B3),IF(MyResult=0,"",MyResult))
The first parameter is the name of a new variable, MyResult. The variable is assigned a value with the second parameter. This can be a constant, like a number or a text, or like in this case, a formula.
The third parameter is a calculation that can use the variable value.
In the following screenshot, the Xlookup returns a 0 because the found cell is empty.
In the next formula down, the Xlookup is wrapped in an IF statement, evaluated, and then repeated. This is the approach where the calculation is duplicated, as described in the question.
The third formula shows the LET() function and its result.
I am making a payroll program in Excel and one of my concerns is that the salaries of the employees are searched using the INDEX and MATCH or VLOOKUP function. The problem is if the salaries get updated in the future (e.g. a raise or changes in rates), all the previous entries that used the old salaries will be updated to the new salaries. This is a disaster and would make my entire program useless and inefficient. Therefore I need to automatically lock previous calculated cells after a certain time.
Edit: Note we do not want to do this manually such as copy pasting values only because almost all cells are connected to each other and one mistake by the encoder or if they forget to do this before updating a value, everything will be messed up.
No! Not copying and pasting, there's a simpler way. You want to convert the Formula property of a given cell (what's shown in the formula bar in Excel) into the Value property of the cell (what's shown in the cell on the spreadsheet). For a given range A1:B6 this would done by the statement
Range("A1:B6").formula = Range("A1:B6").value
But there's a quirk in Excel that you can run faster by accessing a Value2 property, so
Range("A1:B6").formula = Range("A1:B6").value2
The rest of the code is left as an exercise for the reader :-)
This is maybe a little advanced topic.
When you press F9 or recalculate the sheets, Excel tries first to find out what cells just might had a chance to be changed, so it can skip evaluating them at all. The algorithm searches all ancestors and ancestors' ancestors to find out if value of any of them has changed. If so, it proceeds to actual evaluation.
That would be, at least, as long as there's no "INDIRECT()" formula in the path. If there is, the algorithm assumes that the value of this formula is volatile (i.e. it always changes) and so evaluates all the descendants.
There are more volatile formulas: RAND, AREAS, CELL, COLUMNS, INDEX, INDIRECT, NOW, OFFSET, ROWS, TODAY. Some of them obviously should be volatile (like RAND()).
The question: is there any way of telling Excel, that a given cell that is treated volatile by Excel should in fact be kept as frozen, unless its ancestors change?
One way of resolving the problem is to write my own versions of Excel volatile functions in VBA. VBA functions are assumed not volatile.
The problem is that there is a relative high cost of VBA invocation. Another is the need of 'reinventing the wheel'. I hope there is a cheaper solution.
Not the answer you want, but...
Don't use Formulas.
A user defined formula won't help either because of the limitations of what you can do inside them.
Recommend as previous comment suggested moving calculation into a pattern similiar to...
1. pull all sheet data into a 2D array.
2. perform X transformations on data.
3. push data back into an equaly sized sheet address, to the 2D array.