Is it possible to show the sequence in which formulas are processed in a Excel-sheet with many complex formulas?
In my case this is important because the formulas have dependencies to each other and I need to find out the sequence in which the formulas could be processed one after another.
Thanks in advance.
Excel dynamically adjusts the calculation sequence in such a way as to resolve the dependencies, and normally only recalculates the subset of formulas that are dependent on volatile or changed cells. In Excel 2007 and later this sequence is calculated in a multi-threaded way.
So in practical terms trying to determine the calculation sequence in a large complex workbook is not a good idea.
For more information see http://www.decisionmodels.com/calcsecretsc.htm and associated pages
Excel calculation order within formula
Excel calculates in a specific way within each formula. From the Microsoft documentation it will process from left to right but will process brackets (Aka parentheses) first. Excel evaluates in order of BODMAS
Excel calculation order between formula
Excel will determine which formula are independent of others and calculate those first. This includes formulae referring to cells that only contain values or forumulae that reference constants or volatile functions.
Finding the processing sequence
For starters, this is also known as tracing dependencies (and precedents). In Excel, in the formula tab on the ribbon you can access the audit functions that allow you to trace dependents or precedents. These will point you at the formulae that will be calculated before or after a given cell. This gives you the ability to step through the order so that you can ensure it's correct. For more detail on this I recommend this Microsoft article
Related
Without VBA, I am trying to refer a range that starts at A2 and never ends. For example, if I want row 2 till row 5 i'd do
$A$2:$A$5
But what if I want the end to be open?
$A$2:??
Is this possible?
Depending on what's in A1 and what formula you're putting the reference into, you could simply use A:A. For example, if you wanted to sum all of the values in column A, but A1 contained a column title rather than a number, you could still write =SUM(A:A) and the title in A1 would just be ignored.
A2:A works in many formulas
hope that helps
If you want to refer to a range starting from A2 until max row (1048576 or 65536 for Excel prior to 2007), you can use this volatile formula... =OFFSET(A2,0,0,(COUNTBLANK(A:A)+COUNTA(A:A)-1),1) . Use formula as a defined range name or inside other formula which takes range as an argument (for eq SUM)...
Another option (in case your formula is in A1, so accessing A:A would create a circular reference) is:
OFFSET(A2, 0, 0, ROWS(A:A)-1)
This uses ROWS to count the total number of rows (without actually accessing the rows!), subtracts 1 (because we're starting with the second row), and uses this result as the height of a range created with OFFSET.
This is another option based on a formula, using the example locations in the OP's question:
=A2:INDEX(A:A,MAX(FILTER(ROW(A:A),IF(ISBLANK(A:A),0,1)=1)))
The components are the following:
=MAX(FILTER(ROW(A:A),IF(ISBLANK(A:A),0,1)=1))
which finds the number of the deepest row that is not blank, and
A2:INDEX(A:A,<expression 1 above>)
which relies on the expression above to make a bigger formula, which obtains a range starting from any location and ending at a location in the given column at the position obtained by this expression, 1.
This is an alternative to the others listed, and may be of interest as it differs from them in potentially substantial ways.
I can note the following characteristics:
It is not necessarily fast.
It seems to NOT be a volatile formula. This is important, as it means it won't necessarily be recalculated every time a calculation is made. However, I am not sure about the frequency of calculation, and don't fully understand its volatility status.
The uncertainty is related the use of the INDEX function (and, apparently, specifically after the : in a range). There are some resources that describe it.
INDIRECT and OFFSET functions are definitely volatile. There are a number of resources that describe performance implications of volatile functions, some of them mentioned in other SO answers. For example:
https://learn.microsoft.com/en-us/office/client-developer/excel/excel-recalculation
https://www.sumproduct.com/thought/volatile-functions-talk-dirty-to-me
http://www.decisionmodels.com/calcsecretsi.htm
https://chandoo.org/wp/handle-volatile-functions-like-they-are-dynamite/
It allows the user to not have to think about the data in certain cells (for example, A1, which may be meant to have a header, and not numbers).
It returns a range between the cell specified before the : and the last cell in the column that is non-blank. I think it should include non-numeric values in its consideration as well.
It shares some commonality in terms of the range it aims to identify with the answer by Kresimir L.: =OFFSET(A2,0,0,(COUNTBLANK(A:A)+COUNTA(A:A)-1),1).
To note: This answer applies to the version of Excel available as of the time of writing as part of Office 365 (and continually updated). However, the answer is based only on my own verification of its apparent correctness of my installation. I am not sure that all installations of Office 365 have the same software exactly; and I have the sense that some features may differ among different installations (even) of Office 365. I am not sure that this answer applies to everyone. Please test. I would appreciate feedback on your success with this approach.
This well covered in VBA as code below:
Range("A2", Range("A2").End(xlDown))
And if you want reach that in formula, it depends on the version number of your MS-Excel.
According to this reference number of all rows are in a sheet from Excel 2007 onwards are 1048576 that you can use bellow:
$A$2:$A$1048576
Because this range in formula is depended on Excels version, this may be different in future versions.
Finally, I suggest you use VBA.
When We write any formula, for specific function, we need to direct excel the specific location of that data. This location is referred as, Cell Reference. We know very well but how this useful and in which scenario it can be used ?
Cell references are way better than replicating a formula many times. It would be simple if you have a small spreadsheet, but what about thousands of complex computations? It is also better as it creates a more automated spreadsheet, such that when one value is changed other values automatically change.
I've been trying to find a list of the built in MS Excel functions that calculate "Whole Column" formula efficiently but haven't been successful, any ideas where I can go for this information? What i mean by this is exemplified below:
This documentation suggests that SUM and SUMIF formula automatically pick up on the last row of data, thus meaning that there is no efficiency reason why using a more restricted or dynamic range is preferable.
https://msdn.microsoft.com/en-us/library/office/ff726673(v=office.14).aspx#xlAllowExtraData
Answers for Excel 2003/7/10 are all welcome.
I think it would be fair to assume that ALL excel functions behave as teh article describes (ie the same as SUM and SUMIF). (I wasn't aware of this article, but it makes sense when you think about it...)
Behind the scenes the data in cells is stored in OO data structures such that only cells with non-default values and formatting will have been created.
It's highly probable that the value data and formatting data are help in separate containers.
So when Excel is using a formula on a range it is working on the data structures and consequently only works with the cells that have values.
I hope that a whole column with cells having different formatting (but very values) does not cause the SUM and SUMIF formulas to scan through every cell.
If in doubt you could do an experiment with the formulas you want to use.
The link you gave talked about formulas that behave differently and explicitly named VBA user created functions and array formulas - which makes sense.
Also, note that the article says using "Structured Table References" is the best approach. (ie not only storing your data in ranges but storing your data in excel tables. Created from a range using Ribbon:INSERT>Table).
These tables will allow any function to be used more efficiently as the range used is limited to the number of rows the table has.
I hope this helps.
Harvey
This is maybe a little advanced topic.
When you press F9 or recalculate the sheets, Excel tries first to find out what cells just might had a chance to be changed, so it can skip evaluating them at all. The algorithm searches all ancestors and ancestors' ancestors to find out if value of any of them has changed. If so, it proceeds to actual evaluation.
That would be, at least, as long as there's no "INDIRECT()" formula in the path. If there is, the algorithm assumes that the value of this formula is volatile (i.e. it always changes) and so evaluates all the descendants.
There are more volatile formulas: RAND, AREAS, CELL, COLUMNS, INDEX, INDIRECT, NOW, OFFSET, ROWS, TODAY. Some of them obviously should be volatile (like RAND()).
The question: is there any way of telling Excel, that a given cell that is treated volatile by Excel should in fact be kept as frozen, unless its ancestors change?
One way of resolving the problem is to write my own versions of Excel volatile functions in VBA. VBA functions are assumed not volatile.
The problem is that there is a relative high cost of VBA invocation. Another is the need of 'reinventing the wheel'. I hope there is a cheaper solution.
Not the answer you want, but...
Don't use Formulas.
A user defined formula won't help either because of the limitations of what you can do inside them.
Recommend as previous comment suggested moving calculation into a pattern similiar to...
1. pull all sheet data into a 2D array.
2. perform X transformations on data.
3. push data back into an equaly sized sheet address, to the 2D array.
we are looking to find the original? source code for SUMIFS to use in out excel sheet (for both 2003 and 2007. Here is why:
2003 doest support the SUMIFS method
When we do have SUMIFS we cannot utilize formulas "around" the columns (like YEAR())
For example, we want to calculate the ANSWERS that match the YEAR value of the date in cell A1 with the date values in range L:L. Now this doesnt work because we cant use YEAR(L:L) and hence we need to make another column M:M with the YEAR values from L:L
Thus we need the source to be able to upgrade the code further
=SUMIFS(ANSWERS;L:L;"="&YEAR(A1)) <= This works
=SUMIFS(ANSWERS;YEAR(L:L);"="&YEAR(A1)) <= This doesnt
Many thanks
With referenc to these questions:
Replacing SUMIFS in Excel 2003
VBA code for SUMIFS?
I doubt that you'll find the original code, which is, I imagine, C++ and part of the Excel internals. If that's correct, then it wouldn't help much, even if Microsoft gave it to you!
In general, I recommend against using =SUMIF(), =SUMIFS() and other functions that take a string to define the condition for testing: apart from anything else, I'm concerned that they're going to be slow, since my best guess is that internally they construct a string for evaluation for each value. In XL2007 (all I have available right now) at least, this turns out not to be necessarily true (see comments below).
I'm generally much happier with array functions. This, for example, should work in both Excel 2003 & 2007:
=SUMPRODUCT(--(YEAR(L:L)=YEAR(A1)),ANSWERS)
This gets the same answer:
{=SUM(IF(YEAR(L:L)=YEAR(A1),ANSWERS,0))}
In the latter case, you'd enter the formula without the curly braces ({ & }) and confirm it using Control+Shift+Enter to tell Excel it's an array formula.
In the first example, we build a list of boolean results with YEAR(L:L)=YEAR(A1) and convert it into an array of 1s and 0s using the double-negative. Then SUMPRODUCT takes care of the rest. This version requires that ANSWERS has the same dimension as L:L, i.e. it should be the entire column (or the range in L should be constrained in size).
In the second, Excel will run through each entry in L:L. If its year matches that in A1 then the corresponding ANSWERS value will be used, otherwise zero. This formula seems to be more tolerant of dimension differences but I'd still be careful.