Excel: list ranges targeted by INDIRECT formulas - excel

We have a few very large Excel workbooks (dozens of tabs, over a MB each, very complex calculations) with many dozens, perhaps hundreds of formulas that use the dreaded INDIRECT function. These formulas are spread out throughout the workbook, and target several tables of data to look-up for values.
Now I need to move the ranges of data that are targeted by these formulas to a different location in the same workbook.
(The reason is not particularly relevant, but interesting on its own. We need to run these things in Excel Calculation Services and the latency hit of loading each of the rather large tables one at a time proved to be unacceptably high. We are moving the tables in a contiguous range so we can load them all in one shot.)
Is there any way to locate all the INDIRECT formulas that currently refer to the tables we want to move?
I don't need to do this on-line. I'll happily take something that takes 4 hours to run as long as it is reliable.
Be aware that the .Precedent, .Dependent, etc methods only track direct formulas.
(Also, rewriting the spreadsheets in whatever is not an option for us).
Thanks!

You could iterate over the entire Workbook using vba (i've included the code from #PabloG and #euro-micelli ):
Sub iterateOverWorkbook()
For Each i In ThisWorkbook.Worksheets
Set rRng = i.UsedRange
For Each j In rRng
If (Not IsEmpty(j)) Then
If (j.HasFormula) Then
If InStr(oCell.Formula, "INDIRECT") Then
j.Value = Replace(j.Formula, "INDIRECT(D4)", "INDIRECT(C4)")
End If
End If
End If
Next j
Next i
End Sub
This example substitues every occurrence of "indirect(D4)" with "indirect(C4)". You can easily swap the replace-function with something more sophisticated, if you have more complicated indirect-functions. Performance is not that bad, even for bigger Workbooks.

Q: "Is there any way to locate all the INDIRECT formulas that currently refer to the tables we want to move?"
As I read it, you want to look inside the arguments of INDIRECT for references to areas of interest.
OTTOMH I'd write VBA to use a regular expression parser, or even a simple INSTR to find INDIRECT( read forward to the matching ), then EVALUATE() the string inside to convert it to the actual address, repeat as required for multiple INDIRECT(...) calls and dump the formula and its translation to two columns in a sheet.

You can use something like this in VBA:
Sub ListIndirectRef()
Dim rRng As Range
Dim oSh As Worksheet
Dim oCell As Range
For Each oSh In ThisWorkbook.Worksheets
Set rRng = oSh.UsedRange
For Each oCell In rRng
If InStr(oCell.Formula, "INDIRECT") Then
Debug.Print oCell.Address, oCell.Formula
End If
Next
Next
End Sub
Instead of Debug.Print you can add code to suit your taste

Unfortunately, the arguments of
INDIRECT are usually more complex than
that. Here's an actual formula from
one of the sheets, not the most
complex formula we have:
=IF(INDIRECT("'"&$B$5&"'!"&$O5&"1")="","",INDIRECT("'"&$B$5&"'!"&$O5&"1"))
hm, you could write a simple parser by ignoring most of the characters and just looking for the relevant parts (in this example: "A..Z", "0..9" and "!:" etc.) but you will run into troubles if the arguments in "indirect" are functions.
maybe the safer approach would be to print every occurence of "indirect" in a third sheet. you could then add the desired output and write a small search and replace program to write your changes back.
If you "get" every cell in a huge
spreadsheet you might end up needing
monstrous amounts of memory. I am
still willing to try and take that
risk.
PabloG's method of selecting the used range is the way to go (added it into my original code). The speed is pretty good, especially if you check whether the current cell contains a formula. Obviously, this all depends on the size of your workbook.

I'm not sure what the etiquette of SO is concerning mention of products with which the writer is connected, but OAK, the Operis Analysis Kit, an Excel add-in, can replace the INDIRECT functions by the cell references they resolve to. You can then use Excel's audit tools to determine what dependents each range has.
You would, of course, do this to a temporary copy of the workbook.
More at
http://www.operisanalysiskit.com/oakpruning.htm
http://www.operisanalysiskit.com/help/2007/index.html?oakconceptpruning.htm
Given the age of this question you may well have found an alternative solution or workaround.

Related

Faster method than VLOOKUP to compare two data sets in excel using VBA, 1 has 180,000 items the other 250,000

I am automating a report build in Excel using VBA. Part of that process I use vlookup to compare the lists. Tab 1 contains roughly 180,000 line items with the unique ID, the vlookup takes that ID and compares against "owners" in tab 2 with roughly 250,000 line items. Run time on this operation is roughly 25-30mins and I'm wondering if there is a faster way? Maybe I should perform this comparison using a script outside of Excel to reduce calculation time?
It's working fine so I haven't tried to troubleshoot. I have a few ideas around doing the work outside of excel, in the background but looking for ideas from the broader group.
Here is the line I'm using to perform the lookup now, it's repeated 5x in code.
Range("Table").Offset(1).Select
ActiveCell.FormulaR1C1 = "=IFNA(VLOOKUP([#ID],table,2,0),""Unassigned"")"
With each iteration of the above line in the workbook recalculates which is taking the 30mins. I have tried setting calculation to xlManual then back to xlAutomatic, no luck. Was thinking I could just run a single worksheet calc after the formulas where written.
Curious if anyone knows of a faster way to accomplish this. As I said the run time is 30mins for this section, and the total run time is 35-40mins.
If you can SORT your data, you can build a double VLOOKUP with the range_lookup parameter set to TRUE. This causes VLOOKUP to do a binary search which, on a large DB, may run 100x faster:
=IF(VLOOKUP(ID,Table,1,TRUE)=ID,VLOOKUP(ID,Table,2,TRUE),NA())
And if you are using the VLOOKUP method, you should be sure to turn off ScreenUpdating and also set Calculation to manual while you are populating the worksheet with the formulas.
Alternatively, it might be faster to just read the data into a VBA array or dictionary, and do all your lookup and matching within VBA. Again, if you can sort your list, you can use a binary search which will be much faster.
Maybe try to convert the result of your VLOOKUP formula into value after each iteration, something like that:
Sub foo()
Dim rngCell As Range
For Each rngCell In Range("Table").Offset(1)
rngCell.FormulaR1C1 = "=IFNA(VLOOKUP([#ID],table,2,0),""Unassigned"")"
rngCell.Value = rngCell.Value
Next rngCell
End Sub
This should prevent it from recalculating your VLOOKUP results.
Alternatively, use INDEX+MATCH combination, or - if your dataset is sorted - use VLOOKUP with match mode TRUE (approximate) instead of FALSE (exact).

Application Calculate from VBA - Ignore Volatile Functions

Question
Is it possible to efficiently simulate the result of application.calculate from VBA, but ignoring volatile functions?
Detail
Definition: Non-Volatile Equivalent: For any volatile workbook, define the non-volatile equivalent as the workbook whereby all cells with volatile references have been replaced by their raw values.
Definition: Non-Volatile Calculation Diffs: Suppose we have any workbook. Now suppose we took its non-volatile equivalent and did a full recalc on that. Take the cells whose values changed in the non-volatile equivalent, and their values. These are the non-volatile calculation diffs.
The question is asking this. For any workbook, with potential volatile references in it, is there a way to efficiently apply just the non-volatile diffs? Efficiency is key here- it's the whole reason we're looking to ignore volatile functions - see the use case below. So any solution that does not outperform a full recalc is useless.
Use Case
We have a workbook that is rife with INDIRECT usage. INDIRECT is an Excel-native volatile function. There are around 300,000 of these in the workbook, which contains about 1.5 million used cells in total. We do not have capacity to change these. As a result of these INDIRECT usages, our Workbook takes a long time to recalculate. Circa 10 seconds. So we have automatic calculation turned off & users periodically hit manually re-calc to refresh data as they go.
Our business users are fine with that, but some of the new VBA functions we are adding could use something more efficient than a 10 second wait.
What We've Tried so Far
Sheet.Calculate - this is handy in some cases. And we do use it. But it treats the data in all other sheets as values, not formulas. So if there are any zig-zag references this does not provide a fully consistent result. E.g. imagine a reference from sheet A -> sheet B -> sheet A: then one would need to calculate sheet A, then B, then A. The number of zig-zags is arbitrary. And that's just one case, with two sheets. To solve this, one would need to essentially rewrite the entire calculation of Excel.
There's one thing that comes to my mind and it's not ideal. Perhaps you can make your own function that's not volatile and is stored in the spreadsheet, that would do exactly what INDIRECT does (or whatever formulas you use). You could call it INDIRECT2 or something, and just replace them functions on your spreadsheet.
Unfortunately, this function is not availible for Application.WorksheetFunction.functionName but there are ways around it. This is just my general idea.
Use a non-volatile User Defined Function that does the same job as INDIRECT?
Public Function INDIRECT_NV(ByVal ref_text As String, Optional ByVal a1 As Boolean) As Variant
'Non-volatile INDIRECT function
'Does not accept R1C1 notation - the optional is purely for quick replacement
INDIRECT_NV = CVErr(xlErrRef) 'Defaults to an error message
On Error Resume Next 'If the next line fails, just output the error
Set INDIRECT_NV = Range(ref_text) 'Does the work of INDIRECT, but not for R1C1
End Function
Alternatively, depending on the use case, it may be possible to use INDEX and MATCH (non-volatile functions) to replace the INDIRECT queries instead

Get Current Region with Office-JS

How do I get the current region surrounding the ActiveCell using the Excel JS API?
In VBA this is
Set rng=ActiveCell.CurrentRegion
The current region property in the JavaScript API has now been implemented. The property is called getSurroundingRegion()
There is no direct equivalent, but we do have a range.getUsedRange() that will take an existing range and give you a smaller range that represents the non-empty portions. Note that this method will throw a not-found error if there is nothing in the entire range (since effectively it's an empty range, which Excel can't express).
If you really need the CurrentRegion scenario (and I'd be curious to learn more), you could first get the used range (to ensure you're not loading too much data), then load the values property, and then do range.getExpandedRange(indexOfLastRow, indexOfLastColumn).
BTW, unlike VBA's usedRange, the JS "getUsedRange()" always creates an accurate snapshot of the current used range (the VBA one could get stale), and we're exposing it not just on the worksheet but also on a given range.
Update
What I mean is that there are a couple of scenario, one simpler, the other harder.
The simpler one: you know roughly what range you need, but you just need to trim it. For example, you know you have a table-like entity in columns A:C, but you don't know the row count. That's where
worksheet.getRange("A:C").getUsedRange()
would get you what you need.
The harder one: you use getUsedRange() to trim down what you can, but you then load range.values and manually do a search for rows and columns where each cell is empty (""). Once you have that (suppose you found that the relative row index you care about is 5, and column index 2), you could do
originalRange.getCell(0, 0).getExpandedRange(rowIndex, columnIndex)
Concrete example for the above: You have data in A2:C7, though the getUsedRange() of the worksheet is much larger (and hence my suggestion could try to trim it down further by doing a range.getUsedRange()). But for this case, let's imagine that getUsedRange on a worksheet returned a range corresponding to A1:Z100. worksheet.getRange(0, 0) would get you the first cell, which you can then expand by 5 rows and 2 columns (which you find through simple albeit tedious array iteration) to get the range you care about. Makes sense?

Combining multiple non contiguous ranges into one named range to clean up formula

I have a formula that I'm using which works pretty well, but I would like to clean it up.
=SUMIF(IB02R, A56, IB02P)+SUMIF(IB03R, A56, IB03P)+SUMIF(IB04R, A56, IB04P)
IB02R,IB03R,IB04R & IB02P,IB03P,IB04P are ranges that I defined in the name manager. They look at entire rows.
IB02R looks at B4-ND4 of another sheet
IB02P looks at B5-ND5 of another sheet
and so on.
Here is the original formula:
=SUMIF('2014'!$A$4:$ND$4,A75,'2014'!$A$5:$ND$5)+SUMIF('2014'!$A$7:$ND$7,A75,'2014'!$A$8:$ND$8)+SUMIF('2014'!$A$10:$ND$10,A75,'2014'!$A$11:$ND$11)
I would like to simplify this to combine all R's and P's so instead of having 3 sumif statements I could just have =SUMIF(IB0234R, A56, IB0234P). With IB0234R and IB0234P being the ranges contained within IB02R and so forth.
The formula is looks for a match between a particular cell and every cell in 3 different rows. Rows 4,7, and 10.
If there is a match anywhere within those rows it sums up the corresponding values in Rows 5,8, and 11 respectively.
Both of my formulas work, but I would like to simplify for sake of readability and clarity.
Is this possible? I've tried a few different ways to no success.
Here is your formula:
=SUMPRODUCT((CHOOSE({1;2;3},'2014'!$A$4:$ND$4,'2014'!$A$7:$ND$7,'2014'!$A$10:$ND$10)=A56)*CHOOSE({1;2;3},'2014'!$A$5:$ND$5,'2014'!$A$8:$ND$8,'2014'!$A$11:$ND$11))
This will take a little bit of fancy named-range-fu to make it readable, but it's definitely doable.
First, combine the "first row" of each section together into another named range, like so:
=CHOOSE({1;2;3},IB02R,IB03R,IB04R) - We'll call this IB00R
Now do the same with the "second row":
=CHOOSE({1;2;3},IB02P,IB03P,IB04P) - We'll call this IB00P
Now the formula becomes:
=SUMPRODUCT((IB00R=A56)*IB00P)
To understand exactly how the formula is working, I suggest clicking Evaluate Formula on the Formulas tab, and stepping through it, and stepping in and out of your named ranges.
EDIT: Ok now I'm doubting myself - not sure if this is working correctly. I know it will work because I've done it before, but the formula below may not be quite right. I'll figure it out in a bit.
EDIT 2: As written, this doesn't work. However, there is a way around the problem, but I can't remember what it is. Still fiddling with it. If I can't figure it out I'll delete this answer.
EDIT 3: Working now. I forgot, in order to combine non contiguous rows you have to use CHOOSE() instead of INDEX() Sorry for the false start.
FYI, I ABSOLUTELY like Rick's answer a whole lot more, but I'll admit that what he did was new to me (which is why I love this site!!), so I only knew how to do this using VBA.
With VBA, the function you could use to do this would be as follows:
Function Disjoined_SumIf(CriteriaRange As Range, Criteria As Range, SumRange As Range) As Double
Dim ar As Integer
Dim cl As Integer
For ar = 1 To CriteriaRange.Areas.Count
For cl = 1 To CriteriaRange.Areas(ar).Cells.Count
If CriteriaRange.Areas(ar).Cells(cl).Value = Criteria.Value Then Disjoined_SumIf = Disjoined_SumIf + SumRange.Areas(ar).Cells(cl).Value
Next cl
Next ar
End Function
And you would use it in your spreadsheet as =Disjoined_SumIf(IB0234R, A56, IB0234P)
This function will work based upon the following assumptions:
The criteria range is split up the same way the sum range is (which in your case it is)
It doesn't take into account things such as non-numeric data in your sum-range which would blow up the function
It's a quick and dirty solution that could be built upon to make it more robust, but it would work!!
So, again, I'd definitely go with Rick's solution, but I'm adding this for completeness' sake.

Is there any way to speed up excel comparision in VBScript?

I first made a VBA script to compare two excel files. Then optimized it using Variant as said in this question. But then, I changed it to VBScript later. Here the method said above doesn't seem to work.
Are there any other better ways to speed up the process? Especially for large files.
My core code is as follows:-
For Each cell In objxlWorksheet1.UsedRange
If cell.Value <> objxlWorksheet2.Range(cell.Address).Value Then
'fill the color in the cell if there is a mismatch and Increment the counter
objxlWorksheet2.Range(cell.Address).Interior.ColorIndex = 3
counter=counter+1
End If
Next
It depends on what it is that you are comparing. If you have two sheets with similar tables of data it would be easier to use formulas instead of VBA code. Just create a new worksheet and enter a formula like this: =Sheet1!A1=Sheet2!A1 Then you can use Ctrl-Find to search for False
Or if you can copy the data on one sheet side-by-side, you can use conditional formatting to highlight values that are different.

Resources