Working with huge Excel content - presetting dependency chain - excel

Am trying to build a large excel solution. The Excel file (200MB xlsb) has a sheet with 300,000 rows X 100 columns, containing formulae with long chain of dependencies.
Have incorporated most Excel performance tips. No a single volatile function. No array formulae. Eliminated forward references (cell formulae always refer only to previous rows and/or cells on the left in same row and/or sheets prior in alphabetic order, no external links).
Am using 64-bit Excel 2013 on 2.2GHz 4GB system, have disabled AutoSave (huge file), disabled multi-threading (single dependency chain). For those wondering, used optimized VBA code to create 30 million formulae of excel code.
The file takes some minutes to open. Smart recalc (F9) works great <1s on small changes. Full Calculation using (Ctrl+Alt+F9) takes 30s. Full Calculation Rebuild using (Shft+Ctrl+Alt+F9) takes 3mins to build dependency & calculation sequence (noticed as cursor busy), and 30s for Calculation (status bar shows Calculating). File close with/without Save takes longer than File Open. Deleting all these formulae takes forever (cancelled after >1hr). AutoFilter is extremely slow (file unresponsive). Many other operations have become slow.
Can we tell Excel to not bother identifying dependency tree and work left-to-right top-to-bottom alphabetic worksheets.
Given Calc speed is already optimized, Are there additional ways to improve Response Rates for excel operations like Autofilter, insert/delete row/col, etc.
To replicate: Range("B2:CW300001").formula="=A2+len(F1)"
Any guidance will be greatly helpful.

I finally found that, yes, dependency chain can be preprogrammed, by self-creating calcChain.xml in the xml of excel file. However, Excel will automatically modify it if the excel file creates a backward dependency in the chain, or Ctrl+Alt+Shift+F9 is called.
http://phincampbell.com/Improving%20Excel's%20Calculation%20Performance%20using%20Calculation%20Chain%20and%20Dependency%20Tree%20Data%20from%20calcTree.xml.html

Related

How to run large amounts of STOCKHISTORY function without file crashing? Perhaps Font, # of workbooks, # of file references?

So I'm using the new STOCKHISTORY function in excel, however, I need to run around 1700-2100 separate calculations. Every time I start building the excel file about halfway through it gets to a point where the file crashes after I do almost anything. The total number of cells used would be in the 950,000-1,050,000 range (Not all are STOCKHISTORY, some are regular formulas). On my first try the file got to 20mb but then I couldn't even open it. So I tried a second time and took some advice to shrink the file size but now that I'm halfway through creating it the file crashes after I do almost anything. It's currently only 5mb though. This could be a problem with the fact that I'm on the Insider Fast program which could mean it's unoptimized or it could be part in fact that I querying 1700-2100 separate STOCKHISTORY functions. Either way I'm looking for some ideas to shrink the file size and/or make the process faster.
I've done the following
Pasted the formulas into a new Excel file to get rid of excess blank rows & columns
When pasting got rid of borders
Limited cell references when the value could be static
Chopped up the functions into 7 workbooks
Made all the fonts the same and size of font the same.
I've got a whole bunch of scenarios that I'm unsure would be helpful or not. For example is there a specific font/font-size that takes less file space and/or loading speed. Is having all the formulas and info in 1 workbook compared to 5 or 10 workbooks better for speed? What about if instead of having all the functions and formulas in one file I chopped up the information into 2-5 different files and made a master file. Does that alleviate speed and/or usability? I could theoretically cut down the total number of STOCKHISTORY functions to only 1000-1200 but that wouldn't be super ideal. What about specific file types like .xlsb, .xlsx or .xlsm. Which of those (or others) would be able to run the calculations faster or have a smaller file size? What about splitting formulas into multiple cells compared to one big formulas, does that have an effect?

Count total mentions of multiple values from closed workbook

I am using this formula to pull data from another workbook. This is repeated hundreds of times to count how many times a particular domain (there are hundreds of domains) has been used in the other sheet.
=SUMPRODUCT(COUNTIF(INDIRECT({"'E:\[OtherSheet.xlsx]Sheet1'!A1:ZZ500";"'E:\[OtherSheet.xlsx]Sheet2'!A1:ZZ500";"'E:\[OtherSheet.xlsx]Sheet3'!A1:ZZ500";"'E:\[OtherSheet.xlsx]Sheet4'!A1:ZZ500";"'E:\[OtherSheet.xlsx]Sheet5'!A1:ZZ500";"'E:\[OtherSheet.xlsx]Sheet6'!A1:ZZ500";"'E:\[OtherSheet.xlsx]Sheet7'!A1:ZZ500";"'E:\[OtherSheet.xlsx]Sheet8'!A1:ZZ500";"'E:\[OtherSheet.xlsx]Sheet9'!A1:ZZ500"}),D2))
This counts the number of times "D2" appears in the other workbook (across nine different sheets). This code is copied for "D3" "D4" and so on, hundreds of times.
There are a number of problems.
1) The other workbook has to be open, otherwise the data will not update (even with the exact path, INDIRECT will not allow updates).
2) If both sheets are open it updates every time you do anything, even copying and pasting cells, or inserting new cells. "Manual" formula updates are not able to be used as there are other dynamic variables besides this one (that don't slow down Excel) that need to be recalculated often.
3) From what I understand, INDIRECT and other related functions are single-threaded, which, even with a i7-8700k processor and 32gb RAM, with 64-bit Excel, makes the recalculation insanely slow.
4) I tried lowering the "scanned area" dramatically, but it doesn't speed it up. ZZ10000 vs ZZ500 doesn't make a difference, they're both equally slow.
A workaround is to keep one sheet open, update it, then open both when I need to have an 'overall' view. If one sheet is closed, Excel works fine as it is not having to constantly recalculate. Preferably I'd like to keep both sheets open without such a dramatic slow down.
From dozens of hours of research, I've sort of come to the conclusion that it is not possible to do this a fast way without using VBA. But I can't find any VBA function to replicate the code above.
I'm open to non-VBA based solutions as well.
You could use python's pandas module to solve this issue. It is much easier to work with using pandas's dataframe and you have no issue accessing a workbook even if it is closed. Just my two cents worth.

Providing a list of files in Excel of a directory is slowing down excel

I need to provide a current list of files in a directory in an Excel workbook and everything is working as required, just too slowly. I really only need the list to check it is current once upon opening the workbook. It takes around 11 seconds to do this which is acceptable but the problem is it keeps rechecking this every time I carry out even minor edits to the workbook (I guess due to the fact that it is brought in as an Excel table). I determined the lag in my workbook using the rangetimer() function that is provided and it is the only thing taking a long time to calculate. I should also state that the table containing the list of files is finally used in a cell on another worksheet to provide a data validation drop-down list but don't believe this is really the issue.
I did some Googling on reducing Excel calculation times and discovered that there are some Excel functions that are definitely culprits for increasing calculation times (described as volatile) and three of these (NOW,INDEX and ROW) are used in providing the functionality I would like in this part of the workbook.
I have tried two solutions so far:
1. Force Full Calculation set to True in VBA properties window
2. Switched calculations to manual. I set this back to automatic once I identified that this part of the workbook was the issue as I don't want manual calculation generally.
The formula I have in the 'refers to' box of the named range (TutorFileList) is:
'''=FILES("\O008DC01\Shared\Tutor Qualifications*")&T(NOW())'''
The formula I have in each cell of the excel table is:
'''=IFERROR(INDEX(TutorFileList,ROW()-1),"")'''
What I would like to have is the ~11secs of calculated time to find these files reduced down to just one check of the networked directory rather than it taking 11secs of automatic recalculation every time the workbook is modified.
If there is a more efficient way to achieve what I am doing I am prepared to redesign things but I do need the functionality of a drop-down list of files in the specific directory in a cell.
Many thanks for assistance from anyone on this.
I have resolved my issue by reducing the number of rows back to around 200 instead of 500 rows. This brings the calculation lag back to about a second which I can live with.

Excel - File optimisations

I'm working with a rather large Excel document (~9MB) in size which has over 60 sheets and each containing many CUBEVALUE formulas in it.
This document takes over 2 minutes to open (not counting refreshing of values) and while i have read many recommendations e.g:
splitting of worksheets (not possible due to the nature of this file)
shorter formulas, (not possible)
tested on both 32 and 64 bit (performance is not notably different)
I was wondering if you guys came across any ways of optimising opening time for Excel without significantly altering the contents within in it, or any further suggestions.
Save it as an Excel Binary Workbook (.xlsb). You can retain macros, the filesize will be 25-35% of the original and many operations (not just opening/closing) will be faster.
Get rid of any volatile functions that are recalculating the worksheet unnecessarily. INDIRECT, OFFSET, ADDRESS, TODAY and NOW are among the list of volatile functions. Most can be replaced with non-volatile alternatives.
Improve the remaining calculation of the workbook by making worksheet formulas and functions more efficient. Help on this is available at Code Review - Excel, a StackExchange partner. No examples supplied so no specific help offered.
Improve any sub procedure cod run times at the same site. Large blocks should be processed 'in-memory' with arrays, not looped through cell-by-cell, etc. Again, no examples supplied so no specific help offered.
If you use corporate network try first downloading the file to you local computer and then opening.
It may also depend on existence of links to other files, try to reduce their number to minimum if there are any.
Nontheless, the volume of data in your file - is the main driver of opening time.

VBA on Excel only use one processor, how can I use more?

VBA is known to work on only one processor at the same time, so when I run a macro, Excel only uses 50% of CPU instead of totally (dual-core).
Is there a workaround to make VBA use both processors ?
Thanks.
The answer is no unless you split the workload across multiple Excel instances and recombine the results at the end. But the vast majority of cases of slow VBA execution can be speeded up by orders of magnitude:
switch off screen updating
set calculation to manual
read and write excel data using large ranges and variant arrays rather than cell-by-cell
look closely at the coding in VBA UDFs and bypass the VBE refresh bug
minimise the number of calls to the Excel object model
You could split the problem up across multiple instances of Excel.Applications, that way you could do half in one copy of Excel and half in the other.
I've seen someone doing simulation work in Excel VBA using 8 cores. A bit of a pain to bring the results back together at the end, but it worked.
In Excel 2010
Go to File --> Options --> Advanced.
Scroll down until you see the section for Formulas and you can set the number of processors to be used when calculating formulas.
This may not have an effect on VBA but it will work for running a formula down an entire column. Its a big time saver when working with large workbooks. Additionally you can go to task manager and right click on excel and set the priority to high so that your machine focuses its energy there.
** This option does not appear to be available for Mac **

Resources