Count total mentions of multiple values from closed workbook - excel

I am using this formula to pull data from another workbook. This is repeated hundreds of times to count how many times a particular domain (there are hundreds of domains) has been used in the other sheet.
=SUMPRODUCT(COUNTIF(INDIRECT({"'E:\[OtherSheet.xlsx]Sheet1'!A1:ZZ500";"'E:\[OtherSheet.xlsx]Sheet2'!A1:ZZ500";"'E:\[OtherSheet.xlsx]Sheet3'!A1:ZZ500";"'E:\[OtherSheet.xlsx]Sheet4'!A1:ZZ500";"'E:\[OtherSheet.xlsx]Sheet5'!A1:ZZ500";"'E:\[OtherSheet.xlsx]Sheet6'!A1:ZZ500";"'E:\[OtherSheet.xlsx]Sheet7'!A1:ZZ500";"'E:\[OtherSheet.xlsx]Sheet8'!A1:ZZ500";"'E:\[OtherSheet.xlsx]Sheet9'!A1:ZZ500"}),D2))
This counts the number of times "D2" appears in the other workbook (across nine different sheets). This code is copied for "D3" "D4" and so on, hundreds of times.
There are a number of problems.
1) The other workbook has to be open, otherwise the data will not update (even with the exact path, INDIRECT will not allow updates).
2) If both sheets are open it updates every time you do anything, even copying and pasting cells, or inserting new cells. "Manual" formula updates are not able to be used as there are other dynamic variables besides this one (that don't slow down Excel) that need to be recalculated often.
3) From what I understand, INDIRECT and other related functions are single-threaded, which, even with a i7-8700k processor and 32gb RAM, with 64-bit Excel, makes the recalculation insanely slow.
4) I tried lowering the "scanned area" dramatically, but it doesn't speed it up. ZZ10000 vs ZZ500 doesn't make a difference, they're both equally slow.
A workaround is to keep one sheet open, update it, then open both when I need to have an 'overall' view. If one sheet is closed, Excel works fine as it is not having to constantly recalculate. Preferably I'd like to keep both sheets open without such a dramatic slow down.
From dozens of hours of research, I've sort of come to the conclusion that it is not possible to do this a fast way without using VBA. But I can't find any VBA function to replicate the code above.
I'm open to non-VBA based solutions as well.

You could use python's pandas module to solve this issue. It is much easier to work with using pandas's dataframe and you have no issue accessing a workbook even if it is closed. Just my two cents worth.

Related

Providing a list of files in Excel of a directory is slowing down excel

I need to provide a current list of files in a directory in an Excel workbook and everything is working as required, just too slowly. I really only need the list to check it is current once upon opening the workbook. It takes around 11 seconds to do this which is acceptable but the problem is it keeps rechecking this every time I carry out even minor edits to the workbook (I guess due to the fact that it is brought in as an Excel table). I determined the lag in my workbook using the rangetimer() function that is provided and it is the only thing taking a long time to calculate. I should also state that the table containing the list of files is finally used in a cell on another worksheet to provide a data validation drop-down list but don't believe this is really the issue.
I did some Googling on reducing Excel calculation times and discovered that there are some Excel functions that are definitely culprits for increasing calculation times (described as volatile) and three of these (NOW,INDEX and ROW) are used in providing the functionality I would like in this part of the workbook.
I have tried two solutions so far:
1. Force Full Calculation set to True in VBA properties window
2. Switched calculations to manual. I set this back to automatic once I identified that this part of the workbook was the issue as I don't want manual calculation generally.
The formula I have in the 'refers to' box of the named range (TutorFileList) is:
'''=FILES("\O008DC01\Shared\Tutor Qualifications*")&T(NOW())'''
The formula I have in each cell of the excel table is:
'''=IFERROR(INDEX(TutorFileList,ROW()-1),"")'''
What I would like to have is the ~11secs of calculated time to find these files reduced down to just one check of the networked directory rather than it taking 11secs of automatic recalculation every time the workbook is modified.
If there is a more efficient way to achieve what I am doing I am prepared to redesign things but I do need the functionality of a drop-down list of files in the specific directory in a cell.
Many thanks for assistance from anyone on this.
I have resolved my issue by reducing the number of rows back to around 200 instead of 500 rows. This brings the calculation lag back to about a second which I can live with.

Excel - File optimisations

I'm working with a rather large Excel document (~9MB) in size which has over 60 sheets and each containing many CUBEVALUE formulas in it.
This document takes over 2 minutes to open (not counting refreshing of values) and while i have read many recommendations e.g:
splitting of worksheets (not possible due to the nature of this file)
shorter formulas, (not possible)
tested on both 32 and 64 bit (performance is not notably different)
I was wondering if you guys came across any ways of optimising opening time for Excel without significantly altering the contents within in it, or any further suggestions.
Save it as an Excel Binary Workbook (.xlsb). You can retain macros, the filesize will be 25-35% of the original and many operations (not just opening/closing) will be faster.
Get rid of any volatile functions that are recalculating the worksheet unnecessarily. INDIRECT, OFFSET, ADDRESS, TODAY and NOW are among the list of volatile functions. Most can be replaced with non-volatile alternatives.
Improve the remaining calculation of the workbook by making worksheet formulas and functions more efficient. Help on this is available at Code Review - Excel, a StackExchange partner. No examples supplied so no specific help offered.
Improve any sub procedure cod run times at the same site. Large blocks should be processed 'in-memory' with arrays, not looped through cell-by-cell, etc. Again, no examples supplied so no specific help offered.
If you use corporate network try first downloading the file to you local computer and then opening.
It may also depend on existence of links to other files, try to reduce their number to minimum if there are any.
Nontheless, the volume of data in your file - is the main driver of opening time.

How to improve Excel performance in workbook with external links and conditional formatting

I have 5 workbooks with 540 column x 50 row blocks of data.
I also have a 'roll-up' workbook that lists all this data on a single worksheet with links. There are 6 conditional formatting rules.
The roll-up workbook takes ~30 seconds to update links on open and takes 1-2 seconds each time i modify data and move around the worksheet.
I feel like Excel should be able to work with this data in a performant manner.
Am I doing anything wrong? Should i have set things up differently?
I've tried using arrays of links and individualy linked cells and not noticed a difference.
-- EDIT --
When I remove the conditional formatting, the 30 second refresh takes 1 second. Additionally, some Data Validation dropdowns speed up from 6 seconds to immediate. I'll look at ways of removing the Conditional Formatting (like the VBA idea below).
I don't think the problem is in the conditional formatting but rather, it is in the fact that you need to open data from 5 different workbooks. Trying to consolidated those workbooks first would improve the speed. Refreshing data connections takes 30 sec.
Alternatively, instead of a linked cell, try to use designated data connections (as you would in Power Query, for example). This way a refresh wouldn't happen automatically but the responsiveness of the main spreadsheet would improve.
From experience, conditional formatting can really slow a spreadsheet down. I had one much smaller template that became unusable due a lot of conditional formatting, so I had to redesign it with VBA code instead. In that case this was workable as the cells didn't change much after the initial set-up, so the script only needs to be run once. It would not be so workable if the data is likely to change often.
You might consider putting the formatting in a VBA script, depending on how often the data (and hence the formating) is likely to change.

Excel Advanced Filter Very slow to run, but only after autofilter has been run

I have a very difficult issue I have been trying to solve for a few days, I would very much appreciate some help as I have tried to research this issue completely already.
One one sheet I have a database (18 columns and 72,000 rows) in 32 Bit Excel 2010, so its a large database. On this sheet I also have some entries to auto-filter some columns, as well as an advance filter. When I run the Advanced filter, the data filters in 1 second exactly. If I run an auto-filter, (via vba macros) then run advance filter afterwords, the Advanced filter takes 60 seconds to run, even after turning autofiltermode to false. Here is what i have tried but no luck
Removing all shapes on the sheet
THere are no comments on the sheet so none to removed
Removing all regular and conditional formatting
Turning off auto-filter mode
Setting all cell text on the sheet to WrappedText = False
Un-protecting the sheet
Un-hiding any rows and columns
Removing any sorting (.sort.sortfields.clear)
What else could cause this code to run 60 times slower but only after autofilter has previously run on the sheet, and how can i return it to that state? Any and all help would be greatly appreciated
In my case I was programmatically creating named ranges for later use, and these named ranges used the .End(xlDown) functionality to find the end of the data set. For e.g:
Set DLPRange = .Range(.Cells(2, indexHeaders(2)), .Cells(i, indexHeaders(2)).End(xlDown))
DLPRange.Name = "DLPRangeName"
... which creates a column range from the starting cell to the end of the document in my case. Originally this 'bad' way of finding the range went unnoticed because the file format of the workbook was .xls and maxed out at 65k rows. When I needed more I forced it to create a workbook in .xlsm format, which has ~1M rows. The filter ran on the whole column even though the huge majority of it was empty, resulting in the huge time overhead.
tl;dr: you've tricked excel into thinking it has a huge amount of data to filter. Untrick it by checking and making sure it's only filtering the range you think it should be filtering.
After trial and mostly lots of error I was able to find a solution. I determined that almost any action, even without auto-filter would cause this slowdown, and I felt that simply this was a memory issue for Excel with all of that data (even though it ran find sometimes, the 'cache' I'm guessing would fill up and then run slow.
So what I did was use a new and temporary workbook in which the Advanced filter would add the data on filter. I then took this data and copied a portion of it back into my workbook. Then I closed this temporary workbook without saving it. This also brought the code run from 1 second to .3 seconds and I never got the slow Advanced filter run time, regardless of what code I ran or what I did on the original workbook.
I posted this so if anyone else had a similar issue they might use this as a solution for large amounts of data.
A little late, but recently I had the same issue with a not so large database (4000+ rows, 70 columns) and solved it, so just sharing.
In my case, problem was with wrapped text in data range. Setting WrappedText to false as you said helps is not enough, you need to replace Chr(10) in the range you are filtering. Huge difference.

VBA on Excel only use one processor, how can I use more?

VBA is known to work on only one processor at the same time, so when I run a macro, Excel only uses 50% of CPU instead of totally (dual-core).
Is there a workaround to make VBA use both processors ?
Thanks.
The answer is no unless you split the workload across multiple Excel instances and recombine the results at the end. But the vast majority of cases of slow VBA execution can be speeded up by orders of magnitude:
switch off screen updating
set calculation to manual
read and write excel data using large ranges and variant arrays rather than cell-by-cell
look closely at the coding in VBA UDFs and bypass the VBE refresh bug
minimise the number of calls to the Excel object model
You could split the problem up across multiple instances of Excel.Applications, that way you could do half in one copy of Excel and half in the other.
I've seen someone doing simulation work in Excel VBA using 8 cores. A bit of a pain to bring the results back together at the end, but it worked.
In Excel 2010
Go to File --> Options --> Advanced.
Scroll down until you see the section for Formulas and you can set the number of processors to be used when calculating formulas.
This may not have an effect on VBA but it will work for running a formula down an entire column. Its a big time saver when working with large workbooks. Additionally you can go to task manager and right click on excel and set the priority to high so that your machine focuses its energy there.
** This option does not appear to be available for Mac **

Resources