I have an Excel data table linked to a query to an Oracle database. This data table has both:
(1) additional calculated columns (some involving array formulas & MATCHes) added to the data table (based on the queried values), and
(2) many dependent formulas spread throughout other tabs
While the query itself is fast to refresh if added to a brand new Excel file exactly as is, the dependent formulas seem to slow down the refresh EVEN WITH CALCULATIONS SET TO MANUAL.
I've tried several things:
1) Set calculations to manual
2) Disabled screen updating
3) Disabled events
4) Removed calculated columns from data table (just normal formulas)
Nothing seems to help... any ideas? Thanks.
This link may be of interest to you.
Excel Recalculation
http://msdn.microsoft.com/en-us/library/office/bb687891(v=office.12).aspx
Thread Safe Functions
http://msdn.microsoft.com/en-us/library/office/bb687899(v=office.12).aspx#xl2007xllsdk_threadsafe
There are many events that can trigger a recalculate event also read up on VOLATILE functions. A Volatile function is one whose value cannot be assumed to be the same from one moment to the next even if none of its arguments (if it takes any) have changed.
The following Excel functions are volatile.
NOW
TODAY
RAND
OFFSET
INDIRECT
INFO (depending on its arguments)
CELL (depending on its arguments)
Recalculation of data tables is handled slightly differently. Recalculation is handled asynchronously to regular workbook recalculation, so that large tables might take longer to recalculate than the rest of the workbook.
Try setting Automatic Except for Data Tables. Perhaps even when set to manual calculate that just means that whenever an event triggers the manual it will calculate the affected cells. Perhaps your refresh is triggering all these events which will calculate multiple times because as the events trigger it more events come in and trigger again.
The easiest thing you could do is look to eliminate as many VOLATILE and ARRAY formulas as possible. Personal exp tells me that RAND and ARRAY functions are the worst.
Related
I am wondering, if I make a pivottable in Excel from a recordset with about 50000 lines, it takes about 30 seconds to produce a running total in a date field. Yet when I want to achieve the same result in an Access table, the DSUM takes over 30 minutes. Same data... Why is there so much performance difference? What does Excel do in the background?
You might find this article helpful:
http://azlihassan.com/apps/articles/microsoft-access/queries/running-sum-total-count-average-in-a-query-using-a-correlated-subquery
Here's what it says about Dsum, DLookup, etc.
They involve VBA calls, Expression Service calls, and they waste
resources (opening additional connections to the data file.)
Particularly if JET must perform the operation on each row of a query,
this really bogs things down.
Alternatives include looping through the recordset in VBA or creating a subquery. If you need to use DSUM, make sure your field is indexed and avoid text fields.
I have a large Excel file consisting of multiple data sheets with plain data and a couple of dashboard sheets with various graphs and kpi's based on these data.
I am looking to make the file smaller and faster to work with. Should I convert the unformatted data to tables or not. I can't really find anything to support this.
Anyone got any ideas?
Tables have a lot of benefits but they are generally slower than plain data, (although the latest version of Excel 2016 has significant Table speed improvements).
You're actually asking the wrong question. The questions you should be asking is "Why is Excel taking so long to recalculate, why is my filesize so large, and what can I do about it?"
And you don't give us much info about your symptoms. How large is your file in MB? How many rows of data in it? Lots of lookups on big ranges? Lots of volatile functions like OFFSET, INDIRECT at the head of long dependency chains? How slow is it? What version/SKU of Excel do you have?
If Excel runs slowly, it's generally because people have inadvertently programmed it to run slowly, due to poor formula choice and suboptimal layout. Converting to Tables or not isn't going to make a hell of a lot of difference, by the sound of things.
Common culprits that result in slow files include the following (note the last one re Tables) :
Volatile Functions with long calculation chains running off them. See
my post at
https://chandoo.org/wp/2014/03/03/handle-volatile-functions-like-they-are-dynamite/
for more on this
Inefficient lookups on multiple columns (such as using multiple
VLOOKUPS to bring through data for the same record rather than using
one MATCH and multiple INDEX functions)
Lookups on very, very long arrays. See my post at
http://dailydoseofexcel.com/archives/2015/04/23/how-much-faster-is-the-double-vlookup-trick/
to learn how sorting your lookup tables and using the binary match
parameter can speed up lookups thousands fold.
Overuse of resource-intensive formulas such as SUMPRODUCT, when
simpler alternatives exist (including SUMIF and it's variants, or even better, PivotTables)
Using IF and other functions to change the formatting of thousands of cells, instead of using custom number formatting
Using Data Tables. (These can really hog resources, and sometimes
better alternatives exist)
Using many thousands of extra formulas to reference data input cells
that might not be used, rather than using Excel Tables (aka
ListObjects) that expand dynamically, automatically. I always use Tables to host my data and settings. They radically simplify referencing (including from VBA) and file maintainability.
You need to address the root cause, not the symptoms. A good place to start is this article by recalculation guru Charles Williams (who I see has dropped by):
https://msdn.microsoft.com/en-us/vba/excel-vba/articles/excel-tips-for-optimizing-performance-obstructions
In terms of file size, as Charles Williams puts it: To save memory and reduce file size, Excel tries to store information about the area only on a worksheet that was used. This is called the used range. Sometimes various editing and formatting operations extend the used range significantly beyond the range that you would currently consider used. This can cause performance obstructions and file-size obstructions.
You can check what Excel thinks is the used range by pushing Ctrl + End. If you find yourself miles below or to the right of where your data ends, then delete all the rows/columns between that point and the edge of your data:
To quickly do the rows, select the entire row that lies beneath the
bottom of your data, then push Ctrl + Shift + Down Arrow (which
selects all the rows right to the bottom of the spreadsheet) and then
using the Right-Click DELETE option.
For columns, you would select the entire column to the immediate
right of your data, and use the using Ctrl + Shift + Right Arrow to
select the unused bits, and then use the Right-Click DELETE option.
(Note that you’ve got to use the Right-Click DELETE option, and not just push the Delete key on the keyboard.)
When you’ve done this, then push Ctrl + End again and see where you end up – hopefully close to the bottom right corner of your data. Sometimes it doesn’t work, in which case you need to push Alt + F11 (which opens the VBA editor) and type Application.ActiveSheet.UsedRange in the Immediate Window and then pushing ENTER (and if you can’t see a window with the caption “Immediate” then push Ctrl G).
Lastly, depending on what version and SKU of Excel you have, you may be able to use PowerPivot and PowerQuery to radically simplify things and drastically cut down on the amount of formulas in your workbook.
I'm working with the Office JS API to create an Add-In for Excel.
I'm applying a block of data to a table (8784 x 5) with several referencing formulas. Applying the data via range.values = values is pretty slow (~15 seconds) while automatic calculations are turned on. Applying data with auto calcs turned off is significantly faster (<3 seconds). Is there a way to manipulate the calculation mode via the Office JS API? I know that I can see the read-only value of the mode, but I don't see any way to adjust it.
I'd like to turn on manual calcs, apply data, turn back on auto calcs.
Setting calculation mode is in our backlog and is an important feature that we need to add (no timeline yet); however as you noted for now it is read-only. In addition we are currently working on range, worksheet level calculate functions, which may help with workbooks that are already in manual mode.
For larger tables with formulas, there are few pitfalls you can avoid as described here.
I have an Excel 2010 data table which is driven by a query from MSSQL. The underlying query changes depending upon what options the user selects in the Excel workbook. I'm okay with changing the query and pulling the data.
After the data has been selected multiple users will be able to edit and append data to the Excel table and these changes will post back to the SQL database table. Due to the database table structure some of these cells within a given row are mandatory before any data can be inserted into MSSQL and/or potentially updated.
So what I'm trying to achieve is checks on whether certain columns in a row are blank after a cell is edited (I can do this via Worksheet Change) and also before they move off that row so I can bring up a message if all mandatory columns haven't been entered. I can't see any events that fire before Selection Change. My only thoughts on a workaround is to have a global variable row marker that is updated on Selection Change, i.e. it will store the previous row number. I can't use Excel's standard data validation functionality looking at blank cells because although this is fine for a currently correctly populated row that is being edited, inserting new rows or appending directly to the bottom of the table will constantly error as all those mandatory columns will, of course, be blank. Currently I am using conditional formatting to at least highlight columns/cells that require input although this doesn't force users to actually do it. Data cannot be stored within MSSQL until these columns contain data so if they don't fill them in and refresh the table for whatever reason, whatever they have entered will be lost. Obviously this is bad, m'kay. I am concerned about both the Worksheet Change and Selection Change events constantly firing and how that will affect workbook performance.
Any suggestions would be appreciated. Maybe I'm going about this all wrong so any ideas to make this more efficient would also be well received. The user base do not want to see UserForms or MS Access even though it would make this activity very easy. They are too used to the look and feel of Excel sheets.
your best way is to copy the table into 2d array or some other data structure in memory such as dictionary or collection. and than manage each change in memory. this one is very efficient but requires a lot of code. with excel the only problem you have is the key the rest is vlookup and true false questions. vlookup will find the original value and then you have current data + previous data + the logic... is the new data ok?
I am working on converting a large number of spreadsheets to use a new 3rd party data access library (converting from third party library #1 to third party library #2). fyi: a call to a UDF (user defined function) is placed in a cell, and when that is refreshed, it pulls the data into a pivot table below the formula. Both libraries behave the same and produce the same output, except, small irregularites can arise, such as an additional field being shown in the output pivot table using library #2, which can affect formulas on the sheet if data is being read from the pivot table without using GetPivotData.
So I have ~100 of these very complicated (20+ worksheets per workbook) spreadsheets that I have to convert, and run in parallel for a period of time, to see if the output using the new data access library matches the old library.
Is there some clever approach to do this, so I don't have to spend a large amount of time analyzing each sheet to determine the specific elements to compare?
Two rough ideas that come to mind:
1. just create a Validator workbook that has the same # of worksheets, and simply do a Worbook1!Worksheet1!A1 - Worbook2!Worksheet3!A1 for every possible cell on each sheet
2. roughly the equivalent of #1, but just traverse the cells in the 2 books using VBA, and log any cells that do not match.
I don't particularly like either idea, can anyone think of something better than this, maybe some 3rd party utility I could buy?
Sounds like its time for a serious fundamental redesign rather than swapping data access libraries.
But, to address your question:
- I don't think a 3rd party utility to do this exists.
- a VBA approach to do this using variant arrays to get the used-range from each sheet would be reasonably easy & efficient as long as you don't try to traverse the sheets cell-by-cell.