I have a large Excel file consisting of multiple data sheets with plain data and a couple of dashboard sheets with various graphs and kpi's based on these data.
I am looking to make the file smaller and faster to work with. Should I convert the unformatted data to tables or not. I can't really find anything to support this.
Anyone got any ideas?
Tables have a lot of benefits but they are generally slower than plain data, (although the latest version of Excel 2016 has significant Table speed improvements).
You're actually asking the wrong question. The questions you should be asking is "Why is Excel taking so long to recalculate, why is my filesize so large, and what can I do about it?"
And you don't give us much info about your symptoms. How large is your file in MB? How many rows of data in it? Lots of lookups on big ranges? Lots of volatile functions like OFFSET, INDIRECT at the head of long dependency chains? How slow is it? What version/SKU of Excel do you have?
If Excel runs slowly, it's generally because people have inadvertently programmed it to run slowly, due to poor formula choice and suboptimal layout. Converting to Tables or not isn't going to make a hell of a lot of difference, by the sound of things.
Common culprits that result in slow files include the following (note the last one re Tables) :
Volatile Functions with long calculation chains running off them. See
my post at
https://chandoo.org/wp/2014/03/03/handle-volatile-functions-like-they-are-dynamite/
for more on this
Inefficient lookups on multiple columns (such as using multiple
VLOOKUPS to bring through data for the same record rather than using
one MATCH and multiple INDEX functions)
Lookups on very, very long arrays. See my post at
http://dailydoseofexcel.com/archives/2015/04/23/how-much-faster-is-the-double-vlookup-trick/
to learn how sorting your lookup tables and using the binary match
parameter can speed up lookups thousands fold.
Overuse of resource-intensive formulas such as SUMPRODUCT, when
simpler alternatives exist (including SUMIF and it's variants, or even better, PivotTables)
Using IF and other functions to change the formatting of thousands of cells, instead of using custom number formatting
Using Data Tables. (These can really hog resources, and sometimes
better alternatives exist)
Using many thousands of extra formulas to reference data input cells
that might not be used, rather than using Excel Tables (aka
ListObjects) that expand dynamically, automatically. I always use Tables to host my data and settings. They radically simplify referencing (including from VBA) and file maintainability.
You need to address the root cause, not the symptoms. A good place to start is this article by recalculation guru Charles Williams (who I see has dropped by):
https://msdn.microsoft.com/en-us/vba/excel-vba/articles/excel-tips-for-optimizing-performance-obstructions
In terms of file size, as Charles Williams puts it: To save memory and reduce file size, Excel tries to store information about the area only on a worksheet that was used. This is called the used range. Sometimes various editing and formatting operations extend the used range significantly beyond the range that you would currently consider used. This can cause performance obstructions and file-size obstructions.
You can check what Excel thinks is the used range by pushing Ctrl + End. If you find yourself miles below or to the right of where your data ends, then delete all the rows/columns between that point and the edge of your data:
To quickly do the rows, select the entire row that lies beneath the
bottom of your data, then push Ctrl + Shift + Down Arrow (which
selects all the rows right to the bottom of the spreadsheet) and then
using the Right-Click DELETE option.
For columns, you would select the entire column to the immediate
right of your data, and use the using Ctrl + Shift + Right Arrow to
select the unused bits, and then use the Right-Click DELETE option.
(Note that you’ve got to use the Right-Click DELETE option, and not just push the Delete key on the keyboard.)
When you’ve done this, then push Ctrl + End again and see where you end up – hopefully close to the bottom right corner of your data. Sometimes it doesn’t work, in which case you need to push Alt + F11 (which opens the VBA editor) and type Application.ActiveSheet.UsedRange in the Immediate Window and then pushing ENTER (and if you can’t see a window with the caption “Immediate” then push Ctrl G).
Lastly, depending on what version and SKU of Excel you have, you may be able to use PowerPivot and PowerQuery to radically simplify things and drastically cut down on the amount of formulas in your workbook.
Related
It sounds very easy but I looked for this similar question, but looks like I didn't find suitable. Mostly are slightly different issues then mine..
I am receiving monthly one big Excel file, where I got different sheets, but only on one sheet I am having 3x different data ranges (not formatted tables). I am saying it again, ranges not tables, because some "smart" collogues decided just to overwrite file with new data but just to expand the range...so it stayed as range (it goes horizontal), and not table. For Power Query is needed table format I know..
So my issue is to somehow consolidate those ranges (3 of them) on that one sheet into one Query, but without disrupting the original Excel file, and of course to make it dynamic when I am getting new files.
I am comfortable with Power Query, but I didn't have similar things like this where you have more ranges that have to be cleaned, edited and appended into one query...Positive thing is, the column names are the same, just the content are different...
As you can see the data range is in so called "blocks" on data that are going horizontally...
This is basically something what I would like to have:
If question already exists please link!
Here is my test file to check it up:
https://docs.google.com/spreadsheets/d/1RDAoZqxKPk1NdhtcYec8nG_31PFwQ7Lj/edit?usp=sharing&ouid=101738555398870704584&rtpof=true&sd=true
I solved it by combining into 3x queries and then appended into one bigger table.
and, import From Folder is the best import, rather then direct from Excel Workbook, it gives me more space for adding the filter for instance "Date Created" so you can always have the newest on the top or whatever.
Thx anyways for some input of you guys.
Microsoft's official site has an explanation on how to use scenarios in Excel.
If you name the input cells, the scenario manager will show the name, so it's easier to remember that $C$5 is, say, the price.
My question is: is it possible to set up the scenarios in a table somewhere in Excel, and get the scenario manager to read from there? Setting multiple scenarios in the scenario manager is very fiddly, time-consuming and error-prone, especially when the inputs are linked - e.g. setting 10 scenarios where each scenario is an x% change from the previous.
Any suggestions?
PS I know all these things can be done very easily in a scripting language like Python or R, but in this very specific case the calculations are not too complex and the file needs to be shared with other people, so I must use Excel.
VBA would be a last resort because some of these people have VBA disabled by default.
Edit
To clarify, what I'd need is a way to create a table like this below, where those in blue are the inputs, and those in grey are the outputs. I have put together a banal example below, along the lines of the example in the VBA macro answer given below, but the general idea is:
define a number of scenario as the combination of multiple inputs (more than 2) ;
create a table showing, for each scenario, the inputs and some key outputs;
note the table doesn't have all the possible combinations of all the inputs, like the macro given in one of the examples - that would be too much and wouldn't be very readable.
I could put together a quick VBA script that changes the inputs in the model, reads the result and creates the table, but I was wondering if there is a better way - VBA is typically not very robust, in the sense that just changing the location of one cell can often mess things up. I usually avoid Excel for the more complex models (this would be banal in any scripting language), but this I have to do in Excel.
EDIT #2:
Trying to further clarify what I have in mind, I have put together the screenshot below. Each output is the result of many different calculations, and CANNOT be calculated as a small, simple formula - if it could, I would not have any issue, of course!
My issue is that:
- if I change an input, then all the many many calculations occurring behind the scenes change
- the outputs are read from all those calculations
- I cannot use two-way what-if tables
If even this is not clear, the only other thing I can try is to upload an Excel file, which is generally discouraged on SO.
Scenario Manager is a built in function with it's own GUI.
For this reason, the function will be limited in what it can call (only data entered in the GUI)
VBA will allow you to manipulate this data, telling it where to pull the changing values and what data to change it by
So the answer for your specific query:
Can I use Excel without VBA to perform Scenario Manager tasks not set by the GUI?
No.
But it doesn't mean fiddling with the Manger itself would be horrendous. There are ways to teach and learn with it, but also if you save a macro enabled document, users should be able to turn the macro on with the click of a button - so VBA can be an option too
I hope this helps?
Whenever I have two tables in the same column, I get this error.
Create a table in columns (ie B1:C3)
Create another table below that table (ie B5:C7)
Right-click on column B
Is the "Delete" option grayed out (unavailable)?
Convert the second table (B5:C7) back to a normal area
Right-click on column B
Is the "Delete" option active (black) now?
It is for me.
I don't understand why it happens but I'd really appreciate if someone could confirm that I'm not alone on this one. This actually seems like a bug.
Unfortunately this is 'behavior by design'. A ListObject (aka structured ) table has many internal mechanisms. The Delete (column) command is not designed to enumerate through all of the ListObjects on the worksheet to see if any intersect with the column being deleted and then spawn subprocesses that deal with deleting table columns specifically while simultaneously keeping in mind how that will affect other ListObject tables. Instead, it simply does not allow the Delete command when more than a single ListObject table is involved.
This is not allowed may be because deleting a column will Shift Cells Why Dont you try deleting by selecting one column of a Table Like this
see the screenshot you can do it if you select one column of a table at a time
Thanks
Try organizing your data in a different way, so these problems don't occur.
There is no compelling reason to have several tables on ONE sheet. If table placement presents a problem with row/column management, consider moving tables to separate sheets.
Tables can be referenced in formulas by the table name. Ditto for table columns, so there really is no reason to keep several tables on one sheet if you need flexibility with row and column management.
Edit after comment The fact that users are working with several tables and cannot be expected to change sheets to maintain data on different sheets can be addressed in different ways:
Educate your user. I'm a big fan of teaching people how to use software. If they understand what they are doing, they feel positive. If you keep them dumb and tell them to "just click there and shut up" they may feel negative.
You may want to re-consider your data architecture. Provide your users with an interface to add/edit/delete records that is independent of where the data is stored. This is 2016. Data input and data storage are not married to the same page.
You are posting your question in a site for enthusiast programmers. A little bit of VBA will separate your data entry/data storage issues, if you are interested to work it out.
Short Version
Is there a way to have visually represented user re-sizable data ranges in Excel? (If so, via VSTO?)
Long version
I'm writing an add-in to Excel that helps with exporting data within arbitrary workbooks to existing database tables. The data is more or less tabular but it's almost always laid out differently. I'm looking to make the process as error free but quick as possible. For example, columns for ranges of tabular data have their header names ranked for similarity to a pre-determined field names list. The rankings are then fed into a solver for the assignment problem. This allows columns to be mapped to fields automatically with surprisingly high accuracy.
However, detecting the ranges of tabular data isn't feasible -- often not all of the data is wanted for export. Therefore, I'm looking to make a familiar yet quick to operate user interface for users to specify the tabular data ranges within a workbook.
One such user interface would be to have the user draw and re-size the ranges they'd like to export. Thus, I'm seeking to do exactly that. However, I'm open to other user interface ideas if they're more conducive to implementation yet still easy to use.
The solution I've ended up with is creating a scratch copy of the worksheet and managing the formatting of it in order to highlight various portions of it -- instead of user-resizable areas they're "painted" instead by selecting cells. (The interface takes the form of a familiar paint program in a way with various tools that allow you to manipulate the cell information in certain ways.) The data export region and normal modes are switched between with a toggle bottom.
Under the hood it's not a pretty, elegant solution, but it's pretty slick for the end user.
First, I dont have any experience with programming. If I ever start, then this would be probably my first. I keep looking for answer until I found this site.
I am looking outside the box because in excel doing a data of 1 million + row and 20 + column would take a very long time just to wait for the calculation to be done and the copy and paste with formula would take longer. Imagine I have to let the computer running for 8+ hours with the helps of marco and F4 (repeat). All my formula have to paste into number only with I have done with the formula. And even I break the files into piece, the files sizes are 20MB to 110MB without active formula. Opening the file is taking forever.
I wonder how to write a programme with 1) dialog box, 2) the excel command and formula (sort, delimiter, concatenate), 3) ability to create graph, 4) with tabs to view different set of data or graph 5) add in a set of data 6) limiting the number (1-100000), etc. Outlook something look like utorrent.
What compiler suitable for this programme? It's easier you tell me which 'book' to read that me finding which 'book' is suitable because even if it is I might flip it through and go on to the next one. 'book' may refer to book, way, steps, etc.
I'm not sure what you actually want. With 1M+ rows and 20+ columns, an Excel sheet doesn't seem to be the right tool for the job. So do you...
want to keep using Excel, but automate the job? Use Excel VBA like renick suggested. It's the language that Excel uses internally for macros, but you can write any kind of automated processing you'd like. Beware, however, that VBA is not exactly the best language to start a programming experience with. (That's my personal opinion, and what matters is of course whether you get the job done).
want to switch to something else? A database management system seems better suited for the amount of data you have. Microsoft Access is part of Office and might already be on your system. Getting your data into and out of the database could be a problem, but the advantage you have is that a database is built to handle colossal amounts of data and will happily munch your figures for several days without failing. You can access the data using the Structured Query Language (SQL), which is not really a programming language, but very powerful (and it most certainly has CONCATENATE, ADD etc.). Graphing is more difficult, but can also be done.
If you know excel then Excel VBA is a VERY capable language to do all this. I would suggest you go to the VBA Dev Center here to get started.
I can't believe I'm about to say this (for most things I do it would be the wrong choice) but:
If the computations aren't that complex (just lots of them) Python might be a good bet.
If you can get the input as a CSV file than, for about 10 lines of code, you can write a loop that will be run for each line of input and hands you the values to play with.
for line in open('filename', 'w')
values = line.split(',')
#values has the values from this line as strings.
#these can be converted to numbers:
x = float(values[0])
n = int(values[1])
#... and then processed
That might not be the cleanest/best approach but it's simple and straight forward.
p.s. For 1M+ rows, don't expect it to be blazing fast (10 sec to a min or so, depending on what you do to the data)