Why empty excel file is 5.5mb large? - excel

I have an empty excel file, which is 5.5mb large.
If I open it - the process is very laggy, even on fast PC (intel i7 processor).
It opens ~30 sec.
When it opens, it shows that the document has 1048576 rows.
I tried to delete them - but unsuccessfully.
If I remove the G column, the file size gets decreased by half (2.5mb).
If I remove the Entire Sheet, adding new empty one, the file size gets 8kb.
The question is not about how to solve the problem, but what does cause the problem, why this is happening and how do I remove unused rows? I tried to delete them in different ways. saved the document-reopened - no success.
Here is the document, if you need: https://files.fm/u/erfr4weq

Save the excel file with the open xml format, unrar and open it with the editor to see what is going on in it.
Please note that this approach is only valid for xlsx files (office 2007 and onwards)

It's most likely not actually empty, there will be some hidden data somewhere. Try the clear function on excel instead of just the delete button.

I had the same thing happen.
I had 1200 data records that took up 5MB, which is odd.
I looked at the right scroll bar and saw that it was small. Excel had added 10,000 more rows, which I found out.
I got rid of the extra rows, so it is now 152kb.
Check to see if the number of rows on your scroll bar matches the number of rows you're using.
Then get rid of them.
Even though it looks like you're not deleting anything, you are.

Related

How to resolve Excel not removing duplicates?

Probably a dumb question, but I'm having an issue where Microsoft Excel is successfully finding duplicates and SAYING its deleting them in my CSV file, but not actually deleting them or otherwise making any change to the file.
I usually have no problem at all, but I suspect the issue may have to do with the unusually large file size I'm working with today.
To walk through my steps for better troubleshooting, I have columns A-AC, I am selecting column E, which is a list of usernames. I go to the top bar, select data > data tools > remove duplicates. There is a popup menu that confirms if I want to select that column, I select "continue with current selection", then select "ok" and "remove duplicates". It says it's removed the duplicates displaying the expected/correct number (see screenshot), but then no changes are made. If I follow the same steps again, I get the exact same results.
As an experiment, I tested it in a separate CSV file with only the top 20 rows, and it worked perfectly fine. But with the other file that has 15k rows, it doesn't work.
My goal is to remove any rows that contain a duplicate username in column E.
Any idea what's going haywire?

Excel document lost all data

I was working on an Excel document and I used the key combination Alt+h+w to wrap a text paragraph, but for my surprise that makes disappear all my sheets and cells, below an actual view:
Excel doc actual state
I have never seen an error like this in Excell before just after doing the keyboard combination Alt+h+w the document is not corrupted and has his actual size of 379kb, meaning that still has his data somewhere.
I did open all other excel documents and Excel is working fine, I did open a back up of this file from two months ago and the file size is 350kb meaning that size-wise is fine with only 20kb increase in two months but size are similar so is not about losing data.
I will appreciate all your help.
From the view, menu select unhide which will bring back all the information to the foreground

Excel Pain: Saved With Filters, Opened And file Was Half the Size

So something happened twice that has caused me hours of retroactive work. Extremely, extremely annoying, and the bain of my existence.
I have an .xls file with about 171K rows. I saved it with filters on, reducing the number of shown rows to about 13K. When I reopened the file the next day the filters were not showing, but rows were 'hidden', because in order to show all the rows I had to 'unhide'. The problem is that when I unhide, the total rows is ~65K aka the last numbered row that was showing when I had filters on.
Has this happened to anyone before or know how to recover the full 171K rows? I know for a fact I didn't 'clear' or 'delete' anything before or after saving.
One piece of advise I have is for you to start reading screen prompts. When saving a large file with over 65536 rows this kind of warning is displayed. Only you can prevent your own errors.
You can avoid this by making a copy of the original and filtering on that, then you always have the source if you make an error.

Finding the cause of Excel file corruption

I have a feature that downloads things to an xls file using Apache POI. Mostly it works. But on one particular database, the resulting files are corrupted and won't open in Excel. I get the message "We found a problem with some content in 'DownloadFoo.xls'. Do you want us to try to recover as much as we can? If you trust the source of this workbook, click Yes." . Clicking yes results in all the formatting, data validation, etc being stripped out. On the other hand, if I open the file in Open Office Calc and save it, it's fine and can be opened in Excel from then on. (The people who want to use these files aren't allowed to download Open Office Calc, so this is not considered an acceptable workaround.)
I have tried narrowing it down to see which data is causing the problem, but it seems to occur whenever 10 or more items are downloaded, regardless of which items they are. (On other databases, it's fine to download 100+). Excluding some of the columns helps, but they are perfectly innocuous looking columns (and virtually identical to other columns which are fine) so this still hasn't got me to the bottom of it.
Are there any techniques I could use to find out what Excel has a problem with in the corrupted spreadsheets?
I can't make major changes like getting it to download to xlsx instead as this feature is going to be scrapped and replaced with something completely different in the near future, so I'd like to just focus on the problem at hand.
It turns out that the solution to the problem was to reset the data validation lists more often. Quite a lot of the cells in my spreadsheet have data validation. When the data validation lists are longer, they are stored on a hidden sheet. If several cells need the same validation, I try to get them referencing the same list in order to not write out too much stuff on the hidden sheet. However Excel apparently dislikes it when too many cells reference the same list- it's not against the rules as far as I can tell, but it doesn't like it anyway. When I changed it to rewrite the validation lists for every 5 items, it started working.
The reason this database was different was that the items had an unusually high number of subitems, so they occupied a lot of rows even though it didn't seem like many things were being downloaded. Some of the problem columns just had true or false validation rather than using the lists on the hidden sheet, so I don't know what that was about, but resetting the validation lists helped anyway.
This doesn't really answer my question as I never managed to get any information from Excel about what the problem was, or use a particular technique, it was just a series of coincidental findings. I'm putting it here anyway in case anyone else has a similar problem. Also the thing that started me on the right track was finding an old comment when double checking that it doesn't do anything different for over 10 items (it doesn't) in response to Andrew Morton's comment, so thanks Andrew!

Excel File Size Close to 300MB

I have an excel file close to 300MB. There are a few tabs and are all text based.
All fonts are the same. Some are characters and some are date formats. one tab is close to 1M row now. But the number of columns in all tabs are only less than 30. No macros included or links to other files.
I read that Excel limit is ~1M rows X 2000 columns. Does that mean as long as my columns are less than 30, then this file can still potentially grow? Or would it just stop running once it reaches 1M?
thanks in advance.
Once you have 2^20 (1048576) rows you will not be able to add a new row. In order to understand this, select range A1 and keep pressing Ctrl+Down arrow until you see the last row. Once you reach it, you cannot go further.
Concerning the 1.000.000+ entries and the need for more rows - it seems that you are using Excel as a database, and it is really not a good idea to use it as one. If you need so many entries, you probably need a nice database, which would be fast and easy. MS Access (as far as you are using Excel, this is the db with most similar interface) can solve your solution easily. Or MS SQL Server.

Resources