how to optimize speed excel 2007 (±20,000 rows) - excel

I'm in the process of working with an Excel file that contains two columns (old URL and new URL). But it contains about 20,000 rows.
And I have another file containing about 400 old/new URL that needs to be imported in the big ±20,000 rows file.
I have to do all kinds of processing, like:
- Find all duplicate rows (same two columns more than once...). That functionnality would be in a column and it would be good to run that function each time I add 1 row to check if that URL combination already exists in the file
Note that I already turned the sheet into a table.
2 questions now:
1) should I do some kind of vlookup from the ±20,000 rows sheet and the ±400 rows sheet, or VBA? I don't know what would be the best way to do this (i.e.: if that row from the ±400 rows sheet is not in the ±20,000 rows sheet, add it...). Should I use vlookups or populate arrays in VBA (speed-wise)? If I use vlookup, it is true that it is possible to put the vlookup function in a sheet and refer to it in every row instead of puting a vlookup function directly in every row?
2) How can I optimize the 20,000 rows sheet because now, each time I want to sort or filter, it takes an eternity to redraw and it freeze my PC for that time!
Thanks for you help.

Firstly to ommit the dupes from the 400ish row sheet that need to be added in, use a COUNTIFS formula against the big sheet, then sort by this value and only copy in things where there is < 1 for the value (or error).
Secondly I would probably do the same thing in the big sheet but referencing itself, anything with a value above 1 is a dupe.
Lastly, are there formulas in the 20,000 row sheet? I could set up a 20,000 row sheet with just a "1" in range A1:A20,000 and doing anything on it would be super quick. It all comes down to what data you have in there and what you can do to reduce it's load on the system (ie convert formulas to values if they no longer need to calculated)

Excel 2007 has a built-in feature and VBA you can use for your situation: Range.RemoveDuplicates or Data tab -> Data Tools group -> Remove Duplicates
For example data:
Click the Remove Duplicates button:
And you are done!
The VBA equivalent is:
ActiveSheet.Range("$A$1:$B$10").RemoveDuplicates Columns:=Array(1, 2), Header:=xlYes
Note the 1 & 2 does not mean Columns A & B. It means the Columns of the selected Range.
If your worksheet only contains 2 columns, you could use UsedRange instead.

Related

In excel, how can I automate a sheet to be filled based on conditions of another sheet?

Is there a way in excel to copy the contents of one column to another sheet based on particular values in another column.
I have data which looks like this:
Sheet1:
Sheet2:
I would like to copy the column A of sheet 1 directly onto column A of sheet 2, but only if the value in column B is Y.
I have tried used the match function but am not sure how the best way to do this would be. Is there a way for me to achieve this?
My desired output is:
As discussed in the comments, #Waldorf99 was looking to have a second worksheet that would automatically show a filtered list from the first sheet. I can think of a few ways to do this (array formulas or pivot tables come to mind). The problem with mixing dynamic columns with static values is that the static values would become desynced from the dynamic ones.
In the original example, rows may have a blank value in the condition column in sheet one, and then may be assigned a Y or N at a later date. If a Y is assigned to a row in the middle of other rows, the filtered sheet would shift the existing rows down to make room. The static values would stay where they were, and would become desynced. To demonstrate:
If the above image is the original state of sheet 1 and 2...
...adding a y next to x.1.c would result in sheet two shifting columns A and B of row 2 down, but leaves columns C and D behind (as they are static, and not tied into the first two columns in any way).
One thing that may work as you are wanting are filters. You would only have one sheet, with all of the data manually entered. Then you can add filters, and change them to hide rows temporarily when needed.
To add and use filters:
Start with your data all on one sheet...
Highlight your data...
On the Home tab, select "Format as Table" and choose any style...
This turns your data into a table. You can filter by clicking the drop down in a given columns header row, then deselecting the values you want to hide.
The results are a table that only shows the rows with a 'Y'.
The other rows aren't removed, just hidden. You can always reset or change your filters, to configure which rows are visible.
Note: when working with tables, they will auto expand to account for new rows, so long as you work in the row directly under the table (e.g. don't leave blank rows). You can also manually resize the table at any time by clicking and dragging from the bottom right corner of the table.
There are tons of resources of Excel tables online, and it's a pretty useful tool in Excel.
Hope that achieves what you were looking for.

Excel 2010 add and delete rows is very slow but calculations is not

Help!
I have an 8MB 2010 .xslx workbook (no macros) that runs a full calc in about 2 seconds. It only 2 worksheets each with less than 1,500 rows. However, it has 100 and 200 columns. It takes 20+ seconds to insert or delete a row (and much much longer when I delete a group of rows).
It does have a fair amount of calculations in the workbook largely made up of index/match formulas. I went a process to simplify that process by only calculating the matches (for the most part) at the top and left of the worksheet. For example, All of F7:DV7 will point to only 2 rows on worksheet 2 so the match() is only done once in column C and D.
I realize index/match is more complicated than simple a+b and that excel likes rows more than columns but this file isn't that big at all and it seems like it should be able to handle it. And the fact that the calculation is fine, it's just when I add/delete rows that it's so slow has me bewildered.
I came across a similar issue recently, and I found this question while searching for an answer online. Unfortunately, it didn't include an answer, so I moved on. However, I found the reason why the worksheet I was working on was taking so long to delete rows and wanted to return to this question and add my 2 cents.
In my case, it turned out one of the vlookup formulas included table array written something as SheetName!$A$1:D5000. When the formula was copied down, the range expanded by one in every cell down. So the next cell down had defined table array as SheetName!$A$1:D5001. And this went on for a few thousand rows. Turning off automatic calculation had no effect on reducing the wait whenever deleting rows.
Anyway, changing the table array in the vlookup to SheetName!A:D and copying that vlookup down the column did the trick. You didn't mention you used a vlookup, but it could be happening in the index/match formulas.
this is an areas problem. When you filter your data and select an entire column, you are selecting multiple non contiguos ranges, i.e multiple areas. A workaround could be:
sort your data from a to z to group the rows you want to delete in
only one area
Filter the values you want to delete
Delte rows
Enjoy!
If the actual order of your data is important to you, just add a column, fill it with numbers from 1 to n. Perform steps 1) 2) and 3), then restore the original order. Perform step 3).

Get unique sheet from 2 separate sheets

I have a single Excel document with 2 sheets. The first sheet contains "active" clients and the 2nd "inactive clients" but we want to merge both into a 3rd sheet "all clients". We want to ensure that there isn't any multiple rows. Column A in both sheets is the "identifier" which is a 16 digit numeric value. Both sheets have the same columns so effectively I want to match column A in both sheets and return the entire row if it's not found yet. There is around 1.2 million rows combined in both sheets, hence why I cannot just copy and paste them into a single document.
How would I go about doing this?
Good advice in the comments but because even after removal of duplicates there may not be enough rows to accommodate the de-duped list I would suggest starting by determining how many duplicates and deciding which list to delete any from (or you might end up with an incomplete combined list but no practical way to extend it). In both sheets add a column with:
=MATCH(A1,Sheet2/1!A:A,0)
(Sheet2 name for one, Sheet1 name for the other) and copy down to suit.
Then check the de-duped combination number less than 1,048,576 in total. If more they won't fit on a single sheet without an additional set of columns and even if less a database is to be preferred, though not obligatory, with the Excel version possibly convenient for upload.

Excel VBA - Get rid of duplicate values in a particular column

I have a worksheet with many rows of IDs.
I would like to know the best way to write a VBA procedure that will look at the range of values in one column, and replace the entire range with only the unique values which appear in that range. So a column of 1000 IDs might reduce down to a column of 150 unique IDs. I would not like this procedure to affect the data in other columns in the worksheet.
So, say the initial column A was:
*IDs*
ID12
ID12
ID34
ID56
ID78
ID78
ID78
I would like it to replace the column with a new column A:
*IDs*
ID12
ID34
ID56
ID78
Thank you kindly.
Note: I know how to do this manually a few different ways, but I would like to cycle through and do this procedure for every non empty column on a sheet, and the columns are of varying length.
To achieve what you want please do the following:
Select entire column you want to remove dupes of.
Go Ribbon Data > Remove Duplicates.
Set My data has headers (according your input).
You're done. If you want these steps in VBA - turn on macro-recorder before the start (bottom left corner of Excel window).

Combining Excel sheets/groups of columns by a condition in Excel 2007

Is there a way to combine 2 Excel sheets (or groups of columns inside one Excel sheet) so that the rows in one sheet/group append to the other sheet/group where so that certain columns values match.
To clarify:
Lets say I have 2 sheets - Sheet1 and Sheet2. Sheet1 has the columns A,B,C,D. Sheet2 has columns A,E,F,G. Column A in both sheets contains the same data but differently sorted (it is not sorted in conventional way (alphabetically or numerically)). I need to combine these 2 sheets into one, but they need to be combined so that the values in A column match (if possible the result should be ordered in the same way as the Sheet2).
Ideally, the functionality I'm looking for would need to be like SQL's INNER JOIN command.
I'm using Excel 2007.
Thanks
I think you basically described the VLOOKUP function.
You have your two sheets, now you want to create a list, which extends A,B,C,D to A,B,C,D,E,F,G.
For that, you could just use
Sheet1!E1=VLOOKUP(Sheet1!A1,Sheet2!A:G,5,FALSE)
Sheet1!F1=VLOOKUP(Sheet1!A1,Sheet2!A:G,6,FALSE)
Sheet1!G1=VLOOKUP(Sheet1!A1,Sheet2!A:G,7,FALSE)
If you need to create an extra sheet3 as a result, use this:
Sheet3!A1=Sheet1!A1
Sheet3!B1=VLOOKUP(Sheet3!A1,Sheet1!A:D,2,FALSE)
Sheet3!C1=VLOOKUP(Sheet3!A1,Sheet1!A:D,3,FALSE)
Sheet3!D1=VLOOKUP(Sheet3!A1,Sheet1!A:D,4,FALSE)
Sheet3!E1=VLOOKUP(Sheet3!A1,Sheet2!A:G,5,FALSE)
Sheet3!F1=VLOOKUP(Sheet3!A1,Sheet2!A:G,6,FALSE)
Sheet3!G1=VLOOKUP(Sheet3!A1,Sheet2!A:G,7,FALSE)
Hope this interpretation was correct.
Edit:
By the way, because Excel is not mainly intended to function as a database, this operation is a bit messy, because it does not dynamically scale. At least with the second approach, using a thrid sheet. You will have to copy down A1 at least that far, to match the last used row from Sheet1. And if you should copy it down further, so you won't have to worry about it for a while, you might need to error-proof against the empty cells.

Resources