Improve Vlookup on large file - excel

I´ve a very large file that I reduced as much as possible to 3 columns and 80k rows.
I need to perform a vlookup in order to bring values from column 1 or 2 match some other spreadsheets values.
The thing is Excel doesn´t seem to support such large searches, and it stops responding - the computer has 4GB and a Quad core, and not much more running at the same time.
As far as I understand, as I´m not looking for exact matches, I should not use match-index.
The only thing I thouhgt could help but not sure about that, is dividing the file in 2-4, and asking Excel many parallel searches instead of a big one. Could this work?
What else should I try?
Thanks!!!

Sort your data and use True as the 4th VLOOKUP argument. This makes VLOOKUP use binary search rather than linear search and is lightning fast.
If you need to handle missing data you will need to use the double VLOOKUP trick, see
http://fastexcel.wordpress.com/2012/03/29/vlookup-tricks-why-2-vlookups-are-better-than-1-vlookup/

Related

Can I use MINIFS or INDEX/MATCH on two non-contiguous ranges...?

Problem is straightforward, but solution is escaping. Hopefully some master here can provide insight.
I have a big data grid with prices. Those prices are ordered by location (rows) and business name (cols). I need to match the location/row by looking at two criteria (location name and a second column). Once the matching row is found (there will always be a match), I need to get the minimum/lowest price from two ranges within the grid.
The last point is the real challenge. Unlike a normal INDEX or MINIFS scenario, the columns I need to MIN aren't contiguous... for example, I need to know what the MIN value is between I4:J1331 and Q4:U1331. It's not an intersection, it's a contiguous set of values across two different arrays.
You're probably saying "hey, why don't you just reorder your table to make them contiguous"... not an option. I have a lot of data, and this spreadsheet is used for a bunch of other stuff. So, I have to work with the format I have, and that means figuring out how to do a lookup/min across multiple non-contiguous ranges. My latest attempt:
=MINIFS(AND($I$4:$J$1331,$K$4:$P$1331),$B$4:$B$1331,$A2,$E$4:$E$1331,$B2)
Didn't work, but it should make it more clear what I'm trying to do. There has GOT to be an easy way to just tell excel "use these two ranges instead of one".
Thanks,
Rick
Figured it out. For anyone else who's interested, there doesn't seem to be any easy way to just "AND" arrays together for a search (Hello MS, backlog please). So, what I did instead was to just create multiple INDEX/MATCH arrays inside of a MIN function and take the result. Like this:
MIN((INDEX/MATCH ARRAY 1),(INDEX/MATCH ARRAY 2))
They both have identical criteria, the only difference is the set of arrays being indexed in each function. That basically gives me this:
MIN((match array),(match array))
And Min can then pull the lowest value from either.
Not as elegant as I'd like... lots of redundant code, but at least it works.
-rt

Excel nested if function keeps returning false

Hello I am trying to categorize quantitative data into 4 categories: Low, Medium, High, Very High.
=IF(0.35<G2<0.554,"Low",IF(0.555<G2<0.699,"Medium",IF(0.7<G2<0.799,"High",IF(0.35<G2<0.8,"Very High"))))
Does anyone know why this keeps returning "False"?
Thank you
It's because =0.35<G1<0.554 does not make sense from Excel's point of view, Excel's logic would require to make it AND(0.35<G2,G2<0.554). Nested IFs are difficult to read and maintain, I would suggest using a look up table as per the example below:
This would be the correct syntax you're looking for. If you want multiple criteria you have to use the AND function and make single comparisons multiple times.
=IF(AND(0.35<G2,G2<0.554),"Low",IF(G2<0.699,"Medium",IF(G2<0.799,"High",IF(G2<0.8,"Very High"))))

Excel - Return name from list based on multiple criteria

This is my 1st post here (and not allowed to paste images). I have been trying to solve this issue for a couple of days with no luck. I'm working on an Excel spreadsheet for a game and cannot return a name based on multiple criteria. See below:
Table
I am trying to return, for example, the name of the Guardian with the highest amount of games played.
I've tried Index/match/sumproduct combinations but I can't figure this one out. Can you help me?
=index(Data!$A:$H,match((1,Data!B:B=Overview!B12)*(Data!C:C=Overview!B23)),0),1)
=MAX(IF(Data!B:B=Overview!B12,Data!C:C))
I'm thinking if I could join these two formulas together I might be able to make it work.
Try this array formula:
=INDEX(Data!$A1:$A99,MATCH(MAX(Data!$C$1:$C$99*(Data!$B$1:$B$99=B12)),
Data!$C$1:$C$99*(Data!$B$1:$B$99=B12),0))
CtrlShiftEnter
Notice that we should avoid using "full columns" in array formulas because they would introduce the computation of huge arrays and hence would slow down the formulas. I limit it here to 99 rows, use a limit that is big enough to span your data.

How to optimize COUNTIFS with very large data

I would like to create a report that look like this picture below.
My data has around 500,000 cells (it will continue to grow larger)
Right now, I'm using countifs function from excel but it takes a very long time to calculate. (cannot turnoff automatic calculate)
The main value is collected as date and the range of date is about 3 years, so I have to put a lot of formula to cover all range of value.
result
The picture below is the datasource the top one cannot be changed. , while the bottom is the one I created by myself (can change). I use weeknum to change date to week number.
data
Are there any better formula or any ways to make this file faster? Every kinds of suggestions are welcome!
I was thinking about using Pivot Table, but I don't know how to make pivot table from this kind of datasource.
PS. VBA is the last option.
You can download example file here: https://www.mediafire.com/?t21s8ngn9mlme2d
I will post this answer with the disclaimer that it is entirely dependent on the size of the data set. That turning on and off the auto calculate is the best way, but your question doesn't let me do that, so keep reading.
Your question made me curious, so I gave it a try and timed it. I essentially set up two columns of over 100,000 rand numbers choosing from 1-1000 and then tried to do a countif on the two columns if they were equal. I made a macro that I can run that turns off the autocalculate, inserts the start time, calculates, and then inserts the finish time. I highlighted in yellow the time difference.
First I tried your way, two criteria, countifs:
Then I tried to combine (concatenate) the two columns to see if I could make it easier by only having one countif criteria and data set. It doesn't. see result below:
Finally, realizing what was going on. I decided to make the criteria only match the FIRST value in the number to look for. I was essentially reducing the number of characters to check per cell. This had a positive result. See below:
Therefore my suggestion is to limit the length of the words you are comparing in anyway possible. You are mostly looking at dates, so you might have to get creative, but this seems to be the best way possible without going to manual calculation.
I have worked with Excel sheets of a similar size. Especially if you are using the data on a regular basis, I would heartily recommend switching to a proper database SQL based, Access, or whatever fits your purpose. I does wonders for the speed and also you won't run into the size limits of Excel. :-)
You can import the data you have now fairly easy.
I am happy as a clam with my postgresql db.

Speed up Excel formulas

Is there anyway to rewrite this formula to speed up Excel processing?
My spreadsheet has become terribly slow!
=SUMPRODUCT((Sheet1!J:J=Sheet2!A2)*(Sheet1!G:G="Windows XP")*(Sheet1!B:B="Desktop")*(Sheet1!M:M<>"Refresh >=Q2 2014")*(Sheet1!M:M<>"Release 2013")*(Sheet1!M:M<>"Release 2014")*(Sheet1!M:M<>"N/A NVM")*(Sheet1!M:M="No")*(Sheet1!M:M="N/A"))
As written your formula will always return zero because the last two conditions are mutually exclusive - did you mean those last two to be <> rather than = (or did you refer to the wrong columns)?
In any case I can see from the use of whole columns that you must be using Excel 2007 or later (your current formula would give an error otherwise) in which case COUNTIFS will be much faster, i.e. assuming the last two conditions should be adjusted as I suggested try this version:
=COUNTIFS(Sheet1!J:J,Sheet2!A2,Sheet1!G:G,"Windows XP",Sheet1!B:B,"Desktop",Sheet1!M:M,"<>Refresh >=Q2 2014",Sheet1!M:M,"<>Release 2013",Sheet1!M:M,"<>Release 2014",Sheet1!M:M,"<>N/A NVM",Sheet1!M:M,"<>No",Sheet1!M:M,"<>N/A")
If you do need to use SUMPRODUCT then restrict the ranges rather than using whole columns
I don't think that there is really a chance to speed up Excel Formula. But you could save your File in binary code (.xlsb). Losing some compatibility but improving performance.
You also could stop auto (re-)calculations of ther Formula, then you have to manualy refresh. This will let you edit the file much smoother.

Resources