How to optimize COUNTIFS with very large data - excel

I would like to create a report that look like this picture below.
My data has around 500,000 cells (it will continue to grow larger)
Right now, I'm using countifs function from excel but it takes a very long time to calculate. (cannot turnoff automatic calculate)
The main value is collected as date and the range of date is about 3 years, so I have to put a lot of formula to cover all range of value.
result
The picture below is the datasource the top one cannot be changed. , while the bottom is the one I created by myself (can change). I use weeknum to change date to week number.
data
Are there any better formula or any ways to make this file faster? Every kinds of suggestions are welcome!
I was thinking about using Pivot Table, but I don't know how to make pivot table from this kind of datasource.
PS. VBA is the last option.
You can download example file here: https://www.mediafire.com/?t21s8ngn9mlme2d

I will post this answer with the disclaimer that it is entirely dependent on the size of the data set. That turning on and off the auto calculate is the best way, but your question doesn't let me do that, so keep reading.
Your question made me curious, so I gave it a try and timed it. I essentially set up two columns of over 100,000 rand numbers choosing from 1-1000 and then tried to do a countif on the two columns if they were equal. I made a macro that I can run that turns off the autocalculate, inserts the start time, calculates, and then inserts the finish time. I highlighted in yellow the time difference.
First I tried your way, two criteria, countifs:
Then I tried to combine (concatenate) the two columns to see if I could make it easier by only having one countif criteria and data set. It doesn't. see result below:
Finally, realizing what was going on. I decided to make the criteria only match the FIRST value in the number to look for. I was essentially reducing the number of characters to check per cell. This had a positive result. See below:
Therefore my suggestion is to limit the length of the words you are comparing in anyway possible. You are mostly looking at dates, so you might have to get creative, but this seems to be the best way possible without going to manual calculation.

I have worked with Excel sheets of a similar size. Especially if you are using the data on a regular basis, I would heartily recommend switching to a proper database SQL based, Access, or whatever fits your purpose. I does wonders for the speed and also you won't run into the size limits of Excel. :-)
You can import the data you have now fairly easy.
I am happy as a clam with my postgresql db.

Related

sum up days using only existing Excel formulas

I would like to sum up date periods and sum the days per item.
The input data will grow over time and new item categories can appear, so the items (number of rows) that show in the expected report can not be "hardcoded".
The input parameter is the from and to date that determines the period that must be considered. You can imagine this as a moving date window on the input data grid.
I am a Java programmer and I am sure that I can write a proper SQL that groups and sums the data and generate the result. And I can write a Java program too, that does the job, but I really want to do this calculation from Excel.
Is there any way to generate the report by using only a combination of existing MS Excel formulas without writing any Visual Basic code (macro)?
If yes, then could you please put me in the right direction and tell me which formulas I can use? Then I can figure out how to use the formulas.
I hope that this helps to understand better what I would like to have:
Try:
Formula in F3:
=SUM(COUNTIFS(C:C,E3,A:A,"<="&SEQUENCE(H$2-F$2,,F$2),B:B,">="&SEQUENCE(H$2-F$2,,F$2)))
Note that range references that take whole columns will take long to process all data. The above will work even with overlapping dates.

Excel - Return name from list based on multiple criteria

This is my 1st post here (and not allowed to paste images). I have been trying to solve this issue for a couple of days with no luck. I'm working on an Excel spreadsheet for a game and cannot return a name based on multiple criteria. See below:
Table
I am trying to return, for example, the name of the Guardian with the highest amount of games played.
I've tried Index/match/sumproduct combinations but I can't figure this one out. Can you help me?
=index(Data!$A:$H,match((1,Data!B:B=Overview!B12)*(Data!C:C=Overview!B23)),0),1)
=MAX(IF(Data!B:B=Overview!B12,Data!C:C))
I'm thinking if I could join these two formulas together I might be able to make it work.
Try this array formula:
=INDEX(Data!$A1:$A99,MATCH(MAX(Data!$C$1:$C$99*(Data!$B$1:$B$99=B12)),
Data!$C$1:$C$99*(Data!$B$1:$B$99=B12),0))
CtrlShiftEnter
Notice that we should avoid using "full columns" in array formulas because they would introduce the computation of huge arrays and hence would slow down the formulas. I limit it here to 99 rows, use a limit that is big enough to span your data.

MS Excel IF statement confusion

I'm trying to make a spreadsheet for some calendars that I'm selling. I have a pricing scheme which depends on how many calendars a customer buys. I want to be able to keep track of sales, but I don't want to have to input the price per unit for each sale. I set up an IF statement that seems to work except for the last variable. Here's a screenshot:
As you can see, it works all the way to the last one. After the number reaches 11, the price per unit should drop to 6, but it doesn't! I know it must be a simple fix, but I don't know much about IF statement, so I'm stuck. Please help!
If you set up your price table similar to my set-up below, you can use a VLOOKUP formula to make it more flexible.
The formula in H10 is:
=VLOOKUP(G10,$B$1:$C$5,2,TRUE)
Enter and drag down.
The benefit of this approach is that you can change the lower bound count on the left and you'll get an adjustment without having to update the formula. Try changing the 2 in Count to 3 and you'll see the adjustments right away.
An added benefit is you can add some more values to the table for further pricing brackets.
If you must use an IF statement, try:
=IF(G10=1,9,IF(G10<=5,8,IF(AND(G10>5, G10<11),7,IF(G10>=11,6,IF(G10="","")))))
The problem with yours is it will never read the if(G10>=11 part because it has already evaluated if(G10>5
If it should have no output when G10 is blank, use the following variant:
=IF(ISBLANK(G10),"",IF(G10=1,9,IF(G10<=5,8,IF(AND(G10>5, G10<11),7,IF(G10>=11,6,IF(G10="",""))))))
I agree with #Nanashi that putting the price thresholds in a separate table and using vlookup is a better solution. That would make it possible to change the price points without having to edit multiple formulae.
But to just get your formula working, try this: =if(g10="","",if(g10=1,9,if(g1<=6,8,if(g10<=10,7,6))))
Notice that:
You don't need (and probably don't want) quotes around your numeric
values
The order of tests is important. In your original, the test for g10>=11 is never reached because it is in the else part of the g10>5 function.
A properly constructed table of quantities and pricing coupled with a VLOOKUP formula would be the best way to go and eases future pricing changes. But for the sake of diversity, your straightforward pricing structure could also be handled by the following:
=(6+(G10<11)+(G10<6)+(G10<2))*(G10>0)
In the formula, "G10>5" should be "G10<=10". Otherwise, when it gets to 11, it still meets the G10>5 criteria.

Improve Vlookup on large file

I´ve a very large file that I reduced as much as possible to 3 columns and 80k rows.
I need to perform a vlookup in order to bring values from column 1 or 2 match some other spreadsheets values.
The thing is Excel doesn´t seem to support such large searches, and it stops responding - the computer has 4GB and a Quad core, and not much more running at the same time.
As far as I understand, as I´m not looking for exact matches, I should not use match-index.
The only thing I thouhgt could help but not sure about that, is dividing the file in 2-4, and asking Excel many parallel searches instead of a big one. Could this work?
What else should I try?
Thanks!!!
Sort your data and use True as the 4th VLOOKUP argument. This makes VLOOKUP use binary search rather than linear search and is lightning fast.
If you need to handle missing data you will need to use the double VLOOKUP trick, see
http://fastexcel.wordpress.com/2012/03/29/vlookup-tricks-why-2-vlookups-are-better-than-1-vlookup/

How can this lookup (find the last relevant item) be improved?

One of the reports that wastes a bunch of my time at work is the Roster. It's a multi-site, multi-contract listing of every employee currently assigned to a specific client. Currently, it has a little over 6,000 lines by 20-something columns, indexed against 3 different datasets. Not the largest mess in the world, but still a pain. And it's almost all in excel, because I somehow don't have a business case for Access.
But one part of this monster stands apart. One tab per site Site Totals, listing off every time any agent has gone through training. A second tab (again, one per site) Site Data displaying only the most recent training class, and the credentials they had during that class.
That second tab is driven by variations of this array formula - Last_Row is a named range on another tab, and column A is a pivot of the UID column on Site Totals. I've broken it apart for readability:
=IF(INDEX('Site Totals'!B:B,LARGE(($A2=INDIRECT("'Site Totals'!$A$1:$A$"&Last_Row))*
(INDIRECT("'Site Totals'!B1:B"&Last_Row)<>"")*
ROW(INDIRECT("'Site Totals'!$A$1:$A$"&Last_Row)),1))="Trainer",
"",
INDEX('Site Totals'!B:B,LARGE(($A2=INDIRECT("'Site Totals'!$A$1:$A$"&Last_Row))*
(INDIRECT("'Site Totals'!B1:B"&Last_Row)<>"")*
ROW(INDIRECT("'Site Totals'!$A$1:$A$"&Last_Row)),1)))
I know what this formula does, but I don't know how to improve it. This formula needs to be changed, because it currently is on the order of 500 Million calculations (I'm not allowed to delete historical data), and it takes me 3 hours to calculate the workbook ... if it doesn't crash Excel first.
I'm open to VBA and / or custom functions, but would prefer to have native Excel functions. I'm not able to install anything, so any solution must be native Excel, and Must be compatible to Excel 2007.
If your source is a pivot table, try is the GETPIVOTDATA function. You might be able to accomplish what you want without INDIRECT and INDEX.
What i have understood is that every person has/has not attended a training and you want to retrive the name of that training, in case he has not, you want a blank space in the cell. If this description is correct you can try this formua, press ctrl+shift+enter to execute.
=IFERROR(INDEX('Site Totals'!B$1:B$12,MATCH(A2&"Trainer",'Site Totals'!A$1:A$12&'Site Totals'!B$1:B$12)),"")
Here A2 contians the name of the person. I can be more precise with this formula if you can provide some sample data butI would recommend to not to use entire B & Columns in Site Total workssheete as this will definately slow down computing process, instead you can use B1:B8000 or smaller range, to speed up process. Hope that helps.

Resources