SumIf with lots of data in Excel - excel

what I'm trying to do is a simple sumif for about 200k lines of data which causes problems for excel.
Basically my list looks like this
List of Companies Dummy1 Dummy2
Company A 0 1
Company A 0 1
Company A 1 1
Company B 1 1
Company B 0 1
Company B 0 1
....
and if there is a 1 in any row of column B for a specific company I need to plug a 1 in each row of column C for this company.
So Dummy 2 is basically the sum over Dummy 1 for all entries for a specific company.
The data is already sorted by column A.
Anyway, Excel goes crazy.
Is it just plain stupid what I'm doing here because I'm generating too many comparative operations?
What would be an easy way to accomplish what I'm trying to do here?

According to your sample data, filling C2:C200000 with,
=SUMIF(A:A, A2, B:B)
... will be performing 3× as many SUMIF calculations as is necessary. An IF formula only processes the part that is TRUE or FALSE depending on how the criteria resolves so changing the formula to something like the following,
=IF(A2<>A1, SUMIF(A:A, A2, B:B), C1)
... should drastically reduce the processing in a calculation cycle. The degree of improvement will depend upon how many duplicate company values are in column A and whether column A has been sorted to keep the company names together. The smaller the number of unique companies, the more improvement you will see. In short, unless the company changes from row to row, the SUMIF is not calculated.
Sample Calculation Timing Environment:
Excel 2010 64-bit (14.0.7015.1000) running under Windows 7 Pro on a business class i5 laptop w/8Gbs DRAM.
XLSB; Calculation Manual; Recalculate workbook before saving OFF; Save AutoRecovery information OFF
Test 1: 26 companies (Company A to Company Z), each with ~7683 entries in column A, sorted. Column B random 0's and 1's reverted to values. C2:C200000 cleared, worksheet calculated then formula filled in C2:C200000 and new calculation cycle timed to completion.
formula calculation cycle (hh:mm:ss)
=SUMIF(A:A, A2, B:B) 00:21:44
=IF(A2<>A1, SUMIF(A:A, A2, B:B), C1) 00:00:09
Test 2: 5000 companies (Company 0001 to Company 5000), each with ~40 entries in column A, sorted. Column B random 0's and 1's reverted to values. C2:C200000 cleared, worksheet calculated then formula filled in C2:C200000 and new calculation cycle timed to completion.
formula calculation cycle (hh:mm:ss)
=SUMIF(A:A, A2, B:B) 00:22:10
=IF(A2<>A1, SUMIF(A:A, A2, B:B), C1) 00:00:37
      
You cannot magically break the physical laws of time and space but sometimes you can fool them. This solution may not be perfect but perhaps it is something that you can live with.
On a related note, large(r) worksheets benefit from having their formulas reverted to result values once calculations have been made if those results are not likely to change on a regular basis. While Copy, Paste Special, Values is a reasonably quick method of accomplishing this, selecting a large number of cells containing formulas and running the following sub macro is lightning quick.
sub sel_2_Value
application.enableevents = false
selection = selection.value
application.enableevents = true
end sub
If locale differences are not important (currency, dates, etc) then selection = selection.value2 is even better.
The only thing that will slow down the above operation is formulas with dependents within the range being reverted to values as they will be recalculated.

I think the better way to solve this is by ussing pivot table, you can sum Dummy1 by company and get the data as summary.
Here is an examples:
http://www.excel-easy.com/data-analysis/pivot-tables.html
enter link description here
I hope this help

Related

How to use VLOOKUP function in MS Excel

I am very very new to Excel
I have two sheets
Sheet 1
Country PMU Cluster
A Asia Mercury
B Australia Venus
C North America Jupiter
All the countries and continents are unique here
In sheet 2
I have
CountryCode Country PMU Cluster
123 A
234 A
453 B
235 C
1 country can have multiple codes
I have to take the PMU and Cluster and merge it with Sheet 2 , sheet 2 will have an additional column of Country Code.
Any help is very much apprciated.
Replacing my answer per your edits.
I'm just doing this on a single sheet but you can easily adapt by pointing to your other sheet for your lookup array.
Here is the formula for cell G2:
==VLOOKUP($F2,$A:$C,2,FALSE)
Here is the formula for cell H2:
=VLOOKUP($F2,$A:$C,3,FALSE)
Drag your formulas down and you're done. Vlookup formulas are very useful I recommend looking up how they work as someone else could better explain than I. Basically, you are looking up a value (column F) in an array (columns A,B,C) and returning a column index (B = 2, C = 3, etc) for a match. Lastly, you are looking for an approximate (TRUE) or exact (FALSE) match. Almost always use FALSE.
Also, look up cell references and how to lock them (ie, how $ signs rules vary). That way you can easily drag formulas across and keep your lookup value and array the same.

Excel Complex Conditional

Ok so I did this in LibreOffice but now I have to duplicate it to excel for my Pay Roll department since they use excel. So I am having to figure out how to convert the formulas to Excel. This is only 1 of two totaling formulas that did not convert when I saved it as Excel format.
I have the following sheet called DailyReport
I am currently calculating Column M with =SUMPRODUCT(A2:A200=A2, G2:G200)
Then on a secondary sheet I have the following second sheet WeeklyReport
Now what I want to do is if WeeklyReport Column A2 == DailyReport Column A then take the date in DailyReport Column B and test it to fall in the date range in WeeklyReport Column B and Column C with =IF(AND(DailyReport.B2>=B2,DailyReport.B2<=C2),1, 0) and if that is true add the Total Daily Hours to the total in WeeklyReports Column D from DailyReports Column M
I hope this is clear enough if not please let me know what else I can do to make my question more clear.
Thanks in advance!
So, to me it sounds like:
You want a sum of all hours, for a specific employee (defined by the A column value weekly report), in between the dates specified (also defined by weekly report, b & c column) - and you want the end result to be in WeeklyReport column D and all of it to relate to the same row as the result?
sumproduct will do the trick. I am renaming your sheets to DR and WR for my sanity's sake.
=sumproduct((DR!G$2:G$200)*(DR!A$2:A$200=A2)*(DR!B$2:B$200>=B2)*(DR!B$2:B$200<C2))
Now, if you want a new daily report sheet every day it gets a bit trickier to do with formulas alone, you should then have a macro to store the "current" value and add the "new" value, or for simplicity's sake create more columns (one for each working day) and duplicate the formula to all daily columns, or have as many named dailyreports as there are working days in a week and increase the formula to check multiple sheets. I would add columns - least amount of work and the dumbest solution often proves the most resilient.
Did that help in any way?

Excel: Obtain a column by sorting anotr one values

I need to automatically obtain a sorted column of values from another given column values, like in the sample:
I have I need A unchanged, and also B obtained from A
A A B
-----------------
1 1 0
0 0 0
3 3 1
8 8 3
0 0 8
I mean if the values from A changes, the B should change accordignly...
Is that possible in MS Excel?
Here a sandbox and sample:
http://1drv.ms/1SkqMhS
If you put The formula =SMALL(A:A,ROW()) in B1 and copy down then the cells in B will be linked to the cells in A in such a way that the numbers in B will be the numbers in A in sorted order. This won't be efficient for larger ranges but will work fine for small to medium size ranges.
If you want the numbers to start in a lower row, say B2 because you have a header in B1, adjust ROW() to something like ROW()-1.
A word of warning: Use of ROW() can make a spreadsheet somewhat fragile in that formulas that involve it can change their meaning if rows are inserted or deleted or the block containing the formula is moved to somewhere else. Rather than using ROW(), there is something to be said for adding a helper column which numbers the data in A (which would then be in e.g. B) and referring to these numbers rather than small. For example, in:
If I put the formula
=SMALL($B$2:$B$5,A2)
In C1 and copy down, it works as intended. In response to a question you raised in the comments, I added still another column which gives an index where the corresponding value occurs. To do this I wrote in D2 (then copied) the formula
=MATCH(C2,$B$2:$B$5,0)
Of course. Highlight your range and in the Data tab, click "Sort", then you can choose how you want to sort your data:
If column B has information that is to be used with Column A (like next to A1 is "Car"), and you want to sort the whole table, based on Column A, then just select Columns A and B, then sort by column A.
Found the answer, thanks to John Coleman !
Just some minor details like cell value fixing (with $, like A$2)and the -1+ROW adjustment for the 1 header row!

Index/Match Multiple Criteria, Perform Calculation for Each Match

I have two columns in a table that are known as S and D, where S is a date, and D is a duration.
E.g.:
'S' Column
January
February
March
April
'D' Column
60
30
45
30
On a separate sheet, imagine that Row 1 contains a sequence of dates (variable, depends on user menu selection).
Row 2 requires the following calculation:
[(x - s1)/d1 + (x-s2)/d2 + ... + (x-sn)/dn] / n
...where "x" is any date along Row 1.
The calculation would only be done when multiple criteria are matched.
My initial attempt involved creating a separate table, but I think this can be done in a one-cell formula in Row 2. I don't think a sum(index(match)) type would work here considering d1, d2, ..., dn are denominators with different values.
Here is an example attempt:
=SUM((SelectedDate-INDEX(Table[StartDates],MATCH(Criteria1&Criteria2,Range1&Range2,0))/INDEX(Table[Durations],MATCH(Criteria1&Criteria2,Range1&Range2,0))))
It may be important to note that I am able to do this in a two-step fashion. First, I create a table that does the calculation on each row. Then, I reference the table. It would be nice if I can eliminate the need of a "calculation table" in favour of an array-type formula.
I took a guess before most of the comments and suggest the following mainly in case it provides ay ideas or is of help in specifying the requirement more closely. This is "two-step" because the average formula is separate from that for each column, so I appreciate may well be not what is required.
Assuming S and D are labelled data in Sheet2 ColumnsA:B.
Assuming Row1 data rows are labelled in ColumnA.
Assuming dates are not strings (eg January is actually Jan 1) and the differences are in days and always positive, etc.
=(B$1-CHOOSE(COLUMN(),Sheet2!$A1,Sheet2!$A2,Sheet2!$A3,Sheet2!$A4))/CHOOSE(COLUMN(),Sheet2!$B1,Sheet2!$B2,Sheet2!$B3,Sheet2!$B4)
in Row2 to suit and if four data columns, in F2 and copied down to suit :
=AVERAGE(B2:E2)

If column A matches column C input column D into column B

I have a sales tracking sheet where column A contains the profit margin of a particular job (i.e.33%), Column C is the profit margin range(i.e. 31-40%), and Column D is the corresponding commission to that specific range identified in Column C (i.e. 31-40% = 3% commission).
What I want is a formula that will automatically pull the Commission from Column D into Column B when I enter the profit margin of that particular job in Column A.
Any ideas/does that make sense?
Assuming that the values in column A are formatted as percentage, you could use something like this:
=INDEX(D$1:D$10,MATCH(A1*100,1*LEFT(C$1:C$10,FIND("-",C$1:C$10)-1),1))
And press Ctrl+Shift+Enter after entering the formula instead of Enter alone.
This will return a value from range D$1:D$10 where the value from A1 (multiplied by 100 to remove the decimals) is less than the lower bound of the margin range in range C$1:C$10.
Change the ranges accordingly.
In B1, put:
=IF(A1=C1,D1,0)
You can obviously change row numbers to work as needed.
The IF statement has 3 parts, the condition:
=IF(A1=C1,
Here I'm testing to see if the expression is TRUE or FALSE. I can do anything I want here, as long as it evaluates to either a True or False condition.
Next, we specify the "True" result, and the "false" result, which are, respectively, what happens when those conditions are met. For the TRUE condition, we just want to use the value in cell D:
D1,
For the FALSE condition, I don't know what you want, so I just put in a 0.
0)
Note that all 3 parts of the IF statement are separated by commas - play around, you can do a LOT of different things!
EDIT: Just noticed that column C is a range, while A is a singular value. You're going to need to do something like #Jerry did with parsing out the range string.
I am assuming that columns A and B will be indefinitely long just based on how much data is collected, whereas columns C and D are just a reference table with 10 rows each for the 10 ranges (0 - 10%, 11 - 20%, 21 - 30%, etc.). Is this correct?
As an alternative to storing the profit margin range and corresponding commission in columns C and D as you now do, you could incorporate them directly into an IF statement that you use in column B. For example if 91-100% corresponds to 8% commission, 81-90% is 7% commission and so on, then you could insert this formula:
=IF(A2>90,0.08,IF(A2>80,0.07,IF(A2>70,0.06,IF(A2>60,0.05, ...
The advantage to this compared to using and index-match combination which references numbers extracted from the text ("11-20%"), is that you don't run the risk of losing data when the text ranges are altered in some way. (I.e. userproof.)

Resources