How to copy down a formula that uses arrays? - excel

I have 25000 rows of data and within them, I have 50 unique variables. Therefore, these variables are expressed 500 times (or "scenarios")
My formula:
=IF(K2>$K$3,3,IF(K2=0,0,IF(L1=3,2,IF(COUNT($L1:L$2)>(64-$L$2-COUNTIF($K$3:$K$51,">0")),1,2))))
My formula assigns either a 3, 2, or 1 to each of the 50 variables depending on the scenario, however, it only works for one scenario.
I obviously cannot fill down this formula for all 500 scenarios. However, I need a quick way to apply this formula for all scenarios.
Is there a way to quickly do this or do I need to come up with a better formula?

Here is a formula that changes the cell reference from K3 to K53 and so on, every 50 rows. Also, the reference to K3:K51 will adjust to K53:K101 etc.
Using Index instead of Indirect, this formula is not volatile and should not cause slowness.
=IF(K2>INDEX(K:K,(CEILING(ROW(A1)/50,1)*50)-50+3),3,IF(K2=0,0,IF(L1=3,2,IF(COUNT($L1:L$2)>(64-$L$2-COUNTIF(INDEX(K:K,(CEILING(ROW(A1)/50,1)*50)-50+3):INDEX(K:K,(CEILING(ROW(A1)/50,1)*50)-50+51),">0")),1,2))))

Related

Compare multiple columns as pair-wise for Excel/Google Sheets

I am new to Excel/Google Sheets. I have a difficulty of writing a formula to compare columns as a pair-wise since the formula would be
so big as the day goes.
For example, there're 2 main columns Foo and Bar. I want to find the total number of days that Foo
and Bar are equal so the current formula is =IF(A3 = G3, 1, 0)+IF(B3 = H3, 1, 0)+IF(C3 = I3, 1, 0)+...
But this is kind of tedious because there're ~40 days to compare with. Are there any other alternatives
to write a formula in efficient way? Either Google-App-Scripts or Excel Formula is appreciated.
Cheers!
Give a try on below google-sheet formula. Adjust ranges as you need.
=ArrayFormula(SUM(IF(A3:E3=G3:K3,1,0)))
Assuming that you're needing to get such a total for each row and not merely a single row, try this:
=ArrayFormula(IF(A3:A="",,MMULT(IF(A3:F=G3:L,1,0),SEQUENCE(COLUMNS(A:F),1,1,0))))
Of course you will need to adjust the three ranges to match your own FOO and BAR ranges.
This one formula will produce all results for all rows.
The MMULT function is tricky to explain to those as yet unfamiliar with it. But it's a powerful tool. I'll add a picture I created that may best explain what it does:
By making the second matrix a simple SEQUENCE of 1s as long as the other matrix is wide, we wind up multiplying everything by 1 before adding together. And since anything multiplied by 1 is itself, this combination serves only to do a row-by-row add.
Things to keep in mind with MMULT:
1.) Every cell in every matrix must be a number or it will produce an error.
2.) As in the above formula, there are ways to use either/or conditions to turn every cell in a matrix into a number.

Sumproduct with 5 criterias from another workbook takes too long to run

I'm trying to lookup 5 different criteria from another file. The formula I'm using is below:
=IF(SUMPRODUCT(('[WorkBook]Sheet'!$A:$A=$A9),
('[WorkBook]Sheet'!$H:$H=$P9),
('[WorkBook]Sheet'!$D:$D=S$5),
(('[WorkBook]Sheet'!$E:$E="String1")+('[WorkBook]Sheet'!$E:$E="String2")) )>=1,TRUE,FALSE)
I could get the result in the first few cells. However, when I copy paste (or drag) the formula to the table bottom, it takes forever to calculate using 4 processors. Eventually, excel crashed.
Is it possible there's too many criteria used, and they are cross-referencing between 2 files, and on top of that, I nested it with IF function, and therefore the formula is too heavy to run on multiple cells (about 150k cells)? If so, can anyone suggest a better formula?
That SUMPRODUCT has nothing but booleans making it a COUNTIFS. The OR condition is handled with SUM(COUNTIFS(...)) and a hard-coded string array.
=AND(SUM(COUNTIFS('[WorkBook]Sheet'!$A:$A, $A9,
'[WorkBook]Sheet'!$H:$H, $P9,
'[WorkBook]Sheet'!$D:$D, S$5,
'[WorkBook]Sheet'!$E:$E, {"String1", "String2"})))
COUNTIFS can use full column references without calculation lag penalty while SUMPRODUCT is penalized greatly.
The wrapping AND does nothing more than convert a number to TRUE/FALSE.
Here is your original SUMPRODUCT with all ranges cut down to the row containing the last date in column H.
=IF(SUMPRODUCT(('[WorkBook]Sheet'!$a$2:index('[WorkBook]Sheet'!$a:$a, match(1e99, '[WorkBook]Sheet'!$h:$h))=$A9),
('[WorkBook]Sheet'!$h$2:index('[WorkBook]Sheet'!$h:$h, match(1e99, '[WorkBook]Sheet'!$h:$h))=$P9),
('[WorkBook]Sheet'!$d$2:index('[WorkBook]Sheet'!$d:$d, match(1e99, '[WorkBook]Sheet'!$h:$h))=S$5),
(('[WorkBook]Sheet'!$e$2:index('[WorkBook]Sheet'!$e:$e, match(1e99, '[WorkBook]Sheet'!$h:$h))="String1")+
('[WorkBook]Sheet'!$e$2:index('[WorkBook]Sheet'!$e:$e, match(1e99, '[WorkBook]Sheet'!$h:$h))="String2")))>=1, true, false)
Yes, that may look complicated but in fact it does much less work than the full column reference model.
Referencing across files is something I avoid like the plague. Is there a concrete reason why you can't simply use say a PivotTable in the same workbook as the data exists, and just filter the PivotTable to show what you need to show?
Much much simpler. Much much safer. Much much faster. To see this and other alternatives explained in detail, check out my answer at Optimizing Excel formulas - SUMPRODUCT vs SUMIFS/COUNTIFS

Sumproduct or Countif on a 2D matrix

I'm working on data from a population of people with allergies. Each person has a unique ExceptionID, and each allergen has a unique AllergenID (451 in total).
I have a data table with 2 columns (ExceptionID and AllergenID), where each person's allergies are listed row by row. This means that the ExceptionID column has repeated values for people with multiple allergies, and the AllergenID column has repeated values for the different people who have that allergy.
I am trying to count how many times each pair of allergies is present in this population (e.g. Allergen#107 & Allergen#108, Allergen#107 & Allergen#109,etc). To keep it simple I've created a matrix of 451 rows X 451 columns, representing every pair (twice actually because A/B and B/A are equivalent).
I somehow need to use the row name (allergenID) to lookup the ExceptionID in my data table, and count the cases where that matches the ExceptionIDs from the column name (also AllergenID). I have no problem using Vlookup or Index/Match, but I'm struggling with the correct combination of a lookup and Sumproduct or Countif formula.
Any help is greatly appreciated!
Mike
PS I'm using Excel 2016 if that changes anything.
-=UPDATE=-
So the methods suggested by Dirk and MacroMarc both worked, though I couldn't apply the latter to my full data set (17,000+ rows) because it was taking a long time.
I've since decided to turn this into a VBA macro because we now want to see the counts of triplets instead of pairs.
With the 2 columns you start with, it is as good as impossible... You would need to check every ExceptionID to have 2 different specific AllergenID. Better use a helper-table with ExceptionID as rows and AllergenID as columns (or the opposite... whatever you like). The helper table needs a formula like:
=COUNTIFS($A:$A,$D2,$B:$B,E$1)
Which then can be auto-filled. (The ranges are from my example, you need to change them to your needs).
With this helper-matrix you can easily go for your bigger matrix like this:
=COUNTIFS(E:E,1,INDEX($E:$G,,MATCH($I2,$E$1:$G$1,0)),1)
Again, you can auto-fill with this formula, but you need to change it, so it fits your needs.
Because the columns have the same ID2 (would be your AllergenID), there is no need to lookup them because E:E changes automatically with the auto-fill.
Most important part of the formulas are the $ which should not be messed up, or you can not auto-fill it.
Picture of my self-made example (formulas are from the upper left cell in each table):
If you still have any questions, just ask :)
It can be done straight from your original set-up with array formulas:
Please note that array formulas MUST be entered with Ctrl-Shift-Enter, before copying across and down:
In the example pic, I have NAMED the data ranges $A$2:$A$21 as 'People' and $B$2:$B$21 as 'Allergens' to make it a nicer set-up. You can see in the formula bar how that looks as a formula. However you could use the standard references like this in your first matrix cell:
EDIT: silly me, N function is not needed to turn the booleans into 1's and 0's, since multiplying booleans will do the trick. Below formula works...
SUM(IF(MATCH($A$2:$A$21,$A$2:$A$21,0)=ROW($A$2:$A$21)-1, NOT(ISERROR(MATCH($A$2:$A$21&$E2,$A$2:$A$21&$B$2:$B$21,0)))*NOT(ISERROR(MATCH($A$2:$A$21&F$1, $A$2:$A$21&$B$2:$B$21,0))), 0))
Then copy from F2 across and down. It can be perhaps improved in technique with sumproduct or whatever, but it's just a rough example of the technique....

Can you make Excel formula's iterate and count loops

I'm almost certain the answer is 'no', but I thought I'd ask the people here just in case there is some magic formula I haven't yet found...
So lets say I have a table of values;
I've also got another cell somewhere (lets just say that this cell is A1 for my example), which has another number in it. This number is actually a percentage (lets call it 20% in this example)
So I have a table that looks like this
A B C D E
1 | 20
2 |
3 | Number
4 | 23
5 | 68
6 | 145
7 | 8
The simplest way to explain it is to say I'm going to reduce each number by the percentage given in cell A1 (20% in my example). I'm then going to repeat (so reduce the new values by another 20%) until the answer is zero (rounded to 2dp in this example, though I may need more or less rounding later).
I'm trying to work out how many times the formula must run to get the values to 0
Like I said, I'm almost certain this cannot be done outside of scripting (VBA etc.), since to my knowledge excel formulas can't loop and count??
I'm also sure that this is actually quite an easy script (though I don't know VBA at all, so it wouldn't so easy for me to do). I'm certain I could do this in other languages. But then that won't really help, as I'm trying to do this entirely in excel.
I've done it very inelegantly using a formula on another sheet copied down (A1 takes the percentage off the original value, A2 takes the percentage from the value of A1 etc.), and I can then just see which row number the value hits 0, but I was just curious to know if there was a more efficient solution;
a) without using VBA
b) using VBA to do so properly (not important, but I'd like to see how its done if anyone can)
Thanks
My solution would be directly calculate the answer directly.
You are looking for something that gets to 0 rounded to 2 decimals so <0.0049 your formula then becomes
0.0049 = A4 *(1-A1)^n
ln(0.0049) = ln(A4)+ n*ln(1-A1)
n = (ln(0.0049)-ln(a4))/ln(1-A1)
Since you want a whole number of loops you would take the next number higher so final equation is
=ceiling((ln(0.0049)-ln(a4))/ln(1-A1),1)
(1-A1) refers to how much is left after once iteration.
You said A1 is a percentage. If you put in 20% in excel it is treated as 0.2.
The original formula should be
0.0049 = A4*(0.8)^n
Excel uses some kind of "linear calculation model" in which a formula in a cell determines the value of the cell based on the value of other cells, using functions and references. Excel maintains a dependency graph and whenever a cell changes value, it recalculates all dependents, recursively (so dependents of dependents get recalculated too).
The only iteration Excel provides in this model is within functions of the formula.
I believe it is possible in Excel to define a user function that you can use in formulas in your cells.Any iterations you need can be performed by this function (in VBA).
(P.s.: the other iteration Excel provides is in the circular reference, when the dependents of a cell after zero or more steps comes back to the cell as a dependent of itself. Excel will flag this but I believe it is possibe to allow this and instruct Excel how many iterations to perform.)

Find Minimum Value Based on 2 Criteria (Excel 2013)

Looking to find the max value in a column based on two sets of criteria
So the logic would be: Find the minimum value in column M, where the value in column A matches column N, and the value in Column Y is less than 318.
I've tried using an array formula like this but it doesn't seem to be working/is to memory heavy to run:
=MIN(IF(AND(N:N=A2,Y:Y<=318),M:M))
is there a simpler way? or perhaps a UDF that could work?
Thank you for your help!
You can't use AND in these type of formulas because it only returns a single value rather than the required array.
Here are three possible working versions:
1.) Use * to simulate AND
=MIN(IF((N:N=A2)*(Y:Y<=318),M:M))
confirmed with CTRL+SHIFT+ENTER
2.) Use multiple nested IFs
=MIN(IF(N:N=A2,IF(Y:Y<=318,M:M)))
confirmed with CTRL+SHIFT+ENTER
3.) Use AGGREGATE function
=AGGREGATE(15,6,M:M/(N:N=A2)/(Y:Y<=318),1)
The advantages of this approach are that you don't need "array entry", and it can ignore any errors in the data
Either way it's best to reduce the ranges sizes if you can because it might be slow with whole columns
Give this a try and adjust ranges to suit. Try not to use whole column references:
=SMALL(INDEX(($N$2:$N$101=A2)*($Y$2:$Y$101<=318)*$M$2:$M$101,),1+ROWS($M$2:$M$101)-COUNTIFS($N$2:$N$101,A2,$Y$2:$Y$101,"<=318"))
If you are using the whole column to pick up new data as it is added, consider using Dynamic Named Ranges instead
When things get this complex, I'll usually break it down and setup smaller/simpler formulas in seperate columns.
In other words, you have data in columns A through Y ?
So let's create a formula in column AA:
1) identify when value in Col A matches col N, and value in col Y < 318
=and(A1=N1,Y1<318)
2) copy AA1 to all the rows of your data.
3) now we have a condition to work off .. since there is a SUMIF and COUNTIF, but no MINIF .. we'll have to build that ourselves. first the IF:
in column AB1:
=if(AA1,M1,"")
copy that down to all your data.
finally, do your min:
=MIN(AB:AB)
Should give you your answer.
You could probably splice the first two together, but again, building a complex formula like this, build it simply, first, ;)

Resources