Context: I'm mapping some excel sheets into web backend code.
Circular reference works well and super fast within excel, we currently do 1000 iterations in excel for each place of circular reference, and each recalc is practically instant. But when converted to backend code, it's not as fast, so I'm trying to collapse the circularity into formulae.
I was able to collapse some of the circular references, but here's a tricky one. It essentially boils down to this:
subtotal1 = parameter1 * total
subtotal2 = parameter2 * total
subtotal3 = parameter3 * total
total = subtotal1 + subtotal2 + subtotal3
Each subtotal depends on the total and vice versa.
If you do algebraic transformations you'll realize you can never extract the formula for any one argument, because there are over 2 layers of interconnectedness that cannot be unfurled.
Ideally I'd like to it down to a formula like this:
sub1 = <a formula that calculates sub1 directly and does not include sub2, sub3 or total>
How can we collapse this kind of circular references into formulae and avoid doing 1000 iterations in the code?
If your values are in A1:C1 you could use:
=SUM(REDUCE(A1:C1,SEQUENCE(8),LAMBDA(x,y,SUM(x)*x)))
The number mentioned in the sequence is the number of iterations, but this will quickly result in a number too large for Excel.
I used 1, 2, 3 for this and above SEQUENCE(8) results in #NUM! (at least using the mobile app version of Excel).
What this formula does is start with the values in A1:C1, sums these values and multiplies it with it's individual values, creating an array of 3 numbers (subtotal1-3). The the last calculated value (x) is the new start point for the same calculation sum of x * x. This repeats untill the sequence ends.
To make visible what it does you can use:
=REDUCE(A1:C1,SEQUENCE(8),LAMBDA(x,y,VSTACK(x,SUM(TAKE(x,-1))*TAKE(x,-1))))
Which will spill the arrays (starting at the start value, then the iterations, without showing the summed array value).
First mentioned formula does the same without stacking and it's wrapped in sum to get the total.
Related
I would like the average of Column B based on two criteria. That it happened last year and a text criteria from another column. Average by year In the example I have a Year column for test purposes but I don't want to add it to all the data sheets.
=AVERAGEIFS(Table1[Unit], Table1[Date], "="&YEAR(TODAY())-1, Table1[Text], "Up")
throws a DIV/0 error.
I believe I need to define the Date range by year.. like (YEAR(Table1[Date]) but it doesn't work.
=AVERAGEIFS(Table1[Unit], (YEAR(Table1[Date]), "="&YEAR(TODAY())-1, Table1[Text], "Up")
I can get an IF statement to work on a single cell but is there are way to get this to work in a column?
Scott
You can't use formula when defining range so you either have to use helper column or something like this:
=AVERAGEIFS(Table1[Unit],Table1[Date],">="&(DATE(YEAR(TODAY())-1,1,1)),Table1[Date],"<="&(DATE(YEAR(TODAY())-1,12,31)),Table1[Text],"Up")
It checks if date is less than 2022/12/31 (DATE(YEAR(TODAY())-1,12,31))and more than 2022/01/01 (DATE(YEAR(TODAY())-1,1,1))
Result ((3+7)/2=5):
The SUMPRODUCT() function provides some really useful approaches to problems like this.
In your sample data, the formula =YEAR(Table1[Date])=2023 should return an array {FALSE,FALSE,TRUE,TRUE,FALSE,TRUE}.
In your sample data, the formula =Table1[Text]="Up" should return an array {TRUE,FALSE,FALSE,TRUE,FALSE,FALSE}.
SUMPRODUCT() allows us to do some interesting things with those:
If I apply a math operation to those arrays Excel automatically converts them to binary; {0,0,1,1,0,1} and {1,0,0,1,0,0} respectively. That math function can be doing something like multiplying. Or if I want to use them as-is in a function I can just use "--" to force a math operation, that makes them negative and back to positive. In our example we'll be multiplying the arrays so Excel will take care of it for us.
I can do a binary AND operation on the two arrays by using multiplication. Thus: = (YEAR(Table1[Date])=2023) * (Table1[Text]="Up") is actually {0,0,1,1,0,1} * {1,0,0,1,0,0} which in turn equals {0,0,0,1,0,0}. And the 1's in this result array represent the rows that meet both criteria.
=SUMPRODUCT((YEAR(Table1[Date])=2023) * (Table1[Text]="Up")) will equal the count of rows that met both criteria. Which is only 1 in your example.
=SUMPRODUCT((YEAR(Table1[Date])=2023) * (Table1[Text]="Up" * Table1[Unit])) is going to sum the result of array multiplication of {0,0,0,1,0,0} * {4,4,5,5,4,4}. In your sample data that results in 5.
So the formula =SUMPRODUCT((YEAR(Table1[Date])=2023) * (Table1[Text]="Up") * Table1[Unit]) / SUMPRODUCT((YEAR(Table1[Date])=2023) * (Table1[Text]="Up")) actually is "sum of rows that matched" divided by the "count of rows that matched".
Notice that a conditional array like Table1[Text]="Up" MUST be wrapped in its own parenthesis before it can be added (OR function) or multiplied (AND function) with another array.
You may want to wrap that entire formula in an IFERROR() function so you can display a friendlier message when the count is zero. For instance:
=IFERROR(SUMPRODUCT((YEAR(Table1[Date])=2023) * (Table1[Text]="Up") * Table1[Unit]) / SUMPRODUCT((YEAR(Table1[Date])=2023) * (Table1[Text]="Up")),"None")
You will want to fully debug the formula before nesting it in IFERROR() because the IFERROR() function will conceal other errors than just an occasional divide by zero.
This will all seem very cumbersome the first few times you use this approach but if you encounter these kinds of criteria problems often in excel, I promise that taking the time to understand SUMPRODUCT() on logical arrays will pay long-term dividends. Once understood it gives you a robust capability to use SUM, COUNT, and AVERAGE given multiple criteria, that can be any mix of AND and OR criteria.
My question is that I want to return a list of values in column B in sheet 2 (or in this case NBA Players) that contain the value "PG" in cell A3 in sheet 1, from column A in sheet 2. Not only do I want it to match "PG" but I also want the value to have a salary (Column C) that is between $7100 (Cell B2 in Sheet 1) and $8000 (Cell C2) in Sheet 1). Any help would be appreciated.
you are either going to need to use an array formula or a function that returns array like calculations. I will suggest using the AGGREGATE function. Avoid using full comm/row references within an array formula or a function performing array like calculations or you may wind up bogging down your system with excessive calculations.
The AGGREGATE function is made up a several individual functions. Depending which one you choose, it will perform array operations. I am going to suggest that formula 14. What the following example will do is generate a list of results sorted from smallest to largest that ignores error values, then return the first value from the list. The thing we will list is the row number for a row that matches your ALL your criteria. So the basics of AGGREGATE looks like this:
AGGREGATE(Formula #, Error/hidden handling #, Formula, parameter)
The hardest part of this is coming up with the right formula. In the numerator you put the thing you are looking for. In the denominator you place your TRUE/FALSE condition checks. Separate each condition check with *. * will act as an AND function. The thing that makes this work is that TRUE/FALSE convert to 1/0 when they are sent through a math operation. So anything you do not want is FALSE. and anything divided by FALSE becomes divide by 0 which in turn generates an error. Since AGGREGATE is set to ignore error, only things that meet your condition will exist in the list and since they are being divided by TRUE which is 1, your thing remains unchanged. So the aggregate function is going to start to look like:
AGGREGATE(14,6,ROW(some range)/((Condition 1)*Condition 2)*...*(Condition N)),1)
So as eluded to before, 14 set the AGGREGATE to sort a list in ascending order. 6 tells AGGREGATE to ignore errors, and the 1 tells AGGREGATE to return the first item in its sorted list. If it was 2 instead of 1 it would return the 2nd position. If you ask for a position that is greater than the number of items in the list, there will be an error produced by AGGREGATE which does not get ignored.
So now that there is some understanding of what AGGREGATE does lets see how we can apply this to your data. For starters lets assume your data is in rows 2:100 and row 1 is a header row. You will have to adjust the references to suit your data.
CONDITION 1
LEFT($A$2:$A$100,2)="PG"
Checks to see if the first two characters are PG. based on the data in your screen shot, PG was either to the left of the / or was the only entry. There was also an observation that there was only one / in the cells of column A. If you also need to check if it after the / and with the assumption that it can only be on one side and not both at the same time you could use this alternative for your condition check:
(LEFT($A$2:$A$100,2)="PG")+(RIGHT($A$2:$A$100,2)="PG")
In this case the + is performing the task of an OR function. The caveat mentioned earlier is important because if both sides are TRUE then you wind up with TRUE+TRUE which becomes 1+1 which is 2 and we only want to divide by 1 or 0. Though to counter that you could go with:
MIN((LEFT($A$2:$A$100,2)="PG")+(RIGHT($A$2:$A$100,2)="PG"),1)
CONDITION 2
Check that the salary in C is less than or equal a value 80000.
($C$2:$C$100<=80000)
CONDITION 3
Check that the salary in C is greater than or equal a value 71000.
($C$2:$C$100>=71000)
Now lets put this all together to get a list of row numbers that meet your conditions:
AGGREGATE(14,6,ROW($A$2:$A$100)/MIN((LEFT($A$2:$A$100,2)="PG")+(RIGHT($A$2:$A$100,2)="PG"),1)*($C$2:$C$100<=80000)*($C$2:$C$100>=71000),ROW(A1))
Now provided I did not screw up the bracketing in that formula, you can place that formula in a cell and copy it down until it produces errors. As you copy it down, the only thing that will change is the A1 in ROW(A1). It acts like a counter. 1,2,3 etc. so you will get a list of row numbers that meet your criteria. Now we need to convert those row numbers to names.
To find the names, the INDEX function is your friend here. Because it is not part of an array formula or inside a function performing array like calculations, full column reference can be used. So we take our formula that is generating row numbers and place it inside the INDEX function to give:
INDEX(B:B,Row Number)
INDEX(B:B,AGGREGATE(14,6,ROW($A$2:$A$100)/MIN((LEFT($A$2:$A$100,2)="PG")+(RIGHT($A$2:$A$100,2)="PG"),1)*($C$2:$C$100<=80000)*($C$2:$C$100>=71000),ROW(A1)))
Now if you hate seeing error codes when you have copied down further then results you can place the whole thing inside and IFERROR function to give:
IFERROR(formula,What to display in case of an error)
So for blank entries:
IFERROR(INDEX(B:B,AGGREGATE(14,6,ROW($A$2:$A$100)/MIN((LEFT($A$2:$A$100,2)="PG")+(RIGHT($A$2:$A$100,2)="PG"),1)*($C$2:$C$100<=80000)*($C$2:$C$100>=71000),ROW(A1))),"")
and custom message:
IFERROR(INDEX(B:B,AGGREGATE(14,6,ROW($A$2:$A$100)/MIN((LEFT($A$2:$A$100,2)="PG")+(RIGHT($A$2:$A$100,2)="PG"),1)*($C$2:$C$100<=80000)*($C$2:$C$100>=71000),ROW(A1))),"NOT FOUND")
So now you just need to adjust the references to suit your data. If your data is located on another sheet remember to include the sheet name. A reference to B3:C4 would become:
Sheet1!B3:C4
and if the sheet name has a space in it:
'Space Name'!B3:C4
A colleague has an array of values in "X4:X38". Since these are in a table which may be filtered, she wants to use the subtotal function to sum them - but wants all of the values to be rounded up first.
={SUM(ROUNDUP(X4:X38,0))}
works perfectly well. However,
{SUBTOTAL(9,ROUNDUP(X4:X38,0))}
Generates a generic "The formula you typed contains an error" message. I have tried various obvious things, like putting additional brackets around the "roundup" section, etc.
Any help would be appreciated.
You can do this without a helper column by using this formula:
=SUMPRODUCT(SUBTOTAL(2,OFFSET(X4:X38,ROW(X4:X38)-MIN(ROW(X4:X38)),0,1)),ROUNDUP(X4:X38,0))
OFFSET effectively breaks the range down in to individual cells which are passed to SUBTOTAL function and that returns an array of 1 or 0 values based on whether each cell is visible after filter or not - this array is multiplied by the rounded values to give the overall sum of the rounded visible values.
Another way is to use AGGREGATE function like this
=SUMPRODUCT(ROUNDUP(AGGREGATE(15,7,X4:X38,ROW(INDIRECT("1:"&SUBTOTAL(2,X4:X38)))),0))
Given the complexity a helper column might be the preferable approach
After investigation, looks like this is not possible without helper column.
Add a helper column which rounds the individual values in column X, e.g. type the following formula into cell Y4 and drag down to Y38:
= ROUNDUP(X4,0)
And then instead of
= SUBTOTAL(9,ROUNDUP(X4:X38,0))
use:
= SUBTOTAL(9,Y4:Y38)
Then if necessary you can just hide the helper column. Of course the helper column doesn't have to be column Y, it could be any column, e.g. a column far to the right of where the data ends.
Summary: A complex (to me) multi-sheet array formula stops working in a certain column, and I can't figure out why.
Setting: I'm compiling a spreadsheet to establish values for fantasy baseball players. The Sheet1 contains the pasted raw statistics of every hitter, and Sheet2 contains intermediate computations that allow me to determine final values.
In the example formulas, Sheet1 column C holds text strings designating position, and Sheet1 column E holds the number of at bats for each player.
The third referenced column is associated with the statistic being processed.
The first two formulas are working as intended, but I'm adding them to help contextualize the overall process.
All three of these formulas are implemented on Sheet2.
Formula A: Intended to calculate a "replacement value" for a given statistic by averaging the 157th-171st values in the "qualified pool." Qualified values are values for which the player has at least 200 at bats. There are 12 teams times 13 hitters equals 156 league hitters.
{=(SUMPRODUCT(LARGE(IF(Sheet1!$E$2:$E$1500 > 199, Sheet1!V$2:V$1500),ROW(INDIRECT("$157:$171"))))/15)}
Formula B: Intended to calculate a "replacement value" for a given statistic for catchers only, due to scarcity at the position. Works by averaging the 13th through 16th values in the "qualified pool" Qualified values are values for which the player has at least 200 at bats and the cell describing their position contains a "C". There are 12 teams times 1 required catchers equals 12 league catchers.
{=(SUMPRODUCT(LARGE(IF((ISNUMBER(SEARCH("C",Sheet1!$C$2:$C$1500))) * (Sheet1!$E$2:$E$1500 > 199), Sheet1!V$2:V$1500),ROW(INDIRECT("$13:$16"))))/4)}
Formula C: Intended to calculate a "replacement value" for a given statistic for non-catchers only. Works by averaging the 145th through 158th values in the "qualified pool." Qualified values are the inverse of formula B; the intent was to capture all values that don't contain a "C" in their position cell OR don't have at least 200 at bats.
{=(SUMPRODUCT(LARGE(IF(((ISNUMBER(SEARCH("C",Sheet1!$C$2:$C$1500))) + (Sheet1!$E$2:$E$1500 < 200) > 0),, Sheet1!V$2:V$1500),ROW(INDIRECT("$145:$158"))))/14)}
Problem Behavior: The formulas work perfectly with statistics pasted from an external source. However, four columns were added to Sheet1 whose values are derived from the pasted values. For example, Sheet1!V2 would hold the following formula:
=$Q2-(Sheet2!$O$5 * $E2)
Sheet2!O5 contains a formula based on values in Sheet1, but not the V column, only pasted values. The value of Sheet2!O5 is 0.4825.
When applied to the four statistics that were added to Sheet1 and derived from pasted values, Formula C returns 0 for each one.
Formulas A and B work with the four new statistics as expected, and Formula C works with all the pasted value statistics.
Attempted Solutions:
Replacing the new calculated statistics with their values (instead of formulas)
Misc:
The reason for using IF(ISNUMBER(SEARCH())) is because some players may have multiple positions. I want to isolate everyone with a "C", even if there are more characters in the cell.
One difference between the expected values for the four new statistics and the expected values for the pasted statistics is that the results that I'm expecting (that are returning zeroes) are expected to be negative. No other statistic to which Formula C is applied expects or returns negative values. However, Formula A and B return the negative values expected from the four new statistics without a problem.
Question: Why would this formula return a 0 from the newly added statistics, and what can I do or test to fix this problem?
Thank you.
I still don't know the reason, but the problem was fixed by adding the quotation marks to fill in the "value if true" part of the formula instead of putting the two commas next to each other.
The corrected formula:
{=(SUMPRODUCT(LARGE(IF(((ISNUMBER(SEARCH("C",Sheet1!$C$2:$C$1500))) + (Sheet1!$E$2:$E$1500 < 200) > 0),"", Sheet1!V$2:V$1500),ROW(INDIRECT("$145:$158"))))/14)}
Adding the quotation marks had no effect on the values of the columns where the data was positive, but switched the columns where the data was negative from 0 to the expected negative value.
So I've looked up tutorials on how to do this, and I'm still struggling, so I could use some expert help. I know it involves a very complex nested formula with things like SMALL, ROW, INDEX, etc...
So here are two screenshots that provide a sample of what I'm looking for. In realities there is over 1000 rows, but this makes it easier for you guys.
So here is my first example, lets call this Sheet1!:
Code, ID_1 and ID_2. So as you can see (and just focus on the input in A2) there will be two separate IDs in the linked workbook. That sheet, or at least a tiny sample of it, looks like this:
In the first column we see the code we're looking for (which is what we have in A2 of the first one), each of them with different IDs. So as I'm sure you can tell by now, I'm looking for a formula that will allow me to return those values in ID_1 and ID_2 in the first sheet.
I have been going at this for an hour and I'm stumped, so I would greatly appreciate any help provided!
This is a more generic code if the ids are NOT listed consecutively: Obviously I have done this as an example to take in a more general case where the ids occur anywhere throughout the second dataset, AND where there are potentially several.
IFERROR(INDEX($V$2:$V$15, SMALL(IF($U$2:$U$15=$M2, ROW($U$2:$U$15), FALSE), COLUMNS($N2:N2))-ROW($V$1), 1), "")
This formula must be entered with Ctrl-Shift-Enter before copying across and down! Note all absolute and relative referencing/locking ($ signs)
The logical steps in constructing such a formula:
1) We use IF function to test if the values in the column U match the value in column M.
2) In the 'value-if-true' parameter, we will get the corresponding row number of values in column U. These numbers will be fed later in the SMALL function.
3) In the value-if-false part, we just return false, as that will later be used as a non-number in the SMALL function
Above 3 steps in the part: IF($U$2:$U$15=$M2, ROW($U$2:$U$15), FALSE)
4 ) We have now an array of mixed row numbers and FALSE values, which we want to feed to the INDEX function to simply get the corresponding value in column V(our second datset). BUT as we wish to retrieve the different row matches for each code, we have to fish them out of the mixed array with the SMALL function.
5) using our columns as an incrementer, we apply the SMALL function to the array with a varying k parameter. We USE the COLUMNS function (note carefully the different $ sign usage), so that as we drag the formula across, the column count increments: COLUMNS($N2:N2) - giving K values of 1, 2, 3, 4 as we drag the formula across from column N to column Q. Note that it is useful that the SMALL function disregards FALSE values when looking through the array for the values by size.
6) There is an adjustment to account for the fact that the rows are relative to the 'Ids' range which we will feed into the INDEX function to retrieve the different ids. SMALL(IF($U$2:$U$15=$M2, ROW($U$2:$U$15), FALSE), COLUMNS($N2:N2))-ROW($V$1).
This can be avoided if we use the entire column V as the look-up array parameter in the INDEX function, but that's another way...
7) This resulting value can now be passed to the INDEX function to obtain the various ids. The column_num parameter of 1 which I put in the function isn't necessary in a single-column look-up array, but is there for completeness.
8) The entire construction is then wrapped in an IFERROR function to give an empty string if there is no match, but some people may wish to have error outputs there...
well if the two ID will be consecutive in the second list try this:
=index('workbookname'SheetName!columnrangeofserialnumbers,match(A2,'workbookname'Sheetname!columnrangeofIDs,0))
Assuming your other workbook is called Serials, and all the info is on sheet1 you would enter the follow in B2:
=index('serials'sheet1!$B$2:$B$1000,match(A2,'serials'sheet1!$B$2:$B$1000,0))
in C2 enter the following (assuming ids will show up consecutively)
=index('serials'sheet1!$B$2:$B$1000,match(A2,'serials'sheet1!$B$2:$B$1000,0)+1)
This only works if the other workbook is open as far as I know and with the understanding that the two ID will be listed consecutively in the list.