* Updated *
I have a rather large excel data set that I'm trying to summarise using up to 3 dimensions: region, sector, industry.
Any combination of these dimensions can be set or left blank and I need to create a formula that accommodates this WITHOUT using VBA.
Within the data I've set up named ranges to refer to these dimensions.
I'm using an array formula but I'd like to dynamically create a string which is then used as the boolean argument in the array formula.
For instance if:
A1 = "Hong Kong" (region)
B1 = <blank> (sector)
C1 = <blank> (industry)
I create a dynamic string in D1 such that
D1 = (region="Hong Kong")
I then want to use the string in D1 to create an array formula
E1 = {counta(if(D1,employees))}
However, if the user includes a sector such that:
A2 = "Hong Kong" (region)
B2 = "finance" (sector)
C2 = <blank> (industry)
Then I want the string in D2 to update to:
D2 = (region="Hong Kong")*(sector="finance")
Which then automatically updates the value in E2 which still has the same formula.
E2 = {counta(if(D2,employees))}
Is this possible? Alternatively is there any other way of achieving the same outcome, keeping in mind that I need to be able to copy D1 and E1 down into different rows so that different combinations of the dimensions can be viewed simultaneously.
Thanks.
* Updated *
To be clear, the reason the I need the values in column D to be dynamic is so that I can create different scenarios in Row 1, Row 2, Row 3 etc. and I need the values in column E of each row to match the criteria set in columns A:C of that row.
There had to be a fairly simple way!
Columns B:D contain the criteria, A is a criterion number and E is the result of applying the DSUM function to the criterion in that row. I've used DSUM as it seems more natural (to me at least) to sum employee numbers. However, DCOUNT can equally well be used. For brevity I've not shown the data I'm using but it is a very trivial data set with just a few rows of test data.
The first set of criteria in row 2 is: Sector takes value of "Man" (manufacturing) whilst Region and Industry are unspecified. The 3rd set of criteria (in row 4) is: the Region is "Fr" (for France) AND the Industry is "Cars". The results in the DSUM column are obtained by applying the set of criteria in the corresponding row. All, some or even none of the cells in a row may contain entries.
The approach used is based on columns G:J, where with the exception cells G1 and G2 (which contain the numbers 0 and 1, respectively) everything in these columns has been generated by a formula.
There are twice as many rows in columns G:J as there are sets of criteria listed in B:D and the rows should be taken in pairs. The first pair (rows 1 and 2) provide a criterion table for use in DSUM corresponding to the first set of criteria (the table is cells H1:J2), the second pair in rows 3 and 4 provides a criterion table for the second set of criteria (cells H3:J4), etc. (Ignore the 11th row - I copied too many rows downwards in the screenshot!)
Column G has a fairly obvious pattern and can be generated by applying a simple =IF() function in cell G3 which references the starting pair in G1 and G2 with the formula in G3 then copied downwards.
The cells in columns H:J reference the appropriate cells of the set of all criteria (B1:D6 in the screenshot) using the INDEX function (and making use of the value in column Galong the way). It is not too difficult to create a single formula that can be copied from H1 to the range H1:J11 by judicious use of mixed relative and absolute addressing and an IF or two). Note that references to an empty cell in B2:D6 will generate a value of 0 in the corresponding cell in H:J so the construct IF(x=0,"",x) must be used - this makes the formula used in the cells in columns H:J a bit clunky but not excessively so.
Having generated the 5 criteria tables corresponding to the 5 sets of criteria in B:D, use is made of the OFFSET function to deliver the correct criterion table as the third argument of the DSUM functions in column E.
I chose to base my OFFSETs on cell $H$1, so the top-left cell of the criterion table for the first set of criteria is offset from my base cell by 0 rows and 0 columns. The second criterion table is offset by 2 rows and 0 columns, the third by 4 rows and 0 columns. It should be clear how the number of offset rows and columns to use can be calculated from the corresponding criterion number in column A. It should also be obvious that the final two arguments of the OFFSET function will always be 2 and 3. So my DSUM() functions in column E look something like
=DSUM(myData,"Employees",OFFSET($H$1,row_offset,0,2,3))
where myData is the named range containing the test dataset and row_offset is a very simple formula involving the corresponding value in column A.
It would have been nice to have been able to deliver the third argument of the function without having to adopt the approach of effectively reproducing the sets of criteria in B1:D6 in cells H1:J10. Whilst there are ways to generate the required criterion table arrays formulaically without putting them onto the worksheet, I found that DSUM generated an error when applying such an array as its third argument.
Related
edited to add an example table
I use excel's filter and unique functions to retrieve arrays from a source table. The first array is typical set of dates, followed by data. Next to the retrieved arrays, I have columns with formulas.
Once the source table grows, filter function is always up to date, adding new rows in the end...but the columns with formulas do not. You need to copy down the formulas. Also, you cannot make a table of a range if the columns have spill functions like filter or unique.
What would be the recommended way to handle this? Is there a better way than making a macro that copies down the formulas?
As an example, the source table has a growing number of dates and some categories with values:
date
category
value
1.1.2022
A
1.2
1.1.2022
A
0.5
1.1.2022
B
0.2
1.1.2022
B
2.2
2.1.2022
A
0.1
2.1.2022
A
0.3
2.1.2022
B
1.2
...
Now in the summary table, I use unique function to retrieve the dates in the first column. This spills down automatically - so far so good. In the second column (category A), I use sum(filter(..)) function to sum all values in the source table where category = A and date = the date on the same row in the first column:
unique date
cat A
cat B
1.1.2022
1.7
2.4
2.1.2022
0.4
1.2
This is problematic since filter formula looks like this (assuming the above table starts from cell A1):
=sum(filter(source[value],(source[category]=B$1)*isnumber(match(source[date],$A2))))
Hashtag did not seem to work in the last parameter ($A2), e.g. replacing $A2 by offset($A2#,0,0,1) worked only on the first row.
I often use OFFSET in combination with the spill-range syntax that #Rory cited:
For example, if your FILTER/UNIQUE formula in cell A2 spills to columns A and B, use OFFSET(A2#,0,1,,1) in a function that acts on the values in column B only. The result will be a spill range for each row of your original spill range.
Of course, you can also offset rows that way too (e.g., to calculate incremental changes between rows: =OFFSET(A2#,0,1,,1)-OFFSET(A2#,-1,1,,1) for the prior row, so long as the value in B1 does not cause an error).
Additionally, Office365 versions have BYROW and BYCOL that you can use to act on each row or column of a spill range. For example, to find the max value of each row, =BYROW(A2#,LAMBDA(r, MAX(r))).
Thanks #pdtcaskey
I had a similar challenge and your mention of BYROW and LAMBDA did the trick for me!
For above scenario, let's assume the source table is an Excel Table with the elusive name Source and columns Date, Category, Value, situated at A1.
For the summary, assuming it's in a separate sheet and also starts at A1, you can do this:
Put the headers in row 1
In A2 put: =SORT(UNIQUE(Source[Date]))
In B2 put: =BYROW(A2#;LAMBDA(r;SUM(FILTER(Source[Value]; (Source[Category]="A")*(Source[Date]=r)))))
In C2 put: =BYROW(A2#;LAMBDA(r;SUM(FILTER(Source[Value];(Source[Category]="B")*(Source[Date]=r)))))
So I have a table which basically looks as follows:
Criterion Value
1 -5
1 1
2 5
2 5
3 2
3 -1
I want to sum the values in column B based on the criteria in column A, but only if the sum for an individual criterion is not negative. So for example if I ask for the sum of all values where criterion is between 1 and 3, the result should be 11 (the values for criterion 1 not being included in the sum because they add up to a negative number.
My first idea was to add a third column with a sumif([criterion];[#criterion];[value]) and then use a sumifs function which checks whether that that third column is negative. However, my table has +100k lines and with that many sumif functions it becomes intolerably slow.
I know I could create a pivot table to the same effect, but that has two drawbacks: I would have to create a separate sheet, which would add complexity, and my table is frequently updated which means I would have to manually update that pivot table every time to allow for downstream calculations. NBD and I could do that as a last resort, but I wonder whether there isn't a more elegant way to solve this problem.
I would want to avoid VBA to avoid complexity (the sheet will be used by other persons).
Thank you
This can be easily done using UNIQUE() and the two versions of SUMIF() in this way:
First collect all the criteria with =UNIQUE(A2:A7) -- Assuming your data are in columns A and B starting from row 2, this goes in cell C2, with "Criteria" in C1
Compute the subtotals for all criteria using =SUMIF($A$2:$A$7, C2, $B$2:$B$7) -- This goes in cell D2 and extends as the criteria do, "Partials" in cell D1
sum all the data in step 2 yielding a positive sum with =SUMIF(D2:D7, ">0") in cell E2
If you have a lot of data I suggest to use the column references to avoid absolute references and the need to adjust the formulas as data change (in number):
The first formula becomes =UNIQUE(A:A) -- Don't care about the heading being taken (strings and empty cells are not summed)
For the second formula use =SUMIF(A:A, C2, B:B)
Use =SUMIF(D:D, ">0") for the last step
This should be reasonably fast, using just as many extra cells as the number of distinct criteria (multiplied by 2).
I'm trying to create a spreadsheet in excel which creates a sequential number in a column (B) depending on the contents of another column. Currently, there are two possibilities of what could be in Column A ("BI" or "GF"). So I want the data to look like this
COL A COLB
BI 1
BI 2
GF 1
BI 3
GF 2
GF 3
BI 4
BI 5
I've tried several attempts to do this but can't seem to find a solution. Any help would be greatly appreciated.
In B2, try this formula:
=CountIf(A$2:A2,A2)
Try to use the offset equation.
The first cell in COL B will look similar to this:
=COUNTIFS(OFFSET(A$1$1:A1,0,0),A1)
The second should look like this:
=COUNTIFS(OFFSET(A$1$1:A2,0,0),A2)
Drag this down in Col B as far as desired. If you are using a table this should autofill.
Explanation:
Essentially you are using the OFFSET formula to create a dynamic range.
The A$1$ serves as the start of your range by making this an absolute reference and the A1 will serve as the end of your range.
By making the ending cell a relative reference, the array the COUNTIFS function search will never go beyond the row the of the cell the formula is relative of.
In your example, the formula in the first row in Col B would result in 1. The reason is the OFFSET returns the array of A$1$:A1 and the COUNTIFS searches through that array and returns a count of all cells equal to A1 which is "BI".
The second row retains the original starting cell for the array of A$1$ however the end of the array is A2. So the COUNTIFS function sees the new array to search through to be A$1$:A2. The COUNTIFS then searchs through each cell equal to A2 which like A1 is "BI". There are two cells equal to "BI" in the new array and thus the reult is 2.
How this works like you want is displayed in the third row of Col B.
The OFFSET functions simply expands the array size to A$1$:A3. The COUNTIFS will work as it normally does, it takes the array, called the criteriarange in Excel, and performs a count for all items that equal A3. A3 in this case is equal to "GF" and in the array A$1$:A3 there are two cells equal to "BF" and one equal to "GF".
Hope this helps!
I'm building a yearly scorecard (sample shown above). The requirements for the scorecard are listed below.
Year to Date values must cumulatively add each of the previous period values (circled in orange).
P1 = P1
P2 = P1 + P2
P3 = P1 + P2 + P3 (etc)
Year to Date formulas must all be the exact same, dynamically referencing the required columns and required rows so that they can be easily copied from period to period (on going).
With this formula I was trying looking at row 2 with each of the column indicators in it, and trying to test for ISTEXT() to add up the values in the ROW()-1. Using concatenate to build a string that references a row range might not be the best way to do it.
Example: If I have values in row 55
=SUM(INDIRECT(CONCATENATE(ROW()-1, ":", ROW()-1)))
=SUM(INDEX(INDIRECT(CONCATENATE(ROW()-1, ":", ROW()-1)),MATCH(ISTEXT(2:2),2:2,0)))
I was trying something like a horizontal sumifs() formula with little luck, attempting to use the modulus value of the column() function as a logical test.
formula doesn't work
=SUMIFS(INDIRECT(ROW()-1&":"&ROW()-1), MOD(COLUMN()-2, 6), 0)
Or Using some other method of testing which columns to add.
=SUMIFS(INDIRECT(CONCATENATE(ROW()-1, ":", ROW()-1)), IF(ISTEXT(2:2), 1, 0), TRUE)
If I change my lettering in Row 2 (N, H, T) to just "X" then test for X that works, but this formula doesn't factor in the requirement for only adding values from current and prior periods.
=SUMIFS(INDIRECT(CONCATENATE(ROW()-1, ":", ROW()-1)),2:2, "X")
I don't know of a way to accomplish adding up a dynamic number of indirect cell references based on the column you're in. So lets say its row 55 in period 3, I would need a formula that looks in row 2, sees each of the column values (H, N, T) and adds up H55, N55, T55). That same formula would need to construct a different list based on if its in period 2. (H, N), (H55, N55).
Maybe I need to rethink my approach entirely? Write VBA instead?
Edit
To better expand on what the data model is, to address some comments, I've thrown some dummy values and dirty formulas in.
Have a look at service level vs. service level year to date (YTD). Service level is just a flat data entry of weekly performance, then the Summary column is a simple average of the weekly performance in order to report period performance. The YTD number is an average of the period numbers, so these values progressively roll up.
The formulas I'm trying to write are for the summary columns, both period value and YTD values.
It's not entirely clear what your data layout is.
So, assuming:
Labels that identify columns to sum are in row 2
Values to sum are in row 55
Formula is to sum values in row 55, which have a non-blank entry in row 2, and sum values in columns up to and including the column the formula is in
Formula
=SUMPRODUCT($55:$55,--(COLUMN($55:$55)<=COLUMN()),--($2:$2<>""))
For column T use:
=SUM(IF(MOD(COLUMN($H:T),6)=2,$H$1:T$1,0))
This is an array formula and must be confirmed with Ctrl+Shift+Enter.
change the $H$1:T$1 to the rownumber you need to sum (it will only sum every sixth column starting with H)
Having UPEH at Row 9 and this code in row 10 then =SUM(IF(MOD(COLUMN($H:T),6)=2,$H$9:T$9,0))
If set correct one time you can copy paste it as you need it (as long as it stays with just sum every 6th column starting at H)
for making it more dynamically you may better use:
=SUM(IF($A$4:T$4="Summary",$A$9:T$9,0))
This is an array formula and must be confirmed with Ctrl+Shift+Enter.
it checks for Row 4 to contain "Summary" to get the values to sum :)
EDIT
However, if you want to have exactly the same formula in each part you would need to use something like that:
=SUM(IF(AND($4:$4="Summary",COLUMN($4:$4)<=COLUMN(),OFFSET($1:$1,ROW()-2,),0))
This is an array formula and must be confirmed with Ctrl+Shift+Enter.
it sums all the cells 1 row over itself from the beginning till (including) the own column for all columns containing "Summary" in row 4
however, this may get pretty slot pretty fast (calcs a LOT) ^^
BIG HINT: Just looking at what you have/need
lets asume the cells to add are in row 1 and the output in row 2...
we also skip the columns not to calculate (to make it easy)...
A2 would be just A1
B2 would be A1 + B1
C2 would be A1 + B1 + C1... but wait!
A1 + B1 = B2 so better -> C2 = B2 + C1
leads to:
R2Cx = R2C(x-1) + R1Cx
if you just use that behavior in column N (that it is the value over it and the calculated value to the left (column H)) and also write it that way, you could just copy it and paste it in column T and you will get =T(above) + N(calculated). check it :)
I'm creating a grid of correlation values, like a distance grid. I have a series of cells that each contain a formula whose ranges are easy to describe if you know the offset from the first cell, and I'm having trouble figuring out how to specify it.
In the upper left hand cell (R10), the formula is CORREL(C2:C21,C2:C21) -- it's 1, of course.
In the next column over (S10), the formula is CORREL(D2:D21,C2:C21).
In the next row down (R11), the formula is CORREL(C2:C21,D2:D21).
Of course, S11 would contain CORREL(D2:D21,D2:D21), which is also 1. And so on, for a roughly 15x15 grid.
Here's a graphical representation of the ranges involved:
C2:C21,C2:C21 C2:C21,D2:D21 C2:C21,E2:E21
D2:D21,C2:C21 D2:D21,D2:D21 D2:D21,E2:E21
E2:E21,C2:C21 E2:E21,D2:D21 E2:E21,E2:E21
Whenever I add a new data row, I have to manually update several formulas. So, I'd like the last non-blank column number (21, in this case), to be dynamically determined, such as with COUNTA(C:C). Ideally, I'd like the formula to calculate the row offsets, too, so that I can drag one formula across my entire range.
What's the best way to accomplish this? I think OFFSET might be a component in the solution, but I haven't had success getting it all to work together.
Using this simple setup per element of the corr matrix also helps:
=CORREL(INDIRECT("'Risk factors'!"&"T"&G6&":T"&H6);INDIRECT("'Risk factors'!"&"U"&G6&":U"&H6))
With this function I refer to data in another sheet, Risk factors, to correlate rows T and U with each other. I want the ranges of the data to be dynamic so I refer with G6 and H6 in my current sheet to the lenght of the columns (number of rows) which I of course specify in these G6 and H6 cells.
Hope this helps!
I found this formula, while wordy, achieved the desired results. In this example, the data lives in C2:O19. The table I wanted to construct computed the correlation values of all permutations of pairs of columns. Since there are 11 columns, the correlation pairs table is 11x11 and starts at R10. Each cell has the following formula:
=CORREL(INDIRECT(ADDRESS(2,2+(ROWS($R$10:R10)),4)&":"&ADDRESS(COUNTA($C:$C),
2+(ROWS($R$10:R10)),4)),INDIRECT(ADDRESS(2,2+(COLUMNS($R$10:R10)),4)&":"&
ADDRESS(COUNTA($C:$C),2+(COLUMNS($R$10:R10)),4)))
As I found out, INDIRECT() resolves a cell reference and obtains its value.
Let's take a cell, say U12, and look at the range formula in detail. The first INDIRECT is the column given by applying the row offset from R10.
Since Row 12 is 2 rows down from Row 10, ADDRESS(2,2+(ROWS($R$10:U12)),4)&":"&ADDRESS(COUNTA($C:$C),2+(ROWS($R$10:U12)),4) should yield the column that's 2 rows right of Row C, which is E. The formula evaluates to E2:E19.
The second INDIRECT is the column given by applying the column offset from R10. Similarly, since Column U is 3 columns right of Column R, ADDRESS(2,2+(COLUMNS($R$10:U12)),4)&":"&ADDRESS(COUNTA($C:$C),2+(COLUMNS($R$10:U12)),4) should yield the column that's 3 rows right of Row C, which is F. The second formula evaluates to F2:F19.
Substituting these range reference values in, the cell formula reduces to =CORREL(INDIRECT("E2:E19"),INDIRECT("F2:F19")) and further to =CORREL(E2:E19,F2:F19), which is what I'd been using up till now.
Just like a distance table, this table is symmetrical along the diagonal, because =CORREL(E2:E19,F2:F19) equals =CORREL(F2:F19,E2:E19). Each value on the diagonal is 1, because CORREL of the same range is 100% correlation by definition.