I'm building a yearly scorecard (sample shown above). The requirements for the scorecard are listed below.
Year to Date values must cumulatively add each of the previous period values (circled in orange).
P1 = P1
P2 = P1 + P2
P3 = P1 + P2 + P3 (etc)
Year to Date formulas must all be the exact same, dynamically referencing the required columns and required rows so that they can be easily copied from period to period (on going).
With this formula I was trying looking at row 2 with each of the column indicators in it, and trying to test for ISTEXT() to add up the values in the ROW()-1. Using concatenate to build a string that references a row range might not be the best way to do it.
Example: If I have values in row 55
=SUM(INDIRECT(CONCATENATE(ROW()-1, ":", ROW()-1)))
=SUM(INDEX(INDIRECT(CONCATENATE(ROW()-1, ":", ROW()-1)),MATCH(ISTEXT(2:2),2:2,0)))
I was trying something like a horizontal sumifs() formula with little luck, attempting to use the modulus value of the column() function as a logical test.
formula doesn't work
=SUMIFS(INDIRECT(ROW()-1&":"&ROW()-1), MOD(COLUMN()-2, 6), 0)
Or Using some other method of testing which columns to add.
=SUMIFS(INDIRECT(CONCATENATE(ROW()-1, ":", ROW()-1)), IF(ISTEXT(2:2), 1, 0), TRUE)
If I change my lettering in Row 2 (N, H, T) to just "X" then test for X that works, but this formula doesn't factor in the requirement for only adding values from current and prior periods.
=SUMIFS(INDIRECT(CONCATENATE(ROW()-1, ":", ROW()-1)),2:2, "X")
I don't know of a way to accomplish adding up a dynamic number of indirect cell references based on the column you're in. So lets say its row 55 in period 3, I would need a formula that looks in row 2, sees each of the column values (H, N, T) and adds up H55, N55, T55). That same formula would need to construct a different list based on if its in period 2. (H, N), (H55, N55).
Maybe I need to rethink my approach entirely? Write VBA instead?
Edit
To better expand on what the data model is, to address some comments, I've thrown some dummy values and dirty formulas in.
Have a look at service level vs. service level year to date (YTD). Service level is just a flat data entry of weekly performance, then the Summary column is a simple average of the weekly performance in order to report period performance. The YTD number is an average of the period numbers, so these values progressively roll up.
The formulas I'm trying to write are for the summary columns, both period value and YTD values.
It's not entirely clear what your data layout is.
So, assuming:
Labels that identify columns to sum are in row 2
Values to sum are in row 55
Formula is to sum values in row 55, which have a non-blank entry in row 2, and sum values in columns up to and including the column the formula is in
Formula
=SUMPRODUCT($55:$55,--(COLUMN($55:$55)<=COLUMN()),--($2:$2<>""))
For column T use:
=SUM(IF(MOD(COLUMN($H:T),6)=2,$H$1:T$1,0))
This is an array formula and must be confirmed with Ctrl+Shift+Enter.
change the $H$1:T$1 to the rownumber you need to sum (it will only sum every sixth column starting with H)
Having UPEH at Row 9 and this code in row 10 then =SUM(IF(MOD(COLUMN($H:T),6)=2,$H$9:T$9,0))
If set correct one time you can copy paste it as you need it (as long as it stays with just sum every 6th column starting at H)
for making it more dynamically you may better use:
=SUM(IF($A$4:T$4="Summary",$A$9:T$9,0))
This is an array formula and must be confirmed with Ctrl+Shift+Enter.
it checks for Row 4 to contain "Summary" to get the values to sum :)
EDIT
However, if you want to have exactly the same formula in each part you would need to use something like that:
=SUM(IF(AND($4:$4="Summary",COLUMN($4:$4)<=COLUMN(),OFFSET($1:$1,ROW()-2,),0))
This is an array formula and must be confirmed with Ctrl+Shift+Enter.
it sums all the cells 1 row over itself from the beginning till (including) the own column for all columns containing "Summary" in row 4
however, this may get pretty slot pretty fast (calcs a LOT) ^^
BIG HINT: Just looking at what you have/need
lets asume the cells to add are in row 1 and the output in row 2...
we also skip the columns not to calculate (to make it easy)...
A2 would be just A1
B2 would be A1 + B1
C2 would be A1 + B1 + C1... but wait!
A1 + B1 = B2 so better -> C2 = B2 + C1
leads to:
R2Cx = R2C(x-1) + R1Cx
if you just use that behavior in column N (that it is the value over it and the calculated value to the left (column H)) and also write it that way, you could just copy it and paste it in column T and you will get =T(above) + N(calculated). check it :)
Related
So I have a table which basically looks as follows:
Criterion Value
1 -5
1 1
2 5
2 5
3 2
3 -1
I want to sum the values in column B based on the criteria in column A, but only if the sum for an individual criterion is not negative. So for example if I ask for the sum of all values where criterion is between 1 and 3, the result should be 11 (the values for criterion 1 not being included in the sum because they add up to a negative number.
My first idea was to add a third column with a sumif([criterion];[#criterion];[value]) and then use a sumifs function which checks whether that that third column is negative. However, my table has +100k lines and with that many sumif functions it becomes intolerably slow.
I know I could create a pivot table to the same effect, but that has two drawbacks: I would have to create a separate sheet, which would add complexity, and my table is frequently updated which means I would have to manually update that pivot table every time to allow for downstream calculations. NBD and I could do that as a last resort, but I wonder whether there isn't a more elegant way to solve this problem.
I would want to avoid VBA to avoid complexity (the sheet will be used by other persons).
Thank you
This can be easily done using UNIQUE() and the two versions of SUMIF() in this way:
First collect all the criteria with =UNIQUE(A2:A7) -- Assuming your data are in columns A and B starting from row 2, this goes in cell C2, with "Criteria" in C1
Compute the subtotals for all criteria using =SUMIF($A$2:$A$7, C2, $B$2:$B$7) -- This goes in cell D2 and extends as the criteria do, "Partials" in cell D1
sum all the data in step 2 yielding a positive sum with =SUMIF(D2:D7, ">0") in cell E2
If you have a lot of data I suggest to use the column references to avoid absolute references and the need to adjust the formulas as data change (in number):
The first formula becomes =UNIQUE(A:A) -- Don't care about the heading being taken (strings and empty cells are not summed)
For the second formula use =SUMIF(A:A, C2, B:B)
Use =SUMIF(D:D, ">0") for the last step
This should be reasonably fast, using just as many extra cells as the number of distinct criteria (multiplied by 2).
Screenshot of the Excel worksheet
I'm working with historic stock prices, and using eight columns I have:
Column A: High
Column B: Low
Column C: Close
Column D: Cx-Cx-4
Column E: Counts the number of consecutive positive numbers in column D
Column F: Counts the number of consecutive negative numbers in column D
Column G: Calculate the difference between the maximum of column A and minimum of column B within a given sequence.
As an example G1 should equal:
=max(A1:A5)-min(B1:B5)
G6 should equal:
=max(A6:A8)-min(B6:B8)
G9 should equal:
=max(A9:A11)-min(B9:B11)
And so on.
I'd like to know if it is possible to automate this calculation, possibly with the use of one or more additional columns.
Welcome to SO!
This may not be the most efficient solution as you need to add two helper columns, but if I understand your requirements correctly, then this idea should work well enough.
First, let's assume that there are 100 rows in your data set. Given that, enter the formula "=A100" in cell G100 and the formula "=B100" in cell H100. This sets up the boundary condition for the formulas in columns G and H. Now, in cell G99, enter this formula:
"=IF(E99="",G100,IF(E100="",A99,MAX(A99,G100)))"
What this formula does is set up a "running maximum" with the following logic:
If the cell in E99 is blank, copy the running maximum from G100, else:
If the cell in E99 is not blank but the cell in E100 is, set up a new running maximum from the cell in A99, else:
Take the maximum of A99 and G100 as the new running maximum.
Similarly, copy the following formula into cell H100:
"=IF(F99="",H100,IF(F100="",B99,MIN(B99,H100)))"
This follows the same logic as the previous formula, but takes the minimum of column B.
Copy or autofill these formulas to the top of the data set. This should now give you running maximum for column A and a running minimum for column B.
The next step is to calculate the difference. I notice from your question, that you only seem to be interested in calculating this difference at the top of each range (G1, G6, G9, etc.), rather than doing it in every row. Given that, we need a slightly more complicated formula.
The boundary condition for this formula is simply "=G1-H1" entered in cell I1. In cell I2, enter this:
"=IF(OR(AND(E2<>"",E1=""),AND(F2<>"",F1="")),G2-H2,"")"
How this works is that it check two conditions that indicate a range boundary:
E1 is blank and E2 is not
or
F1 is blank and F2 is not
If either of these conditions hold, the IF statement is true and "G2-H2" is diplayed, otherwise a blank cell is displayed. Now copy or autofill this formula to the bottom of the data set.
As a final step, you can now hide columns G and H if you don't need them displayed. This should now give you the results I think you're looking for. Please let me know if this doesn't work out for you.
* Updated *
I have a rather large excel data set that I'm trying to summarise using up to 3 dimensions: region, sector, industry.
Any combination of these dimensions can be set or left blank and I need to create a formula that accommodates this WITHOUT using VBA.
Within the data I've set up named ranges to refer to these dimensions.
I'm using an array formula but I'd like to dynamically create a string which is then used as the boolean argument in the array formula.
For instance if:
A1 = "Hong Kong" (region)
B1 = <blank> (sector)
C1 = <blank> (industry)
I create a dynamic string in D1 such that
D1 = (region="Hong Kong")
I then want to use the string in D1 to create an array formula
E1 = {counta(if(D1,employees))}
However, if the user includes a sector such that:
A2 = "Hong Kong" (region)
B2 = "finance" (sector)
C2 = <blank> (industry)
Then I want the string in D2 to update to:
D2 = (region="Hong Kong")*(sector="finance")
Which then automatically updates the value in E2 which still has the same formula.
E2 = {counta(if(D2,employees))}
Is this possible? Alternatively is there any other way of achieving the same outcome, keeping in mind that I need to be able to copy D1 and E1 down into different rows so that different combinations of the dimensions can be viewed simultaneously.
Thanks.
* Updated *
To be clear, the reason the I need the values in column D to be dynamic is so that I can create different scenarios in Row 1, Row 2, Row 3 etc. and I need the values in column E of each row to match the criteria set in columns A:C of that row.
There had to be a fairly simple way!
Columns B:D contain the criteria, A is a criterion number and E is the result of applying the DSUM function to the criterion in that row. I've used DSUM as it seems more natural (to me at least) to sum employee numbers. However, DCOUNT can equally well be used. For brevity I've not shown the data I'm using but it is a very trivial data set with just a few rows of test data.
The first set of criteria in row 2 is: Sector takes value of "Man" (manufacturing) whilst Region and Industry are unspecified. The 3rd set of criteria (in row 4) is: the Region is "Fr" (for France) AND the Industry is "Cars". The results in the DSUM column are obtained by applying the set of criteria in the corresponding row. All, some or even none of the cells in a row may contain entries.
The approach used is based on columns G:J, where with the exception cells G1 and G2 (which contain the numbers 0 and 1, respectively) everything in these columns has been generated by a formula.
There are twice as many rows in columns G:J as there are sets of criteria listed in B:D and the rows should be taken in pairs. The first pair (rows 1 and 2) provide a criterion table for use in DSUM corresponding to the first set of criteria (the table is cells H1:J2), the second pair in rows 3 and 4 provides a criterion table for the second set of criteria (cells H3:J4), etc. (Ignore the 11th row - I copied too many rows downwards in the screenshot!)
Column G has a fairly obvious pattern and can be generated by applying a simple =IF() function in cell G3 which references the starting pair in G1 and G2 with the formula in G3 then copied downwards.
The cells in columns H:J reference the appropriate cells of the set of all criteria (B1:D6 in the screenshot) using the INDEX function (and making use of the value in column Galong the way). It is not too difficult to create a single formula that can be copied from H1 to the range H1:J11 by judicious use of mixed relative and absolute addressing and an IF or two). Note that references to an empty cell in B2:D6 will generate a value of 0 in the corresponding cell in H:J so the construct IF(x=0,"",x) must be used - this makes the formula used in the cells in columns H:J a bit clunky but not excessively so.
Having generated the 5 criteria tables corresponding to the 5 sets of criteria in B:D, use is made of the OFFSET function to deliver the correct criterion table as the third argument of the DSUM functions in column E.
I chose to base my OFFSETs on cell $H$1, so the top-left cell of the criterion table for the first set of criteria is offset from my base cell by 0 rows and 0 columns. The second criterion table is offset by 2 rows and 0 columns, the third by 4 rows and 0 columns. It should be clear how the number of offset rows and columns to use can be calculated from the corresponding criterion number in column A. It should also be obvious that the final two arguments of the OFFSET function will always be 2 and 3. So my DSUM() functions in column E look something like
=DSUM(myData,"Employees",OFFSET($H$1,row_offset,0,2,3))
where myData is the named range containing the test dataset and row_offset is a very simple formula involving the corresponding value in column A.
It would have been nice to have been able to deliver the third argument of the function without having to adopt the approach of effectively reproducing the sets of criteria in B1:D6 in cells H1:J10. Whilst there are ways to generate the required criterion table arrays formulaically without putting them onto the worksheet, I found that DSUM generated an error when applying such an array as its third argument.
I am trying to use an excel formula to determine the proportion (pDistance) of the total Distance for each Position by Site. For example if the (total) Distance was 50 and the Position was 10 the proportion of the total distance (pDistance) would be 0.2, the last pDistance of any Site should always be 1. The formula I used (=IF(B3<B2, 1, (B2/C2))) mostly works, except that I have some values of Position that are -10.
In these cases every Position in that Site should have a value of 10 added to it before calculating the pDistance. Can this be done with one formula, that is calculate the pDistance and add 10 when necessary? Or would it be best to make a dummy column first? If creating a dummy column is the way to go, how does one use formula to do something until a condition is met (e.g. add 10 the Position until a 0 , or new Site, is reached)?
From the sample data, I understand that for any Site which has a value of -10 somewhere in column B, then the calculation of (B / C) needs to be ((B + 10) / C). (Although cell D11 doesn't follow this, I'm assuming it's just a typo :) )
The formula below would work in this case:
D2 = (SUMPRODUCT(($A$2:$A$22=$A2)*($B$2:$B$22=-10))*10 + B2) / C2
This formula works as follows:
SUMPRODUCT(($A$2:$A$22=$A2)*($B$2:$B$22=-10))
--> This will return the total number of rows which contain the value of A2 in column A and a -10 in column B :
($A$2:$A$22=$A2) looks at all values in the cells between A2 & A22, and finds which ones match A2. That is, it finds which rows that have the same Site number as the row where the formula is entered
($B$2:$B$22=-10) looks at all values in the cells between B2 & B22, and finds which ones equal -10. That is, it finds all rows that have a Distance value of -10.
The SUMPRODUCT finds the overlap of these. So, it counts how many rows have both the right Site number and a distance value of -10.
This value is than multiplied by 10. If there were no -10 values paired with the site, then it is 0 * 10 = 0. If there was one pair, it will be 1 * 10 = 10.
That result is then added to B2, and then divided by C2.
Note: this formula assumes that there will only be a maximum of one -10 value for a particular site. If there is more than one, then the SUMPRODUCT will return the total number of matches, and the calculation will be wrong (but the formula can be updated to correct this).
Also note that the formula refers to the ranges $A$2:$A$22 and $B$2:$B$22. These ranges must match the first and last numerical entry in your data (and not include any text - this will break the SUMPRODUCT formula). The simplest way to handle this (if you don't want to have to update the formula when adding values at the end of data set) is to set up a defined range name and refer to that in the formula.
To apply the formula, just paste it into D2, and copy/drag the formula down into the cells below.
As a conclusion, I can recommend that you create a "dummy" / "helper" column to store the (SUMPRODUCT * 10) + B2 results, and use this as the input for the division calculation. This will help to visualize and check the data that is being used for the calculation. It can be hidden until you want to check the values or change the formula in case the requirements change in the future.
I have a table set up as follows:
Column 1 - Customer Name
Row 1 - Item Name
Row 2 - Item Cost
Row 3+- Item Quantity
How do I set up the last column to calculate the total cost for each customer? I.e, For each customer row, I want to multiply the number in each cell (= quantity) by the corresponding cell in Row 2 (= cost), and add them all up for the final bill.
To clarify what I'm saying I'm attaching the following picture so that we can discuss specifics.
Have you tried SUMPRODUCT - it does exactly what you need, gives the sum of 2 or more multiplied ranges?
=SUMPRODUCT(A71:C71,$A$2:$C$2)
You can extend the ranges as far as you need. If you want to add columns make sure you don't add at the end, e.g. if you retain one blank column (D currently) and include that in the formula, then if you add a column at D the formula will automatically extend to E
You can use sumproduct but specify the ranges, e.g. =sumproduct(B2:B6,C2:C6), the next row would then be =sumproduct(B2:B6,D2:D6) etc. I'm sure there's a way to "fix" your cost row but it's quite quick doing it this way
If, for example, your first data set is in column A (i.e. per unit cost) and the second data set is in column B (i.e. quantity), and you want the total cost for each item for the specified quantity, place the following formula in C1
=A1*B1
Select C1 and drag the fill handle - this is the small
black square at the bottom right corner of the cursor as far down the column as you need. The program will automatically replicate the formula with the correct cell numbers for each row.
One way is to use this formula:
=SUM(B4:B5)*B2+SUM(C4:C5)*C2
It is not so cool but you still need to expand the formula even with SUMPRODUCT because the range has to be the same as far as I know.
The other way I came up will use a matrix function called MMULT and here is the example:
With this array (means you have to click Ctrl + Shift + Enter altogether) formula entered into cell D6: =SUM(MMULT(B2:C2,TRANSPOSE(B3:C5))), you will get your expected result without needing all the subtotals. Please note this is a 2 x 1 By 2 x 3 Matrixformula.