Formula with IF statement and condition - excel

I am trying to use an excel formula to determine the proportion (pDistance) of the total Distance for each Position by Site. For example if the (total) Distance was 50 and the Position was 10 the proportion of the total distance (pDistance) would be 0.2, the last pDistance of any Site should always be 1. The formula I used (=IF(B3<B2, 1, (B2/C2))) mostly works, except that I have some values of Position that are -10.
In these cases every Position in that Site should have a value of 10 added to it before calculating the pDistance. Can this be done with one formula, that is calculate the pDistance and add 10 when necessary? Or would it be best to make a dummy column first? If creating a dummy column is the way to go, how does one use formula to do something until a condition is met (e.g. add 10 the Position until a 0 , or new Site, is reached)?

From the sample data, I understand that for any Site which has a value of -10 somewhere in column B, then the calculation of (B / C) needs to be ((B + 10) / C). (Although cell D11 doesn't follow this, I'm assuming it's just a typo :) )
The formula below would work in this case:
D2 = (SUMPRODUCT(($A$2:$A$22=$A2)*($B$2:$B$22=-10))*10 + B2) / C2
This formula works as follows:
SUMPRODUCT(($A$2:$A$22=$A2)*($B$2:$B$22=-10))
--> This will return the total number of rows which contain the value of A2 in column A and a -10 in column B :
($A$2:$A$22=$A2) looks at all values in the cells between A2 & A22, and finds which ones match A2. That is, it finds which rows that have the same Site number as the row where the formula is entered
($B$2:$B$22=-10) looks at all values in the cells between B2 & B22, and finds which ones equal -10. That is, it finds all rows that have a Distance value of -10.
The SUMPRODUCT finds the overlap of these. So, it counts how many rows have both the right Site number and a distance value of -10.
This value is than multiplied by 10. If there were no -10 values paired with the site, then it is 0 * 10 = 0. If there was one pair, it will be 1 * 10 = 10.
That result is then added to B2, and then divided by C2.
Note: this formula assumes that there will only be a maximum of one -10 value for a particular site. If there is more than one, then the SUMPRODUCT will return the total number of matches, and the calculation will be wrong (but the formula can be updated to correct this).
Also note that the formula refers to the ranges $A$2:$A$22 and $B$2:$B$22. These ranges must match the first and last numerical entry in your data (and not include any text - this will break the SUMPRODUCT formula). The simplest way to handle this (if you don't want to have to update the formula when adding values at the end of data set) is to set up a defined range name and refer to that in the formula.
To apply the formula, just paste it into D2, and copy/drag the formula down into the cells below.
As a conclusion, I can recommend that you create a "dummy" / "helper" column to store the (SUMPRODUCT * 10) + B2 results, and use this as the input for the division calculation. This will help to visualize and check the data that is being used for the calculation. It can be hidden until you want to check the values or change the formula in case the requirements change in the future.

Related

How to find the maximum value of a given range, dependent on the value in a separate column

Screenshot of the Excel worksheet
I'm working with historic stock prices, and using eight columns I have:
Column A: High
Column B: Low
Column C: Close
Column D: Cx-Cx-4
Column E: Counts the number of consecutive positive numbers in column D
Column F: Counts the number of consecutive negative numbers in column D
Column G: Calculate the difference between the maximum of column A and minimum of column B within a given sequence.
As an example G1 should equal:
=max(A1:A5)-min(B1:B5)
G6 should equal:
=max(A6:A8)-min(B6:B8)
G9 should equal:
=max(A9:A11)-min(B9:B11)
And so on.
I'd like to know if it is possible to automate this calculation, possibly with the use of one or more additional columns.
Welcome to SO!
This may not be the most efficient solution as you need to add two helper columns, but if I understand your requirements correctly, then this idea should work well enough.
First, let's assume that there are 100 rows in your data set. Given that, enter the formula "=A100" in cell G100 and the formula "=B100" in cell H100. This sets up the boundary condition for the formulas in columns G and H. Now, in cell G99, enter this formula:
"=IF(E99="",G100,IF(E100="",A99,MAX(A99,G100)))"
What this formula does is set up a "running maximum" with the following logic:
If the cell in E99 is blank, copy the running maximum from G100, else:
If the cell in E99 is not blank but the cell in E100 is, set up a new running maximum from the cell in A99, else:
Take the maximum of A99 and G100 as the new running maximum.
Similarly, copy the following formula into cell H100:
"=IF(F99="",H100,IF(F100="",B99,MIN(B99,H100)))"
This follows the same logic as the previous formula, but takes the minimum of column B.
Copy or autofill these formulas to the top of the data set. This should now give you running maximum for column A and a running minimum for column B.
The next step is to calculate the difference. I notice from your question, that you only seem to be interested in calculating this difference at the top of each range (G1, G6, G9, etc.), rather than doing it in every row. Given that, we need a slightly more complicated formula.
The boundary condition for this formula is simply "=G1-H1" entered in cell I1. In cell I2, enter this:
"=IF(OR(AND(E2<>"",E1=""),AND(F2<>"",F1="")),G2-H2,"")"
How this works is that it check two conditions that indicate a range boundary:
E1 is blank and E2 is not
or
F1 is blank and F2 is not
If either of these conditions hold, the IF statement is true and "G2-H2" is diplayed, otherwise a blank cell is displayed. Now copy or autofill this formula to the bottom of the data set.
As a final step, you can now hide columns G and H if you don't need them displayed. This should now give you the results I think you're looking for. Please let me know if this doesn't work out for you.

Offset formula logic clarity

I am trying to get year to desired month total of personal expenditure sub categories. After researching stackoverflow, I found a formula seemingly appropriate for my requirements. I found it shifting the desired area by one row down during formula evaluation. I modified the formula by hit and trial on adhoc basis which is giving the correct results. To me the initially chosen formula appeared quite appropriate. I have shown below the sample data sheet and the evaluation steps of the original and modified formula. Could someone explain particularly the offset portion as to why it was going wrong for the initially chosen formula and how the modification helped in solving the problem. Somehow I am not able to get conceptual clarity on this issue.
Sample Data files
Personal_Accounts evaluated with formula A
Personal_Accounts evaluated with modified formula
Offset works by specifying:
A cell from you which you will offset (A1 in this example) then specifying how many rows and columns to move from that position, and then how tall and wide to make the range.
The number of rows to move down: In this case the number of rows down is determined by Match(). Match() here will return the number of rows down in the range A1:A9 that the value SS can be found. The answer is 5. Offset now is looking at Range A1 + 5 rows: A6
The number of columns to move across: Here we move 1 column. No funny business. New range is B6
The number of rows to include in the range from that start point: Here COUNTIFS() is used to return the number of times SS is found in the range A2:A9. The answer is 3. So the range will start at B6 and include three rows down in the range. Essentially B6:B8.
Finally, the number of columns to include in the range: Here it's 7 since that's what you have in cell A13, so your range is now B6:H8
OFfseT() returns that range and Sum sums it up
You subtracted one from the results of MATCH() and correctly moved that formula to produce B5:H7. You could have also changed the search range in MATCH() to A2:A9, which would probably make more sense from a readability standpoint.
Lastly, your COUNTIFS() could just be COUNTIF() since you are not evaluating multiple conditions.
So if I had to write this from scratch, I would use:
=Sum(Offset(A1, Match(A2:A9, A12, 0), 1, Countif(A2:A9, A12), A13)
Which will get you the same correct answer, without any math on Match() results.
Offset has two main functions - either to move to cell (target) using specified number of rows and columns from the starting point, or to select range of specified number of rows and columns starting in the target cell. Your original formula has issue in this part
MATCH(A12;A1:A9;0)
matched cell is fifth therefore the offset moves 5 rows down ending in A6, because it starts in A1 + 5 rows. Then it moves 1 column to be in B6 and then creates range of 3 rows in total and 7 columns = B6:H8. So you need to deduct 1 from the result of the match function to end up in the right row.
For better understanding imagine if the SS value was in the first row of the range A1:A9 (in A1) - then the offset would move from A1 one row down to A2 although you wouldnt want it to move at all.
look at your basic offset formula definition.
Offest (REFERENCE CELL, HOW MANY ROWS TO MOVE FROM REFERENCE, HOW MANY COLUMNS TO MOVE FROM EFERENCE, HOW MANY ROWS TO RETURN, HOW MANY COLUMNS TO RETURN)
so if you set your reference cell to A1 and you want to return the result in A2, you need to move down 1 row from your reference cell.
OFFSET ($A$1,1,0,1,1)
Now if we look at the match portion of your equation, MATCH return what position the information is in. So if we want to find the match position of the information in A2 in a range going from A1:A100, Match is going to tell you that the information in A2 is in the 2nd position of the column. Or more precisely it returns a value of 2.
So now we need to tell offset how far down to reach the 2nd position. We dont actually want it to move down 2 rows to get to the second position since our reference point is A1 which is the first row. As a result we really want to go down 1 row to get to the second row. So you want 1 less from your match results which you correctly did by doing Match(...)-1

Dynamically build array of values by indirect column reference

I'm building a yearly scorecard (sample shown above). The requirements for the scorecard are listed below.
Year to Date values must cumulatively add each of the previous period values (circled in orange).
P1 = P1
P2 = P1 + P2
P3 = P1 + P2 + P3 (etc)
Year to Date formulas must all be the exact same, dynamically referencing the required columns and required rows so that they can be easily copied from period to period (on going).
With this formula I was trying looking at row 2 with each of the column indicators in it, and trying to test for ISTEXT() to add up the values in the ROW()-1. Using concatenate to build a string that references a row range might not be the best way to do it.
Example: If I have values in row 55
=SUM(INDIRECT(CONCATENATE(ROW()-1, ":", ROW()-1)))
=SUM(INDEX(INDIRECT(CONCATENATE(ROW()-1, ":", ROW()-1)),MATCH(ISTEXT(2:2),2:2,0)))
I was trying something like a horizontal sumifs() formula with little luck, attempting to use the modulus value of the column() function as a logical test.
formula doesn't work
=SUMIFS(INDIRECT(ROW()-1&":"&ROW()-1), MOD(COLUMN()-2, 6), 0)
Or Using some other method of testing which columns to add.
=SUMIFS(INDIRECT(CONCATENATE(ROW()-1, ":", ROW()-1)), IF(ISTEXT(2:2), 1, 0), TRUE)
If I change my lettering in Row 2 (N, H, T) to just "X" then test for X that works, but this formula doesn't factor in the requirement for only adding values from current and prior periods.
=SUMIFS(INDIRECT(CONCATENATE(ROW()-1, ":", ROW()-1)),2:2, "X")
I don't know of a way to accomplish adding up a dynamic number of indirect cell references based on the column you're in. So lets say its row 55 in period 3, I would need a formula that looks in row 2, sees each of the column values (H, N, T) and adds up H55, N55, T55). That same formula would need to construct a different list based on if its in period 2. (H, N), (H55, N55).
Maybe I need to rethink my approach entirely? Write VBA instead?
Edit
To better expand on what the data model is, to address some comments, I've thrown some dummy values and dirty formulas in.
Have a look at service level vs. service level year to date (YTD). Service level is just a flat data entry of weekly performance, then the Summary column is a simple average of the weekly performance in order to report period performance. The YTD number is an average of the period numbers, so these values progressively roll up.
The formulas I'm trying to write are for the summary columns, both period value and YTD values.
It's not entirely clear what your data layout is.
So, assuming:
Labels that identify columns to sum are in row 2
Values to sum are in row 55
Formula is to sum values in row 55, which have a non-blank entry in row 2, and sum values in columns up to and including the column the formula is in
Formula
=SUMPRODUCT($55:$55,--(COLUMN($55:$55)<=COLUMN()),--($2:$2<>""))
For column T use:
=SUM(IF(MOD(COLUMN($H:T),6)=2,$H$1:T$1,0))
This is an array formula and must be confirmed with Ctrl+Shift+Enter.
change the $H$1:T$1 to the rownumber you need to sum (it will only sum every sixth column starting with H)
Having UPEH at Row 9 and this code in row 10 then =SUM(IF(MOD(COLUMN($H:T),6)=2,$H$9:T$9,0))
If set correct one time you can copy paste it as you need it (as long as it stays with just sum every 6th column starting at H)
for making it more dynamically you may better use:
=SUM(IF($A$4:T$4="Summary",$A$9:T$9,0))
This is an array formula and must be confirmed with Ctrl+Shift+Enter.
it checks for Row 4 to contain "Summary" to get the values to sum :)
EDIT
However, if you want to have exactly the same formula in each part you would need to use something like that:
=SUM(IF(AND($4:$4="Summary",COLUMN($4:$4)<=COLUMN(),OFFSET($1:$1,ROW()-2,),0))
This is an array formula and must be confirmed with Ctrl+Shift+Enter.
it sums all the cells 1 row over itself from the beginning till (including) the own column for all columns containing "Summary" in row 4
however, this may get pretty slot pretty fast (calcs a LOT) ^^
BIG HINT: Just looking at what you have/need
lets asume the cells to add are in row 1 and the output in row 2...
we also skip the columns not to calculate (to make it easy)...
A2 would be just A1
B2 would be A1 + B1
C2 would be A1 + B1 + C1... but wait!
A1 + B1 = B2 so better -> C2 = B2 + C1
leads to:
R2Cx = R2C(x-1) + R1Cx
if you just use that behavior in column N (that it is the value over it and the calculated value to the left (column H)) and also write it that way, you could just copy it and paste it in column T and you will get =T(above) + N(calculated). check it :)

How does the SUMPRODUCT command works in this example?

The following code allows me to determine distinct values in a pivot table in Excel:
=SUMPRODUCT(($A$A:$A2=A2)*($B$2:$B2=B2))
See also: Simple Pivot Table to Count Unique Values
The code runs perfectly fine. However, can somebody help me understand how this code actually works?
You write: the following code allows me to determine distinct values in a pivot table in Excel
No. That formula alone does not do that. Read on for the explanation of what does.
There's a typo in the formula. It should be
=SUMPRODUCT(($A$2:$A2=A2)*($B$2:$B2=B2))
See the difference?
The formula starts in row 2 and is copied down. In each row, the $A$2 reference and the $B$2 reference will stay the same. The $ signs make them absolute references. The relative references $A2 and A2 will change their row numbers when copied down, so in row 3 the A2 will change to A3 and B2 will change to B3. In the next row it will be A4 and B4, and so on.
You may want to create a sample scenario with data similar to that in the thread you link to. Then use the "Evaluate Formula" tool on the Formulas ribbon to see step by step what is calculated. The formula evaluates from the inside out. Let's assume the formula has been copied down to row 5 and we are now looking at
=SUMPRODUCT(($A$2:$A5=A5)*($B$2:$B5=B5))
($A$2:$A5=A5) this bit compares all the cells from A2 to A5 with the value in A5. The result is an array of four values, either true or false. The next bit ($B$2:$B5=B5) also returns an array of true or false values.
These two arrays are multiplied and the result is an array of 1 or 0 values. Each array has the same number of values.
The first value of the first array will be multiplied with the first value of the second array. (see the red arrows)
The second value of the first array will be multiplied with the second value of the second array. (see the blue arrows)
and so on.
True * True will return 1, everything else will return 0. The result of the multiplication is:
The nature of the SumProduct function is to sum the result of the multiplications (the product), so that is what it does.
This function alone does not do anything at all to establish distinct values in Excel. In the thread you link to, the Sumproduct is wrapped in an IF statement and THAT is where the distinct values are identified.
=IF(SUMPRODUCT(($A$2:$A2=A2)*($B$2:$B2=B2))>1,0,1)
In plain words: If the combination of the value in column A of the current row and column B of the current row has already appeared above, return a zero, otherwise, return a 1.
This marks distinct values of the combined columns A and B.
Firts, i think you made a type here, as the formula should be :
=SUMPRODUCT(($A$2:$A2=A2)*($B$2:$B2=B2))
Let's decompose it in 2 parts:
First, we check the cells between A2 and A2, so only one cell, and we check the number of cells wich are equals to A2. In this case, the output should be 1, as you're comparing A2 with A2. However, you're not limited to compare A2 with A2. If you had chosen 2 cells equals to A2, the results would have been 2.You can compare as many cells as you want with A2 (replace the characters after the $ to modulate).
We do the same for the second bracket, except the pivot value is B2.
After that, you need to understand what the function SUMPRODUCT does. It sum the value of the product for a range of array. For example, say you have the value 1 on A1, 1 on A2, 2 on B1 and 3 on B2, if you make SUMPRODUCT((A1:A2)*(B1:B2)) , you will obtain (1*2) + (1*3) = 5. So, in the example you gave us, it will give the sum of (A2=A2)*(B2=B2) = 1.
So, it will output the number of pair (Ax,Bx) which is equals to (A2,B2). With the link, you can see that, if you select the first line only, the function will output 1 (and so the IF will output 1), but if you select the first 2 lines, the function will output 2, (and so the IF will output 0).
I hope this made sense to you, as i hoped i didn't make any mistakes along the explanation.

How can I calculate deciles with a range of 12,000 cells in excel?

I have a column of 12,000+ numbers, both positive and negative, sorted from highest to lowest in an Excel spreadsheet.
Is there an easy way to go about dividing this range into deciles?
This may not be the most efficient solution, but you might try the following:
Assuming your numbers are in cells A1 through A12000, enter the following formula in cell B1 =PERCENTRANK($A$1:$A$12000,A1,1). This calculates the percent rank, with the set of values in cells $A$1:$A$12000, of the value in cell A1, rounded down to 1 decimal place (which is all you need to identify the decile).
Copy the formula in cell B1 to cells B2 through B12000.
Use the values in column B to identify the decile for the corresponding value in column A. 0 identifies values greater than or equal to the 0th percentile and less than the 10th percentile, 0.1 identifies values greater than or equal to the 10th percentile and less than the 20th percentile, and so on. Depending on the size of your set and whether or not there are duplicates, there may or may not be a value that gets assigned a PERCENTRANK of exactly 1.
If you are using Excel 2010, you might, depending on your needs, consider using the new functions PERCENTRANK.INC and PERCENTRANK.EXC that are supposed to supercede PERCENTRANK.
Hope this helps.
Assuming your data is in column A, in a neighboring column in row 1 put this formula and then fill down:
=IF(A1<PERCENTILE(A:A,0.1),1
,IF(A1<PERCENTILE(A:A,0.2),2
,IF(A1<PERCENTILE(A:A,0.3),3
,IF(A1<PERCENTILE(A:A,0.4),4
,IF(A1<PERCENTILE(A:A,0.5),5
,IF(A1<PERCENTILE(A:A,0.6),6
,IF(A1<PERCENTILE(A:A,0.7),7
,IF(A1<PERCENTILE(A:A,0.8),8
,IF(A1<PERCENTILE(A:A,0.9),9,10
)))))))))
This will display a 1 for the first decile, 2 for the second, 3 for the third, etc.
I had same query, found answer on this forum:
https://www.mrexcel.com/forum/excel-questions/581682-create-decile-segments.html
Try:
=INT((ROWS($A$1:A1) - 1) * 10 / ROWS($A$1:$A$3890))+1

Resources