Build proper sum formula in excel - excel

I am trying to figure out how excel can fill down a column if i make a function like sum(A1:A2), sum(A1:A3) and so on. i have had no luck thus far with successfully filling as what would occur is the following result for the respective cells; sum(A1:A2), sum(A2:A3). I am sure there is a very simple fix but I am not typically an excel user.

While it may be tempting to enter the simple formula =SUM(A$2:A2) into B2 and copy it down, if you're going to sum large ranges this is actually incredibly inefficient on large ranges compared to the formula =SUM(B1,A2)
Why? Let's say you copy =SUM(A$2:A2) down 10 rows.
Your result at row 2 only had to sum 1 number: the number in A2.
Your result at row 3 has to sum 2 numbers: the numbers in A2:A3.
Your result at row 4 has to sum 3 numbers: the numbers in A2:A4.
...
Your result at row 11 has to sum 10 numbers: the numbers in A2:A11.
So how many numbers did Excel have to add in total to produce the answers you calculated in B2:B11?
1+2+3+4+5+6+7+8+9+10 = 55
But if we're using the other approach i.e. =SUM(B1,A2) then all we're doing for each row is adding the number to the left to the previously calculated sum above. So on each row, we sum only two numbers together. Meaning to produce the same 10 answers, the amount of numbers that Excel has to add to produce the exact same totals in B2:B11 are:
2+2+2+2+2+2+2+2+2+2 = 20
Now let's extrapolate that, to some sizeable ranges.
Yikes! So how much does this matter in the real world, given we've all got pretty fast computers good at math?
If you fill rows A2:A100000 with some numbers and then put =SUM(B1,A2) in B2 and fill down, it takes well under a second to calculate on my PC. But if you put =SUM(A$2:A2) instead, it takes almost a minute.
My advice: Get out of the habit of using =SUM(A$2:A2). One day you'll thank me for it.

Related

Excel: Make a dynamic formula that counts a specified max sum of X consecutive days

I am trying to make a formula that could count the max sum of any number of consecutive days that I indicate in some cell. Here is the dataset and the formula:
Dataset
The formula that calculates the maximum sum of three consecutive days:
=MAX(IFERROR(INDEX(
INDEX(E2:AI2,0)+
INDEX(F2:AI2,0)+
INDEX(G2:AI2,0),
0),""))
As you can see the number of days here is determined by the number of rows in the formula that start with "Index". The only difference between these rows is the letters (E, F, G). Is there any way I could reference a cell in which I could put a number for those days, instead of adding more rows to this formula?
Another approach avoding use of Offset is to use Scan to generate an array of running totals, then subtract totals which are N elements apart (where N is the number of consecutive cells to be added):
=LET(range,E2:AI2,
length,A1,
runningTotal,SCAN(0,range,LAMBDA(a,b,a+b)),
sequence1,SEQUENCE(1,COLUMNS(range)-length+1,A1),
sequence2,SEQUENCE(1,COLUMNS(range)-length+1,0),
difference,INDEX(runningTotal,sequence1)-IF(sequence2,INDEX(runningTotal,sequence2),0),
MAX(difference))
The answer here was posted by another user on another website, so I will repost it here:
One way to achieve this without relying on a VBA solution would be to use the BYCOL() function (available for Excel for Microsoft 365):
=BYCOL(array, [function])
The array specifies the range to which you want to apply your function, and the function itself is specified in a lambda statement. In the end, you want to get the minimum value of the sum of x consecutive days. Assuming that your data is stored in the range E2:AI2 and the number of consecutive days is stored in cell A1, the function looks like this:
=MIN(BYCOL(E2:AI2,LAMBDA(col,SUM(OFFSET(col,,,,A1)))))
The MIN() part ensures that you get only the smallest sum of the array (all sums of the x consecutive values) returned. The array is simply the range in which your data is stored; it is named in the lambda argument col and consequently used by its name. In your case, you want to apply the sum function for, e.g., x = 4 consecutive days (where 4 is stored in cell A1).
However, with this simple specification, you run into the problem of offsetting beyond cells with values toward the right end of the data. This means that the last sum you get would be 81.8 (value on 31 Jan) + 3 times 0 because the cells are empty. To avoid this, you can combine your function with an IF() statement that replaces the result with an empty cell if the number of empty cells is greater than 0. The adjusted formula looks like this:
=MIN(BYCOL(E2:AI2,
LAMBDA(col,IF(COUNTIF(OFFSET(col,,,,A1),"")>0,"",SUM(OFFSET(col,,,,A1))))))
If you do not have the Microsoft 365 version, there are two approaches that would also work. However, the two approaches are a bit more tedious, especially for cases with multiple days (because the number of days can not really be set automatically; except for potentially constructing the ranges with a combination of ADDRESS() and INDIRECT()), but I would still argue a bit neater than your current specification:
=MIN(INDEX(E2:AF2+F2:AG2+G2:AH2+H2:AI2,0))
=SUMPRODUCT(MIN(E2:AF2+F2:AG2+G2:AH2+H2:AI2))
The idea regarding the ranges is the same in both scenarios, with a shift in the start and end of the range by 1 for each additional day.
Another approach getting to the same result:
=LET(range,E2:AI2,
cons,4,
repeat,COLUMNS(range)-cons+1,
MAX(
BYROW(SEQUENCE(repeat,cons,,1)-INT(SEQUENCE(repeat,cons,0,1/cons))*(cons-1),
LAMBDA(x,SUM(INDEX(range,1,x))))))
This avoids OFFSET (volatile, slowing your file down) and the repeat value, consecutive number and/or the range are easily changeable.
Hope it helps (I answered to the max sum, as stated in the title). Change max to min to get the min sum result.
Edit:
I changed the repeat part in the formula to be dynamic (max number of consecutive columns in range), but you can replace it by a number or a cell reference.
The cons part can also be linked to a cell reference.
Also found a big in my formula which is fixed.

Compare multiple columns as pair-wise for Excel/Google Sheets

I am new to Excel/Google Sheets. I have a difficulty of writing a formula to compare columns as a pair-wise since the formula would be
so big as the day goes.
For example, there're 2 main columns Foo and Bar. I want to find the total number of days that Foo
and Bar are equal so the current formula is =IF(A3 = G3, 1, 0)+IF(B3 = H3, 1, 0)+IF(C3 = I3, 1, 0)+...
But this is kind of tedious because there're ~40 days to compare with. Are there any other alternatives
to write a formula in efficient way? Either Google-App-Scripts or Excel Formula is appreciated.
Cheers!
Give a try on below google-sheet formula. Adjust ranges as you need.
=ArrayFormula(SUM(IF(A3:E3=G3:K3,1,0)))
Assuming that you're needing to get such a total for each row and not merely a single row, try this:
=ArrayFormula(IF(A3:A="",,MMULT(IF(A3:F=G3:L,1,0),SEQUENCE(COLUMNS(A:F),1,1,0))))
Of course you will need to adjust the three ranges to match your own FOO and BAR ranges.
This one formula will produce all results for all rows.
The MMULT function is tricky to explain to those as yet unfamiliar with it. But it's a powerful tool. I'll add a picture I created that may best explain what it does:
By making the second matrix a simple SEQUENCE of 1s as long as the other matrix is wide, we wind up multiplying everything by 1 before adding together. And since anything multiplied by 1 is itself, this combination serves only to do a row-by-row add.
Things to keep in mind with MMULT:
1.) Every cell in every matrix must be a number or it will produce an error.
2.) As in the above formula, there are ways to use either/or conditions to turn every cell in a matrix into a number.

Formula to read text as number

Okay so as you can see each column has a value of 1-3. Currently there's just a basic formula at the end that works out the percentage by adding whats there and then dividing by the total possible if every column was 3 and then multiplying by 100.
My problem is I would like to put 'n/a' down as a result sometimes where a number value/score wasn't relevant. But when it comes to calculating the percentage it would be marked lower since it would be three less than the total possible.
So I assume there a few different ways I could tackle this problem, either by a formula that calculates the possible total based upon only the cells that have a numerical value or a formula which reads n/a as the value 3. But I can't seem to find something that works.
Please help, thanks.
You can do this:
The countif() counts the n/a and multiplies by 3 to be subtracted from the 24.
Okay I think I figured this out and the solution was far more simple that I thought it would be.
Originally the Score as % formula was =SUM(T14/24*100) 24 being the total is all cells were 3.
So what I've done is add a new column called possible total with the formula =COUNT(K14:R14)*3 As the count function only counts cells that have a numerical value, thus will ignore any cells with n/a and since 3 is the maximum value that can be entered into the cell I've multiplied the count by 3.
Then it was a simple case of changing the score formula to =SUM(T14/S14*100) where the S column will be the new possible total.

averaging every nth rows and excluding values

I want to average every 5 rows but also to exclude in the average values that are less than 50. This is the command to average every 5 rows.
=AVERAGE(OFFSET($L$3,(ROW()-ROW($P$2))*5,,5))
This is the command to exclude values less than 50
=AVERAGEIF(L3:L8,">50")
How do I combine those two in one command?
Thanks to a colleague of mine, the following works like a gem.
=IFERROR(AVERAGEIF(OFFSET($L$3,(ROW()-ROW($P$2))*5,,5),">50"),0)
As long as you have Excel 2016, a SUMPRODUCT formula will work.
=SUMPRODUCT((MOD(ROW($A$1:$A100),5)=0)*($A$1:$A$100<50)*$A$1:$A$100)/SUMPRODUCT((MOD(ROW($A$1:$A100),5)=0)*($A$1:$A$100<50)*1)
I assumed your data was in A1:A100 so update that as needed. And in the MOD formula, I used 5 for ever 5th row. If you need to change that, change the 5 in MOD formulas. Lastly, the formula does not include the value 50 since you stated you wanted less than 50.
'Tiny' Differences
Correction
=IFERROR(AVERAGEIF(OFFSET(L$3,(ROW()-ROW(L$2))*5,,5),">50"),0)
Visualize
Let's first visualize what you're actually doing.
For every five rows in column L you are displaying the average of the values, if they are greater than or equal to 50. (in this example) in column G:
For L3:L8 in G2,
for L9:L13 in G3,
for L14:L18 in G4 etc.
Issues
The 1st issue is that the formula is written exclusively for the
2nd row. If you want the first result to be displayed in the first row, the formula will result in a REF! error.
If you want to display the first result in the 1st row you have to
change L$2 to L$1:
=IFERROR(AVERAGEIF(OFFSET(L$3,(ROW()-ROW(L$1))*5,,5),">50"),0)
or for the 3rd row you have to change L$2 to L$3:
=IFERROR(AVERAGEIF(OFFSET(L$3,(ROW()-ROW(L$3))*5,,5),">50"),0)
The 2nd issue is that you are doing something in column L and for
no obvious reason you are using column P in your formula. You could
have used any column Z, AN or CG, but your doing stuff in
column L, so use L.
The 3rd issue is that you have locked the columns $L which means where
ever you put the formula in a single row, the result will be the same. If
you don't lock them, you can copy the formula e.g. to the right and
it will display the results for columns M, N, O etc.:
Other Formulas
=SUM(OFFSET(L$3,(ROW()-ROW(L$2))*5,,5))
=COUNT(OFFSET(L$3,(ROW()-ROW(L$2))*5,,5))
=AVERAGE(OFFSET(L$3,(ROW()-ROW(L$2))*5,,5))
=SUMIF(OFFSET(L$3,(ROW()-ROW(L$2))*5,,5),">50")
=COUNTIF(OFFSET(L$3,(ROW()-ROW(L$2))*5,,5),">50")
AVERAGEIF is available in Excel from version 2007, but for older versions the following formula can be used instead:
=IF(COUNTIF(OFFSET(L$3,(ROW()-ROW(L$2))*5,,5),">"&50)=0,0,SUMIF(OFFSET(L$3,(ROW()-ROW(L$2))*5,,5),">"&50)/COUNTIF(OFFSET(L$3,(ROW()-ROW(L$2))*5,,5),">"&50))
It first checks if COUNTIF results in 0. If it does it displays 0, otherwise it divides SUMIF with COUNTIF.

how to convert series of numbers into 0 to 10 range

I have a large series of numbers that I want to convert to a 0 to 10 scale.
I used the following formula to convert the maximum value to 10 and minimum value to 0,
=IF(A1="-","0",MIN(MAX((A1-MIN(A:A))/((MAX(A:A)-MIN(A:A))/11),0),10))
However,I face some problems converting the series where maximum value should be 0 and minimum value should be 10. For example, if column A has the values,
1
4
6
7
8
then 8 should have a value of 0 and 1 should have a value of 10.
Thanks!
Just use the formula =10-B1, where B1 is the cell containing your mentioned formula.
Please note though that your formula has the following flaws:
It is wrong. If you test it with the three numbers 1,2,3 you get 5.5 for the value corresponding to 2. Obviously the correct answer should be 5. This error is caused by the number 11 that you use to divide the (MAX(A:A)-MIN(A:A)). Change it to 10 and everything will work!
It returns #DIV/0! if you have only one number in column A.
It is inefficient because it calls time-expensive functions MAX(A:A) and MIN(A:A) in each and every cell containing this formula. Since these two functions are not dependent on the formula-containing cell, consider using them only once in some other cells and subsequently modify your formula so it contains links to these external cells rather than the functions themselves.
It is hardly maintainable and/or readable. It took me a while to understand how your formula works. Consider separating it into meaningful pieces, place the pieces into separate cells and finally simply link the pieces together in some final - and much smaller - formula.
It is unnecessarily convoluted. There is a much easier formula to achieve the same thing, based on the following:
= 10*B1/C1,
where B1 contains the "distance from minimum", i.e. A1-MIN(A:A), and C1 contains the total length of your range of numbers, i.e. MAX(A:A)-MIN(A:A)

Resources