Excel: Sum up N largest numbers in non-continous array of cells - excel

I am not that fluent in Excel, so my question is potentially very simple but nevertheless gives me a headache.
I have a speadsheet with two types of values. One absolute value, and one that is calculated as a percentage from this absolute value.
A
1 10
2 0.1
3 20
4 0.2
5 30
6 0.3
7 40
8 0.4
9 50
10 0.5
In this example, the second value is 1% from the first value (e.g. 0.1 from 10). In my actual table these values differ and the numbers are random. The % fraction from the second value depends on some key etc. So this is a simplified representation for the sake of a minimal example.
I want to determine the sum of the largest 4 (out of 5) numbers, but only from those 1% (e.g. 0.1, not 10) values. The numbers are all below each other. Basically, i want to ignore the absolute numbers (e.g. 10) and only apply the relative (e.g. 0.1) numbers.
The LARGE function determines the largest n numbers and has the following format:
=SUM(LARGE(array, k))
The array represents a continuous range in the table. However, I need to throw in a selected set of fields. Is there a way to do this with set of cells?
In other words, if i use the array I have
=SUM(LARGE(A1:A10, {1,2,3,4}))
the algorithms will always pick up 20,30,40 and 50.
Ideally, I want something like this:
=SUM(LARGE(array(A2,A4,A6,A8,A10), {1,2,3,4}))
Help?

Using your provided sample data, something like this regular formula (does not require array entry) should work for you because the percentages will always be less than or equal to 1:
=SUMPRODUCT(LARGE((A1:A10<=1)*A1:A10,{1,2,3,4}))
If you want a more flexible way of grabbing the top N numbers, you can substitute the {1,2,3,4} with the ROW() function, like so:
=SUMPRODUCT(LARGE((A1:A10<=1)*A1:A10,ROW(1:4)))
EDIT: If the only way to get relative values is if they are every other row, starting in row 2, you can use this formula instead:
=SUMPRODUCT(LARGE(INDEX((MOD(ROW(A1:A10),2)=0)*A1:A10,),ROW(1:4)))

For your simplified example, suppose that B1:B4 contains the values 1,2,3,4. Then in C1:C4 enter the array formula:
{=LARGE(IF(MOD(ROW(A1:A10),2)=0,A1:A10,-1),B1:B4)}
Similarly, the formula
{=SUM(LARGE(IF(MOD(ROW(A1:A10),2)=0,A1:A10,-1),B1:B4))}
(using Ctrl + Shift + Enter to accept as an array formula)
will give you the sum of the top 4.
This assumes that the numbers are all positive. You can replace the -1 in the formula by the min of all the values -1 if that assumption isn't valid.
Another approach, if the criteria for being a relative cell is too ad-hoc to be summarized by a simple formula but if you have a listing of the cells is to use the Indirect function:
In the above screenshot I have a listing of the cells containing the relative values. In D1 I put the formula =INDIRECT(C1) and copied down. Then, the formula
=SUM(LARGE(D1:D5,{1,2,3,4}))
returns the desired sum.
There might be a way to dispense with the helper column, though the function INDIRECT seems to not play very nicely with array formulas.
On Edit: Here is a VBA solution:
'The following function returns the sum of the largest k
'elements in range R that are at the list of indices
'if indices is left blank, then the sum of
'the largest k in R is returned
Function SumLargest(R As Range, k As Long, ParamArray indices() As Variant) As Double
Dim A As Variant
Dim i As Long, n As Long
Dim sum As Double
n = UBound(indices)
If n = -1 Then
For i = 1 To k
sum = sum + Application.WorksheetFunction.Large(R, i)
Next i
SumLargest = sum
Exit Function
Else
ReDim A(0 To n)
For i = 0 To n
A(i) = R.Cells(indices(i)).Value
Next i
For i = 1 To k
sum = sum + Application.WorksheetFunction.Large(A, i)
Next i
SumLargest = sum
End If
End Function
If you put this function in a standard code module then it can be used from the worksheet like:
=SumLargest(A1:A10,4,2,4,6,8,10)
this last returns the sum of the largest 4, drawn from the entries at 2,4,6,8,10

Related

Provide a single Excel Formula for calculating Binomial Coefficients (N,K) in Excel with positive or negative N

Is there a single excel formula that can take integer inputs N and K and generate the binomial coefficient (N,K), for positive or negative (or zero) values of N?
The range of N and K should be fairly small e.g. -11 < N < +11 and -1 < K < +11. Otherwise large numbers will be generated that exceed excel's capabilities.
CONTEXT
Excel does not provide a Binomial function. So how how to get around this? The binomial function for positive N is straightforward:- Binomial(N,K) = Factorial(N)/(Factorial(N-K)*Factorial(K)). But this doesn't work for negative N.
For information on Binomial Coefficients there is useful stuff in Ken Ward's pages on Pascals Triangle and Extended Pascal's Triangle.
I wanted to make a similar tabular resource in Excel...but with one single table covering positive, zero and negative values of $N$.
One efficient way of doing this is to define a single formula which can be used in every cell of the table. The formula should discriminate between values of N which are negative, zero, or positive and use appropriate logic to obtain the correct output in each case.
Of course, rather than build a whole table, the same formula can be used to calculate the binomial coefficient for a single (N,K) input pair of values.
For anyone else who ends up here through google, Excel actually does have COMBIN for N >= K >= 0. If you know the inputs would otherwise be valid, one option for handling K > N would be IFERROR(COMBIN(N, K), 0), with the advantage that you only specify N and K once, and the disadvantage of hiding when your assumptions inevitably turn out to be wrong.
To bring it around to an actual answer (though honestly I'd have preferred just leaving a comment if the site would let me), the other answer's formula can then be simplified to
IF(A1>-1,IF(B1>A1,0,COMBIN(A1,B1)),(-1)^(B1)*COMBIN(-A1-1+B1,B1))
as a bonus, it seems to be able to handle a larger range of inputs, as however COMBIN is implemented avoids the issue of the FACTs temporarily exceeding 1.8e308 even though much of that would be cancelled out in the division.
You can simulate a binomial function by using a conditional formula in a single Excel cell which takes as input the contents of two other cells.
e.g. if worksheet cells A1 and A2 contain the numeric values corresponding to N,K in the binomial expression (N,K) then the following conditional formula can be put in another worksheet cell (e.g. A3)...
=IF(A1>-1,IF(B1>A1,0,(FACT(A1)/(FACT(B1)*FACT(A1-B1)))),(-1)^(B1)*(FACT(-A1-1+B1)/(FACT(B1)*FACT(-A1-1+B1-B1))))
This will handle positive and negative (and zero) values of N. In this solution both N and K must be integers (including zero). There is a limit to the size of N and K that excel will be able to cope with (but I havent tested the limits beyond the range -11
The excelf formula uses the conditional construct: IF(test,operation if true, operation if false).
In pseudo-code the logic is as follows:-
IF(N>-1) THEN
IF(K>N) THEN
Result = 0
ELSE
Result = (N!)/(K!*(N-K)!)
ENDIF
ELSE
Result = (-1)^(K) * (-N-1+K)! / ( (K)! * (-N-1+K-K)! )
ENDIF
Note the formula uses the Upper Negation Identity to determine coefficients when N is negative:-
(-N,K) = (-1)^K * (K-N-1,K).
Pascal's Triangle Table
To create a "Pascal's Triangle"-type table for negative and positive values of N, proceed as follows.
(1) Create a new blank excel worksheet.
(2) In column B put the integer N values (starting at cell B4 and proceeding downwards):-
e.g Nmin,Nmin-1,...-2,-1,0,1,2,3,...,Nmax-1,Nmax.
(3) In row 3 put the integer K values (starting at cell C3 and proceeding rightwards):-
0,1,2,3,...Kmax.
(4) Then in cell (C4) enter the conditional formula:-
=IF($B4>-1,IF(C$3>$B4,0,(FACT($B4)/(FACT(C$3)*FACT($B4-C$3)))),(-1)^(C$3)*(FACT(-$B4-1+C$3)/(FACT(C$3)*FACT(-$B4-1+C$3-C$3))))
(5) Copy cell C4 and paste it to all cells in the grid bounded (at left and at top) by your N and K values.
The grid cells will then contain the binomial ceofficient corresponding to (N,K).

Average the sum of rows without a creating new column in Excel

Here's a sample of my matrix:
A B C D E
1 0 0 1 1
0 0 0 0 0
0 0 1 1 0
0 2 1
You can think of each row as a respondent and each column as an item on a questionnaire.
My goal is to take an average of the sum of each row (i.e. total score for each respondent) without creating a new column AND accounting for the fact that some or all of the entries in a given row are empty (e.g., some respondents
missed some items [see row 5] or didn't complete the questionnaire entirely [see row 3]).
The desired solution for this matrix = 1.67, whereby
[1+0+0+1+1 = 3] + [0+0+0+0+0 = 0] + [0+0+1+1+0 = 2]/3 == 5/3 = 1.67
As you can see, we have averaged over three values despite there being five rows because one has missing data.
I am already able to take an average of the sum of rows which are only summed for non-missing entries, e.g.,:
=AVERAGE(IF(AND(A1<>"",B1<>"",C1<>"",D1<>"",E1<>""),SUM(A1:E1)),IF(AND(A2<>"",B2<>"",C2<>"",D2<>"",E2<>""),SUM(A2:E2)),IF(AND(A3<>"",B3<>"",C3<>"",D3<>"",E3<>""),SUM(A3:E3)),IF(AND(A4<>"",B4<>"",C4<>"",D4<>"",E4<>""),SUM(A4:E4)),IF(AND(A5<>"",B5<>"",C5<>"",D5<>"",E5<>""),SUM(A5:E5)))
However, this results in a value of 1 because it treats any row with some or all values values as = 0.
It does the following:
[1+0+0+1+1 = 3] + [0+0+0+0+0 = 0] + [0+0+0+0+0 = 0] + [0+0+1+1+0 = 2] + [0+0+0+0+0 = 0]/4 == 5/5 = 1
Does anyone have any ideas about how to adapt the current code to average over non-missing values or an alternative way of achieving the desired result?
You can do this more concisely with an array formula, but the short answer to fix up your existing formula is, if you have a blank cell in your sheet somewhere (say it's F1) AVERAGE will ignore blank cells so change your formula to
=AVERAGE(IF(AND(A1<>"",B1<>"",C1<>"",D1<>"",E1<>""),SUM(A1:E1),F1),IF(AND(A2<>"",B2<>"",C2<>"",D2<>"",E2<>""),SUM(A2:E2),F1),IF(AND(A3<>"",B3<>"",C3<>"",D3<>"",E3<>""),SUM(A3:E3),F1),IF(AND(A4<>"",B4<>"",C4<>"",D4<>"",E4<>""),SUM(A4:E4),F1),IF(AND(A5<>"",B5<>"",C5<>"",D5<>"",E5<>""),SUM(A5:E5),F1))
This would be one array formula version of your formula - it uses OFFSET to pull out each row of the matrix then SUBTOTAL to see if every cell in that row has a number in it. Then it uses SUBTOTAL again to work out the sum of each row and AVERAGE to get the average of rows.
=AVERAGE(IF(SUBTOTAL(2,OFFSET(A1,ROW(A1:A5)-ROW(A1),0,1,COLUMNS(A1:E1)))=COLUMNS(A1:E1),SUBTOTAL(9,OFFSET(A1,ROW(A1:A5)-ROW(A1),0,1,COLUMNS(A1:E1))),""))
Has to be entered as an array formula using CtrlShiftEnter
Note 1 - some people don't like using OFFSET because it is volatile - you can use matrix multiplication instead but it's arguably less easy to understand.
Note 2 - I used "" instead of referring to an empty cell. Interesting that the non-array formula needed an actual blank cell but the array formula needed an empty string.
You can omit the empty string
=AVERAGE(IF(SUBTOTAL(2,OFFSET(A1,ROW(A1:A5)-ROW(A1),0,1,COLUMNS(A1:E1)))=COLUMNS(A1:E1),SUBTOTAL(9,OFFSET(A1,ROW(A1:A5)-ROW(A1),0,1,COLUMNS(A1:E1)))))
Basically, what you're describing here for your desired result is the =AVERAGEA() function
The Microsoft Excel AVERAGEA function returns the average (arithmetic
mean) of the numbers provided. The AVERAGEA function is different from
the AVERAGE function in that it treats TRUE as a value of 1 and FALSE
as a value of 0.
With that in mind, the formula should look like this.
=SUM(AVERAGEA(A1:A4),AVERAGEA(B1:B4),AVERAGE(C1:C4),AVERAGEA(D1:D4),AVERAGEA(E1:E4))
Produces the expected result:
Note, if you want to ROUND() the result to two digits, add the following formula to it:
=ROUND(SUM(AVERAGEA(A1:A4),AVERAGEA(B1:B4),AVERAGE(C1:C4),AVERAGEA(D1:D4),AVERAGEA(E1:E4)), 2)

Excel - Generate Random Number based on 2 other columns

I am not sure how to accomplish the following task:
On columnA I have numbers, on columnB I have names associated to those numbers.
I would like to be able and generate a random number from column A, but only from the numbers that do not have anything associated on column B. For eg.: If I have a number in A1, but I do not have a name in B1, the number from A1 should be one of those that could be randomly generated.
If A2 has a number, and B2 is not empty (it has a name), I do not want to have A2 in the random range.
Also, it would be great to be able and re-generate the random number (aka, If I assign a number, let's say, 321, I want to be able somehow to generate the random number again, but it should not take into consideration 321 as it was already assigned).
I appreciate any ideas.
Sample
You can try the following:
based on your sample data/spreadsheet, do the following:
1) in Cell C1: put the value "0"
2) in Cell C2: put the formula:
=IF(ISBLANK(B2), 1, 0)+OFFSET(C2,-1,0,1,1)
3) copy the formula in C2 down alongside all your data.
4) Put this formula anywhere you want to display the randomized # from Column A:
=OFFSET(A1,MATCH(RANDBETWEEN(1,COUNTA(A:A)-COUNTA(B:B)),C:C,0)-1,0,1,1)
I think you'll have a hard time doing it with worksheet functions. Here's one quick-and-dirty approach that works at least some of the time---
In two (or more) cells, add the formula =RANDBETWEEN(COLUMN(A1),COLUMN(D1)), replacing A1 and D1 with the first and last cells in your row of numbers. This will pick random column numbers.
Give each of those cells a unique name, e.g., therand, therand2, ... .
In another cell, which will be your answer, add =IF(ISBLANK(INDIRECT(ADDRESS(2,therand))),INDIRECT(ADDRESS(1,therand)),INDIRECT(ADDRESS(1,therand2))). This will check whether the first random column number corresponds to a blank label (ISBLANK(...)) and, if so, use the corresponding row-1 value. If not, it will use the row-1 value from the second random column number.
Hit F9 to recalcualate.
This will not always give you a blank-labeled cell. Add more random column numbers and corresponding IF(...) clauses to reduce the probability of a mis-selection.
Edit: you said "row A" in the original question, but I see you meant "column A" (rather than "row 1"). You'll need to transpose the rows and columns in the formulas above to make them work.
Here is a VBA answer. It can be used directly as a worksheet function:
Function RandUnassigned(R As Range) As Variant
Application.Volatile
Dim A As Variant, i As Long, n As Long
Dim num As Long
n = R.Rows.Count
ReDim A(1 To n)
For i = 1 To n
If Len(R.Cells(i, 2)) = 0 Then
num = num + 1
A(num) = R.Cells(i, 1)
End If
Next i
RandUnassigned = A(Application.WorksheetFunction.RandBetween(1, num))
End Function
Used like this:
This recalculates every time the spreadsheet does. If you want it to draw a random number just once, remove the line Application.Volatile from the function definition.

Excel Solver: Solving based on an average

I have a parameter in A1 that influences "TOTAL" in a random and very high standard deviation. Lets say A1 is 2...then TOTAL Values could be 1...5...17...3...2..2...etc If A1 is 1 then TOTAL Values could be 1....3...5..15...9...10..etc
I would like solver to figure out which value in A1 would equate to the best AVERAGE of TOTAL after X runs. Where I can define X.
In my example you can tell that A1=1 is better on average after 6 runs. However, if you run solver normally it would say A1=2 is the best, because it produced a value of 17.
This doesn't seem to be the kind of problem you solve with solver. Why not write a macro that loops through the values of A1, X times, keeping a running sum of the TOTAL values for each A1? When it's all over, the largest sum is also the largest average.
The inner loop will be something like this:
Redim tSum(1 to maxA1)
for i = 1 to maxA1
tSum(i) = 0
for j = 1 to X
[A1] = i
Application.calculate
tSum(i) = tSum(i) + TOTAL
next j
next i
'now step through tSum. The index of the largest value
' is the value of A1 desired. Put it in a handy cell.
It has to be a macro, not a function because it changes A1.

How do I repeat function over several row

I'll admit that I'm not an Excel guru so maybe someone here can help me.
On my worksheet I have several blocks of data.
I calculate the sum of all items within column D of that block.
Within each block I am checking the value of the cell in column C and if it contains the letter "y" and the value in column D of that row is equal to zero I must exclude the total sum of column D.
Currently I am doing this by multiplying the sum value by either 1 or 0 which is produced by running a test over the cell contents.
Below is an example of what I am using to test rows 23 to row 25 inclusively for data in Column D. I am also performing the same on Column E and G, but the "y" character is always in column C, hence the absolut column reference.
=IF(AND($C23="y",D23=0),0,1)*IF(AND($C24="y",D24=0),0,1)*IF(AND($C25="y",D25=0),0,1)
There must be a more efficient way to do this.
Ideally I would like to write a function that I can paste into a cell and then select the rows or cells over which I run the test.
Can anyone point me in the right direction?
I'm not sure if I understand the question 100%, but here's an answer anyway:
{=NOT(SUM((C23:C25="y")*(D23:D25=0)))*SUM(D23:D25)}
Don't enter the curly braces, rather enter the formula using Control+Shift+Enter to make it an array formula. Here's what it does:
First, it counts all the rows where C is 'y' and D is zero. That will return (in this example) 0, 1, 2, or 3. The NOT will change the zero to a 1 and change anything else to a zero. Then that one or zero will be multiplied by the sum of column D.
So if zero rows have both 'y' and 0, multiply the SUM by 1. If more than zero rows have both 'y' and 0, multiply the SUM by 0.
This Function should work:
Option Explicit
Public Function getMySum(src As Range, sumColumn As Integer)
Dim iRow As Range
Dim yColumn As Integer
yColumn = 1
getMySum = 0
For Each iRow In src.Rows
If (Strings.StrConv(iRow.Cells(1, yColumn), vbLowerCase) <> "y") Or (iRow.Cells(1, sumColumn) <> 0) Then
getMySum = getMySum + iRow.Cells(1, sumColumn)
Else
getMySum = 0
Exit For
End If
Next iRow
End Function
You need to add this Code to a VBA Module
The function call for your Example would be: =getMySum("C:23:D25", 2)
The only other option I see would be to combine the value in c and d like =C23&";"&D23 and sumif VLOOKUP that searches for "y;0" returns an error.

Resources