Excel Solver: Solving based on an average - excel

I have a parameter in A1 that influences "TOTAL" in a random and very high standard deviation. Lets say A1 is 2...then TOTAL Values could be 1...5...17...3...2..2...etc If A1 is 1 then TOTAL Values could be 1....3...5..15...9...10..etc
I would like solver to figure out which value in A1 would equate to the best AVERAGE of TOTAL after X runs. Where I can define X.
In my example you can tell that A1=1 is better on average after 6 runs. However, if you run solver normally it would say A1=2 is the best, because it produced a value of 17.

This doesn't seem to be the kind of problem you solve with solver. Why not write a macro that loops through the values of A1, X times, keeping a running sum of the TOTAL values for each A1? When it's all over, the largest sum is also the largest average.
The inner loop will be something like this:
Redim tSum(1 to maxA1)
for i = 1 to maxA1
tSum(i) = 0
for j = 1 to X
[A1] = i
Application.calculate
tSum(i) = tSum(i) + TOTAL
next j
next i
'now step through tSum. The index of the largest value
' is the value of A1 desired. Put it in a handy cell.
It has to be a macro, not a function because it changes A1.

Related

Average the sum of rows without a creating new column in Excel

Here's a sample of my matrix:
A B C D E
1 0 0 1 1
0 0 0 0 0
0 0 1 1 0
0 2 1
You can think of each row as a respondent and each column as an item on a questionnaire.
My goal is to take an average of the sum of each row (i.e. total score for each respondent) without creating a new column AND accounting for the fact that some or all of the entries in a given row are empty (e.g., some respondents
missed some items [see row 5] or didn't complete the questionnaire entirely [see row 3]).
The desired solution for this matrix = 1.67, whereby
[1+0+0+1+1 = 3] + [0+0+0+0+0 = 0] + [0+0+1+1+0 = 2]/3 == 5/3 = 1.67
As you can see, we have averaged over three values despite there being five rows because one has missing data.
I am already able to take an average of the sum of rows which are only summed for non-missing entries, e.g.,:
=AVERAGE(IF(AND(A1<>"",B1<>"",C1<>"",D1<>"",E1<>""),SUM(A1:E1)),IF(AND(A2<>"",B2<>"",C2<>"",D2<>"",E2<>""),SUM(A2:E2)),IF(AND(A3<>"",B3<>"",C3<>"",D3<>"",E3<>""),SUM(A3:E3)),IF(AND(A4<>"",B4<>"",C4<>"",D4<>"",E4<>""),SUM(A4:E4)),IF(AND(A5<>"",B5<>"",C5<>"",D5<>"",E5<>""),SUM(A5:E5)))
However, this results in a value of 1 because it treats any row with some or all values values as = 0.
It does the following:
[1+0+0+1+1 = 3] + [0+0+0+0+0 = 0] + [0+0+0+0+0 = 0] + [0+0+1+1+0 = 2] + [0+0+0+0+0 = 0]/4 == 5/5 = 1
Does anyone have any ideas about how to adapt the current code to average over non-missing values or an alternative way of achieving the desired result?
You can do this more concisely with an array formula, but the short answer to fix up your existing formula is, if you have a blank cell in your sheet somewhere (say it's F1) AVERAGE will ignore blank cells so change your formula to
=AVERAGE(IF(AND(A1<>"",B1<>"",C1<>"",D1<>"",E1<>""),SUM(A1:E1),F1),IF(AND(A2<>"",B2<>"",C2<>"",D2<>"",E2<>""),SUM(A2:E2),F1),IF(AND(A3<>"",B3<>"",C3<>"",D3<>"",E3<>""),SUM(A3:E3),F1),IF(AND(A4<>"",B4<>"",C4<>"",D4<>"",E4<>""),SUM(A4:E4),F1),IF(AND(A5<>"",B5<>"",C5<>"",D5<>"",E5<>""),SUM(A5:E5),F1))
This would be one array formula version of your formula - it uses OFFSET to pull out each row of the matrix then SUBTOTAL to see if every cell in that row has a number in it. Then it uses SUBTOTAL again to work out the sum of each row and AVERAGE to get the average of rows.
=AVERAGE(IF(SUBTOTAL(2,OFFSET(A1,ROW(A1:A5)-ROW(A1),0,1,COLUMNS(A1:E1)))=COLUMNS(A1:E1),SUBTOTAL(9,OFFSET(A1,ROW(A1:A5)-ROW(A1),0,1,COLUMNS(A1:E1))),""))
Has to be entered as an array formula using CtrlShiftEnter
Note 1 - some people don't like using OFFSET because it is volatile - you can use matrix multiplication instead but it's arguably less easy to understand.
Note 2 - I used "" instead of referring to an empty cell. Interesting that the non-array formula needed an actual blank cell but the array formula needed an empty string.
You can omit the empty string
=AVERAGE(IF(SUBTOTAL(2,OFFSET(A1,ROW(A1:A5)-ROW(A1),0,1,COLUMNS(A1:E1)))=COLUMNS(A1:E1),SUBTOTAL(9,OFFSET(A1,ROW(A1:A5)-ROW(A1),0,1,COLUMNS(A1:E1)))))
Basically, what you're describing here for your desired result is the =AVERAGEA() function
The Microsoft Excel AVERAGEA function returns the average (arithmetic
mean) of the numbers provided. The AVERAGEA function is different from
the AVERAGE function in that it treats TRUE as a value of 1 and FALSE
as a value of 0.
With that in mind, the formula should look like this.
=SUM(AVERAGEA(A1:A4),AVERAGEA(B1:B4),AVERAGE(C1:C4),AVERAGEA(D1:D4),AVERAGEA(E1:E4))
Produces the expected result:
Note, if you want to ROUND() the result to two digits, add the following formula to it:
=ROUND(SUM(AVERAGEA(A1:A4),AVERAGEA(B1:B4),AVERAGE(C1:C4),AVERAGEA(D1:D4),AVERAGEA(E1:E4)), 2)

Sigma of product involving exponential

I have a function f(n) to be computed into Column E in my worksheet. The table is a dynamic range which grows with new input.
The formula describes the kinetics of a dose-response. On day 1, there are no terms to sum. On day 2, there is one summation term. On day 3, there are two summation terms and so on. More weightage is provided to w(i) closer to day of calculation and lesser to those earlier through the exponential term.
Function description as shown in the following image :
Function f(n)
[1
n = number of days, basically how many ever rows are present.
k1 & tau1 are constants provided already.
w(i) is provided on each row which is an input to the function f(n).
Could you please advise a computationally cheap way to compute this sigma summation over several hundred rows in Excel, thanks!
UPDATED :
How worksheet looks:
I tried this formula :
$B$4*SUMPRODUCT(D2*EXP(-1*(ROW(INDIRECT($A$2&":"&C2-1)))/$B$3))
B4 = k1, D2 =w(n), A2 = Start, C2 = n, B3 = tau.
See worksheet and formula assignment: https://i.stack.imgur.com/ULjI0.jpg
The implementation is not correct because if I go to any row and set w(n) = 0, then f(n) tends to go to 0 which doesn't seem right. Appreciate any help to correct the error, thanks.

Excel: Sum up N largest numbers in non-continous array of cells

I am not that fluent in Excel, so my question is potentially very simple but nevertheless gives me a headache.
I have a speadsheet with two types of values. One absolute value, and one that is calculated as a percentage from this absolute value.
A
1 10
2 0.1
3 20
4 0.2
5 30
6 0.3
7 40
8 0.4
9 50
10 0.5
In this example, the second value is 1% from the first value (e.g. 0.1 from 10). In my actual table these values differ and the numbers are random. The % fraction from the second value depends on some key etc. So this is a simplified representation for the sake of a minimal example.
I want to determine the sum of the largest 4 (out of 5) numbers, but only from those 1% (e.g. 0.1, not 10) values. The numbers are all below each other. Basically, i want to ignore the absolute numbers (e.g. 10) and only apply the relative (e.g. 0.1) numbers.
The LARGE function determines the largest n numbers and has the following format:
=SUM(LARGE(array, k))
The array represents a continuous range in the table. However, I need to throw in a selected set of fields. Is there a way to do this with set of cells?
In other words, if i use the array I have
=SUM(LARGE(A1:A10, {1,2,3,4}))
the algorithms will always pick up 20,30,40 and 50.
Ideally, I want something like this:
=SUM(LARGE(array(A2,A4,A6,A8,A10), {1,2,3,4}))
Help?
Using your provided sample data, something like this regular formula (does not require array entry) should work for you because the percentages will always be less than or equal to 1:
=SUMPRODUCT(LARGE((A1:A10<=1)*A1:A10,{1,2,3,4}))
If you want a more flexible way of grabbing the top N numbers, you can substitute the {1,2,3,4} with the ROW() function, like so:
=SUMPRODUCT(LARGE((A1:A10<=1)*A1:A10,ROW(1:4)))
EDIT: If the only way to get relative values is if they are every other row, starting in row 2, you can use this formula instead:
=SUMPRODUCT(LARGE(INDEX((MOD(ROW(A1:A10),2)=0)*A1:A10,),ROW(1:4)))
For your simplified example, suppose that B1:B4 contains the values 1,2,3,4. Then in C1:C4 enter the array formula:
{=LARGE(IF(MOD(ROW(A1:A10),2)=0,A1:A10,-1),B1:B4)}
Similarly, the formula
{=SUM(LARGE(IF(MOD(ROW(A1:A10),2)=0,A1:A10,-1),B1:B4))}
(using Ctrl + Shift + Enter to accept as an array formula)
will give you the sum of the top 4.
This assumes that the numbers are all positive. You can replace the -1 in the formula by the min of all the values -1 if that assumption isn't valid.
Another approach, if the criteria for being a relative cell is too ad-hoc to be summarized by a simple formula but if you have a listing of the cells is to use the Indirect function:
In the above screenshot I have a listing of the cells containing the relative values. In D1 I put the formula =INDIRECT(C1) and copied down. Then, the formula
=SUM(LARGE(D1:D5,{1,2,3,4}))
returns the desired sum.
There might be a way to dispense with the helper column, though the function INDIRECT seems to not play very nicely with array formulas.
On Edit: Here is a VBA solution:
'The following function returns the sum of the largest k
'elements in range R that are at the list of indices
'if indices is left blank, then the sum of
'the largest k in R is returned
Function SumLargest(R As Range, k As Long, ParamArray indices() As Variant) As Double
Dim A As Variant
Dim i As Long, n As Long
Dim sum As Double
n = UBound(indices)
If n = -1 Then
For i = 1 To k
sum = sum + Application.WorksheetFunction.Large(R, i)
Next i
SumLargest = sum
Exit Function
Else
ReDim A(0 To n)
For i = 0 To n
A(i) = R.Cells(indices(i)).Value
Next i
For i = 1 To k
sum = sum + Application.WorksheetFunction.Large(A, i)
Next i
SumLargest = sum
End If
End Function
If you put this function in a standard code module then it can be used from the worksheet like:
=SumLargest(A1:A10,4,2,4,6,8,10)
this last returns the sum of the largest 4, drawn from the entries at 2,4,6,8,10

Let Excel Solver Operate with Natural Numbers

My workbook is a bit complicated, but the basic problem can be illustrated with a short example.
Let's assume that I have 5 cells in Excel: A, B, C, D, and E.
A = 0
B = 5 * A
C = 0
D = 10 * C
E = B + D
In the Excel Solver Function I select cell E as the objective that is to be maximised, and cells A and C as the variable cells. Furthermore, I add the constraint that cell E must not exceed 10, and the second constraint that cells A and C must be integers.
The ideal solution would be that the value in cell A should be 0, and the value in cell C should be 10. In the more complicated version of this problem, however, Excel cannot find the optional solution.
The way the current formulas look like I expect that Excel could find the right solution, if excel only used natural numbers to find the optimal solution. For example, Excel currently would look at the outcome for A = 0.01 and C = 9.99. Instead, Excel should strictly compare outcomes for variable choices such as A = 1 and C = 9.
How can I make the Excel solver function operate with natural numbers only?
I suggest you keep your current cells, and create a mirror set beside them. Each mirror cell will equal the rounded version of the original cell. ie: RoundedA will have the formula:
=ROUND(A,0)
Then when you do your data analysis, solve for the rounded version of those cells, with the changing variable being one of the "original" cells.
EDIT
As discussed below, you may need to 'trick' data validation into creating the values you want.
Assume A1 is going to be your "testing" data validation cell. B1 is a permanently blank cell. Set another cell, say, C1, equal to:
=randbetween(1,10)
This will create a random integer between 1 and 10. Set your "variable" cell to be equal to C10. So, your variable cell will always be a random number between 1 & 10 (or you could set "1" equal to the smallest number of your other variables, and "10" equal to the largest number of your other variables. This will create the scope needed to answer your question).
Then when you do data validation, make it try to get A1 = TRUE. Do this by figuring out what the 'test' condition is of your cells. Something like "when X = 0, I know that all my variables are correct". ie set A1 to:
=if(X=0,1,0)
Then do data validation by changing B1 (your blank, unrefered-to cell), waiting until A1 = 1. Does that make sense? It will turn C1 into the random cycling variable between LOWVALUE and HIGHVALUE, always using integers. Data validation will stop when A1 = 1 (which happens when some value X = 0). B1 will just spin uselessly until Data validation finds something (or it will stop after a few hundred tries if it finds nothing).
To get Solver to try larger increment divide the target cells by a large number and in such a way, each increment that solver tries is larger. (for example it changes the target cells by +-0.00001 - in that case divide the target cells by 100000 or more).

Can I use a built-in Excel solver to solve this equation somehow? If not, how would you go about it?

First of all, let me show you guys the equation in question.
In this equation S, V, and t are known constants. CFL is also known. We have an initial value for D, and we have no idea what k is.
What I need to do is find ideal values for both D and k that would minimize the residuals squared of a calculated CFL and a measured CFL. Using residuals squared is just a way for me to check if they're the best possible values, but it's fine if there's another way to go about this that uses some other method.
The residual squared is just the absolute value of the difference between the calculated and measured CFLs, which is then squared. The lower the residual squared, the better the fit we have. So I need the smallest possible residual squared resulting from putting both k and D into the equation. That'll result in a calculated CFL, which I can then compare to a measured CFL, allowing me to calculate the residual squared.
My first idea for how to do this, since I'm not sure how to use Excel equations, was to fix the value of D (since we have an initial starting value to work from) and then vary through different values of k, putting them into the equation to find a calculated CFL, and comparing that to the measured to find the residuals squared, until I find one that results with the smallest residuals squared. Then I fix k at that ideal value, and vary D until I find the smallest residual there as well. Then I fix D again, and go back to varying k. My idea was that I could keep bouncing back and forth like that until both D and k were within a certain percentage of their previous values. I assumed it would reach some sort of equilibrium with this method
However, the numbers just go crazy, and end up either going to zero or going to infinity. So I need to rework my process. Which is where you guys come in!
How would you go about finding the most ideal values for both D and k, which would result in a calculated CFL closest to the measured one, assuming you are given values for every variable above apart from k? Remember to factor in that the value of D given initially is simply a starting place to work from, and is not the most ideal value.
I've been working on this program for a long time (at least a month), and I'm just stuck as hell and desperate. I was hoping you guys could help me out.
Here are some initial values to work with:
S = 19.634954
V = 12.271846
D (initial) = 0.01016482
CFL (measured) = 0.401
t = 4
k = ?
Thank you for any ideas you might have.
As Dean said, your system has two unknowns, and in the general case an infinite number of solutions (different pairs of (D,k)). By fixing D, CFL is a continuous function of k, and as such, you should be able to find a k that gives the CFL you measured (within some accuracy). For this problem (i.e., finding k given CFL) you can use the Goal Seek tool. Here is how:
1) Problem setup:
Use the name of the variables to name the cells in which you input their values (Go to Formulas--> Defined Names --> Define Name and give some the name of each variable to a cell). Then input the values of your parameters in these cells, (give k an arbitrary value, eg = 1), and input the formula in cell CFL like:
=(S/V)*SQRT(D/k)*(ERF(SQRT(k*t))+SQRT(k*t/PI())*EXP(-k*t))
Again, note that S,V,D,k and t are defined as named ranges.
2) Problem Solution:
Go To Data --> Data Tools --> What-If Analysis --> Goal Seek and enter the following parameters:
Set Cell: CFL
To value: 0.401
By changing cell: k
This gave me k=0.151759378, which results in CFL = 0.401261265054823.
I hope this helps?
Edit: Finding some solution pairs using VBA:
1) Place the measured CFL value in a cell (I chose H2).
2) Replace named ranges k, D and CFL. I used rngK, rngD and rngCFL, each one starting from row 2 till row 20.
3) Fill down rngD with a step (I took 0.01) using the formula =INDEX(rngD,ROW()-ROW($C$2))+0.01. The first entry of rngD is in cell C2 and has the value 0.01016482. The formula is copied down to all other cells in the range.
4) Fill down rngK with some initial values (I took =1).
5) Fill down the rngCFL range with the formula =(S/V)*SQRT(INDEX(rngD,ROW()-ROW($G$1))/INDEX(rngK,ROW()-ROW($G$1)))*(ERF(SQRT(INDEX(rngK,ROW()-ROW($G$1))*t))+SQRT(INDEX(rngK,ROW()-ROW($G$1))*t/PI())*EXP(-INDEX(rngK,ROW()-ROW($G$1))*t)). I use the ROW() and INDEX() functions to refer to the Range element I need.
6) Finally, use this code in a sub:
Dim iCnt As Long
For iCnt = 1 To Range("rngk").Count
Range("rngCFL")(iCnt).GoalSeek goal:=Range("H2"), changingCell:=Range("rngK")(iCnt)
Next iCnt
The above generates 19 pairs (D,k) that give the measured CFL value.
You can't solve for two unknown variables in a 1 formula system. However if I take D as given then you have a 1 unknown/1 formula system.
I just simply used 1 column as a guess of k (for me column B. I used another column to represent the calculated CFL with the guessed k (for me column C). I have another column that has either a 1 or -1 (for me column D). Lastly I have a column that represents the absolute value by which I want to increment my guess.
I named cells with the given values of the variables to make it easier to use them.
I started with a guess of k=1.
Here are my formulas in my first row which was 7.
B7=.1
C7 =(s/v)*(d/B7)^0.5*(ERF(((B7*t)^0.5))+((B7*t)/PI())^0.5*EXP(-1*B7*t))
nothing in D7 or E7
in row 8:
B8=B7+E8+D8
C8==(s/v)*(d/B8)^0.5*(ERF(((B8*t)^0.5))+((B8*t)/PI())^0.5*EXP(-1*B8*t))
D8=1
E8=.01
in row 9 the B and C column is just copied down but D and E are as follows
D9==IF(C9>cfl,1,-1)
E9==IF(D9=D8,E8,E8/10)
Once you get those in you can just copy down however many rows you want.
What this does is every time the residual of the CFL switches signs the increment's sign will also flip. Additionally, the absolute value of the increment will also shrink by a factor of 10 to give more precision as it goes.
This is by no means the best way to solve your problem but it is a way.

Resources