I want to use Solver to find a target sum from a subset of a larger set. (10 number subset of 16 numbers). I'm not very familiar with Solver and I'm unsure of the best way to set my constraints.
I have 5 categories of numbers within the set, A, B, C, D, E. The set of 16 may include different combinations of these categories, but the target subset total has a defined makeup:
A, B, B, C, C, C, (B or C), (B or C),D, E
So my question is, how would I introduce these rules into Solver?
Secondarily, I know Solver comes up with 1 solution, is there a way to get a range of solutions that are closest to the target? So, if I have a set that doesn't quite produce the target, I could see 3 solutions that are nearest the target (plus or minus)?
Thanks.
TL/DR - Here's a way to setup the problem you have. Getting a series of answers that are "close" is out of scope for Solver. You could do it with VBA, if you define additional criteria to search around. (Edit - added set up for defined makeup)
Comments
It seems there is no criteria about "a number must be selected from each category", or "a number must be selected from category X". Rather, you have a bunch of numbers, each belongs to a category, but you want to find the combination of numbers that provide a certain total - then investigate the categories afterward.
Initial Problem Setup
From the information you provided in your comment, I put together the following ...
Column A (Category) and Column B (Value) is the information you provided. Column C (Selected) is just 0 or 1. Column D (Result) is the multiplication =B2*C2 filled down.
F2 is the sum of Column D. G2 is the target value for this sum (as you provided). H2 is the squared error of the calculation =(F2-G2)^2.
This is the solver setup ...
Set Objective: is set to $H$2 - the squared error of the calculation
To: is set to Min. You could choose Value Of: 0, but if there is no exact answer, it may fail.
By Changing Variable Cells: is set to $C$2:$C$17 or Column C.
Subject to the Constraints: includes $C$2:$C$17 = binary. This forces the values to be either 0 or 1.
Select a Solving Method: is set to Evolutionary. GRG Nonlinear can sometimes solve this type of problem, but it takes much longer than Evolutionary.
Below is the result - different from what you reported. When I force your result, I get 106.61 for CalcTotal (perhaps a transcription error somewhere) ...
Finding Near Results
Solver is an optimizer, so it will find "the answer" that best fits your objective.
In order for it to provide you with different answers, you need to provide it with different objectives.
For example, you may want to force a solution that uses C1, but is still closest to your objective. The setup might look like this...
... with this solution ...
In that example, I excluded $C$10 from the "By Changing Variable Cells:" and from the "Subject To Constraints:" fields. Another way to accomplish the same thing would have been to keep the original field values unchanged, but to add an additional constraint of $C$10 = 1.
Or, as another example, perhaps you want a solution that chooses exactly two values from Category B. You could modify the setup to include sums of values selected from each category, and add a constraint. Here is the setup ...
... and the corresponding result ...
Making an Algorithm
If you determine criteria for the range of solutions you are looking for, you could setup a VBA sub to loop through setting up Solver, getting the solution, and storing the results of the "Selected Column" for review.
If you want to pursue this track, I suggest you look at other solutions provided on this site, try to set something up, and ask a new question if you have problems.
Edit - missed one of your criteria
I missed addressing your statement - "the target subset total has a defined makeup: A, B, B, C, C, C, (B or C), (B or C),D, E"
In this instance, using the setup containing sums for each category, you can specify these constraints ...
$G$4 = 1
$G$5 <= 4
$G$5 >= 2
$G$6 <= 5
$G$6 >= 3
$G$7 = 1
$G$8 = 1
On first pass, it provided this result: A1, B2, B3, C1, C3, C4, C5, C6, D1, E1 with a total of 106.38.
Related
I'm not very good in VBA, so used helper columns for the beginning of my explanation. If you can do all the operations in VBA, feel free to show me how (if not, skip to the next paragraph).
So in column B I have case numbers (ex. 12345, 12346, etc.) that sometimes repeat themselves (e. 12345, 12345). When there is a repetition, I need all of the same case numbers to show the same case status, which match the one which is the least advanced (data in column D). Therefore the first 12345 could be Planning and the next 12345 could be Action. In column E, I would need the both of them to show Planning. For this, I made a table with associations (ex: Table1), where "Planning" would be 1, "Action" would be 2 and so on.
Therefore, my first helper column has the following formula:
=IFERROR(IF(COUNTIF(B:B, B3)>1, VLOOKUP(D3, Table1, 2, False), D3), "")
If the case number has no repetition, it can keep its original value. The results give me something like this:
While I need it to look like this at the very least:
So I can change it back to this:
So how can I determine the minimum by case number and apply this minimum to all the entries with duplicate case numbers? Any and all help appreciated.
No need for VBA.
If I understand you correctly, you enter a number in Column C that corresponds to the status.
The following formula will return the least advanced status for any duplicates: (the table I named tblStatus)
=VLOOKUP(AGGREGATE(15,6,1/($B$1:$B$8=B1)*$C$1:$C$8,1),tblStatus,2,FALSE)
Edit:
If, as you show in your example, you might have either the status number OR the status text in column C, then you need to convert the text to a number in order to use the MIN function equivalent.
In that case, try:
=VLOOKUP(AGGREGATE(15,6,1/($B$1:$B$8=B1)*IF(ISNUMBER($C$1:$C$8),$C$1:$C$8,MATCH($C$1:$C$8,tblStatus[Text],0)),1),tblStatus,2,FALSE)
Working from this spreadsheet: https://docs.google.com/spreadsheets/d/1aiZzzOFzPDrw_siMhL8XiNIp3q7f8nz58HvwBKn14uA/edit?usp=sharing
Trying to calculate sum of a column where value in another col is true: this works well, like this:
=SUMIF($B$5:$B$17,"NY",G5:G17)
I'd like to be able to calculate this but, also multiply the range in question by another variable, in column C (i.e. the "rate"), so that I get the sum of each number of hours in column G done by the resources in Col A, multiplied by the rate charged for each resource.
this obviously doesn't work, but this is the type of thing I'm looking for: =SUMIF($B$5:$B$17,"NY",G5:G17*(Corresponding value in Col C))
Any ideas?
SUMIFS will only accept a range so I don't think you can do it that way. The alternative is to use SUMPRODUCT.
In Google sheets:
=sumproduct((B5:B17="NY"),C5:C17,G5:G17)
Excel is slightly more picky: you have to coerce the inner bracket to a number, either
=sumproduct(--(B5:B17="NY"),C5:C17,G5:G17)
or
=sumproduct((B5:B17="NY")*C5:C17*G5:G17)
I can't get my head wrapped around this multi conditional between two columns. I have two columns A and B but would like to use some formulas to compare each "grouping" of column A. For example in Column A, if all "group 2" has all Column B values as Pass, it is a pass.
Edit: I've updated it with some more rules since this just a bit more complicated for me to wrap my head around.
There are only 5 criteria:
PASS, PROG, UNAVAIL, IGNORE, "BLANK"
Rules:
FAIL if subgroup has 1 or more fail
IGNORE if subgroup has 1 or more ignore
PASS if ALL PASS or combination of PASS and UNAVAIL
PROG if NOT fail and a combination of PASS, UNAVAIL, PROG
"BLANK"s are treated as UNAVAIL
Appreciate any help, thank you!
(Answer changed to reflect new criteria)
This sheet:
Was created with the following two formulas (using named ranges in A-C where the name is in the first row):
In C1 I entered (then copied)
=CONCATENATE(TRIM(A2),"-", IF(LEN(TRIM(B2)) > 0, TRIM(B2), "UNAVAIL"))
In F2 I entered (then copied)
=IF(COUNTIF(Tag, E2 &"-FAIL") >0, "FAIL",IF(COUNTIF(Tag, E2 &"-IGNORE") >0,"IGNORE",IF(COUNTIF(Group,E2) = COUNTIF(Tag, E2 &"-PASS") + COUNTIF(Tag, E2 &"-UNAVAIL"),"PASS","PROG")))
The 4th case is like an else at the bottom of a switch -- no need to explicitly check the condition.
along the same lines,
add a column C, where if the result is pass value is 0,
result is fail value will be -100000 or so (large negative)
for uncertain use some prime negative like -3/
then use a pivot table and use the sum of the values
Then you can use formulas to deduce different conditions.
the use of large negative number is to be be able to if all the results
are uncertain, as long as they don't overlap the range less than -100000 or so.
anyway you get the idea.
This formula may work (enter as an array formula CTRL-ENTER):
=IF(SUM(IF(IF($A$2:$A$23=$E2,$B$2:$B$23,"OUT_OF_GROUP")="FAIL",1,0))>0,"FAIL",IF(SUM(IF(IF($A$2:$A$23=$E2,$B$2:$B$23,"OUT_OF_GROUP")="IGNORE",1,0))>0,"IGNORE",IF(SUM(IF(IF($A$2:$A$23=$E2,$B$2:$B$23,"OUT_OF_GROUP")="PASS",1,0))+SUM(IF(IF($A$2:$A$23=$E2,$B$2:$B$23,"OUT_OF_GROUP")="UNAVAIL",1,0))+SUM(IF(IF($A$2:$A$23=$E2,$B$2:$B$23,"OUT_OF_GROUP")=0,1,0))=SUM(IF($A$2:$A$23=$E2,1,0)),"PASS","PROG")))
Here Group and Result are in $A$2:$A$23 and $B$2:$B$23, respectively. E2:E11 holds "s1" through "s10". I assumed Rule 3 meant that a combination of UNAVAIL and blanks is a PASS --- the logic could be modified to make that a PROG.
My workbook is a bit complicated, but the basic problem can be illustrated with a short example.
Let's assume that I have 5 cells in Excel: A, B, C, D, and E.
A = 0
B = 5 * A
C = 0
D = 10 * C
E = B + D
In the Excel Solver Function I select cell E as the objective that is to be maximised, and cells A and C as the variable cells. Furthermore, I add the constraint that cell E must not exceed 10, and the second constraint that cells A and C must be integers.
The ideal solution would be that the value in cell A should be 0, and the value in cell C should be 10. In the more complicated version of this problem, however, Excel cannot find the optional solution.
The way the current formulas look like I expect that Excel could find the right solution, if excel only used natural numbers to find the optimal solution. For example, Excel currently would look at the outcome for A = 0.01 and C = 9.99. Instead, Excel should strictly compare outcomes for variable choices such as A = 1 and C = 9.
How can I make the Excel solver function operate with natural numbers only?
I suggest you keep your current cells, and create a mirror set beside them. Each mirror cell will equal the rounded version of the original cell. ie: RoundedA will have the formula:
=ROUND(A,0)
Then when you do your data analysis, solve for the rounded version of those cells, with the changing variable being one of the "original" cells.
EDIT
As discussed below, you may need to 'trick' data validation into creating the values you want.
Assume A1 is going to be your "testing" data validation cell. B1 is a permanently blank cell. Set another cell, say, C1, equal to:
=randbetween(1,10)
This will create a random integer between 1 and 10. Set your "variable" cell to be equal to C10. So, your variable cell will always be a random number between 1 & 10 (or you could set "1" equal to the smallest number of your other variables, and "10" equal to the largest number of your other variables. This will create the scope needed to answer your question).
Then when you do data validation, make it try to get A1 = TRUE. Do this by figuring out what the 'test' condition is of your cells. Something like "when X = 0, I know that all my variables are correct". ie set A1 to:
=if(X=0,1,0)
Then do data validation by changing B1 (your blank, unrefered-to cell), waiting until A1 = 1. Does that make sense? It will turn C1 into the random cycling variable between LOWVALUE and HIGHVALUE, always using integers. Data validation will stop when A1 = 1 (which happens when some value X = 0). B1 will just spin uselessly until Data validation finds something (or it will stop after a few hundred tries if it finds nothing).
To get Solver to try larger increment divide the target cells by a large number and in such a way, each increment that solver tries is larger. (for example it changes the target cells by +-0.00001 - in that case divide the target cells by 100000 or more).
First of all, let me show you guys the equation in question.
In this equation S, V, and t are known constants. CFL is also known. We have an initial value for D, and we have no idea what k is.
What I need to do is find ideal values for both D and k that would minimize the residuals squared of a calculated CFL and a measured CFL. Using residuals squared is just a way for me to check if they're the best possible values, but it's fine if there's another way to go about this that uses some other method.
The residual squared is just the absolute value of the difference between the calculated and measured CFLs, which is then squared. The lower the residual squared, the better the fit we have. So I need the smallest possible residual squared resulting from putting both k and D into the equation. That'll result in a calculated CFL, which I can then compare to a measured CFL, allowing me to calculate the residual squared.
My first idea for how to do this, since I'm not sure how to use Excel equations, was to fix the value of D (since we have an initial starting value to work from) and then vary through different values of k, putting them into the equation to find a calculated CFL, and comparing that to the measured to find the residuals squared, until I find one that results with the smallest residuals squared. Then I fix k at that ideal value, and vary D until I find the smallest residual there as well. Then I fix D again, and go back to varying k. My idea was that I could keep bouncing back and forth like that until both D and k were within a certain percentage of their previous values. I assumed it would reach some sort of equilibrium with this method
However, the numbers just go crazy, and end up either going to zero or going to infinity. So I need to rework my process. Which is where you guys come in!
How would you go about finding the most ideal values for both D and k, which would result in a calculated CFL closest to the measured one, assuming you are given values for every variable above apart from k? Remember to factor in that the value of D given initially is simply a starting place to work from, and is not the most ideal value.
I've been working on this program for a long time (at least a month), and I'm just stuck as hell and desperate. I was hoping you guys could help me out.
Here are some initial values to work with:
S = 19.634954
V = 12.271846
D (initial) = 0.01016482
CFL (measured) = 0.401
t = 4
k = ?
Thank you for any ideas you might have.
As Dean said, your system has two unknowns, and in the general case an infinite number of solutions (different pairs of (D,k)). By fixing D, CFL is a continuous function of k, and as such, you should be able to find a k that gives the CFL you measured (within some accuracy). For this problem (i.e., finding k given CFL) you can use the Goal Seek tool. Here is how:
1) Problem setup:
Use the name of the variables to name the cells in which you input their values (Go to Formulas--> Defined Names --> Define Name and give some the name of each variable to a cell). Then input the values of your parameters in these cells, (give k an arbitrary value, eg = 1), and input the formula in cell CFL like:
=(S/V)*SQRT(D/k)*(ERF(SQRT(k*t))+SQRT(k*t/PI())*EXP(-k*t))
Again, note that S,V,D,k and t are defined as named ranges.
2) Problem Solution:
Go To Data --> Data Tools --> What-If Analysis --> Goal Seek and enter the following parameters:
Set Cell: CFL
To value: 0.401
By changing cell: k
This gave me k=0.151759378, which results in CFL = 0.401261265054823.
I hope this helps?
Edit: Finding some solution pairs using VBA:
1) Place the measured CFL value in a cell (I chose H2).
2) Replace named ranges k, D and CFL. I used rngK, rngD and rngCFL, each one starting from row 2 till row 20.
3) Fill down rngD with a step (I took 0.01) using the formula =INDEX(rngD,ROW()-ROW($C$2))+0.01. The first entry of rngD is in cell C2 and has the value 0.01016482. The formula is copied down to all other cells in the range.
4) Fill down rngK with some initial values (I took =1).
5) Fill down the rngCFL range with the formula =(S/V)*SQRT(INDEX(rngD,ROW()-ROW($G$1))/INDEX(rngK,ROW()-ROW($G$1)))*(ERF(SQRT(INDEX(rngK,ROW()-ROW($G$1))*t))+SQRT(INDEX(rngK,ROW()-ROW($G$1))*t/PI())*EXP(-INDEX(rngK,ROW()-ROW($G$1))*t)). I use the ROW() and INDEX() functions to refer to the Range element I need.
6) Finally, use this code in a sub:
Dim iCnt As Long
For iCnt = 1 To Range("rngk").Count
Range("rngCFL")(iCnt).GoalSeek goal:=Range("H2"), changingCell:=Range("rngK")(iCnt)
Next iCnt
The above generates 19 pairs (D,k) that give the measured CFL value.
You can't solve for two unknown variables in a 1 formula system. However if I take D as given then you have a 1 unknown/1 formula system.
I just simply used 1 column as a guess of k (for me column B. I used another column to represent the calculated CFL with the guessed k (for me column C). I have another column that has either a 1 or -1 (for me column D). Lastly I have a column that represents the absolute value by which I want to increment my guess.
I named cells with the given values of the variables to make it easier to use them.
I started with a guess of k=1.
Here are my formulas in my first row which was 7.
B7=.1
C7 =(s/v)*(d/B7)^0.5*(ERF(((B7*t)^0.5))+((B7*t)/PI())^0.5*EXP(-1*B7*t))
nothing in D7 or E7
in row 8:
B8=B7+E8+D8
C8==(s/v)*(d/B8)^0.5*(ERF(((B8*t)^0.5))+((B8*t)/PI())^0.5*EXP(-1*B8*t))
D8=1
E8=.01
in row 9 the B and C column is just copied down but D and E are as follows
D9==IF(C9>cfl,1,-1)
E9==IF(D9=D8,E8,E8/10)
Once you get those in you can just copy down however many rows you want.
What this does is every time the residual of the CFL switches signs the increment's sign will also flip. Additionally, the absolute value of the increment will also shrink by a factor of 10 to give more precision as it goes.
This is by no means the best way to solve your problem but it is a way.