Generating multiple optimal solutions using Excel solver - excel

Is there a way to get all optimal solutions when you are solving some problem with Excel Solver (Simplex LP method)?
If not, what is the best way/add-in to Excel to solve it and convert existing VBA code to use this new way?

Actually, I have found a way to do this with Excel solver, although it is not optimal in sense of time consumption but that is not issue for me.
If you can assign unique id for each possible solution on some way, which is true in my case, then for each solution you find you can check if there is some solution with same value with different id on following way :
Find first optimal solution and save solution id and result. I will call this origID, origRes
Check if there is some solution with id < origID and res = origRes
If yes, then consider newId as initial id and continue with step 2 until you can't find solutions which satisfied criteria
After that, do the same thing with condition id > origID and res = origRes
After you make sure you found all solutions with optimal solution origRes, then we can go and find solution which is not optimal as origRes. I did it on a way to add condition that new solution needs to be <= (origRes - 0.01) because I know that all solutions will be with 2 decimal places.
Go to step 2 again
I know this is not the best way but I usually do not need more than 100 solutions and currently I can get it in 2 mins which is acceptable for me.

Although this looks easy, it actually is not such an easy question. Even the definition of "all possible optimal solutions" is not clear. There may by infinitely many of them. Asking for "all basic feasible solutions" (i.e. corner points) sounds better. To my knowledge there are no solvers providing this. I also do not know of a really simple technique to enumerate all optimal bases.
One interesting approach is to use a MIP formulation to enumerate all optimal bases:
Sangbum Lee, Chan Phalakornkule, Michael M. Domach, Ignacio E. Grossmann, "Recursive MILP model for finding all the alternate optima in LP
models for metabolic networks," Computers and Chemical Engineering 24 (2000) 711-716. (link)

Related

Levenshtein Distance in Power Query Using M

I've been using this VBA solution by smirkingman from another similar question for calculating Levenshtein distance between strings. I have a need to translate this to an M code function in Excel Power Query, but don't have the know-how to do so.
Hoping someone can help me out. The 3 basic transformations between strings used in Levenshtein distance are below. Each counts as 1 step. More steps = greater distance between strings.
Insertion
Deletion
Substitution
I thought I could "cheat" and not use a For loop-type structure as shown in the VBA example, but the test results below show that I need a more robust solution.
let
result = (s1 as text, s2 as text) as number =>
List.Max({Text.Length(s1),Text.Length(s2)}) - List.Count(List.Intersect({Text.ToList(s1), Text.ToList(s2)}))
in
result
Test Results
s1
s2
result
explanation
pale
pole
1
substitution
dole
sale
2
substitution (x2)
pool
spool
1
insertion
two
one
2 (incorrect)
substitution and/or insert/delete (3 steps min) EXPECTED: 3
If you're keen to use PowerQuery to achieve this, then you should check the Fuzzy Matching functionality. See details here and here.
However, to me, this is a 'black box' algorithm, and I am not convinced on how accurate/efficient it works... Plus, Microsoft have not published the code behind this function, so the Open Source community cannot investigate it.
The PowerBI Community seem to think that it is implementing the Jaccard similarity algorithm; which is a little bit different to the Levenshtein algorithm which you are familiar with. See more info here.
If you're keen to take a deep-dive in to the internal workings of the string-distance matrix, I implemented this in R some years ago. You can read about this in my Blog and my Repo on this topic.
I would strongly recommend you to not use PowerQuery or VBA for this. There are much much much better libraries in both R and Python for implementing this methodology.

How to handle various units in a single attribute / feature using Pandas?

I have a dataset, on which i am working on Data Cleaning part, where one of the attribute or feature is having the values with various units. for example some of the values are as follow.
1 kg; 6 LB; 900 gms; 32 oz; etc.
If i use the standard scaler then it will not be fair as the values and their units are different, so cannot treat them as is.
Please do suggest how to handle such data.
I will recommend to change the different value to same unit first of all. For example, you can make all the value to kg or whatever suits best for you, and then perform the standard scale.
Thanks All, I did some research and found that i need to convert the various units into standard units and which follow internation norms referred to SI Units https://www.nist.gov/pml/weights-and-measures/metric-si/si-units , and same suggestion has given by #sharmajee499.
Moving ahead with this approach.. though this is going to be a lot of manual code, but seems there is no direct short and easy way.
Please do post if have any better solution.

Hypothesis search tree

I have a object with many fields. Each field has different range of values. I want to use hypothesis to generate different instances of this object.
Is there a limit to the number of combination of field values Hypothesis can handle? Or what does the search tree hypothesis creates look like? I don't need all the combinations but I want to make sure that I get a fair number of combinations where I test many different values for each field. I want to make sure Hypothesis is not doing a DFS until it hits the max number of examples to generate
TLDR: don't worry, this is a common use-case and even a naive strategy works very well.
The actual search process used by Hypothesis is complicated (as in, "lead author's PhD topic"), but it's definitely not a depth-first search! Briefly, it's a uniform distribution layered on a psudeo-random number generator, with a coverage-guided fuzzer biasing that towards less-explored code paths, with strategy-specific heuristics on top of that.
In general, I trust this process to pick good examples far more than I trust my own judgement, or that of anyone without years of experience in QA or testing research!

Close-Enough TSP implementation

I'm looking for a solution to a Close-Enough Traveling Salesman Problem (CETSP) where I have a set of nodes that I need to visit all within a certain distance of optimally. I've found a couple of sources for some approaches towards this TSP variant but was unable to find a solver or a algorithm that I could easily use.
Do you have any suggestions for how I can go about getting a solution to my CETSP problem, whether it be running an implementation of it myself or using an existing solver.
You can try using UFFLP. They have an example where you can find the correct coordinates the salesman is supposed to pass given a predetermined sequence. So you can generate thousands of sequences and choose the best one (just a simple heuristic).
Have a look at http://www.gapso.com.br/en/ufflp-en/
You will find useful information.

String Matching Algorithms

I have a python app with a database of businesses and I want to be able to search for businesses by name (for autocomplete purposes).
For example, consider the names "best buy", "mcdonalds", "sony" and "apple".
I would like "app" to return "apple", as well as "appel" and "ple".
"Mc'donalds" should return "mcdonalds".
"bst b" and "best-buy" should both return "best buy".
Which algorithm am I looking for, and does it have a python implementation?
Thanks!
The Levenshtein distance should do.
Look around - there are implementations in many languages.
Levenshtein distance will do this.
Note: this is a distance, you have to calculate it to every string in your database, which can be a big problem if you have a lot of entries.
If you have this problem then record all the typos the users make (typo=no direct match) and offline build a correction database which contains all the typo->fix mappings. some companies do this even more clever, eg: google watches how users correct their own typos and learns the mappings from this.
Soundex or Metaphone might work.
I think what you are looking for is a huge field of Data Quality and Data Cleansing. I fear if you could find a python implementation regarding this as it has to be something which cleanses considerable amount of data in db which could be of business value.
Levensthein distance goes in the right direction but only half the way. There are several tricks to get it to use the half matches as well.
One would be to use a subsequence dynamic time warping (DTW is actually a generalization of levensthein distance). For this you relax the start and end cases when calcualting the cost matrix. If you only relax one of the conditions you can get autocompletion with spell checking. I am not sure if there is a python implementation available, but if you want to implement it for yourself it should not be more than 10-20 LOC.
The other idea would be to use a Trie for speed up, which can do DTW/Levensthein on multiple results simultaniously (huge speedup if your database is large). There is a paper on Levensthein on Tries at IEEE, so you can find the algorithm there. Again for this you would need to relax the final boundary condition, so you get partial matches. However since you step down in the trie you just need to check when you have fully consumed the input and then return all leaves.
check this one http://docs.python.org/library/difflib.html
it should help you

Resources