Close-Enough TSP implementation - traveling-salesman

I'm looking for a solution to a Close-Enough Traveling Salesman Problem (CETSP) where I have a set of nodes that I need to visit all within a certain distance of optimally. I've found a couple of sources for some approaches towards this TSP variant but was unable to find a solver or a algorithm that I could easily use.
Do you have any suggestions for how I can go about getting a solution to my CETSP problem, whether it be running an implementation of it myself or using an existing solver.

You can try using UFFLP. They have an example where you can find the correct coordinates the salesman is supposed to pass given a predetermined sequence. So you can generate thousands of sequences and choose the best one (just a simple heuristic).
Have a look at http://www.gapso.com.br/en/ufflp-en/
You will find useful information.

Related

Can we apply multi-criteria decision making algorithms in incomplete data?

I am currently working on a project where a multi criteria decision making algorithm is needed in order to evaluate several alternatives for a given goal. After long research, I decided to use the AHP method for my case study. The problem is that the alternatives taken into account for the given goal contain incomplete data.
For example, I am interested in buying a house and I have three alternatives to consider. One criterion for comparing them is the size of the house. Let’s assume that I know the sizes of some of the rooms of these houses, but I do not have information about the actual sizes of the entire houses.
My questions are:
Can we apply AHP (or any MCDM method) when we are dealing with
incomplete data?
What are the consequences?
And, how can we minimize the presence of missing data in MCDM?
I would really appreciate some advice or help! Thanks!
If you still looking for answers, please let me answer your questions.
Before the detail explain, I coludn't answer with a technical approach in programming language.
First, we can use uncertinal data for MCDM, AHP with statical method.
As reducing lost of data, you can use deep learning concepts like entropy.
The result of it will be get reliability by accuracy of probabilistic approach.
The example that you talked, you could find the data of entire extent in other houses has same extent of criteria. Accuracy will depend on number of criteria, reliability of inference.
To get the perfect answer in your problem, you might need to know optimization, linear algebra, calculus, statistics above intermediate level
I'm student in management major, and I would help you as I can. I hope you get what you want

Hypothesis search tree

I have a object with many fields. Each field has different range of values. I want to use hypothesis to generate different instances of this object.
Is there a limit to the number of combination of field values Hypothesis can handle? Or what does the search tree hypothesis creates look like? I don't need all the combinations but I want to make sure that I get a fair number of combinations where I test many different values for each field. I want to make sure Hypothesis is not doing a DFS until it hits the max number of examples to generate
TLDR: don't worry, this is a common use-case and even a naive strategy works very well.
The actual search process used by Hypothesis is complicated (as in, "lead author's PhD topic"), but it's definitely not a depth-first search! Briefly, it's a uniform distribution layered on a psudeo-random number generator, with a coverage-guided fuzzer biasing that towards less-explored code paths, with strategy-specific heuristics on top of that.
In general, I trust this process to pick good examples far more than I trust my own judgement, or that of anyone without years of experience in QA or testing research!

Generating multiple optimal solutions using Excel solver

Is there a way to get all optimal solutions when you are solving some problem with Excel Solver (Simplex LP method)?
If not, what is the best way/add-in to Excel to solve it and convert existing VBA code to use this new way?
Actually, I have found a way to do this with Excel solver, although it is not optimal in sense of time consumption but that is not issue for me.
If you can assign unique id for each possible solution on some way, which is true in my case, then for each solution you find you can check if there is some solution with same value with different id on following way :
Find first optimal solution and save solution id and result. I will call this origID, origRes
Check if there is some solution with id < origID and res = origRes
If yes, then consider newId as initial id and continue with step 2 until you can't find solutions which satisfied criteria
After that, do the same thing with condition id > origID and res = origRes
After you make sure you found all solutions with optimal solution origRes, then we can go and find solution which is not optimal as origRes. I did it on a way to add condition that new solution needs to be <= (origRes - 0.01) because I know that all solutions will be with 2 decimal places.
Go to step 2 again
I know this is not the best way but I usually do not need more than 100 solutions and currently I can get it in 2 mins which is acceptable for me.
Although this looks easy, it actually is not such an easy question. Even the definition of "all possible optimal solutions" is not clear. There may by infinitely many of them. Asking for "all basic feasible solutions" (i.e. corner points) sounds better. To my knowledge there are no solvers providing this. I also do not know of a really simple technique to enumerate all optimal bases.
One interesting approach is to use a MIP formulation to enumerate all optimal bases:
Sangbum Lee, Chan Phalakornkule, Michael M. Domach, Ignacio E. Grossmann, "Recursive MILP model for finding all the alternate optima in LP
models for metabolic networks," Computers and Chemical Engineering 24 (2000) 711-716. (link)

String Matching Algorithms

I have a python app with a database of businesses and I want to be able to search for businesses by name (for autocomplete purposes).
For example, consider the names "best buy", "mcdonalds", "sony" and "apple".
I would like "app" to return "apple", as well as "appel" and "ple".
"Mc'donalds" should return "mcdonalds".
"bst b" and "best-buy" should both return "best buy".
Which algorithm am I looking for, and does it have a python implementation?
Thanks!
The Levenshtein distance should do.
Look around - there are implementations in many languages.
Levenshtein distance will do this.
Note: this is a distance, you have to calculate it to every string in your database, which can be a big problem if you have a lot of entries.
If you have this problem then record all the typos the users make (typo=no direct match) and offline build a correction database which contains all the typo->fix mappings. some companies do this even more clever, eg: google watches how users correct their own typos and learns the mappings from this.
Soundex or Metaphone might work.
I think what you are looking for is a huge field of Data Quality and Data Cleansing. I fear if you could find a python implementation regarding this as it has to be something which cleanses considerable amount of data in db which could be of business value.
Levensthein distance goes in the right direction but only half the way. There are several tricks to get it to use the half matches as well.
One would be to use a subsequence dynamic time warping (DTW is actually a generalization of levensthein distance). For this you relax the start and end cases when calcualting the cost matrix. If you only relax one of the conditions you can get autocompletion with spell checking. I am not sure if there is a python implementation available, but if you want to implement it for yourself it should not be more than 10-20 LOC.
The other idea would be to use a Trie for speed up, which can do DTW/Levensthein on multiple results simultaniously (huge speedup if your database is large). There is a paper on Levensthein on Tries at IEEE, so you can find the algorithm there. Again for this you would need to relax the final boundary condition, so you get partial matches. However since you step down in the trie you just need to check when you have fully consumed the input and then return all leaves.
check this one http://docs.python.org/library/difflib.html
it should help you

Cross Referencing Databases on Fuzzy Data

I am currently working on project where I have to match up a large quantity of user-generated names with a separate list of the same names in a canonical format. The problem is that the user-generated names contains numerous misspellings, abbreviations, as well as simply invalid data, making it hard to do a cross-reference with the canonical data. Any suggestions on methods to do this?
This does not have to be done in real-time and in this case accuracy is more important than speed.
Current ideas for this are:
Do a fuzzy search for the user entered name in the canonical database using an existing search implementation like Lucene or Sphinx, which I presume use something like the Levenshtein distance for this.
Cross-reference on the SOUNDEX hash (which is supposedly computed on the sound of the name rather than spelling) instead of using the actual name.
Some combination of the above
Anyone have any feedback on any of these or ideas of their own?
One of my concerns is that none of the above methods will handle abbreviations very well. Can anyone point me in a direction for some machine learning methods to actually search on expanded abbreviations (or tell me I'm crazy)? Thanks in advance.
First, I'd add to your list the techniques discussed at Peter Norvig's post on spelling correction.
Second, I'd ask what kind of "user-generated names" you're talking about. Having dealt with both, I believe that the heuristics you'd use for street names are somewhat different from the heuristics for person names. (As a simple example, does "Dr" expand to "Drive" or "Doctor"?)
Third, I'd look at a combination using testing to establish the set of coefficients for combining the results of the various techniques.

Resources