Traveling salesman TSP: Brute algorithm improvement - traveling-salesman

According to wiki it will take (N-1)! to calculate a tour with N cities. I found a better way to do it but I can't do the math to calculate just how much I improved it. I can tell you that on my home pc I been able to solve 20 cities map in less than 1 hour. 20! = 2.43290200e+18. Here is what I did:
When searching a route of N cities (lets give them a names: City(1), City(2), City(3)... City(N)) with the brout algorithm, you will first perform this test: City(1), City(2), City(3), City(4)... City(N) and some time after, this one: City(1), City(3), City(2), City(4)... City(N). I am claiming that the second calculation is unnecessary. If I calculated just once the shortest route for City(4) ... City(N) I can use it for my second calculation and determine which route is better.
Using this trick I can reduce the number of calculation that I am doing for the K city by: (N - k) which is the number of options that I can shouse who will be the first city, multiply (N - K - 1)! which is the number of options that I have to choose the rest of the cities, and minus the first time, that I need to perform the full calculation. So it will be (N - K)!. And you need to sum it for all the K's starting from k = 3 to k = N - 2.
This is as far as I went,(which is not to far)... I hope you be able to help me to calculate this.

Storing and reusing results you've already calculated is the basic idea behind dynamic programming, for the TSP there are dynamic programming algorithms that runs with O[(N^2)*(2^N)] time, which will yield quicker result than your algorithm (you'll be able to solve problems with 25 vertices within minutes...)
See: Dynamic Programming for the TSP

Related

Low and High Pass Filters with Specific Cutoff

If I were to implement low pass filters on an array of digital samples by the following code, where original is the original array of data, and new is the array of filtered data, and c is a certain constant:
new[0] = original[0];
for(int i=1; i<original.length; i++){
new[i] = new[i-1] + c * (original[i] - new[i-1]);
}
Or a high pass filter with the third line replaced with:
new[i] = c * (new[i-1] + original[i] - original[i-1]);
What would be the relationship between c and the cutoff frequency of each?
Both of the filters are single-pole Infinite Impulse Response (IIR) filters.
IIR filters have analogues in the continuous time domain (e.g. simple LC and RC circuits). Analysis usually starts with the desired transfer function H(ω) - using the z-tranform to covert to discrete-time. With a little re-arrangement will yield an equation which you can solve for your filter coefficients. [H(ω) is -3dB at the cut-off frequency].
This material is typically taught in the first and second undergraduate years of Electronic engineering degrees, so there will be loads of online, and free, courseware to be had. You'll need the accompanying pure mathematics courses.
Lots of practical filter designs turn out to be analytically insoluble (or at least difficult); a common way to proceed is solving numerically.
MATLAB is the tool of choice for many. NI LabView also has a filter designer. Neither are cheap.
Single-pole filters are easy to solve. this may help. There are also various on-line filter solvers if you want to design more complex - or higher order - filters.

Implementing Dynamic programming Algo via threads to fasten-up

Say i have this very common DP problem ( Dynamic programming) -
Given a cost matrix cost[][] and a position (m, n) in cost[][], write a function that returns cost of minimum cost path to reach (m, n) from (0, 0). Each cell of the matrix represents a cost to traverse through that cell. Total cost of a path to reach (m, n) is sum of all the costs on that path (including both source and destination). You can only traverse down, right and diagonally lower cells from a given cell, i.e., from a given cell (i, j), cells (i+1, j), (i, j+1) and (i+1, j+1) can be traversed. You may assume that all costs are positive integers.
PS: answer to this - 8
Now, After solving this question.. Following Question ran through my mind.
Say i have 1000*1000 matrix. and O(n^2) will take some time (<1sec on intel i5 for sure).
but can i minimize it further. say starting 6-8 threads using this algorithm and then synchronizing them back to get the answer at last ? will it be fast or even logically possible to get answer or i should throw this thought away
Generally speaking, on such small problems (as you say < 1sec), parallel computing is less efficient than sequential due to protocol overhead (thread starting and synchronizing). Another problem might be, that you increase the cache miss rate because you're choosing the data you want to operate on "randomly" (not linearly) from the input. However, when it comes to larger problems, say matrices with 10 times as many entries, it sure is worth a thought (or two).
This is a possible solution. Given a 16x16 Matrix, we cut it into 4 equal squares. For each of those squares, one thread is responsible. The number in each little square indicates, after how many time units the result in that square can be calculated.
So, the total time is 33 units (whatever a unit is). Compared to the sequential solution with 64 units, it is just half of it. You can convince yourself that the runtime for any 2^k x 2^k Matrix is 2^(2k - 1) + 1.
However, this is only the first idea that came up to my mind. I hope that there is a (much) faster parallel solution in the world outside.
What's more, for the reasons I mentionned at the beginning of my answer, for all practical purposes, you would not achieve a speedup of 2 with my solution.
I'd start with algorithmic improvements. There's no need to test N2 solutions.
One key is the direction from which you entered a square. If you entered it by moving downward, there's no need to check the square to the right. Likewise, if you entered it by moving right, there's no need to check the path downward from there. The destination of a right-angle turn can always be reached via a diagonal move, leaving out one square and its positive weight/cost.
As far as threading goes, I can see (at least) a couple of ways of splitting things up. One would be to simply queue up requests from when you enter a square. I.e., instead of (for example) testing another square, it queues up requests to test its two or three exits. N threads process those requests, which generate more requests, continuing until all of them reach the end point.
This has the obvious disadvantage that you're likely to continue traversing some routes after serial code could abandon them because they're already longer than the shortest route you've round so far.
Another possibility would be to start two threads, one traversing forward, the other backward. In each, you find the shortest route to any given point along the diagonal, then you're left with a purely linear scan through those candidates to find the shortest sum.

Looking for an estimation method (data analysis)

Since I have no idea about what I am doing right now, my wording may sound funny. But seriously, I need to learn.
The problem I'm facing is to come up with a method (model) to estimate how a software program works: namely running time and maximal memory usage. What I already have are a large amount of data. This data set gives an overview of how a program works under different conditions, e.g.
<code>
RUN Criterion_A Criterion_B Criterion_C Criterion_D Criterion_E <br>
------------------------------------------------------------------------
R0001 12 2 3556 27 9 <br>
R0002 2 5 2154 22 8 <br>
R0003 19 12 5556 37 9 <br>
R0004 10 3 1556 7 9 <br>
R0005 5 1 556 17 8 <br>
</code>
I have thousands of rows of such data. Now I need to know how I can estimate (forecast) the running time and maximal memory usage if I know all criteria in advance. What I need is an approximation that gives hints (upper limits, or ranges).
I have feeling that it is a typical ??? problem which I don't know. Could you guys show me some hints or give me some ideas (theories, explanations, webpages) or anything that may help. Thanks!
You want a new program that takes as input one or more criteria, then outputs an estimate of the running time or memory usage. This is a machine learning problem.
Your inputs can be listed as a vector of numbers, like this:
input = [ A, B, C, D, E ]
One of the simplest algorithms for this would be a K-nearest neighbor algorithm. The idea behind this is that you'll take your input vector of numbers, and find in your database the vector of numbers that is most similar to your input vector. For example, given this vector of inputs:
input = [ 11, 1.8, 3557, 29, 10 ]
You can assume that the running time and memory should be very similar to the values from this run (originally in your table listed above):
R0001 12 2 3556 27 9
There are several algorithms for calculating the similarity between these two vectors, one simple and intuitive such algorithm is the Euclidean distance. As an example, the Euclidean distance between the input vector and the vector from the table is this:
dist = sqrt( (11-12)^2 + (1.8-2)^2 + (3557-3556)^2 + (27-29)^2 + (9-10)^2 )
dist = 2.6533
It should be intuitively clear that points with lower distance should be better estimates for running time and memory usage, as the distance should describe the similarity between two sets of criteria. Assuming your criteria are informative and well-selected, points with similar criteria should have similar running time and memory usage.
Here's some example code of how to do this in R:
r1 = c(11,1.8,3557,29,10)
r2 = c(12,2.0,3556,27, 9)
print(r1)
print(r2)
dist_r1_r2 = sqrt( (11-12)^2 + (1.8-2)^2 + (3557-3556)^2 + (27-29)^2 + (9-10)^2 )
print(dist_r1_r2)
smarter_dist_r1_r2 = sqrt( sum( (r1 - r2)^2 ) )
print(smarter_dist_r1_r2)
Taking the running time and memory usage of your nearest row is the KNN algorithm for K=1. This approach can be extended to include data from multiple rows by taking a weighted combination of multiple rows from the database, with rows with lower distances to your input vector contributing more to the estimates. Read the Wikipedia page on KNN for more information, especially with regard to data normalization, including contributions from multiple points, and computing distances.
When calculating the difference between these lists of input vectors, you should consider normalizing your data. The rationale for doing this is that a difference of 1 unit between 3557 and 3556 for criteria C may not be equivalent to a difference of 1 between 11 and 12 for criteria A. If your data are normally distributed, you can convert them all to standard scores (or Z-scores) using this formula:
N_trans = (N - mean(N)) / sdev(N)
There is no general solution on the "right" way to normalize data as it depends on the type and range of data that you have, but Z-scores are easy to compute and a good method to try first.
There are many more sophisticated techniques for constructing estimates such as this, including linear regression, support vector regression, and non-linear modeling. The idea behind some of the more sophisticated methods is that you try and develop an equation that describes the relationship of your variables to running time or memory. For example, a simple application might just have one criterion and you can try and distinguish between models such as:
running_time = s1 * A + s0
running_time = s2 * A^2 + s1 * A + s0
running_time = s3 * log(A) + s2 * A^2 + s1 * A + s0
The idea is that A is your fixed criteria, and sN are a list of free parameters that you can tweak until you get a model that works well.
One problem with this approach is that there are many different possible models that have different numbers of parameters. Distinguishing between models that have different numbers of parameters is a difficult problem in statistics, and I don't recommend tackling it during your first foray into machine learning.
Some questions that you should ask yourself are:
Do all of my criteria affect both running time and memory usage? Do some affect only one or the other, and are some useless from a predictive point of view? Answering this question is called feature selection, and is an outstanding problem in machine learning.
Do you have any a priori estimates of how your variables should influence running time or memory usage? For example, you might know that your application uses a sorting algorithm that is N * log(N) in time, which means that you explicitly know the relationship between one criterion and your running time.
Do your rows of measured input criteria paired with running time and memory usage cover all of the plausible use cases for your application? If so, then your estimates will be much better, as machine learning can have a difficult time with data that it's unfamiliar with.
Do the running time and memory of your program depend on criteria that you don't input into your estimation strategy? For example, if you're depending on an external resource such as a web spider, problems with your network may influence running time and memory usage in ways that are difficult to predict. If this is the case, your estimates will have a lot more variance.
If the criterion you would be forecasting for lies within the range of currently known criteria then you should do some more research on the Interpolation process:
In the mathematical subfield of numerical analysis, interpolation is a method of constructing new data points within the range of a discrete set of known data points
If it lies outside your currently known data range research Extrapolation which is less accurate:
In mathematics, extrapolation is the process of constructing new data points outside a discrete set of known data points.
Methods
Interpolation methods for your browsing.
A powerpoint presentation detailing some methods used for Extrapolation.

How do I efficiently estimate a probability based on a small amount of evidence?

I've been trying to find an answer to this for months (to be used in a machine learning application), it doesn't seem like it should be a terribly hard problem, but I'm a software engineer, and math was never one of my strengths.
Here is the scenario:
I have a (possibly) unevenly weighted coin and I want to figure out the probability of it coming up heads. I know that coins from the same box that this one came from have an average probability of p, and I also know the standard deviation of these probabilities (call it s).
(If other summary properties of the probabilities of other coins aside from their mean and stddev would be useful, I can probably get them too.)
I toss the coin n times, and it comes up heads h times.
The naive approach is that the probability is just h/n - but if n is small this is unlikely to be accurate.
Is there a computationally efficient way (ie. doesn't involve very very large or very very small numbers) to take p and s into consideration to come up with a more accurate probability estimate, even when n is small?
I'd appreciate it if any answers could use pseudocode rather than mathematical notation since I find most mathematical notation to be impenetrable ;-)
Other answers:
There are some other answers on SO that are similar, but the answers provided are unsatisfactory. For example this is not computationally efficient because it quickly involves numbers way smaller than can be represented even in double-precision floats. And this one turned out to be incorrect.
Unfortunately you can't do machine learning without knowing some basic math---it's like asking somebody for help in programming but not wanting to know about "variables" , "subroutines" and all that if-then stuff.
The better way to do this is called a Bayesian integration, but there is a simpler approximation called "maximum a postieri" (MAP). It's pretty much like the usual thinking except you can put in the prior distribution.
Fancy words, but you may ask, well where did the h/(h+t) formula come from? Of course it's obvious, but it turns out that it is answer that you get when you have "no prior". And the method below is the next level of sophistication up when you add a prior. Going to Bayesian integration would be the next one but that's harder and perhaps unnecessary.
As I understand it the problem is two fold: first you draw a coin from the bag of coins. This coin has a "headsiness" called theta, so that it gives a head theta fraction of the flips. But the theta for this coin comes from the master distribution which I guess I assume is Gaussian with mean P and standard deviation S.
What you do next is to write down the total unnormalized probability (called likelihood) of seeing the whole shebang, all the data: (h heads, t tails)
L = (theta)^h * (1-theta)^t * Gaussian(theta; P, S).
Gaussian(theta; P, S) = exp( -(theta-P)^2/(2*S^2) ) / sqrt(2*Pi*S^2)
This is the meaning of "first draw 1 value of theta from the Gaussian" and then draw h heads and t tails from a coin using that theta.
The MAP principle says, if you don't know theta, find the value which maximizes L given the data that you do know. You do that with calculus. The trick to make it easy is that you take logarithms first. Define LL = log(L). Wherever L is maximized, then LL will be too.
so
LL = hlog(theta) + tlog(1-theta) + -(theta-P)^2 / (2*S^2)) - 1/2 * log(2*pi*S^2)
By calculus to look for extrema you find the value of theta such that dLL/dtheta = 0.
Since the last term with the log has no theta in it you can ignore it.
dLL/dtheta = 0 = (h/theta) + (P-theta)/S^2 - (t/(1-theta)) = 0.
If you can solve this equation for theta you will get an answer, the MAP estimate for theta given the number of heads h and the number of tails t.
If you want a fast approximation, try doing one step of Newton's method, where you start with your proposed theta at the obvious (called maximum likelihood) estimate of theta = h/(h+t).
And where does that 'obvious' estimate come from? If you do the stuff above but don't put in the Gaussian prior: h/theta - t/(1-theta) = 0 you'll come up with theta = h/(h+t).
If your prior probabilities are really small, as is often the case, instead of near 0.5, then a Gaussian prior on theta is probably inappropriate, as it predicts some weight with negative probabilities, clearly wrong. More appropriate is a Gaussian prior on log theta ('lognormal distribution'). Plug it in the same way and work through the calculus.
You can use p as a prior on your estimated probability. This is basically the same as doing pseudocount smoothing. I.e., use
(h + c * p) / (n + c)
as your estimate. When h and n are large, then this just becomes h / n. When h and n are small, this is just c * p / c = p. The choice of c is up to you. You can base it on s but in the end you have to decide how small is too small.
You don't have nearly enough info in this question.
How many coins are in the box? If it's two, then in some scenarios (for example one coin is always heads, the other always tails) knowing p and s would be useful. If it's more than a few, and especially if only some of the coins are only slightly weighted then it is not useful.
What is a small n? 2? 5? 10? 100? What is the probability of a weighted coin coming up heads/tail? 100/0, 60/40, 50.00001/49.99999? How is the weighting distributed? Is every coin one of 2 possible weightings? Do they follow a bell curve? etc.
It boils down to this: the differences between a weighted/unweighted coin, the distribution of weighted coins, and the number coins in your box will all decide what n has to be for you to solve this with a high confidence.
The name for what you're trying to do is a Bernoulli trial. Knowing the name should be helpful in finding better resources.
Response to comment:
If you have differences in p that small, you are going to have to do a lot of trials and there's no getting around it.
Assuming a uniform distribution of bias, p will still be 0.5 and all standard deviation will tell you is that at least some of the coins have a minor bias.
How many tosses, again, will be determined under these circumstances by the weighting of the coins. Even with 500 tosses, you won't get a strong confidence (about 2/3) detecting a .51/.49 split.
In general, what you are looking for is Maximum Likelihood Estimation. Wolfram Demonstration Project has an illustration of estimating the probability of a coin landing head, given a sample of tosses.
Well I'm no math man, but I think the simple Bayesian approach is intuitive and broadly applicable enough to put a little though into it. Others above have already suggested this, but perhaps if your like me you would prefer more verbosity.
In this lingo, you have a set of mutually-exclusive hypotheses, H, and some data D, and you want to find the (posterior) probabilities that each hypothesis Hi is correct given the data. Presumably you would choose the hypothesis that had the largest posterior probability (the MAP as noted above), if you had to choose one. As Matt notes above, what distinguishes the Bayesian approach from only maximum likelihood (finding the H that maximizes Pr(D|H)) is that you also have some PRIOR info regarding which hypotheses are most likely, and you want to incorporate these priors.
So you have from basic probability Pr(H|D) = Pr(D|H)*Pr(H)/Pr(D). You can estimate these Pr(H|D) numerically by creating a series of discrete probabilities Hi for each hypothesis you wish to test, eg [0.0,0.05, 0.1 ... 0.95, 1.0], and then determining your prior Pr(H) for each Hi -- above it is assumed you have a normal distribution of priors, and if that is acceptable you could use the mean and stdev to get each Pr(Hi) -- or use another distribution if you prefer. With coin tosses the Pr(D|H) is of course determined by the binomial using the observed number of successes with n trials and the particular Hi being tested. The denominator Pr(D) may seem daunting but we assume that we have covered all the bases with our hypotheses, so that Pr(D) is the summation of Pr(D|Hi)Pr(H) over all H.
Very simple if you think about it a bit, and maybe not so if you think about it a bit more.

Non-Uniform Random Number Generator Implementation?

I need a random number generator that picks numbers over a specified range with a programmable mean.
For example, I need to pick numbers between 2 and 14 and I need the average of the random numbers to be 5.
I use random number generators a lot. Usually I just need a uniform distribution.
I don't even know what to call this type of distribution.
Thank you for any assistance or insight you can provide.
You might be able to use a binomial distribution, if you're happy with the shape of that distribution. Set n=12 and p=0.25. This will give you a value between 0 and 12 with a mean of 3. Just add 2 to each result to get the range and mean you are looking for.
Edit: As for implementation, you can probably find a library for your chosen language that supports non-uniform distributions (I've written one myself for Java).
A binomial distribution can be approximated fairly easily using a uniform RNG. Simply perform n trials and record the number of successes. So if you have n=10 and p=0.5, it's just like flipping a coin 10 times in a row and counting the number of heads. For p=0.25 just generate uniformly-distributed values between 0 and 3 and only count zeros as successes.
If you want a more efficient implementation, there is a clever algorithm hidden away in the exercises of volume 2 of Knuth's The Art of Computer Programming.
You haven't said what distribution you are after. Regarding your specific example, a function which produced a uniform distribution between 2 and 8 would satisfy your requirements, strictly as you have written them :)
If you want a non-uniform distribution of the random number, then you might have to implement some sort of mapping, e.g:
// returns a number between 0..5 with a custom distribution
int MyCustomDistribution()
{
int r = rand(100); // random number between 0..100
if (r < 10) return 1;
if (r < 30) return 2;
if (r < 42) return 3;
...
}
Based on the Wikipedia sub-article about non-uniform generators, it would seem you want to apply the output of a uniform pseudorandom number generator to an area distribution that meets the desired mean.
You can create a non-uniform PRNG from a uniform one. This makes sense, as you can imagine taking a uniform PRNG that returns 0,1,2 and create a new, non-uniform PRNG by returning 0 for values 0,1 and 1 for the value 2.
There is more to it if you want specific characteristics on the distribution of your new, non-uniform PRNG. This is covered on the Wikipedia page on PRNGs, and the Ziggurat algorithm is specifically mentioned.
With those clues you should be able to search up some code.
My first idea would be:
generate numbers in the range 0..1
scale to the range -9..9 ( x-0.5; x*18)
shift range by 5 -> -4 .. 14 (add 5)
truncate the range to 2..14 (discard numbers < 2)
that should give you numbers in the range you want.
You need a distributed / weighted random number generator. Here's a reference to get you started.
Assign all numbers equal probabilities,
while currentAverage not equal to intendedAverage (whithin possible margin)
pickedNumber = pick one of the possible numbers (at random, uniform probability, if you pick intendedAverage pick again)
if (pickedNumber is greater than intendedAverage and currentAverage<intendedAverage) or (pickedNumber is less than intendedAverage and currentAverage>intendedAverage)
increase pickedNumber's probability by delta at the expense of all others, conserving sum=100%
else
decrease pickedNumber's probability by delta to the benefit of all others, conserving sum=100%
end if
delta=0.98*delta (the rate of decrease of delta should probably be experimented with)
end while

Resources