Crossover and mutation in Differential Evolution - traveling-salesman

I'm trying to solve Traveling Salesman problem using Differential Evolution. For example, if I have vectors:
[1, 4, 0, 3, 2, 5], [1, 5, 2, 0, 3, 5], [4, 2, 0, 5, 1, 3]
how can I make crossover and mutation? I saw something like a+Fx(b-c), but I have no idea how to use this.

I ran into this question when looking for papers on solving the TSP problem using evolutionary computing. I have been working on a similar project and can provide a modified version of my written code.
For mutation, we can swap two indices in a vector. I assume that each vector represents an order of nodes you will visit.
def swap(lst):
n = len(lst)
x = random.randint(0, n)
y = random.randint(0, n)
# store values to be swapped
old_x = lst[x]
old_y = lst[y]
# do swap
lst[y] = old_x
lst[x] = old_y
return lst
For the case of crossover in respect to the TSP problem, we would like to keep the general ordering of values in our permutations (we want a crossover with a positional bias). By doing so, we will preserve good paths in good permutations. For this reason, I believe single-point crossover is the best option.
def singlePoint(parent1, parent2):
point = random.randint(1, len(parent1)-2)
def helper(v1, v2):
# this is a helper function to save with code duplication
points = [i1.getPoint(i) for i in range(0, point)]
# add values from right of crossover point in v2
# that are not already in points
for i in range(point, len(v2)):
pt = v2[i]
if pt not in points:
points.append(pt)
# add values from head of v2 which are not in points
# this ensures all values are in the vector.
for i in range(0, point):
pt = v2[i]
if pt not in points:
points.append(pt)
return points
# notice how I swap parent1 and parent2 on the second call
offspring_1 = helper(parent1, parent2)
offspring_2 = helper(parent2, parent1)
return offspring_1, offspring_2
I hope this helps! Even if your project is done, this could come in handy GA's are great ways to solve optimization problems.

if F=0.6, a=[1, 4, 0, 3, 2, 5], b=[1, 5, 2, 0, 3, 5], c=[4, 2, 0, 5, 1, 3]
then a+Fx(b-c)=[-0.8, 5.8, 1.2, 0, 3.2, 6.2]
then change the smallest number in the array to 0, change the second smallest number in the array to 1, and so on.
so it return [0, 4, 2, 1, 3, 5].
This method is inefficient when used to solve the jsp problems.

Related

Creating a for loop to find mean for a specific section of list

I would like to loop a list to find the mean for a specific window.
What I mean by this is for example:
num_list=[1,2,3,4,5]
window=3
Thus, I would find the mean for [1,2,3] , [2,3,4] and [3,4,5].
How I approached this was as the following:
average_list=[]
first_list=num_list[0:window]
def mean(data):
n=len(data)
mean=sum(data)/n
return mean
for i in first_list:
first_value=mean(i)
average_list.append(first_value)
I am not quite sure how to incorporate the other two lists without typing it individually. Any help would be greatly appreciated!!
You can use list comprehension to iterate num_list taking slices of length window.
Try this:
mean_lst = [sum(num_list[i:i+window])/window for i in range(len(num_list)-window + 1)]
Result is
[2.0, 3.0, 4.0]
Here's the most obvious solution to your problem:
for i in range(list_len-window+1):
average_list.append(mean(num_list[i:i+window]))
It does work properly, but it isn't optimal. Consider num_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] and window = 5. Using obvious method, the code will first calculate the sum of [1, 2, 3, 4, 5], then divide by 5, then it will calculate the sum of [2, 3, 4, 5, 6], then divide by 5, and so on.
This code is clearly doing a lot more calculations than it needs to. An optimal way would be to calculate the mean of first window and then for calculating mean of every consecutive window remove the first_element_of_previous_window/window_size (1/5 for 2nd window) and add last_element_of_current_window/window_size (6/5 for 2nd window) to the mean of previous window. This approach avoids a lot of unnecessary calculations.
Code Implementation:
prev_mean = mean(num_list[:window])
average_list = [prev_mean]
for i in range(1, list_len-window+1):
prev_mean -= num_list[i-1] / window
prev_mean += num_list[i+window-1] / window
average_list.append(prev_mean)

check if a number has its equivalent negative number in a list

need to traverse through a list and check if list members has its equivalent negative numbers also in same list
Note: list will not have duplicates and considering performance as well
l = [2, -3, 1, 3, 5, 0, -5, -2]
d=defaultdict(list)
for num in l:
if num>0:
d['pos'].append(num)
else:
d['neg'].append(num)
print (d)
not sure how to proceed further. Could you help please
l = [2, -3, 1, 3, 5, 0, -5, -2]
output: [[2,-2],[-3,3],[5,-5]]
As there cannot be duplicates and you thus do not need to consider exclusive use of each number, you may use a set, like so,
numbers = [2, -3, 1, 3, 5, 0, -5, -2]
unique_numbers = set(numbers)
paired = [[number, -number] for number in numbers
if number > 0 and -number in unique_numbers]
print(paired)
For a result of [[2, -2], [3, -3], [5, -5]].
Sets support O(1) membership checks. Constructing one from an existing list costs O(n), where n is the number of elements in the list. Iterating over the list to compute the pairs is again O(n), such that it runs in O(n) total, at the additional cost of some more memory for the set (again around n).
If you don't care much about performance,
[[x, -x] for x in l if x > 0 and -x in l]

Count number of repeated elements in list considering the ones larger than them

I am trying to do some clustering analysis on a dataset. I am using a number of different approaches to estimate the number of clusters, then I put what every approach gives (number of clusters) in a list, like so:
total_pred = [0, 0, 1, 1, 0, 1, 1]
Now I want to estimate the real number of clusters, so I let the methods above vote, for example, above, more models found 1 cluster than 0, so I take 1 as the real number of clusters.
I do this by:
counts = np.bincount(np.array(total_pred))
real_nr_of_clusters = np.argmax(counts))
There is a problem with this method, however. If the above list contains something like:
[2, 0, 1, 0, 1, 0, 1, 0, 1]
I will get 0 clusters as the average, since 0 is repeated more often. However, if one model found 2 clusters, it's safe to assume it considers at least 1 cluster is there, hence the real number would be 1.
How can I do this by modifying the above snippet?
To make the problem clear, here are a few more examples:
[1, 1, 1, 0, 0, 0, 3]
should return 1,
[0, 0, 0, 1, 1, 3, 4]
should also return 1 (since most of them agree there is AT LEAST 1 cluster).
There is a problem with your logic
Here is an implementation of the described algorithm.
l = [2, 0, 1, 0, 1, 0, 1, 0, 1]
l = sorted(l, reverse=True)
votes = {x: i for i, x in enumerate(l, start=1)}
Output
{2: 1, 1: 5, 0: 9}
Notice that since you define a vote as agreeing with anything smaller than itself, then min(l) will always win, because everyone will agree that there are at least min(l) clusters. In this case min(l) == 0.
How to fix it
Mean and median
Beforehand, notice that taking the mean or the median are valid and light-weight options that both satisfy the desired output on your examples.
Bias
Although, taking the mean might not be what you want if, for say, you encounter votes with high variance such as [0, 0, 7, 8, 10] where it is unlikely that the answer is 5.
A more general way to fix that is to include a voter's bias toward votes close to theirs. Surely that a 2-voter will agree more to a 1 than a 0.
You do that by implementing a metric (note: this is not a metric in the mathematical sense) that determines how much an instance that voted for x is willing to agree to a vote for y on a scale of 0 to 1.
Note that this approach will allow voters to agree on a number that is not on the list.
We need to update our code to account for applying that pseudometric.
def d(x, y):
return x <= y
l = [2, 0, 1, 0, 1, 0, 1, 0, 1]
votes = {y: sum(d(x, y) for x in l) for y in range(min(l), max(l) + 1)}
Output
{0: 9, 1: 5, 2: 1}
The above metric is a sanity check. It is the one your provided in your question and it indeed ends up determining that 0 wins.
Metric choices
You will have to toy a bit with your metrics, but here are a few which may make sense.
Inverse of the linear distance
def d(x, y):
return 1 / (1 + abs(x - y))
l = [2, 0, 1, 0, 1, 0, 1, 0, 1]
votes = {y: sum(d(x, y) for x in l) for y in range(min(l), max(l) + 1)}
# {0: 6.33, 1: 6.5, 2: 4.33}
Inverse of the nth power of the distance
This one is a generalization of the previous. As n grows, voters tend to agree less and less with distant vote casts.
def d(x, y, n=1):
return 1 / (1 + abs(x - y)) ** n
l = [2, 0, 1, 0, 1, 0, 1, 0, 1]
votes = {y: sum(d(x, y, n=2) for x in l) for y in range(min(l), max(l) + 1)}
# {0: 5.11, 1: 5.25, 2: 2.44}
Upper-bound distance
Similar to the previous metric, this one is close to what you described at first in the sense that a voter will never agree to a vote higher than theirs.
def d(x, y, n=1):
return 1 / (1 + abs(x - y)) ** n if x >= y else 0
l = [2, 0, 1, 0, 1, 0, 1, 0, 1]
votes = {y: sum(d(x, y, n=2) for x in l) for y in range(min(l), max(l) + 1)}
# {0: 5.11, 1: 4.25, 2: 1.0}
Normal distribution
An other option that would be sensical is a normal distribution or a skewed normal distribution.
While the other answer provides a comprehensive review of possible metrics and methods, it seems what you are seeking is to find the closest number of clusters to mean!
So something as simple as:
cluster_num=int(np.round(np.mean(total_pred)))
Which returns 1 for all your cases as you expect.

Select numbers from a list with given probability p

Lets say I have a list of numbers [1, 2, 3, ..., 100]. Now I want to select numbers from the list where each number is either accepted or rejected with a given probability 0 < p < 1 . The accepted numbers are then stored in a separate list. How can I do that?
The main problem is choosing the number with probability p. Is there an inbuilt function for that?
The value of p is given by the user.
You can use random.random() and a list comprehension:
import random
l = [1,2,3,4,5,6,7,8,9]
k = [x for x in l if random.random() > 0.23] # supply user input value here as 0.23
print(l)
print(k)
Output:
[1, 2, 3, 4, 5, 6, 7, 8, 9]
[2, 3, 4, 5, 6, 7]
to check each element of the list if it has a probability of > 0.23 of staying in the list.
Sidenote:
random.choices() has the ability to accept weights:
random.choices(population, weights=None, *, cum_weights=None, k=1)
but those only change the probability inside the given list for drawing one of the elements (absolute or relative weights are possible) - thats not working for "independent" probabilities though.

python numpy stack matrices and add specific corner/column entries

Say we have two matrices A and B with a size of 2 by 2. Is there a command that can stack them horizontally and add A[:,1] to B[:,0] so that the resulting matrix C is 2 by 3, with C[:,0] = A[:,0], C[:,1] = A[:,1] + B[:,0], C[:,2] = B[:,1]. One step further, stacking them on diagonal so that C[0:2,0:2] = A, C[1:2,1:2] = B, C[1,1] = A[1,1] + B[0,0]. C is 3 by 3 in this case. Hard coding this routine is not hard, but I'm just curious since MATLAB has a similar function if my memory serves me well.
A straight forward approach is to copy or add the two arrays to a target:
In [882]: A=np.arange(4).reshape(2,2)
In [883]: C=np.zeros((2,3),int)
In [884]: C[:,:-1]=A
In [885]: C[:,1:]+=A # or B
In [886]: C
Out[886]:
array([[0, 1, 1],
[2, 5, 3]])
Another approach is to to pad A at the end, pad B at the start, and sum; while there is a convenient pad function, it won't be any faster.
And for the diagonal
In [887]: C=np.zeros((3,3),int)
In [888]: C[:-1,:-1]=A
In [889]: C[1:,1:]+=A
In [890]: C
Out[890]:
array([[0, 1, 0],
[2, 3, 1],
[0, 2, 3]])
Again the 2 arrays could be pad and added.
I'm not aware of any specialized function to do this; even if there were, it probably would do the same thing. This isn't a common enough operation to justify a compiled version.
I have built up finite element sparse matrices by adding over lapping element matrices. The sparse formats for both MATLAB and scipy facilitate this (duplicate coordinates are summed).
============
In [896]: np.pad(A,[[0,0],[0,1]],mode='constant')+np.pad(A,[[0,0],[1,0]],mode='
...: constant')
Out[896]:
array([[0, 1, 1],
[2, 5, 3]])
In [897]: np.pad(A,[[0,1],[0,1]],mode='constant')+np.pad(A,[[1,0],[1,0]],mode='
...: constant')
Out[897]:
array([[0, 1, 0],
[2, 3, 1],
[0, 2, 3]])
What's the special MATLAB code for doing this?
in Octave I found:
prepad(A,3,0,axis=2)+postpad(A,3,0,axis=2)

Resources