Python calculation of LennardJones 2D interaction pair correlation distribution function in Grand Canonical Ensemble - python-3.x

Edit
I believe there is a problem with the normalization of the histogram, since one must divide with the radius of each element.
I am trying trying to calculate the fluctuations of particle number and the radial distribution function of a 2d LennardJones(LJ) system using python3. Although I believe the particle fluctuations come out right, the pair correlation g(r) come right for small distances but then blow up ( the calculation uses numpy's histogram method).
The thing is, I can' t find out why such a behavior emerges- perhaps of some misunderstanding of a method? As it is, I am posting the relevant code right below, and if needed, I could also upload other parts of the code or the entire script.
Note first, that since we are working with the Grand-Canonical Ensemble, as the number of particles changes, so is the array that stores the particles- and perhaps that's another point where a mistake in implementation could exist.
Particle removal or insertion
def mcex(L,npart,particles,beta,rho0,V,en):
factorin=(rho0*V)/(npart+1)
factorout=(npart)/(V*rho0)
print("factorin=",factorin)
print("factorout",factorout)
# Produce random number and check:
rand = random.uniform(0, 1)
if rand <= 0.5:
# Insert a particle at a random location
x_new_coord = random.uniform(0, L)
y_new_coord = random.uniform(0, L)
new_particle = [x_new_coord,y_new_coord]
new_E = particleEnergy(new_particle,particles, npart+1)
deltaE = new_E
print("dEin=",deltaE)
# Acceptance rule for inserting
if(deltaE>10):
P_in=0
else:
P_in = (factorin) *math.exp(-beta*deltaE)
print("pinacc=",P_in)
rand= random.uniform(0, 1)
if rand <= P_in :
particles.append(new_particle)
en += deltaE
npart += 1
print("accepted insertion")
else:
if npart != 0:
p = random.randint(0, npart-1)
this_particle = particles[p]
prev_E = particleEnergy(this_particle, particles, p)
deltaE = prev_E
print("dEout=",deltaE)
# Acceptance rule for removing
if(deltaE>10):
P_re=1
else:
P_re = (factorout)*math.exp(beta*deltaE)
print("poutacc=",P_re)
rand = random.uniform(0, 1)
if rand <= P_re :
particles.remove(this_particle)
en += deltaE
npart = npart - 1
print("accepted removal")
print()
return particles, en, npart
Monte Carlo relevant part: for 1/10 runs, check the possibility of inserting or removing a particle
# MC
for step in range(0, runTimes):
print(step)
print()
rand = random.uniform(0,1)
if rand <= 0.9:
#----------- change energies-------------------------
#........
#........
else:
particles, en, N = mcex(L,N,particles,beta,rho0,V, en)
# stepList.append(step)
if((step+1)%1000==0):
for i, particle1 in enumerate(particles):
for j, particle2 in enumerate(particles):
if j!= i:
# print(particle1)
# print(particle2)
# print(i)
# print(j)
dist.append(distancesq(particle1, particle2))
NList.append(N)
where we call the function mcex and perhaps the particles array is not updated correctly:
def mcex(L,npart,particles,beta,rho0,V,en):
factorin=(rho0*V)/(npart+1)
factorout=(npart)/(V*rho0)
print("factorin=",factorin)
print("factorout",factorout)
# Produce random number and check:
rand = random.uniform(0, 1)
if rand <= 0.5:
# Insert a particle at a random location
x_new_coord = random.uniform(0, L)
y_new_coord = random.uniform(0, L)
new_particle = [x_new_coord,y_new_coord]
new_E = particleEnergy(new_particle,particles, npart+1)
deltaE = new_E
print("dEin=",deltaE)
# Acceptance rule for inserting
if(deltaE>10):
P_in=0
else:
P_in = (factorin) *math.exp(-beta*deltaE)
print("pinacc=",P_in)
rand= random.uniform(0, 1)
if rand <= P_in :
particles.append(new_particle)
en += deltaE
npart += 1
print("accepted insertion")
else:
if npart != 0:
p = random.randint(0, npart-1)
this_particle = particles[p]
prev_E = particleEnergy(this_particle, particles, p)
deltaE = prev_E
print("dEout=",deltaE)
# Acceptance rule for removing
if(deltaE>10):
P_re=1
else:
P_re = (factorout)*math.exp(beta*deltaE)
print("poutacc=",P_re)
rand = random.uniform(0, 1)
if rand <= P_re :
particles.remove(this_particle)
en += deltaE
npart = npart - 1
print("accepted removal")
print()
return particles, en, npart
and finally, we create the g(r) histogramm
where perhaps the normalization or the use of the histogram method are not as they should
RDF(N,particles,L)
with the function:
def RDF(N,particles, L):
minb=0
maxb=8
nbin=500
skata=np.asarray(dist).flatten()
rDf = np.histogram(skata, np.linspace(minb, maxb,nbin))
prefactor = (1/2/ np.pi)* (L**2/N **2) /len(dist) *( nbin /(maxb -minb) )
# prefactor = (1/(2* np.pi))*(L**2/N**2)/(len(dist)*num_increments/(rMax + 1.1 * dr ))
rDf = [prefactor*rDf[0], 0.5*(rDf[1][1:]+rDf[1][:-1])]
print('skata',len(rDf[0]))
print('incr',len(rDf[1]))
plt.figure()
plt.plot(rDf[1],rDf[0])
plt.xlabel("r")
plt.ylabel("g(r)")
plt.show()
The results are:
Particle N number fluctuations
and
[
but we want

Although I have accepted an answer, I am posting here some more details.
To normalize the pair correlation correctly one must divide each "number of particles found at a certain distance" or mathematically the sum of delta function of the distances , one must divide with the distance it's self.
Understanding first that a numpy.histogram is an array of two elements, first element the array of all counted events and second element the vector of bins, one must take each element of the first array, lets say np.histogram[0] and multiply it pairwise with np.histogram[1] of the second array.
That is, one must do the following:
def RDF(N,particles, L):
minb=0
maxb=25
nbin=200
width=(maxb-minb)/(nbin)
rings=np.linspace(minb, maxb,nbin)
skata=np.asarray(dist).flatten()
rDf = np.histogram(skata, rings ,density=True)
prefactor = (1/( np.pi*(L**2/N**2)))
rDf = [prefactor*rDf[0], 0.5*(rDf[1][1:]+rDf[1][:-1])]
rDf[0]=np.multiply(rDf[0],1/(rDf[1]*( width )))
where before the last multiply line, we are centering the bins so that their numbers equals the number of elements of the first array( you have five fingers, but four intermediate gaps between them)

Your g(r) is not correctly normalised. You need to divide the number of pairs found in each bin by the average density of the system times the area of the annulus associated to that bin, where the latter is just 2 pi r dr, with r being the bin's midpoint and dr the bin size. As far as I can tell, your prefactor does not contain the "r" bit. There is also something else that is missing, but it's hard to tell without knowing what all the other constants contain.
EDIT: here is a link that will guide you the implementation of a routine to compute the radial distribution function in 2D and 3D

Related

Speeding up this PySCIPOpt routine for a Mixed Integer Program

I'm writing code to find median orders of tournaments, given a tournament T with n vertices, a median order is an ordering of the vertices of T such that it maximices the number of edges pointing in the 'increasing' direction with respect to the ordering.
In particular if the vertex set of T is {0,...,n-1}, the following integer problem (maximizing over the set of permutations) yields an optimal answer to the problem where Q is also a boolean n by n matrix.
I've implemented a linealization of this problem, noting that Q is a permutation, the following python code works, but my computer can't handle graphs that are as small as 10 vertices, where i would expect to have fast answers, is there any relatively easy way to speed up this computation?.
import numpy as np
from pyscipopt import Model,quicksum
from networkx.algorithms.tournament import random_tournament as rt
import math
# Some utilities to define the adjacency matrix of an oriented graph
def adjacency_matrix(t,order):
n = len(order)
adj_t = np.zeros((n,n))
for e in t.edges:
adj_t[order.index(e[0]),order.index(e[1])] = 1
return adj_t
# Random tournaments to instanciate the problem
def random_tournament(n):
r_t = rt(n)
adj_t = adjacency_matrix(r_t,list(range(n)))
return r_t, adj_t
###############################################################
############# PySCIPOpt optimization Routine ##################
###############################################################
n = 5 # some arbitrary size parameter
t,adj_t = random_tournament(n)
model = Model()
p,w,r = {},{},{}
# Defining model variables and weights
for k in range(n):
for l in range(n):
p[k,l] = model.addVar(vtype='B')
for i in range(n):
for j in range(i,n):
r[i,k,l,j] = model.addVar(vtype='C')
w[i,k,l,j] = adj_t[k][l]
for i in range(n):
# Forcing p to be a permutation
model.addCons(quicksum(p[s,i] for s in range(n))==1)
model.addCons(quicksum(p[i,s] for s in range(n))==1)
for k in range(n):
for j in range(i,n):
for l in range(n):
# Setting r[i,k,l,j] = min(p[i,k],p[l,j])
model.addCons(r[i,k,l,j] <= p[k,i])
model.addCons(r[i,k,l,j] <= p[l,j])
# Set the objective function
model.setObjective(quicksum(r[i,k,l,j]*w[i,k,l,j] for i in range(n) for j in range(i,n) for k in range(n) for l in range(n)), "maximize")
model.data = p,r
model.optimize()
sol = model.getBestSol()
# Print the solution on a readable format
Q = np.array([math.floor(model.getVal(model.data[0][key])) for key in model.data[0].keys()]).reshape([n,n])
print(f'\nOptimization ended with status {model.getStatus()} in {"{:.2f}".format(end_optimization-end_setup)}s, with {model.getObjVal()} increasing edges and optimal solution:')
print('\n',Q)
order = [int(x) for x in list(np.matmul(Q.T,np.array(range(n))))]
new_adj_t = adjacency_matrix(t,order)
print(f'\nwhich induces the ordering:\n\n {order}')
print(f'\nand induces the following adjacency matrix: \n\n {new_adj_t}')
Right now I've run it for n=5 taking between 5 and 20 seconds, and have ran it succesfully for small integers 6,7 with not much change in time needed.
For n=10, on the other hand, it has been running for around an hour with no solution yet, I suppose the linearization having O(n**4) variables hurts, but I don't understand why it blows up so fast. Is this normal? How would a better implementation be in case there is one?.

Speed Up a for Loop - Python

I have a code that works perfectly well but I wish to speed up the time it takes to converge. A snippet of the code is shown below:
def myfunction(x, i):
y = x + (min(0, target[i] - data[i, :]x))*data[i]/(norm(data[i])**2))
return y
rows, columns = data.shape
start = time.time()
iterate = 0
iterate_count = []
norm_count = []
res = 5
x_not = np.ones(columns)
norm_count.append(norm(x_not))
iterate_count.append(0)
while res > 1e-8:
for row in range(rows):
y = myfunction(x_not, row)
x_not = y
iterate += 1
iterate_count.append(iterate)
norm_count.append(norm(x_not))
res = abs(norm_count[-1] - norm_count[-2])
print('Converge at {} iterations'.format(iterate))
print('Duration: {:.4f} seconds'.format(time.time() - start))
I am relatively new in Python. I will appreciate any hint/assistance.
Ax=b is the problem we wish to solve. Here, 'A' is the 'data' and 'b' is the 'target'
Ugh! After spending a while on this I don't think it can be done the way you've set up your problem. In each iteration over the row, you modify x_not and then pass the updated result to get the solution for the next row. This kind of setup can't be vectorized easily. You can learn the thought process of vectorization from the failed attempt, so I'm including it in the answer. I'm also including a different iterative method to solve linear systems of equations. I've included a vectorized version -- where the solution is updated using matrix multiplication and vector addition, and a loopy version -- where the solution is updated using a for loop to demonstrate what you can expect to gain.
1. The failed attempt
Let's take a look at what you're doing here.
def myfunction(x, i):
y = x + (min(0, target[i] - data[i, :] # x)) * (data[i] / (norm(data[i])**2))
return y
You subtract
the dot product of (the ith row of data and x_not)
from the ith row of target,
limited at zero.
You multiply this result with the ith row of data divided my the norm of that row squared. Let's call this part2
Then you add this to the ith element of x_not
Now let's look at the shapes of the matrices.
data is (M, N).
target is (M, ).
x_not is (N, )
Instead of doing these operations rowwise, you can operate on the entire matrix!
1.1. Simplifying the dot product.
Instead of doing data[i, :] # x, you can do data # x_not and this gives an array with the ith element giving the dot product of the ith row with x_not. So now we have data # x_not with shape (M, )
Then, you can subtract this from the entire target array, so target - (data # x_not) has shape (M, ).
So far, we have
part1 = target - (data # x_not)
Next, if anything is greater than zero, set it to zero.
part1[part1 > 0] = 0
1.2. Finding rowwise norms.
Finally, you want to multiply this by the row of data, and divide by the square of the L2-norm of that row. To get the norm of each row of a matrix, you do
rownorms = np.linalg.norm(data, axis=1)
This is a (M, ) array, so we need to convert it to a (M, 1) array so we can divide each row. rownorms[:, None] does this. Then divide data by this.
part2 = data / (rownorms[:, None]**2)
1.3. Add to x_not
Finally, we're adding each row of part1 * part2 to the original x_not and returning the result
result = x_not + (part1 * part2).sum(axis=0)
Here's where we get stuck. In your approach, each call to myfunction() gives a value of part1 that depends on target[i], which was changed in the last call to myfunction().
2. Why vectorize?
Using numpy's inbuilt methods instead of looping allows it to offload the calculation to its C backend, so it runs faster. If your numpy is linked to a BLAS backend, you can extract even more speed by using your processor's SIMD registers
The conjugate gradient method is a simple iterative method to solve certain systems of equations. There are other more complex algorithms that can solve general systems well, but this should do for the purposes of our demo. Again, the purpose is not to have an iterative algorithm that will perfectly solve any linear system of equations, but to show what kind of speedup you can expect if you vectorize your code.
Given your system
data # x_not = target
Let's define some variables:
A = data.T # data
b = data.T # target
And we'll solve the system A # x = b
x = np.zeros((columns,)) # Initial guess. Can be anything
resid = b - A # x
p = resid
while (np.abs(resid) > tolerance).any():
Ap = A # p
alpha = (resid.T # resid) / (p.T # Ap)
x = x + alpha * p
resid_new = resid - alpha * Ap
beta = (resid_new.T # resid_new) / (resid.T # resid)
p = resid_new + beta * p
resid = resid_new + 0
To contrast the fully vectorized approach with one that uses iterations to update the rows of x and resid_new, let's define another implementation of the CG solver that does this.
def solve_loopy(data, target, itermax = 100, tolerance = 1e-8):
A = data.T # data
b = data.T # target
rows, columns = data.shape
x = np.zeros((columns,)) # Initial guess. Can be anything
resid = b - A # x
resid_new = b - A # x
p = resid
niter = 0
while (np.abs(resid) > tolerance).any() and niter < itermax:
Ap = A # p
alpha = (resid.T # resid) / (p.T # Ap)
for i in range(len(x)):
x[i] = x[i] + alpha * p[i]
resid_new[i] = resid[i] - alpha * Ap[i]
# resid_new = resid - alpha * A # p
beta = (resid_new.T # resid_new) / (resid.T # resid)
p = resid_new + beta * p
resid = resid_new + 0
niter += 1
return x
And our original vector method:
def solve_vect(data, target, itermax = 100, tolerance = 1e-8):
A = data.T # data
b = data.T # target
rows, columns = data.shape
x = np.zeros((columns,)) # Initial guess. Can be anything
resid = b - A # x
resid_new = b - A # x
p = resid
niter = 0
while (np.abs(resid) > tolerance).any() and niter < itermax:
Ap = A # p
alpha = (resid.T # resid) / (p.T # Ap)
x = x + alpha * p
resid_new = resid - alpha * Ap
beta = (resid_new.T # resid_new) / (resid.T # resid)
p = resid_new + beta * p
resid = resid_new + 0
niter += 1
return x
Let's solve a simple system to see if this works first:
2x1 + x2 = -5
−x1 + x2 = -2
should give a solution of [-1, -3]
data = np.array([[ 2, 1],
[-1, 1]])
target = np.array([-5, -2])
print(solve_loopy(data, target))
print(solve_vect(data, target))
Both give the correct solution [-1, -3], yay! Now on to bigger things:
data = np.random.random((100, 100))
target = np.random.random((100, ))
Let's ensure the solution is still correct:
sol1 = solve_loopy(data, target)
np.allclose(data # sol1, target)
# Output: False
sol2 = solve_vect(data, target)
np.allclose(data # sol2, target)
# Output: False
Hmm, looks like the CG method doesn't work for badly conditioned random matrices we created. Well, at least both give the same result.
np.allclose(sol1, sol2)
# Output: True
But let's not get discouraged! We don't really care if it works perfectly, the point of this is to demonstrate how amazing vectorization is. So let's time this:
import timeit
timeit.timeit('solve_loopy(data, target)', number=10, setup='from __main__ import solve_loopy, data, target')
# Output: 0.25586539999994784
timeit.timeit('solve_vect(data, target)', number=10, setup='from __main__ import solve_vect, data, target')
# Output: 0.12008900000000722
Nice! A ~2x speedup simply by avoiding a loop while updating our solution!
For larger systems, this will be even better.
for N in [10, 50, 100, 500, 1000]:
data = np.random.random((N, N))
target = np.random.random((N, ))
t_loopy = timeit.timeit('solve_loopy(data, target)', number=10, setup='from __main__ import solve_loopy, data, target')
t_vect = timeit.timeit('solve_vect(data, target)', number=10, setup='from __main__ import solve_vect, data, target')
print(N, t_loopy, t_vect, t_loopy/t_vect)
This gives us:
N t_loopy t_vect speedup
00010 0.002823 0.002099 1.345390
00050 0.051209 0.014486 3.535048
00100 0.260348 0.114601 2.271773
00500 0.980453 0.240151 4.082644
01000 1.769959 0.508197 3.482822

How to vectorize a function of two matrices in numpy?

Say, I have a binary (adjacency) matrix A of dimensions nxn and another matrix U of dimensions nxl. I use the following piece of code to compute a new matrix that I need.
import numpy as np
from numpy import linalg as LA
new_U = np.zeros_like(U)
for idx, a in np.ndenumerate(A):
diff = U[idx[0], :] - U[idx[1], :]
if a == 1.0:
new_U[idx[0], :] += 2 * diff
elif a == 0.0:
norm_diff = LA.norm(U[idx[0], :] - U[idx[1], :])
new_U[idx[0], :] += -2 * diff * np.exp(-norm_diff**2)
return new_U
This takes quite a lot of time to run even when n and l are small. Is there a better way to rewrite (vectorize) this code to reduce the runtime?
Edit 1: Sample input and output.
A = np.array([[0,1,0], [1,0,1], [0,1,0]], dtype='float64')
U = np.array([[2,3], [4,5], [6,7]], dtype='float64')
new_U = np.array([[-4.,-4.], [0,0],[4,4]], dtype='float64')
Edit 2: In mathematical notation, I am trying to compute the following:
where u_ik = U[i, k],u_jk = U[j, k], and u_i = U[i, :]. Also, (i,j) \in E corresponds to a == 1.0 in the code.
Leveraging broadcasting and np.einsum for the sum-reductions -
# Get pair-wise differences between rows for all rows in a vectorized manner
Ud = U[:,None,:]-U
# Compute norm L1 values with those differences
L = LA.norm(Ud,axis=2)
# Compute 2 * diff values for all rows and mask it with ==0 condition
# and sum along axis=1 to simulate the accumulating behaviour
p1 = np.einsum('ijk,ij->ik',2*Ud,A==1.0)
# Similarly, compute for ==1 condition and finally sum those two parts
p2 = np.einsum('ijk,ij,ij->ik',-2*Ud,np.exp(-L**2),A==0.0)
out = p1+p2
Alternatively, use einsum for computing squared-norm values and using those to get p2 -
Lsq = np.einsum('ijk,ijk->ij',Ud,Ud)
p2 = np.einsum('ijk,ij,ij->ik',-2*Ud,np.exp(-Lsq),A==0.0)

How to draw from two different gaussians conditionally on some Bernoulli distribution?

I have two gaussian distributions (I'm using multivariate_normal) and I'd like to draw from them with probability of p for the first gaussian and 1-p for the other one. I'd like to make n draws.
Is it possible to do that without a for loop? (for efficiency purposes)
Thanks
Yes, it is possible to perform this operation without a loop. Try:
import numpy as np
from scipy import stats
sample_size = 100
p = 0.25
# Flip a coin with P(HEADS) = p to determine which distribution to draw from
indicators = stats.bernoulli.rvs(p, size=sample_size)
# Draw from N(0, 1) w/ probability p and N(-1, 1) w/ probability (1-p)
draws = (indicators == 1) * np.random.normal(0, 1, size=sample_size) + \
(indicators == 0) * np.random.normal(-1, 1, size=sample_size)
You can accomplish the same thing using np.vectorize (caveat emptor):
def draw(x):
if x == 0:
return np.random.normal(-1, 1)
elif x == 1:
return np.random.normal(0, 1)
draw_vec = np.vectorize(draw)
draws = draw_vec(indicators)
If you need to extend the solution to a mixture of more than 2 distributions, you can use np.random.multinomial to assign samples to distributions and add additional cases to the if/else in draw.

Grouping algorithm based on distance

I have a set of 200 points with x and y coordinates. I need to make batches of 20 such that each point in the batch is "n" centimeters away from the other 19 i.e. no two points in one batch are within "n" cm of the other. A point should only belong to one batch. How do I solve this?
I've used trees to draw out branches such that a new node is added only if it is "n" cm away from all other nodes in the branch. This works but is extremely slow.
Input.csv: |Point Name||X coordinate||Y coordinate|
Output: Lists of batches
From what I guess from your question, this should to it:
import math
from random import randrange
test_data = {(randrange(0, 1000), randrange(0, 1000)) for _ in range(200)}
inf = float("inf")
def distance(p1, p2):
return math.hypot(p2[0] - p1[0], p2[1] - p1[1])
def batch(data: set, min_distance, max_distance=inf, count=20, remove=True):
result = [next(iter(data))]
candidates = {t for t in data if max_distance > distance(result[0], t) > min_distance}
while len(result) < count and len(candidates) > 0:
result.append(candidates.pop())
candidates = {t for t in data if all(max_distance > distance(p, t) > min_distance for p in result)}
if len(result)<count:
raise ValueError("Not enough values in data that have great enough distances between each other")
if remove:
data.difference_update(result)
return result
print(len(test_data))
i = 1
while test_data:
print(i,batch(test_data, 10))
i+=1

Resources