creating an array of numbers whose frequency resembles bell curve - python-3.x

I want to create an array A [1 ,1 , 2, 2 ,2 , 5, 5 ,5 ,....] with numbers from [a,b] such that
An histogram where Y-Axis is the frequency of the number in the array and X-axis is [a,b] resembles a bell curve.
Bell Curve
The sum of frequency(i)*i for all i in [a,b] is approximately around a large number K
Many functions are available in python like numpy.random.normal or scipsy.stats.truncnorm but I am not able to fully understand their use and how they can help me to create such an array.

The first point is easy, for the second point, I'm assuming you want the "integral" of freq * x to be close to K (making each x * freq(x) ~ K is mathematically impossible). You can do that by adjusting sample size.
First step: bell curve shaped integer numbers between a and b, use scipy.stats.truncnorm. From the docs:
Notes
The standard form of this distribution is a standard normal truncated to the range [a, b] --- notice that a and b are defined over
the domain of the standard normal. To convert clip values for a
specific mean and standard deviation, use::
a, b = (myclip_a - my_mean) / my_std, (myclip_b - my_mean) / my_std
Take a normal in the -3, 3 range, so the curve is nice. Adjust mean and standard deviation so -3, 3 becomes a, b:
from scipy.stats import truncnorm
a, b = 10, 200
loc = (a + b) / 2
scale = (b - a) / 6
n = 100
f = truncnorm(-3,3, loc=(a+b)/2,scale=(b-a)/6)
Now, since frequency is related to the probability density function: sum(freq(i) * i ) ~ n * sum(pdf(i) * i). Therefore, n = K / sum(pdf(i) * i). This can be obtained as:
K = 200000
i = np.arange(a, b +1)
n = int(K / i.dot(f.pdf(i)))
Now generate integer random samples, and check function:
samples = f.rvs(size=n).astype(np.int)
import matplotlib.pyplot as plt
plt.hist(samples, bins = 20)
print(np.histogram(samples, bins=b-a+1)[0].dot(np.arange(a,b+1)))
>> 200315

Related

Solving vector second order differential equation while indexing into an array

I'm attempting to solve the differential equation:
m(t) = M(x)x'' + C(x, x') + B x'
where x and x' are vectors with 2 entries representing the angles and angular velocity in a dynamical system. M(x) is a 2x2 matrix that is a function of the components of theta, C is a 2x1 vector that is a function of theta and theta' and B is a 2x2 matrix of constants. m(t) is a 2*1001 array containing the torques applied to each of the two joints at the 1001 time steps and I would like to calculate the evolution of the angles as a function of those 1001 time steps.
I've transformed it to standard form such that :
x'' = M(x)^-1 (m(t) - C(x, x') - B x')
Then substituting y_1 = x and y_2 = x' gives the first order linear system of equations:
y_2 = y_1'
y_2' = M(y_1)^-1 (m(t) - C(y_1, y_2) - B y_2)
(I've used theta and phi in my code for x and y)
def joint_angles(theta_array, t, torques, B):
phi_1 = np.array([theta_array[0], theta_array[1]])
phi_2 = np.array([theta_array[2], theta_array[3]])
def M_func(phi):
M = np.array([[a_1+2.*a_2*np.cos(phi[1]), a_3+a_2*np.cos(phi[1])],[a_3+a_2*np.cos(phi[1]), a_3]])
return np.linalg.inv(M)
def C_func(phi, phi_dot):
return a_2 * np.sin(phi[1]) * np.array([-phi_dot[1] * (2. * phi_dot[0] + phi_dot[1]), phi_dot[0]**2])
dphi_2dt = M_func(phi_1) # (torques[:, t] - C_func(phi_1, phi_2) - B # phi_2)
return dphi_2dt, phi_2
t = np.linspace(0,1,1001)
initial = theta_init[0], theta_init[1], dtheta_init[0], dtheta_init[1]
x = odeint(joint_angles, initial, t, args = (torque_array, B))
I get the error that I cannot index into torques using the t array, which makes perfect sense, however I am not sure how to have it use the current value of the torques at each time step.
I also tried putting odeint command in a for loop and only evaluating it at one time step at a time, using the solution of the function as the initial conditions for the next loop, however the function simply returned the initial conditions, meaning every loop was identical. This leads me to suspect I've made a mistake in my implementation of the standard form but I can't work out what it is. It would be preferable however to not have to call the odeint solver in a for loop every time, and rather do it all as one.
If helpful, my initial conditions and constant values are:
theta_init = np.array([10*np.pi/180, 143.54*np.pi/180])
dtheta_init = np.array([0, 0])
L_1 = 0.3
L_2 = 0.33
I_1 = 0.025
I_2 = 0.045
M_1 = 1.4
M_2 = 1.0
D_2 = 0.16
a_1 = I_1+I_2+M_2*(L_1**2)
a_2 = M_2*L_1*D_2
a_3 = I_2
Thanks for helping!
The solver uses an internal stepping that is problem adapted. The given time list is a list of points where the internal solution gets interpolated for output samples. The internal and external time lists are in no way related, the internal list only depends on the given tolerances.
There is no actual natural relation between array indices and sample times.
The translation of a given time into an index and construction of a sample value from the surrounding table entries is called interpolation (by a piecewise polynomial function).
Torque as a physical phenomenon is at least continuous, a piecewise linear interpolation is the easiest way to transform the given function value table into an actual continuous function. Of course one also needs the time array.
So use numpy.interp1d or the more advanced routines of scipy.interpolate to define the torque function that can be evaluated at arbitrary times as demanded by the solver and its integration method.

Complex number computational error grows as the size of matrix increase

If I have two small complex matrices, the complex number multiplication is fine even when I do it manually (Breaking the complex numbers into real and imaginary parts and do the multiplication respectively).
import numpy as np
a_shape = (3,10)
b_shape = (10,3)
# Generating the first complex matrix a
np.random.seed(0)
a_real = np.random.randn(a_shape[0], a_shape[1])
np.random.seed(1)
a_imag = np.random.randn(a_shape[0], a_shape[1])
a = a_real + a_imag*1j
# Generating the second complex matrix b
np.random.seed(2)
b_real = np.random.randn(b_shape[0], b_shape[1])
np.random.seed(3)
b_imag = np.random.randn(b_shape[0], b_shape[1])
b = b_real + b_imag*1j
# 1st approach to do complex multiplication
output1 = np.dot(a,b)
# Manaul complex multiplication
output_real = np.dot(a.real,b.real) - np.dot(a.imag,b.imag)
np.array_equal(output1.real, output_real) # the results are the same
>>> True
However, if my matrices are bigger, the results obtained by np.(a,b) and multiplying it manually are different.
a_shape = (3,500)
b_shape = (500,3)
# Generating the first complex matrix a
np.random.seed(0)
a_real = np.random.randn(a_shape[0], a_shape[1])
np.random.seed(1)
a_imag = np.random.randn(a_shape[0], a_shape[1])
a = a_real + a_imag*1j
# Generating the second complex matrix b
np.random.seed(2)
b_real = np.random.randn(b_shape[0], b_shape[1])
np.random.seed(3)
b_imag = np.random.randn(b_shape[0], b_shape[1])
b = b_real + b_imag*1j
# 1st approach to do complex multiplication
output1 = np.dot(a,b)
# 2nd approach to do complex multiplication
output_real = np.dot(a.real,b.real) - np.dot(a.imag,b.imag)
np.array_equal(output1.real, output_real)
>>> False
I am asking this because I need to do some complex number multiplication in pytorch. pytorch doesn't support complex number natively, so I need to do it manually for the real and imagery components.
Then the result is slightly off than using np.dot(a,b)
Any resolution to this problem?
Differences between the two calculations
output1.real - output_real
>>>array([[-3.55271368e-15, -2.48689958e-14, 1.06581410e-14],
[-1.06581410e-14, -5.32907052e-15, -7.10542736e-15],
[ 0.00000000e+00, -2.84217094e-14, -7.10542736e-15]])
You don't say how small the differences are but I suspect what you are seeing has nothing to do with complex numbers but with the nature of floating point arithmetic.
In particular floating point addition is not associative, that is we do not necessarily have
(a + b) + c = a + (b + c)
This would explain what you are seeing, as what you are doing is comparing
Sum{ Ra[i]*Rb[i] - Ia[i]*Ib[i]}
and
Sum{ Ra[i]*Rb[i]} - Sum{ Ia[i]*Ib[i]}
(where Ra[i] is the real part of a[i] etc)
One thing to try to see that this is the problem is to restrict the real and complex parts of the numbers to be, say, a whole number of sixteenths. With such numbers -- as long as you don't add an outrageous number (many many billions) of them -- double precision floating point arithmetic will be exact and so you should get identical results. For example in C you could generate such numbers by generating a bunch of random integers between say -16 and 16 and then divining each by the (double precision) number 16.0, to get a double precision number between -1 and 1 that is a whole number of sixteenths.

Generate a random point on an elliptical curve

I'm writing a program which randomly chooses two integers within a certain interval. I also wrote a class (which I didn't add below) which uses two numbers 'a' and 'b' and creates an elliptical curve of the form:
y^2 = x^3 + ax + b
I've written the following to create the two random numbers.
def numbers():
n = 1
while n>0:
a = random.randint(-100,100)
b = random.randint(-100,100)
if -16 * (4 * a ** 3 + 27 * b ** 2) != 0:
result = [a,b]
return result
n = n+1
Now I would like to generate a random point on this elliptical curve. How do I do that?
The curve has an infinite length, as for every y ϵ ℝ there is at least one x ϵ ℝ so that (x, y) is on the curve. So if we speak of a random point on the curve we cannot hope to have a homogeneous distribution of the random point over the whole curve.
But if that is not important, you could take a random value for y within some range, and then calculate the roots of the following function:
f(x) = x3 + ax + b - y2
This will result in three roots, of which possibly two are complex (not real numbers). You can take a random real root from that. This will be the x coordinate for the random point.
With the help of numpy, getting the roots is easy, so this is the function for getting a random point on the curve, given values for a and b:
def randomPoint(a, b):
y = random.randint(-100,100)
# Get roots of: f(x) = x^3 + ax + b - y^2
roots = numpy.roots([1, 0, a, b - y**2])
# 3 roots are returned, but ignore potential complex roots
# At least one will be real
roots = [val.real for val in roots if val.imag == 0]
# Choose a random root among those real root(s)
x = random.choice(roots)
return [x, y]
See it run on repl.it.

How to start this "Number Density of Particles" homework in Python?

Part 2 - Determination of Number Density of Particles
If we say that q is the production rate of particles of a specific size, then in an interval dt the total number of particles produced is just q dt. To make things concrete in what follows, please adopt the case:
a = 0.9amax
q = 100000
Consider this number of particles at some distance r from the nucleus. The number density of particles will be number divided by volume, so to find number density we must compute the volume of a shell of radius r with a thickness that corresponds to how far the particles will travel in our time interval dt. Obviously that’s just the velocity of the particle at radius r times the time interval v(r) dt, so the volume of our shell is:
Volume = Shell Surface Area×Shell Thickness = 4πr2v(r)dt
Therefore, the number density, n, at radius r is:
n(r) = q dt /4πr2v(r)dt = q /4πr2v(r) (equation5)
You will note that our expression above will have a singularity for the number density of particles right at the surface of the nucleus, since at that position the outward velocity, v(R), is 0. Obviously this is an indication that we expect the particle density n to drop very rapidly as the dust is accelerated away from the surface. For now, let’s not worry about this point — we don’t need it later — and just graph how the number density varies with distance from the nucleus, starting with the 1st point after the surface value
• Evaluate Eqaution 5 for all calculated points using the parameters for q and a given above.
• Make a log-log graph of the number density versus radius. You should find that, after terminal velocity is achieved, the number density decreases as r−2, corresponding to a slope of -2 on a log-log plot
Current code:
% matplotlib inline
import numpy as np
import matplotlib.pyplot as pl
R = 2000 #Nucleus Radius (m)
GM_n = 667 #Nucleus Mass (m^3 s^-2)
Q = 7*10**27 #Gas Production Rate (molecules s^-1)
V_g = 1000 #Gas Velocity (m s^-1)
C_D = 4 #Drag Coefficient Dimensionless
p_d = 500 #Grain Density (kg m^-3)
M_h2o = .01801528/(6.022*10**23) #Mass of a water molecule (g/mol)
pi = np.pi
p_g_R = M_h2o*Q/(4*np.pi*R**2*V_g)
print ('Gas Density at the comets nucleus: ', p_g_R)
a_max = (3/8)*C_D*(V_g**2)*p_g_R*(1/p_d)*((R**2)/GM_n)
print ('Radius of Maximum Size Particle: ', a_max)
def drag_force(C_D,V_g,p_g_R,pi,a,v):
drag = .5*C_D*((V_g - v)**2)*p_g_R*pi*a**2
return drag
def grav_force(GM_n,M_d,r):
grav = -(GM_n*M_d)/(r**2)
return grav
def p_g_r(p_g_R,R,r):
p_g_r = p_g_R*(R**2/r**2)
return p_g_r
dt = 1
tfinal = 100000
v0 = 0
t = np.arange(0.,tfinal+dt,dt)
npoints = len(t)
r = np.zeros(npoints)
v = np.zeros(npoints)
r[0]= R
v[0]= v0
a = np.array([0.9,0.5,0.1,0.01,0.001])*a_max
for j in range(len(a)):
M_d = 4/3*pi*a[j]**3*p_d
for i in range(len(t)-1):
rmid = r[i] + v[i]*dt/2.
vmid = v[i] + (grav_force(GM_n,M_d,r[i])+drag_force(C_D,V_g,p_g_r(p_g_R,R,r[i]),pi,a[j],v[i]))*dt/2.
r[i+1] = r[i] + vmid*dt
v[i+1] = v[i] + (grav_force(GM_n,M_d,rmid)+drag_force(C_D,V_g,p_g_r(p_g_R,R,rmid),pi,a[j],vmid))*dt
pl.plot(r,v)
pl.show()
a_2= 0.9*a_max
q = 100000
I have never programmed anything like this before, my class is very difficult for me and I don't understand it. I have developed the above code with the help of the professor, and I am nearly out of time to finish this project. I just want help understanding the problem.
How do I find v(r) when I only have v(t), r(t)?
What do I do to calculate the r values and what r values do I even use?
You have v as a known function of time and also r as another known function of time. You can invert these to get t vs. v and t vs. r. To get v as a function of r, eliminate t.

Lattice Points in a 2D plane

Given 2 point in a 2D plane, how many lattice points lie within these two point?
For example, for A (3, 3) and B (-1, -1) the output is 5. The points are: (-1, -1), (0, 0), (1, 1), (2, 2) and (3, 3).
Apparently by "lattice points lie within two points" you mean (letting LP stand for lattice point) the LP's on the line between two points (A and B).
The equation of line AB is y = m*x + b for some slope and intercept numbers m and b. For cases of interest, we can assume m, b are rational, because if either is irrational there is at most 1 LP on AB. (Proof: If 2 or more LP's are on line, it has rational slope, say e/d, with d,e integers; then y=b+x*e/d so at LP (X,Y) on line, d*b = d*Y-X*e, which is an integer, hence b is rational.)
In following, we suppose A = (u,v) and B = (w,z), with u,w and v,z having rational differences, and typically write y = mx+b with m=e/d and b=g/f.
Case 1. A, B both are LP's: Let q = gcd(u-w,v-z); take d = (u-w)/q and e = (v-z)/q and it's easily seen that there are q+1 lattice points on AB.
Case 2a. A is an LP, B isn't: If u-w = h/i and v-z = j/k
then m = j*i/(h*k). Let q = gcd(j*i,h*k), d = h*k/q, e=j*i/q, w' = u + d*floor((w-u)/d) and similarly for z', then solve (u,v),(w',z') as in case 1. For case 2b swap A and B.
Case 3. Neither A nor B is an LP: After finding an LP C on the extended line through A,B, use arithmetic like in Case 2 to find LP A' inside line segment AB and apply case 2. To find A', if m = e/d, b = g/f, note that f*d*y = d*g + e*f*x is of the form p*x + q*y = r, a simple Diophantine equation that is solvable for C=(x,y) iff gcd(p,q) divides r.
Complexity: gcd(m,n) is O(ln(min(m,n)) so algorithm complexity is typically O(ln(Dx)) or O(ln(Dy)) if A,B are separated by x,y distances Dx,Dy.

Resources