curve fitting with integer inputs Python 3.3 - python-3.x

I am using scipy's curvefit module to fit a function and wanted to know if there is a way to tell it the the only possible entries are integers not real numbers? Any ideas as to another way of doing this?

In its general form, an integer programming problem is NP-hard ( see here ). There are some efficient heuristic or approximate algorithm to solve this problem, but none guarantee an exact optimal solution.
In scipy you may implement a grid search over the integer coefficients and use, say, curve_fit over the real parameters for the given integer coefficients. As for grid search. scipy has brute function.
For example if y = a * x + b * x^2 + some-noise where a has to be integer this may work:
Generate some test data with a = 5 and b = -1.5:
coef, n = [5, - 1.5], 50
xs = np.linspace(0, 10, n)[:,np.newaxis]
xs = np.hstack([xs, xs**2])
noise = 2 * np.random.randn(n)
ys = np.dot(xs, coef) + noise
A function which given the integer coefficients fits the real coefficient using curve_fit method:
def optfloat(intcoef, xs, ys):
from scipy.optimize import curve_fit
def poly(xs, floatcoef):
return np.dot(xs, [intcoef, floatcoef])
popt, pcov = curve_fit(poly, xs, ys)
errsqr = np.linalg.norm(poly(xs, popt) - ys)
return dict(errsqr=errsqr, floatcoef=popt)
A function which given the integer coefficients, uses the above function to optimize the float coefficient and returns the error:
def errfun(intcoef, *args):
xs, ys = args
return optfloat(intcoef, xs, ys)['errsqr']
Minimize errfun using scipy.optimize.brute to find optimal integer coefficient and call optfloat with the optimized integer coefficient to find the optimal real coefficient:
from scipy.optimize import brute
grid = [slice(1, 10, 1)] # grid search over 1, 2, ..., 9
# it is important to specify finish=None in below
intcoef = brute(errfun, grid, args=(xs, ys,), finish=None)
floatcoef = optfloat(intcoef, xs, ys)['floatcoef'][0]
Using this method I obtain [5.0, -1.50577] for the optimal coefficients, which is exact for the integer coefficient, and close enough for the real coefficient.

In general, the answer is No: scipy.optimize.curve_fit() and leastsq() that it is based on, and (AFAIK) all the other solvers in scipy.optimize work strictly on floating point numbers.
You could try increasing the value of epsfcn (which has a default value of numpy.finfo('double').eps ~ 2.e-16), which would be used as the initial step to all variables in the problem. The basic issue is that the fitting algorithm will adjust a floating point number, and if you do
int_var = int(float_var)
and the algorithm changes float_var from 1.0 to 1.00000001, it will see no difference in the result and decide that that value does not actually alter the fit metric.
Another approach would be to have a floating point parameter 'tmp_float_var' that is freely adjusted by the fitting algorithm but then in your objective function use
int_var = int(tmp_float_var / numpy.finfo('double').eps)
as the value for your integer variable. That might need a little tweaking, and might be a little unstable, but ought to work.

Related

taking the norm of 3 vectors in python

This is probably a stupid question, but for some reason I can't get the norm of three matrices of vectors.
Each vector in the x matrix represents the x coordinate of a sensor (8 sensors total) for three different experiments. Same for y and z.
ex:
x = [array([ 2.239, 3.981, -8.415, 33.895, 48.237, 52.13 , 60.531, 56.74 ]), array([ 2.372, 6.06 , -3.672, 3.704, -5.926, -2.341, 35.667, 62.097])]
y = [array([ 18.308, -17.83 , -22.278, -99.67 , -121.575, -116.794,-123.132, -127.802]), array([ -3.808, 0.974, -3.14 , 6.645, 2.531, 7.312, -129.236, -112. ])]
z = [array([-1054.728, -1054.928, -1054.928, -1058.128, -1058.928, -1058.928, -1058.928, -1058.928]), array([-1054.559, -1054.559, -1054.559, -1054.559, -1054.559, -1054.559, -1057.959, -1058.059])]
I tried doing:
norm= np.sqrt(np.square(x)+np.square(y)+np.square(z))
x = x/norm
y = y/norm
z = z/norm
However, I'm pretty sure its wrong. When I then try and sum the components of let's say np.sum(x[0]) I don't get anywhere close to 1.
Normalization does not make the sum of the components equal to one. Normalization makes the norm of the vector equal to one. You can check if your code worked by taking the norm (square root of the sum of the squared elements) of the normalized vector. That should equal 1.
From what I can tell, your code is working as intended.
I made a mistake - your code is working as intended, but not for your application. You could define a function to normalize any vector that you pass to it, much as you did in your program as follows:
def normalize(vector):
norm = np.sqrt(np.sum(np.square(vector)))
return vector/norm
However, because x, y, and z each have 8 elements, you can't normalize x with the components from x, y, and z.
What I think you want to do is normalize the vector (x,y,z) for each of your 8 sensors. So, you should pass 8 vectors, (one for each sensor) into the normalize function I defined above. This might look something like this:
normalized_vectors = []
for i in range(8):
vector = np.asarray([x[i], y[i],z[i]])
normalized_vectors.append = normalize(vector)

How to calculate Covariance and Correlation in Python without using cov and corr?

How can we calculate the correlation and covariance between two variables without using cov and corr in Python3?
At the end, I want to write a function that returns three values:
a boolean that is true if two variables are independent
covariance of two variables
correlation of two variables.
You can find the definition of correlation and covariance here:
https://medium.com/analytics-vidhya/covariance-and-correlation-math-and-python-code-7cbef556baed
I wrote this part for covariance:
'''
ans=[]
mean_x , mean_y = x.mean() , y.mean()
n = len(x)
Cov = sum((x - mean_x) * (y - mean_y)) / n
sum_x = float(sum(x))
sum_y = float(sum(y))
sum_x_sq = sum(xi*xi for xi in x)
sum_y_sq = sum(yi*yi for yi in y)
psum = sum(xi*yi for xi, yi in zip(x, y))
num = psum - (sum_x * sum_y/n)
den = pow((sum_x_sq - pow(sum_x, 2) / n) * (sum_y_sq - pow(sum_y, 2) / n), 0.5)
if den == 0: return 0
return num / den
'''
For the covariance, just subtract the respective means and multiply the vectors together (using the dot product). (Of course, make sure whether you're using the sample covariance or population covariance estimate -- if you have "enough" data the difference will be tiny, but you should still account for it if necessary.)
For the correlation, divide the covariance by the standard deviations of both.
As for whether or not two columns are independent, that's not quite as easy. For two random variables, we just have that $\mathbb{E}\left[(X - \mu_X)(Y - \mu_Y)\right] = 0$, where $\mu_X, \mu_Y$ are the means of the two variables. But, when you have a data set, you are not dealing with the actual probability distributions; you are dealing with a sample. That means that the correlation will very likely not be exactly $0$, but rather a value close to $0$. Whether or not this is "close enough" will depend on your sample size and what other assumptions you're willing to make.

Calculating a custom probability distribution in python (numerically)

I have a custom (discrete) probability distribution defined somewhat in the form: f(x)/(sum(f(x')) for x' in a given discrete set X). Also, 0<=x<=1.
So I have been trying to implement it in python 3.8.2, and the problem is that the numerator and denominator both come out to be really small and python's floating point representation just takes them as 0.0.
After calculating these probabilities, I need to sample a random element from an array, whose each index may be selected with the corresponding probability in the distribution. So if my distribution is [p1,p2,p3,p4], and my array is [a1,a2,a3,a4], then probability of selecting a2 is p2 and so on.
So how can I implement this in an elegant and efficient way?
Is there any way I could use the np.random.beta() in this case? Since the difference between the beta distribution and my actual distribution is only that the normalization constant differs and the domain is restricted to a few points.
Note: The Probability Mass function defined above is actually in the form given by the Bayes theorem and f(x)=x^s*(1-x)^f, where s and f are fixed numbers for a given iteration. So the exact problem is that, when s or f become really large, this thing goes to 0.
You could well compute things by working with logs. The point is that while both the numerator and denominator might underflow to 0, their logs won't unless your numbers are really astonishingly small.
You say
f(x) = x^s*(1-x)^t
so
logf (x) = s*log(x) + t*log(1-x)
and you want to compute, say
p = f(x) / Sum{ y in X | f(y)}
so
p = exp( logf(x) - log sum { y in X | f(y)}
= exp( logf(x) - log sum { y in X | exp( logf( y))}
The only difficulty is in computing the second term, but this is a common problem, for example here
On the other hand computing logsumexp is easy enough to to by hand.
We want
S = log( sum{ i | exp(l[i])})
if L is the maximum of the l[i] then
S = log( exp(L)*sum{ i | exp(l[i]-L)})
= L + log( sum{ i | exp( l[i]-L)})
The last sum can be computed as written, because each term is now between 0 and 1 so there is no danger of overflow, and one of the terms (the one for which l[i]==L) is 1, and so if other terms underflow, that is harmless.
This may however lose a little accuracy. A refinement would be to recognize the set A of indices where
l[i]>=L-eps (eps a user set parameter, eg 1)
And then compute
N = Sum{ i in A | exp(l[i]-L)}
B = log1p( Sum{ i not in A | exp(l[i]-L)}/N)
S = L + log( N) + B

Solving vector second order differential equation while indexing into an array

I'm attempting to solve the differential equation:
m(t) = M(x)x'' + C(x, x') + B x'
where x and x' are vectors with 2 entries representing the angles and angular velocity in a dynamical system. M(x) is a 2x2 matrix that is a function of the components of theta, C is a 2x1 vector that is a function of theta and theta' and B is a 2x2 matrix of constants. m(t) is a 2*1001 array containing the torques applied to each of the two joints at the 1001 time steps and I would like to calculate the evolution of the angles as a function of those 1001 time steps.
I've transformed it to standard form such that :
x'' = M(x)^-1 (m(t) - C(x, x') - B x')
Then substituting y_1 = x and y_2 = x' gives the first order linear system of equations:
y_2 = y_1'
y_2' = M(y_1)^-1 (m(t) - C(y_1, y_2) - B y_2)
(I've used theta and phi in my code for x and y)
def joint_angles(theta_array, t, torques, B):
phi_1 = np.array([theta_array[0], theta_array[1]])
phi_2 = np.array([theta_array[2], theta_array[3]])
def M_func(phi):
M = np.array([[a_1+2.*a_2*np.cos(phi[1]), a_3+a_2*np.cos(phi[1])],[a_3+a_2*np.cos(phi[1]), a_3]])
return np.linalg.inv(M)
def C_func(phi, phi_dot):
return a_2 * np.sin(phi[1]) * np.array([-phi_dot[1] * (2. * phi_dot[0] + phi_dot[1]), phi_dot[0]**2])
dphi_2dt = M_func(phi_1) # (torques[:, t] - C_func(phi_1, phi_2) - B # phi_2)
return dphi_2dt, phi_2
t = np.linspace(0,1,1001)
initial = theta_init[0], theta_init[1], dtheta_init[0], dtheta_init[1]
x = odeint(joint_angles, initial, t, args = (torque_array, B))
I get the error that I cannot index into torques using the t array, which makes perfect sense, however I am not sure how to have it use the current value of the torques at each time step.
I also tried putting odeint command in a for loop and only evaluating it at one time step at a time, using the solution of the function as the initial conditions for the next loop, however the function simply returned the initial conditions, meaning every loop was identical. This leads me to suspect I've made a mistake in my implementation of the standard form but I can't work out what it is. It would be preferable however to not have to call the odeint solver in a for loop every time, and rather do it all as one.
If helpful, my initial conditions and constant values are:
theta_init = np.array([10*np.pi/180, 143.54*np.pi/180])
dtheta_init = np.array([0, 0])
L_1 = 0.3
L_2 = 0.33
I_1 = 0.025
I_2 = 0.045
M_1 = 1.4
M_2 = 1.0
D_2 = 0.16
a_1 = I_1+I_2+M_2*(L_1**2)
a_2 = M_2*L_1*D_2
a_3 = I_2
Thanks for helping!
The solver uses an internal stepping that is problem adapted. The given time list is a list of points where the internal solution gets interpolated for output samples. The internal and external time lists are in no way related, the internal list only depends on the given tolerances.
There is no actual natural relation between array indices and sample times.
The translation of a given time into an index and construction of a sample value from the surrounding table entries is called interpolation (by a piecewise polynomial function).
Torque as a physical phenomenon is at least continuous, a piecewise linear interpolation is the easiest way to transform the given function value table into an actual continuous function. Of course one also needs the time array.
So use numpy.interp1d or the more advanced routines of scipy.interpolate to define the torque function that can be evaluated at arbitrary times as demanded by the solver and its integration method.

normalize vector with respect to the infinity norm python 3

This is the code I'm trying to write im new to coding so im sure im way off any help would be great. Thank you in advance.
Write a function normalize(vector) which takes in a vector and returns the normalized vector with respect to the infinity norm. i.e. (1/infNorm(vector)) * vector.
def normalize(vector):
infNorm(vector) = abs(vector[0])
for i in vector:
if abs(i) > norm:
infNorm(vector) = abs(i)
finalvector = (1/infNorm(vector)) * vector
return finalvector
vector = [2, 5, 7]
print(normalize(vector))
You are confusing function call parameters using () with sequence indices []. By sequence, I mean a Python sequence, which includes things like tuples and lists. Here, you're using a list as a vector. (You could also use tuples, but only if you don't plan to modify them. So we'll stick with lists, for generality and simplicity.)
Also, you need two loops: one to find the norm, and one to apply it.
def infnorm(vector):
norm = 0
for i in range(len(vector)):
if abs(vector[i]) > norm:
norm = vector[i]
return norm
def normalize(vector):
norm = infnorm(vector)
return [v/norm for v in vector]
vector = [2, 5, 7]
print(normalize(vector))
Results:
[0.2857142857142857, 0.7142857142857143, 1.0]
Note that I didn't take the absolute value of each element before normalizing it. I'm no vector wizard, so that might be wrong, but I'm guessing that the normalized vector can have negative values.
The last tricky bit, the return value for normalize(vector), is called a "list comprehension". It's a nifty python trick to build a list using a formula. They look odd at first, but with a little practice it gets easy and they're quite precise and clear. Check it out.
If you are going to use a for loop to find the maximum value of an array in python, I'd suggest splitting the normalize function in two functions, one to get the infinity norm and another one to calculate the vector, as such:
def infNorm(vector):
norm = vector[0]
for element in vector:
if norm < abs(element):
norm = abs(element)
return norm
def normalize(vector):
norm = infNorm(vector)
new_vector = []
for element in vector:
new_vector.append((1.0/norm)*element)
return new_vector
Otherwise, you could use the max() built-in function from python, with such function, the code would look like this:
def normalize(vector):
norm = abs(max(vector, key=abs))
new_vector = []
for element in vector:
new_vector.append((1.0/norm)*element)
return new_vector
By the way, when you have a symbol, followed by parenthesis, you are trying to invoke a function.So, when you do infNorm(vector) = abs(vector[0]), you are trying to assign a value to a function call, which will result in a syntax error. The correct way would be just infNorm = abs(vector[0]).
The infinity norm is the sum of the absolute values of the elements. For instance, here is what sagemath offers for one vector, for the infinity norm, the 2-norm and the 1-norm.
In general to normalise a vector according to a norm you divide each of its elements by its length in that norm.
Then this can be expressed in Python in this way:
>>> vec = [-2, 5, 3]
>>> inf_norm = sum([abs(v) for v in vec])
>>> inf_norm
10
>>> normalised_vec = [v/inf_norm for v in vec]
>>> normalised_vec
[-0.2, 0.5, 0.3]

Resources