I am unable to take the definite integral of an expression. My output is never a number, it is always an expression - python-3.x

This is my expression:
x = symbols('x')
expr=(8.21067284717243e+22*((1/(16.3934426229508*x - 0.19672131147541))**1.2)**0.5*(1/(16.3934426229508*x - 0.19672131147541))**0.6*log(1531.16571479152*(1/(16.3934426229508*x - 0.19672131147541))**1.2)**0.5)
integrate(expr, (x, 0, 5))
I am trying to integrate my mass loss expression (g/Gyr) to find the total mass loss over 5 Gyr.

Related

How to evaluate the trust-constr Lagrangian?

I'm using the trust-constr algorithm from scipy.optimize.minimize with an interval constraint (lowerbound < g(x) < upperbound).
I would like to plot the Lagrangian in a region around the found solution to analyze the convergence behavior.
According to my knowledge, the Lagrangian is defined as:
with:
In the returned OptimizeResult object, I can find the barrier parameter, but the slack variables are missing. The Lagrange multipliers are present, but there is only one per interval constraint, while I would expect two since each interval constraint is converted to two canonical inequality constraints:
Clearly, I'm missing something, so any help would be appreciated.
Minimal reproducible example:
import scipy.optimize as so
import numpy as np
# Problem definition:
# Five 2D points are given, with equally spaced x coordinates.
# The y coordinate of the first point is zero, while the last point has value 10.
# The goal is to find the smallest y coordinate of the other points, given the
# difference between the y coordinates of two consecutive points has to lie within the
# interval [-3, 3].
xs = np.linspace(0, 4, 5)
y0s = np.zeros(xs.shape)
y0s[-1] = 10
objective_fun = lambda y: np.mean(y**2)
def constraint_fun(ys):
'''
Calculates the signed squared consecutive differences of the input vector, augmented
with the first and last element of y0s.
'''
full_ys = y0s.copy()
full_ys[1:-1] = ys
consecutive_differences = full_ys[1:] - full_ys[:-1]
return np.sign(consecutive_differences) * consecutive_differences**2
constraint = so.NonlinearConstraint(fun=constraint_fun, lb=-3**2, ub=3**2)
result = so.minimize(method='trust-constr', fun=objective_fun, constraints=[constraint], x0=y0s[1:-1])
# The number of interval constraints is equal to the size of the output vector of the constraint function.
print(f'Nr. of interval constraints: {len(constraint_fun(y0s[1:-1]))}')
# Expected nr of Lagrange multipliers: 2x number of interval constraints.
print(f'Nr. of Lagrange multipliers: {len(result.v[0])}')
Output:
Nr. of interval constraints: 4
Nr. of Lagrange multipliers: 4
Expected output:
Nr. of interval constraints: 4
Nr. of Lagrange multipliers: 8
You're right, there should indeed be 8 lagrangian multipliers. As a workaround, you can use the old dictionary constraints instead of the NonlinearConstraint objects.
lb, ub = -3**2, 3**2
# Note that lb <= g(x) <= ub is equivalent to g(x) - lb >= 0, ub - g(x) >= 0
cons = [{'type': 'ineq', 'fun': lambda ys: constraint_fun(ys) - lb},
{'type': 'ineq', 'fun': lambda ys: ub - constraint_fun(ys)}]
res = minimize(objective_fun, method="trust-constr", x0=y0s[1:-1], constraints=cons)
Here, 'fun' is expected to be a function such that fun(x) >= 0. This gives me 8 lagrangian multipliers as expected. Nonetheless, it should also work with NonlinearConstraints, so it might be worth opening an issue at the scipy repo on Github.
Regarding your second question: res.constr contains a list of the constraint values at the solution, i.e. the values of g(x) - lb and ub - g(x). Since we have g(x) - lb - s = 0 and ub-g(x)-s=0 it follows immediately that res.constr are just the values of the slack variables you are looking for (when we use dictionary constraints).

How to calculate Covariance and Correlation in Python without using cov and corr?

How can we calculate the correlation and covariance between two variables without using cov and corr in Python3?
At the end, I want to write a function that returns three values:
a boolean that is true if two variables are independent
covariance of two variables
correlation of two variables.
You can find the definition of correlation and covariance here:
https://medium.com/analytics-vidhya/covariance-and-correlation-math-and-python-code-7cbef556baed
I wrote this part for covariance:
'''
ans=[]
mean_x , mean_y = x.mean() , y.mean()
n = len(x)
Cov = sum((x - mean_x) * (y - mean_y)) / n
sum_x = float(sum(x))
sum_y = float(sum(y))
sum_x_sq = sum(xi*xi for xi in x)
sum_y_sq = sum(yi*yi for yi in y)
psum = sum(xi*yi for xi, yi in zip(x, y))
num = psum - (sum_x * sum_y/n)
den = pow((sum_x_sq - pow(sum_x, 2) / n) * (sum_y_sq - pow(sum_y, 2) / n), 0.5)
if den == 0: return 0
return num / den
'''
For the covariance, just subtract the respective means and multiply the vectors together (using the dot product). (Of course, make sure whether you're using the sample covariance or population covariance estimate -- if you have "enough" data the difference will be tiny, but you should still account for it if necessary.)
For the correlation, divide the covariance by the standard deviations of both.
As for whether or not two columns are independent, that's not quite as easy. For two random variables, we just have that $\mathbb{E}\left[(X - \mu_X)(Y - \mu_Y)\right] = 0$, where $\mu_X, \mu_Y$ are the means of the two variables. But, when you have a data set, you are not dealing with the actual probability distributions; you are dealing with a sample. That means that the correlation will very likely not be exactly $0$, but rather a value close to $0$. Whether or not this is "close enough" will depend on your sample size and what other assumptions you're willing to make.

What is this normalization curve? Constant ^ (Constant ^ Observation Indexed to 100)

My apologies, but I'm not quite sure how to even ask this question. I have some normalization curves I've been using at work, and I'd like to know more about them so I speak about them intelligently. They have an s shape like a sigmoid function, but their general formula is the following:
Constant ^ (Constant ^ Observation Indexed to 100)
First, index a variable from 0 to 100 with the highest observation equal to 100, then insert into the equations below for curves with different slopes.
s1 = 0.0000000001 ^ (0.97 ^ Index)
s2 = 0.0000000002 ^ (0.962 ^ Index)
s3 = 0.0000000003 ^ (0.953 ^ Index)
And so on, up to s10. The resulting values are compressed between 0 and 1. s10 has the steepest slope with values that skew toward 1, and s1 has the shallowest slope with values that skew toward 0.
I think they're very clever, and they work well for our purposes, but I don't know what to even call them. Can anyone point me in the right direction? Again, apologies for the vagueness and if this is inappropriately tagged.
The functions you describe are special cases of the Gompertz functions; Gompertz functions have a sigmoidal shape and have many applications across different domains. For example in biology, Gompertz functions are used to model bacterial and tumour cell growth.
To see how your equations relate to the more general Gompertz functions, let's rewrite the equations for s
On a side note, we can see that taking the double-log of s (i.e. log log s) linearises the equation as a function of the index.
We can now compare this with the more general Gompertz function
Taking the natural logarithm gives
We then set a=1 and take the natural logarithm again
So the equations you give are algebraically identical to the Gompertz functions with parameters
Let's plot the function for the three sets of parameters that you give in your post (I use R here but it's easy to do something similar in e.g. Python)
# Define a function f which takes the index and two parameters a and b
# We use a helper function scale01 to scale the values of f in the interval [0,1]
# using min-max scaling
scale01 <- function(x) (x - min(x)) / (max(x) - min(x))
f <- function(idx, a, b) scale01(a ^ (b ^ idx))
# Calculate s for the three different sets of parameters and
# using integer index values from 0 to 100
idx <- 0:100
lst <- lapply(list(
s1 = list(a = 0.0000000001, b = 0.97),
s2 = list(a = 0.0000000002, b = 0.962),
s3 = list(a = 0.0000000003, b = 0.953)),
function(pars) f(idx, a = pars$a, b = pars$b))
# Plot
library(ggplot2)
df <- cbind(idx = idx, stack(lst))
ggplot(df, aes(idx, values, colour = ind)) + geom_line()

Solving vector second order differential equation while indexing into an array

I'm attempting to solve the differential equation:
m(t) = M(x)x'' + C(x, x') + B x'
where x and x' are vectors with 2 entries representing the angles and angular velocity in a dynamical system. M(x) is a 2x2 matrix that is a function of the components of theta, C is a 2x1 vector that is a function of theta and theta' and B is a 2x2 matrix of constants. m(t) is a 2*1001 array containing the torques applied to each of the two joints at the 1001 time steps and I would like to calculate the evolution of the angles as a function of those 1001 time steps.
I've transformed it to standard form such that :
x'' = M(x)^-1 (m(t) - C(x, x') - B x')
Then substituting y_1 = x and y_2 = x' gives the first order linear system of equations:
y_2 = y_1'
y_2' = M(y_1)^-1 (m(t) - C(y_1, y_2) - B y_2)
(I've used theta and phi in my code for x and y)
def joint_angles(theta_array, t, torques, B):
phi_1 = np.array([theta_array[0], theta_array[1]])
phi_2 = np.array([theta_array[2], theta_array[3]])
def M_func(phi):
M = np.array([[a_1+2.*a_2*np.cos(phi[1]), a_3+a_2*np.cos(phi[1])],[a_3+a_2*np.cos(phi[1]), a_3]])
return np.linalg.inv(M)
def C_func(phi, phi_dot):
return a_2 * np.sin(phi[1]) * np.array([-phi_dot[1] * (2. * phi_dot[0] + phi_dot[1]), phi_dot[0]**2])
dphi_2dt = M_func(phi_1) # (torques[:, t] - C_func(phi_1, phi_2) - B # phi_2)
return dphi_2dt, phi_2
t = np.linspace(0,1,1001)
initial = theta_init[0], theta_init[1], dtheta_init[0], dtheta_init[1]
x = odeint(joint_angles, initial, t, args = (torque_array, B))
I get the error that I cannot index into torques using the t array, which makes perfect sense, however I am not sure how to have it use the current value of the torques at each time step.
I also tried putting odeint command in a for loop and only evaluating it at one time step at a time, using the solution of the function as the initial conditions for the next loop, however the function simply returned the initial conditions, meaning every loop was identical. This leads me to suspect I've made a mistake in my implementation of the standard form but I can't work out what it is. It would be preferable however to not have to call the odeint solver in a for loop every time, and rather do it all as one.
If helpful, my initial conditions and constant values are:
theta_init = np.array([10*np.pi/180, 143.54*np.pi/180])
dtheta_init = np.array([0, 0])
L_1 = 0.3
L_2 = 0.33
I_1 = 0.025
I_2 = 0.045
M_1 = 1.4
M_2 = 1.0
D_2 = 0.16
a_1 = I_1+I_2+M_2*(L_1**2)
a_2 = M_2*L_1*D_2
a_3 = I_2
Thanks for helping!
The solver uses an internal stepping that is problem adapted. The given time list is a list of points where the internal solution gets interpolated for output samples. The internal and external time lists are in no way related, the internal list only depends on the given tolerances.
There is no actual natural relation between array indices and sample times.
The translation of a given time into an index and construction of a sample value from the surrounding table entries is called interpolation (by a piecewise polynomial function).
Torque as a physical phenomenon is at least continuous, a piecewise linear interpolation is the easiest way to transform the given function value table into an actual continuous function. Of course one also needs the time array.
So use numpy.interp1d or the more advanced routines of scipy.interpolate to define the torque function that can be evaluated at arbitrary times as demanded by the solver and its integration method.

Chi-square test - not able to match types

I am trying to do chi-square test using this statistics package function. I have following contingency table:
A B
True: 12 8
False: 16 9
I used following code:
import Data.Vector
import Statistics.Test.ChiSquared
sample = fromList [(12, 8), (16, 9)]
main = print(chi2test(sample))
However, it gives following error:
[1 of 1] Compiling Main ( rnchisq.hs, rnchisq.o )
rnchisq.hs:9:23: error:
• Couldn't match expected type ‘Int’
with actual type ‘Vector (Integer, Integer)’
• In the first argument of ‘chi2test’, namely ‘(sample)’
In the first argument of ‘print’, namely ‘(chi2test (sample))’
In the expression: print (chi2test (sample))
Where is the problem and how can it be solved? Thanks for your help.
Edit: As suggested in the answer by #JosephSible I also tried:
main = print(chi2test(1, sample))
(1 being degree of freedom)
But here I get error:
rnchisq.hs:7:22: error:
• Couldn't match expected type ‘Int’
with actual type ‘(Integer, Vector (Integer, Integer))’
• In the first argument of ‘chi2test’, namely ‘(1, sample)’
In the first argument of ‘print’, namely ‘(chi2test (1, sample))’
In the expression: print (chi2test (1, sample))
Following compiled and ran:
main = print $ chi2test 1 sample
However, the output is
Nothing
I expected some value. It remains Nothing even if I drastically change numbers in sample. Why am I getting Nothing?
The chi2test function performs a general chi-square goodness-of-fit test, not a chi-square test on a 2x2 contingency table. It expects a set of pairs representing the "observed" actual counts and the "expected" theoretical mean counts under the null hypothesis, rather than just the counts from the table.
In other words, you need to work through a fair bit of statistical theory to use this function to analyse a 2x2 table, but here's a function that appears to work:
import Data.Vector as V
import Statistics.Test.ChiSquared
sample = ((12, 8), (16, 9))
main = print $ chi2table sample
chi2table ((a,b), (c,d))
= chi2test 2 $ V.fromList $ Prelude.zip [a,b,c,d] [ea,eb,ec,ed]
where n = a + b + c + d
ea = expected (a+b) (a+c)
eb = expected (a+b) (b+d)
ec = expected (c+d) (a+c)
ed = expected (c+d) (b+d)
expected rowtot coltot = (rowtot * coltot) `fdiv` n
fdiv x y = fromIntegral x / fromIntegral y
This gives output:
> main
Just (Test {testSignificance = mkPValue 0.7833089019485086,
testStatistics = 7.56302521008404e-2, testDistribution = chiSquared 2})
Update: With respect to the degrees of freedom, the test itself is calculated using a chi-square with 1 degree of freedom (basically (R-1)*(C-1) for R and C the number of rows and columns of the table). The reason we have to specify 2 here is that the 2 represents the number of degrees of freedom "lost" or "constrained" in addition to the total count. We start with 4 degrees of freedom total, we lose one for the total count across all cells, and we are constrained to lose two more to get down to the 1 degree of freedom for the test.
Anyway, this will match the output of statistical software only if you turn off continuity correction. For example, in R:
> chisq.test(rbind(c(12,8),c(16,9)), correct=FALSE)
Pearson's Chi-squared test
data: rbind(c(12, 8), c(16, 9))
X-squared = 0.07563, df = 1, p-value = 0.7833
>
chi2test takes two arguments, and you're only passing it one. Instead of calling chi2test sample, call chi2test df sample, where df is the number of additional degrees of freedom.

Resources