Why won't SymPy integrate a standard Log-Normal PDF to 1?
I'm running the following code in Python 3.x and SymPy 1.0.1:
from sympy.stats import density, LogNormal
from sympy import Symbol, integrate, oo
mu, sigma = 0, 1
z = Symbol('z')
X = LogNormal('x', mu, sigma)
f = density(X)(z)
integrate(f, (z, 0, oo))
which should(?) return 1 but outputs:
sqrt(2)*Integral(exp(-log(z)**2/2)/z, (z, 0, oo))/(2*sqrt(pi))
Does anyone know what's going on here?
Apparently, Sympy fails to find the closed form solution of this integral.
You can, however, help Sympy perform the integration. One approach is to perform a transformation of the integration variable with the hope that it will result in a simpler integrand expression that Sympy can handle. Sympy offers a convenient transform() method for this purpose.
import sympy as sp
import sympy.stats
mu, sigma = 0, 1
z = sp.Symbol('z', nonnegative=True)
X = sympy.stats.LogNormal('x', mu, sigma)
f = sympy.stats.density(X)(z)
I = sp.Integral(f, (z, 0, sp.oo))
print(I)
This is the original integral form, which Sympy fails to evaluate. (Note the use of sympy.Integral which returns an unevaluated integral.) One (obvious?) transformation of the integration variable is z -> exp(z), which results in a new integral as follows
I2 = I.transform(z,sp.exp(z))
print(I2)
Now, we may call the doit() method to evaluate the transformed integral:
I2.doit()
1
Related
for my class we are to use the roof function from spicy.optimize to solve a system on non-linear equations. When I type it and tun it I receive the answer as the input array. when looking into the solution it states
'The iteration is not making good progress, as measured by the \n improvement from the last ten iterations.'
import numpy as np
from scipy.optimize import root
z0 = np.array([3 , 3]) #Initial Guesses
def quad(x):
x = z0[0]
y = z0[1]
f = np.array([(x-4)**2 + (y-4)**2 -5, x**2 + y**2 - 16])
return f
result = root(quad, z0)
result
I have tried using different initial guesses, but it keeps having the same message.
I tried using Numpy, Scipy and Scikitlearn, but couldn't find what I need in any of them, basically I need to fit a curve to a dataset, but restricting some of the coefficients to known values, I found how to do it in MATLAB, using fittype, but couldn't do it in python.
In my case I have a dataset of X and Y and I need to find the best fitting curve, I know it's a polynomial of second degree (ax^2 + bx + c) and I know it's values of b and c, so I just needed it to find the value of a.
The solution I found in MATLAB was https://www.mathworks.com/matlabcentral/answers/216688-constraining-polyfit-with-known-coefficients which is the same problem as mine, but with the difference that their polynomial was of degree 5th, how could I do something similar in python?
To add some info: I need to fit a curve to a dataset, so things like scipy.optimize.curve_fit that expects a function won't work (at least as far as I tried).
The tools you have available usually expect functions only inputting their parameters (a being the only unknown in your case), or inputting their parameters and some data (a, x, and y in your case).
Scipy's curve-fit handles that use-case just fine, so long as we hand it a function that it understands. It expects x first and all your parameters as the remaining arguments:
from scipy.optimize import curve_fit
import numpy as np
b = 0
c = 0
def f(x, a):
return c+x*(b+x*a)
x = np.linspace(-5, 5)
y = x**2
# params == [1.]
params, _ = curve_fit(f, x, y)
Alternatively you can reach for your favorite minimization routine. The difference here is that you manually construct the error function so that it only inputs the parameters you care about, and then you don't need to provide that data to scipy.
from scipy.optimize import minimize
import numpy as np
b = 0
c = 0
x = np.linspace(-5, 5)
y = x**2
def error(a):
prediction = c+x*(b+x*a)
return np.linalg.norm(prediction-y)/len(prediction)**.5
result = minimize(error, np.array([42.]))
assert result.success
# params == [1.]
params = result.x
I don't think scipy has a partially applied polynomial fit function built-in, but you could use either of the above ideas to easily build one yourself if you do that kind of thing a lot.
from scipy.optimize import curve_fit
import numpy as np
def polyfit(coefs, x, y):
# build a mapping from null coefficient locations to locations in the function
# coefficients we're passing to curve_fit
#
# idx[j]==i means that unknown_coefs[i] belongs in coefs[j]
_tmp = [i for i,c in enumerate(coefs) if c is None]
idx = {j:i for i,j in enumerate(_tmp)}
def f(x, *unknown_coefs):
# create the entire polynomial's coefficients by filling in the unknown
# values in the right places, using the aforementioned mapping
p = [(unknown_coefs[idx[i]] if c is None else c) for i,c in enumerate(coefs)]
return np.polyval(p, x)
# we're passing an initial value just so that scipy knows how many parameters
# to use
params, _ = curve_fit(f, x, y, np.zeros((sum(c is None for c in coefs),)))
# return all the polynomial's coefficients, not just the few we just discovered
return np.array([(params[idx[i]] if c is None else c) for i,c in enumerate(coefs)])
x = np.linspace(-5, 5)
y = x**2
# (unknown)x^2 + 1x + 0
# params == [1, 0, 0.]
params = fit([None, 0, 0], x, y)
Similar features exist in nearly every mainstream scientific library; you just might need to reshape your problem a bit to frame it in terms of the available primitives.
Recently I was working on some data for which I was able to obtain a curve using curve_fit after saving the plot and the values obtained I returned to the same code later only to find it does not work.
#! python 3.5.2
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats
from scipy.optimize import curve_fit
data= np.array([
[24, 0.176644513],
[27, 0.146382841],
[30, 0.129891534],
[33, 0.105370908],
[38, 0.077820511],
[50, 0.047407538]])
x, y = np.array([]), np.array([])
for val in data:
x = np.append(x, val[0])
y = np.append(y, (val[1]/(1-val[1])))
def f(x, a, b):
return (np.exp(-a*x)**b)
# The original a and b values obtained
a = -0.2 # after rounding
b = -0.32 # after rounding
plt.scatter(x, y)
Xcurve = np.linspace(x[0], x[-1], 500)
plt.plot(Xcurve, f(Xcurve,a,b), ls='--', color='k', lw=1)
plt.show()
# the original code to get the values
a = b = 1
popt, pcov = curve_fit(f, x, y, (a, b))
Whereas, previously curve_fit returned the values a, b = -0.2, -0.32 now returns:
Warning (from warnings module):
File "C:/Users ... line 22
return (np.exp(-a*x)**b)
RuntimeWarning: overflow encountered in exp
The code as far as I am aware did not change. Thanks
Without knowing what changed in the code, it is hard to say what changed between your state of "working" and "not working". It may be that changes in the version of scipy you used give different results: there have changes to the underlying implementation in curve_fit() over the past few years.
But also: curve_fit() (and the underlying python and Fortran code it uses) requires reasonably good initial guesses for the parameters for many problems to work at all. With bad guesses for the parameters, many problems will fail.
Exponential decay problems seem to be especially challenging for the Levenberg-Marquardt algorithm (and the implementations used by curve_fit(), and do require reasonable starting points. It's also easy to get into a part of parameter space where the function evaluates to zero, and changes in the parameter values have no effect.
If possible, if your problem involves exponential decay, it is helpful to work in log space. That is, model log(f), not f itself. For your problem in particular, your model function is exp(-a*x)**b. Is that really what you mean? a and bwill be exactly correlated.
In addition, you may find lmfit helpful. It has a Model class for curve-fitting, using similar underlying code, but allows fixing or setting bounds on any of the parameters. An example for your problem would be (approximately):
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats
from scipy.optimize import curve_fit
import lmfit
data= np.array([
[24, 0.176644513],
[27, 0.146382841],
[30, 0.129891534],
[33, 0.105370908],
[38, 0.077820511],
[50, 0.047407538]])
x, y = np.array([]), np.array([])
for val in data:
x = np.append(x, val[0])
y = np.append(y, (val[1]/(1-val[1])))
def f(x, a, b):
print("In f: a, b = " , a, b)
return (np.exp(-a*x)**b)
fmod = lmfit.Model(f)
params = fmod.make_params(a=-0.2, b=-0.4)
# set bounds on parameters
params['a'].min = -2
params['a'].max = 0
params['b'].vary = False
out = fmod.fit(y, params, x=x)
print(out.fit_report())
plt.plot(x, y)
plt.plot(x, out.best_fit, '--')
plt.show()
These are two codes, one written with Python 3, and the other one written with Wolfram Mathematica. The codes are equivalent, and therefore the results (plots) should be the same. But the codes give different plots. Here are the codes.
The Python code:
import numpy as np
import matplotlib.pyplot as plt
from scipy.special import k0, k1, i0, i1
k=100.0
x = 0.0103406
B = 80.0
def fdens(f):
return (1/2*(1-f**2)**2+f **4/2
+1/2*B*k*x**2*f**2*(1-f**2)*np.log(1+2/(B*k*x**2))
+(B*f**2*(1+B*k*x**2))/((k*(2+B*k*x**2))**2)
-f**4/(2+B*k*x**2)
+(B*f)/(k*x)*
(k0(f*x)*i1(f *np.sqrt(2/(k*B)+x**2))
+i0(f*x)*k1(f *np.sqrt(2/(k*B)+x**2)))/
(k1(f*x)*i1(f *np.sqrt(2/(k*B)+x**2))
-i1(f*x)*k1(f *np.sqrt(2/(k*B)+x**2)))
)
plt.figure(figsize=(10, 8), dpi=70)
X = np.linspace(0, 1, 100, endpoint=True)
C = fdens(X)
plt.plot(X, C, color="blue", linewidth=2.0, linestyle="-")
plt.show()
the python result
The Mathematica code:
k=100.;B=80.;
x=0.0103406;
func[f_]:=1/2*(1-f^2)^2+1/2*B*k*x^2*f^2*(1-f^2)*Log[1+2/(B*k*x^2)]+f^4/2-f^4/(2+B*k*x^2)+B*f^2*(1+B*k*x^2)/(k*(2+B*k*x^2)^2)+(B*f)/(k*x)*(BesselI[1, (f*Sqrt[2/(B*k) + x^2])]*BesselK[0, f*x] + BesselI[0, f*x]*BesselK[1, (f*Sqrt[2/(B*k) + x^2])])/(BesselI[1, (f*Sqrt[2/(B*k) + x^2])]*BesselK[1,f*x] - BesselI[1,f*x]*BesselK[1, (f*Sqrt[2/(B*k) + x^2])]);
Plot[func[f],{f,0,1}]
the Mathematica result
(correct one)
The results are different. Does someone know why?
From my tests it looks like the first order Bessell functions give different results. Both evaluate to Bessel(f * 0.0188925) initially, but the scipy version gives me a range from 0 to 9.4e-3 where wolframalpha (which uses a Mathematica backend) gives 0 to 1.4. I would dig a little deeper into this.
Additionally python uses standard C floating point numbers while Mathematica uses symbolic operations. Sympy tries to mimic such symbolic operations in python.
I am really new to Theano, and I am just trying to figure out some basic functionality. I have a tensor variable x, and i would like the functio to return a tensor variable y of the same shape, but filled with value 0.2. I am not sure how to define y.
For example if x = [1,2,3,4,5], then I would like y = [0,2, 0,2, 0,2, 0,2, 0.2]
from theano import tensor, function
y = tensor.dmatrix('y')
masked_array = function([x],y)
There's probably a dozen different ways to do this and which is best will depend on the context: how this piece of code/functionality fits into the wider program.
Here's one approach:
import theano
import theano.tensor as tt
x = tt.vector()
y = tt.ones_like(x) * 0.2
f = theano.function([x], outputs=y)
print f([1, 2, 3, 4, 5])