Calculate the improper integral Monte Carlo method - transform

For exaple, I'm trying to transform the integral:
I need to transform it to an integral that goes from 0 to 1 (or from a to b) in order to apply the algorithm of Monte Carlo I implemented. I already have Monte Carlo function, which calculate definite integrals, but I need to transform this improper integral to definite integral and I really don't know how to do it.
Ideally, I want to transform this integral (from -inf to inf):
Also I tried to do transformation, which I found on Inthernet (integration by substitution: x=-ln(y), dx=-1/y), but it doesn't work:
So how can I get transformation of this integral?

I need to transform it to an integral that goes from 0 to 1 (or from a to b) in order to apply the algorithm of Monte Carlo I implemented.
No, you don't.
if you have integral
∫0∞ w(x) g(x) dx
you could sample from w(x) and compute mean value of g(x) at sampled points.
You integral
∫0∞ e-x cos(x) dx
is pretty perfect for such approach - you sample from e-x and compute E[cos(x)]
Along the lines (Python 3.9, Win10 x64)
import numpy as np
rng = np.random.default_rng()
N = 1000000
U = rng.random(N)
W = -np.log(1.0 - U) # sampling exp(-x)
G = np.cos(W)
ans = np.mean(G)
print(ans)
will print something like
0.5002769491719996
And concerning your second integral, see What is the issue in my array division step?

Related

Why my fit for a logarithm function looks so wrong

I'm plotting this dataset and making a logarithmic fit, but, for some reason, the fit seems to be strongly wrong, at some point I got a good enough fit, but then I re ploted and there were that bad fit. At the very beginning there were a 0.0 0.0076 but I changed that to 0.001 0.0076 to avoid the asymptote.
I'm using (not exactly this one for the image above but now I'm testing with this one and there is that bad fit as well) this for the fit
f(x) = a*log(k*x + b)
fit = fit f(x) 'R_B/R_B.txt' via a, k, b
And the output is this
Also, sometimes it says 7 iterations were as is the case shown in the screenshot above, others only 1, and when it did the "correct" fit, it did like 35 iterations or something and got a = 32 if I remember correctly
Edit: here is again the good one, the plot I got is this one. And again, I re ploted and get that weird fit. It's curious that if there is the 0.0 0.0076 when the good fit it's about to be shown, gnuplot says "Undefined value during function evaluation", but that message is not shown when I'm getting the bad one.
Do you know why do I keep getting this inconsistence? Thanks for your help
As I already mentioned in comments the method of fitting antiderivatives is much better than fitting derivatives because the numerical calculus of derivatives is strongly scattered when the data is slightly scatered.
The principle of the method of fitting an integral equation (obtained from the original equation to be fitted) is explained in https://fr.scribd.com/doc/14674814/Regressions-et-equations-integrales . The application to the case of y=a.ln(c.x+b) is shown below.
Numerical calculus :
In order to get even better result (according to some specified criteria of fitting) one can use the above values of the parameters as initial values for iterarive method of nonlinear regression implemented in some convenient software.
NOTE : The integral equation used in the present case is :
NOTE : On the above figure one can compare the result with the method of fitting an integral equation to the result with the method of fitting with derivatives.
Acknowledgements : Alex Sveshnikov did a very good work in applying the method of regression with derivatives. This allows an interesting and enlightening comparison. If the goal is only to compute approximative values of parameters to be used in nonlinear regression software both methods are quite equivalent. Nevertheless the method with integral equation appears preferable in case of scattered data.
UPDATE (After Alex Sveshnikov updated his answer)
The figure below was drawn in using the Alex Sveshnikov's result with further iterative method of fitting.
The two curves are almost indistinguishable. This shows that (in the present case) the method of fitting the integral equation is almost sufficient without further treatment.
Of course this not always so satisfying. This is due to the low scatter of the data.
In ADDITION , answer to a question raised in comments by CosmeticMichu :
The problem here is that the fit algorithm starts with "wrong" approximations for parameters a, k, and b, so during the minimalization it finds a local minimum, not the global one. You can improve the result if you provide the algorithm with starting values, which are close to the optimal ones. For example, let's start with the following parameters:
gnuplot> a=47.5087
gnuplot> k=0.226
gnuplot> b=1.0016
gnuplot> f(x)=a*log(k*x+b)
gnuplot> fit f(x) 'R_B.txt' via a,k,b
....
....
....
After 40 iterations the fit converged.
final sum of squares of residuals : 16.2185
rel. change during last iteration : -7.6943e-06
degrees of freedom (FIT_NDF) : 18
rms of residuals (FIT_STDFIT) = sqrt(WSSR/ndf) : 0.949225
variance of residuals (reduced chisquare) = WSSR/ndf : 0.901027
Final set of parameters Asymptotic Standard Error
======================= ==========================
a = 35.0415 +/- 2.302 (6.57%)
k = 0.372381 +/- 0.0461 (12.38%)
b = 1.07012 +/- 0.02016 (1.884%)
correlation matrix of the fit parameters:
a k b
a 1.000
k -0.994 1.000
b 0.467 -0.531 1.000
The resulting plot is
Now the question is how you can find "good" initial approximations for your parameters? Well, you start with
If you differentiate this equation you get
or
The left-hand side of this equation is some constant 'C', so the expression in the right-hand side should be equal to this constant as well:
In other words, the reciprocal of the derivative of your data should be approximated by a linear function. So, from your data x[i], y[i] you can construct the reciprocal derivatives x[i], (x[i+1]-x[i])/(y[i+1]-y[i]) and the linear fit of these data:
The fit gives the following values:
C*k = 0.0236179
C*b = 0.106268
Now, we need to find the values for a, and C. Let's say, that we want the resulting graph to pass close to the starting and the ending point of our dataset. That means, that we want
a*log(k*x1 + b) = y1
a*log(k*xn + b) = yn
Thus,
a*log((C*k*x1 + C*b)/C) = a*log(C*k*x1 + C*b) - a*log(C) = y1
a*log((C*k*xn + C*b)/C) = a*log(C*k*xn + C*b) - a*log(C) = yn
By subtracting the equations we get the value for a:
a = (yn-y1)/log((C*k*xn + C*b)/(C*k*x1 + C*b)) = 47.51
Then,
log(k*x1+b) = y1/a
k*x1+b = exp(y1/a)
C*k*x1+C*b = C*exp(y1/a)
From this we can calculate C:
C = (C*k*x1+C*b)/exp(y1/a)
and finally find the k and b:
k=0.226
b=1.0016
These are the values used above for finding the better fit.
UPDATE
You can automate the process described above with the following script:
# Name of the file with the data
data='R_B.txt'
# The coordinates of the last data point
xn=NaN
yn=NaN
# The temporary coordinates of a data point used to calculate a derivative
x0=NaN
y0=NaN
linearFit(x)=Ck*x+Cb
fit linearFit(x) data using (xn=$1,dx=$1-x0,x0=$1,$1):(yn=$2,dy=$2-y0,y0=$2,dx/dy) via Ck, Cb
# The coordinates of the first data point
x1=NaN
y1=NaN
plot data using (x1=$1):(y1=$2) every ::0::0
a=(yn-y1)/log((Ck*xn+Cb)/(Ck*x1+Cb))
C=(Ck*x1+Cb)/exp(y1/a)
k=Ck/C
b=Cb/C
f(x)=a*log(k*x+b)
fit f(x) data via a,k,b
plot data, f(x)
pause -1

Bachelier Normal Implied Vol Python Calculation (Help) Jekel

Writing a python script to calc Implied Normal Vol ; in line with Jekel article (Industry Standard).
https://jaeckel.000webhostapp.com/ImpliedNormalVolatility.pdf
They say they are using a Generalized Incomplete Gamma Function Inverse.
For a call:
F(x)=v/(K - F) -> find x that makes this true
Where F is Inverse Incomplete Gamma Function
And x = (K - F)/(T*sqrt(T) ; v is the value of a call
for that x, IV is =(K-F)/x*sqrt(T)
Example I am working with:
F=40
X=38
T=100/365
v=5.25
Vol= 20%
Using the equations I should be able to backout Vol of 20%
Scipy has upper and lower Incomplete Gamma Function Inverse in their special functions.
Lower: scipy.special.gammaincinv(a, y) : {a must be positive param}
Upper: scipy.special.gammainccinv(a, y) : {a must be positive param}
Implementation:
SIG= sympy.symbols('SIG')
F=40
T=100/365
K=38
def Objective(sig):
SIG=sig
return(special.gammaincinv(.5,((F-K)**2)/(2*T*SIG**2))+special.gammainccinv(.5,((F-K)**2)/(2*T*SIG**2))+5.25/(K-F))
x=optimize.brentq(Objective, -20.00,20.00, args=(), xtol=1.48e-8, rtol=1.48e-8, maxiter=1000, full_output=True)
IV=(K-F)/x*T**.5
Print(IV)
I know I am wrong, but Where am I going wrong / how do I fix it and use what I read in the article ?
Did you also post this on the Quantitative Finance Stack Exchange? You may get a better response there.
This is not my field, but it looks like your main problem is that brentq requires the passed Objective function to return values with opposite signs when passed the -20 and 20 arguments. However, this will not end up happening because according to the scipy docs, gammaincinv and gammainccinv always return a value between 0 and infinity.
I'm not sure how to fix this, unfortunately. Did you try implementing the analytic solution (rather than iterative root finding) in the second part of the paper?

Rendering Fractals: The Mobius Transformation and The Newtonian Basin

I understand how to render (two dimensional) "Escape Time Group" fractals (Julia and Mandelbrot), but I can't seem to get a Mobius Transformation or a Newton Basin rendered.
I'm trying to render them using the same method (by recursively using the polynomial equation on each pixel 'n' times), but I have a feeling these fractals are rendered using totally different methods. Mobius 'Transformation' implies that an image must already exist, and then be transformed to produce the geometry, and the Newton Basin seems to plot each point, not just points that fall into a set.
How are these fractals graphed? Are they graphed using the same iterative methods as the Julia and Mandelbrot?
Equations I'm Using:
Julia: Zn+1 = Zn^2 + C
Where Z is a complex number representing a pixel, and C is a complex constant (Correct).
Mandelbrot: Cn+1 = Cn^2 + Z
Where Z is a complex number representing a pixel, and C is the complex number (0, 0), and is compounded each step (The reverse of the Julia, correct).
Newton Basin: Zn+1 = Zn - (Zn^x - a) / (Zn^y - a)
Where Z is a complex number representing a pixel, x and y are exponents of various degrees, and a is a complex constant (Incorrect - creating a centered, eight legged 'line star').
Mobius Transformation: Zn+1 = (aZn + b) / (cZn + d)
Where Z is a complex number representing a pixel, and a, b, c, and d are complex constants (Incorrect, everything falls into the set).
So how are the Newton Basin and Mobius Transformation plotted on the complex plane?
Update: Mobius Transformations are just that; transformations.
"Every Möbius transformation is
a composition of translations,
rotations, zooms (dilations) and
inversions."
To perform a Mobius Transformation, a shape, picture, smear, etc. must be present already in order to transform it.
Now how about those Newton Basins?
Update 2: My math was wrong for the Newton Basin. The denominator at the end of the equation is (supposed to be) the derivative of the original function. The function can be understood by studying 'NewtonRoot.m' from the MIT MatLab source-code. A search engine can find it quite easily. I'm still at a loss as to how to graph it on the complex plane, though...
Newton Basin:
f(x) = x - f(x) / f'(x)
In Mandelbrot and Julia sets you terminate the inner loop if it exceeds a certain threshold as a measurement how fast the orbit "reaches" infinity
if(|z| > 4) { stop }
For newton fractals it is the other way round: Since the newton method is usually converging towards a certain value we are interested how fast it reaches its limit, which can be done by checking when the difference of two consecutive values drops below a certain value (usually 10^-9 is a good value)
if(|z[n] - z[n-1]| < epsilon) { stop }

curve fitting with integer inputs Python 3.3

I am using scipy's curvefit module to fit a function and wanted to know if there is a way to tell it the the only possible entries are integers not real numbers? Any ideas as to another way of doing this?
In its general form, an integer programming problem is NP-hard ( see here ). There are some efficient heuristic or approximate algorithm to solve this problem, but none guarantee an exact optimal solution.
In scipy you may implement a grid search over the integer coefficients and use, say, curve_fit over the real parameters for the given integer coefficients. As for grid search. scipy has brute function.
For example if y = a * x + b * x^2 + some-noise where a has to be integer this may work:
Generate some test data with a = 5 and b = -1.5:
coef, n = [5, - 1.5], 50
xs = np.linspace(0, 10, n)[:,np.newaxis]
xs = np.hstack([xs, xs**2])
noise = 2 * np.random.randn(n)
ys = np.dot(xs, coef) + noise
A function which given the integer coefficients fits the real coefficient using curve_fit method:
def optfloat(intcoef, xs, ys):
from scipy.optimize import curve_fit
def poly(xs, floatcoef):
return np.dot(xs, [intcoef, floatcoef])
popt, pcov = curve_fit(poly, xs, ys)
errsqr = np.linalg.norm(poly(xs, popt) - ys)
return dict(errsqr=errsqr, floatcoef=popt)
A function which given the integer coefficients, uses the above function to optimize the float coefficient and returns the error:
def errfun(intcoef, *args):
xs, ys = args
return optfloat(intcoef, xs, ys)['errsqr']
Minimize errfun using scipy.optimize.brute to find optimal integer coefficient and call optfloat with the optimized integer coefficient to find the optimal real coefficient:
from scipy.optimize import brute
grid = [slice(1, 10, 1)] # grid search over 1, 2, ..., 9
# it is important to specify finish=None in below
intcoef = brute(errfun, grid, args=(xs, ys,), finish=None)
floatcoef = optfloat(intcoef, xs, ys)['floatcoef'][0]
Using this method I obtain [5.0, -1.50577] for the optimal coefficients, which is exact for the integer coefficient, and close enough for the real coefficient.
In general, the answer is No: scipy.optimize.curve_fit() and leastsq() that it is based on, and (AFAIK) all the other solvers in scipy.optimize work strictly on floating point numbers.
You could try increasing the value of epsfcn (which has a default value of numpy.finfo('double').eps ~ 2.e-16), which would be used as the initial step to all variables in the problem. The basic issue is that the fitting algorithm will adjust a floating point number, and if you do
int_var = int(float_var)
and the algorithm changes float_var from 1.0 to 1.00000001, it will see no difference in the result and decide that that value does not actually alter the fit metric.
Another approach would be to have a floating point parameter 'tmp_float_var' that is freely adjusted by the fitting algorithm but then in your objective function use
int_var = int(tmp_float_var / numpy.finfo('double').eps)
as the value for your integer variable. That might need a little tweaking, and might be a little unstable, but ought to work.

Can good type systems distinguish between matrices in different bases?

My program (Hartree-Fock/iterative SCF) has two matrices F and F' which are really the same matrix expressed in two different bases. I just lost three hours of debugging time because I accidentally used F' instead of F. In C++, the type-checker doesn't catch this kind of error because both variables are Eigen::Matrix<double, 2, 2> objects.
I was wondering, for the Haskell/ML/etc. people, whether if you were writing this program you would have constructed a type system where F and F' had different types? What would that look like? I'm basically trying to get an idea how I can outsource some logic errors onto the type checker.
Edit: The basis of a matrix is like the unit. You can say 1L or however many gallons, they both mean the same thing. Or, to give a vector example, you can say (0,1) in Cartesian coordinates or (1,pi/2) in polar. But even though the meaning is the same, the numerical values are different.
Edit: Maybe units was the wrong analogy. I'm not looking for some kind of record type where I can specify that the first field will be litres and the second gallons, but rather a way to say that this matrix as a whole, is defined in terms of some other matrix (the basis), where the basis could be any matrix of the same dimensions. E.g., the constructor would look something like mkMatrix [[1, 2], [3, 4]] [[5, 6], [7, 8]] and then adding that object to another matrix would type-check only if both objects had the same matrix as their second parameters. Does that make sense?
Edit: definition on Wikipedia, worked examples
This is entirely possible in Haskell.
Statically checked dimensions
Haskell has arrays with statically checked dimensions, where the dimensions can be manipulated and checked statically, preventing indexing into the wrong dimension. Some examples:
This will only work on 2-D arrays:
multiplyMM :: Array DIM2 Double -> Array DIM2 Double -> Array DIM2 Double
An example from repa should give you a sense. Here, taking a diagonal requires a 2D array, returns a 1D array of the same type.
diagonal :: Array DIM2 e -> Array DIM1 e
or, from Matt sottile's repa tutorial, statically checked dimensions on a 3D matrix transform:
f :: Array DIM3 Double -> Array DIM2 Double
f u =
let slabX = (Z:.All:.All:.(0::Int))
slabY = (Z:.All:.All:.(1::Int))
u' = (slice u slabX) * (slice u slabX) +
(slice u slabY) * (slice u slabY)
in
R.map sqrt u'
Statically checked units
Another example from outside of matrix programming: statically checked units of dimension, making it a type error to confuse e.g. feet and meters, without doing the conversion.
Prelude> 3 *~ foot + 1 *~ metre
1.9144 m
or for a whole suite of SI units and quanities.
E.g. can't add things of different dimension, such as volumes and lengths:
> 1 *~ centi litre + 2 *~ inch
Error:
Expected type: Unit DVolume a1
Actual type: Unit DLength a0
So, following the repa-style array dimension types, I'd suggest adding a Base phantom type parameter to your array type, and using that to distinguish between bases. In Haskell, the index Dim
type argument gives the rank of the array (i.e. its shape), and you could do similarly.
Or, if by base you mean some dimension on the units, using dimensional types.
So, yep, this is almost a commodity technique in Haskell now, and there's some examples of designing with types like this to help you get started.
This is a very good question. I don't think you can encode the notion of a basis in most type systems, because essentially anything that the type checker does needs to be able to terminate, and making judgments about whether two real-valued vectors are equal is too difficult. You could have (2 v_1) + (2 v_2) or 2 (v_1 + v_2), for example. There are some languages which use dependent types [ wikipedia ], but these are relatively academic.
I think most of your debugging pain would be alleviated if you simply encoded the bases in which you matrix works along with the matrix. For example,
newtype Matrix = Matrix { transform :: [[Double]],
srcbasis :: [Double], dstbasis :: [Double] }
and then, when you M from basis a to b with N, check that N is from b to c, and return a matrix with basis a to c.
NOTE -- it seems most people here have programming instead of math background, so I'll provide short explanation here. Matrices are encodings of linear transformations between vector spaces. For example, if you're encoding a rotation by 45 degrees in R^2 (2-dimensional reals), then the standard way of encoding this in a matrix is saying that the standard basis vector e_1, written "[1, 0]", is sent to a combination of e_1 and e_2, namely [1/sqrt(2), 1/sqrt(2)]. The point is that you can encode the same rotation by saying where different vectors go, for example, you could say where you're sending [1,1] and [1,-1] instead of e_1=[1,0] and e_2=[0,1], and this would have a different matrix representation.
Edit 1
If you have a finite set of bases you are working with, you can do it...
{-# LANGUAGE EmptyDataDecls #-}
data BasisA
data BasisB
data BasisC
newtype Matrix a b = Matrix { coefficients :: [[Double]] }
multiply :: Matrix a b -> Matrix b c -> Matrix a c
multiply (Matrix a_coeff) (Matrix b_coeff) = (Matrix multiplied) :: Matrix a c
where multiplied = undefined -- your algorithm here
Then, in ghci (the interactive Haskell interpreter),
*Matrix> let m = Matrix [[1, 2], [3, 4]] :: Matrix BasisA BasisB
*Matrix> m `multiply` m
<interactive>:1:13:
Couldn't match expected type `BasisB'
against inferred type `BasisA'
*Matrix> let m2 = Matrix [[1, 2], [3, 4]] :: Matrix BasisB BasisC
*Matrix> m `multiply` m2
-- works after you finish defining show and the multiplication algorithm
While I realize this does not strictly address the (clarified) question – my apologies – it seems relevant at least in relation to Don Stewart's popular answer...
I am the author of the Haskell dimensional library that Don referenced and provided examples from. I have also been writing – somewhat under the radar – an experimental rudimentary linear algebra library based on dimensional. This linear algebra library statically tracks the sizes of vectors and matrices as well as the physical dimensions ("units") of their elements on a per element basis.
This last point – tracking physical dimensions on a per element basis – is rather challenging and perhaps overkill for most uses, and one could even argue that it makes little mathematical sense to have quantities of different physical dimensions as elements in any given vector/matrix. However, some linear algebra applications of interest to me such as kalman filtering and weighted least squares estimation typically use heterogeneous state vectors and covariance matrices.
Using a Kalman filter as an example, consider a state vector x = [d, v] which has physical dimensions [L, LT^-1]. The next (future) state vector is predicted by multiplication by the state transition matrix F, i.e.: x' = F x_. Clearly for this equation to make sense F cannot be arbitrary but must have size and physical dimensions [[1, T], [T^-1, 1]]. The predict_x' function below statically ensures that this relationship holds:
predict_x' :: (Num a, MatrixVector f x x) => Mat f a -> Vec x a -> Vec x a
predict_x' f x_ = f |*< x_
(The unsightly operator |*< denotes multiplication of a matrix on the left with a vector on the right.)
More generally, for an a priori state vector x_ of arbitrary size and with elements of arbitrary physical dimensions, passing a state transition matrix f with "incompatible" size and/or physical dimensions to predict_x' will cause a compile time error.
In F# (which originally evolved from OCaml), you can use units of measure. Andrew Kenned, who designed the feature (and also created a very interesting theory behind it) has a great series of articles that demonstrate it.
This can quite likely be used in your scenario - although I don't fully understand the question. For example, you can declare two unit types like this:
[<Measure>] type litre
[<Measure>] type gallon
Adding litres and gallons gives you a compile time error:
1.0<litre> + 1.0<gallon> // Error!
F# doesn't automatically insert conversion between different units, but you can write a conversion function:
let toLitres gal = gal * 3.78541178<litre/gallon>
1.0<litre> + (toLitres 1.0<gallon>)
The beautiful thing about units of measure in F# is that they are automatically inferred and functions are generic. If you multiply 1.0<gallon> * 1.0<gallon>, the result is 1.0<gallon^2>.
People have used this feature for various things - ranging from conversion of virtual meters to screen pixels (in solar system simulations) to converting currencies (dollars in financial systems). Although I'm not expert, it is quite likely that you could use it in some way for your problem domain too.
If it's expressed in a different base, you can just add a template parameter to act as the base. That will differentiate those types. A float is a float is a float- if you don't want two float values to be the same if they actually have the same value, then you need to tell the type system about it.

Resources