Python nomarlization - python-3.x

I have some float numbers like:
586.3212341231,-847.3829941845
I want to use sigmoid function to make the floats in a range of [1,-1] for example :
0.842931342,-0.481238571
Any thoughts about it?
I tried scipy but it gives me wrong outcomes.

There are many such functions: see this Wikipedia link for examples. The graphic there in particular gives examples resulting in your desired range, though the endpoints 1 and -1 are not obtainable for finite values of the parameter x.
The simplest function there is
x / (1 + abs(x))
If you really want your first two sample float numbers to be mapped to your second two float numbers, you can tune your function to be
(a + x) / (b + abs(x))
for particular values of a and b. For two desired values of x and f(x) you can find a and b by solving two simultaneous linear equations. I used sympy to get the following for your sample values:
a = 246.362120860444
b = 401.521207478205
So your final resulting function is
(246.362120860444 + x) / (401.521207478205 + abs(x))
I tested this and it gives just the values you want. Here are two plots showing that this gives your desired range (-1, 1). The first one shows your two points best.

Related

Why my fit for a logarithm function looks so wrong

I'm plotting this dataset and making a logarithmic fit, but, for some reason, the fit seems to be strongly wrong, at some point I got a good enough fit, but then I re ploted and there were that bad fit. At the very beginning there were a 0.0 0.0076 but I changed that to 0.001 0.0076 to avoid the asymptote.
I'm using (not exactly this one for the image above but now I'm testing with this one and there is that bad fit as well) this for the fit
f(x) = a*log(k*x + b)
fit = fit f(x) 'R_B/R_B.txt' via a, k, b
And the output is this
Also, sometimes it says 7 iterations were as is the case shown in the screenshot above, others only 1, and when it did the "correct" fit, it did like 35 iterations or something and got a = 32 if I remember correctly
Edit: here is again the good one, the plot I got is this one. And again, I re ploted and get that weird fit. It's curious that if there is the 0.0 0.0076 when the good fit it's about to be shown, gnuplot says "Undefined value during function evaluation", but that message is not shown when I'm getting the bad one.
Do you know why do I keep getting this inconsistence? Thanks for your help
As I already mentioned in comments the method of fitting antiderivatives is much better than fitting derivatives because the numerical calculus of derivatives is strongly scattered when the data is slightly scatered.
The principle of the method of fitting an integral equation (obtained from the original equation to be fitted) is explained in https://fr.scribd.com/doc/14674814/Regressions-et-equations-integrales . The application to the case of y=a.ln(c.x+b) is shown below.
Numerical calculus :
In order to get even better result (according to some specified criteria of fitting) one can use the above values of the parameters as initial values for iterarive method of nonlinear regression implemented in some convenient software.
NOTE : The integral equation used in the present case is :
NOTE : On the above figure one can compare the result with the method of fitting an integral equation to the result with the method of fitting with derivatives.
Acknowledgements : Alex Sveshnikov did a very good work in applying the method of regression with derivatives. This allows an interesting and enlightening comparison. If the goal is only to compute approximative values of parameters to be used in nonlinear regression software both methods are quite equivalent. Nevertheless the method with integral equation appears preferable in case of scattered data.
UPDATE (After Alex Sveshnikov updated his answer)
The figure below was drawn in using the Alex Sveshnikov's result with further iterative method of fitting.
The two curves are almost indistinguishable. This shows that (in the present case) the method of fitting the integral equation is almost sufficient without further treatment.
Of course this not always so satisfying. This is due to the low scatter of the data.
In ADDITION , answer to a question raised in comments by CosmeticMichu :
The problem here is that the fit algorithm starts with "wrong" approximations for parameters a, k, and b, so during the minimalization it finds a local minimum, not the global one. You can improve the result if you provide the algorithm with starting values, which are close to the optimal ones. For example, let's start with the following parameters:
gnuplot> a=47.5087
gnuplot> k=0.226
gnuplot> b=1.0016
gnuplot> f(x)=a*log(k*x+b)
gnuplot> fit f(x) 'R_B.txt' via a,k,b
....
....
....
After 40 iterations the fit converged.
final sum of squares of residuals : 16.2185
rel. change during last iteration : -7.6943e-06
degrees of freedom (FIT_NDF) : 18
rms of residuals (FIT_STDFIT) = sqrt(WSSR/ndf) : 0.949225
variance of residuals (reduced chisquare) = WSSR/ndf : 0.901027
Final set of parameters Asymptotic Standard Error
======================= ==========================
a = 35.0415 +/- 2.302 (6.57%)
k = 0.372381 +/- 0.0461 (12.38%)
b = 1.07012 +/- 0.02016 (1.884%)
correlation matrix of the fit parameters:
a k b
a 1.000
k -0.994 1.000
b 0.467 -0.531 1.000
The resulting plot is
Now the question is how you can find "good" initial approximations for your parameters? Well, you start with
If you differentiate this equation you get
or
The left-hand side of this equation is some constant 'C', so the expression in the right-hand side should be equal to this constant as well:
In other words, the reciprocal of the derivative of your data should be approximated by a linear function. So, from your data x[i], y[i] you can construct the reciprocal derivatives x[i], (x[i+1]-x[i])/(y[i+1]-y[i]) and the linear fit of these data:
The fit gives the following values:
C*k = 0.0236179
C*b = 0.106268
Now, we need to find the values for a, and C. Let's say, that we want the resulting graph to pass close to the starting and the ending point of our dataset. That means, that we want
a*log(k*x1 + b) = y1
a*log(k*xn + b) = yn
Thus,
a*log((C*k*x1 + C*b)/C) = a*log(C*k*x1 + C*b) - a*log(C) = y1
a*log((C*k*xn + C*b)/C) = a*log(C*k*xn + C*b) - a*log(C) = yn
By subtracting the equations we get the value for a:
a = (yn-y1)/log((C*k*xn + C*b)/(C*k*x1 + C*b)) = 47.51
Then,
log(k*x1+b) = y1/a
k*x1+b = exp(y1/a)
C*k*x1+C*b = C*exp(y1/a)
From this we can calculate C:
C = (C*k*x1+C*b)/exp(y1/a)
and finally find the k and b:
k=0.226
b=1.0016
These are the values used above for finding the better fit.
UPDATE
You can automate the process described above with the following script:
# Name of the file with the data
data='R_B.txt'
# The coordinates of the last data point
xn=NaN
yn=NaN
# The temporary coordinates of a data point used to calculate a derivative
x0=NaN
y0=NaN
linearFit(x)=Ck*x+Cb
fit linearFit(x) data using (xn=$1,dx=$1-x0,x0=$1,$1):(yn=$2,dy=$2-y0,y0=$2,dx/dy) via Ck, Cb
# The coordinates of the first data point
x1=NaN
y1=NaN
plot data using (x1=$1):(y1=$2) every ::0::0
a=(yn-y1)/log((Ck*xn+Cb)/(Ck*x1+Cb))
C=(Ck*x1+Cb)/exp(y1/a)
k=Ck/C
b=Cb/C
f(x)=a*log(k*x+b)
fit f(x) data via a,k,b
plot data, f(x)
pause -1

How do i Square Root a Function in VBA

I am working on a MonteCarlo simulation model and part of it is to calculate the following formula:
X = Sqr(1-p)Y + Sqr(p)Z,
Where:
Y and Z are randomly obtained values based (idiosyncratic and systematic factors, respectviely) on a standard normal (inv.) distribution, calculated as:
Application.WorksheetFunction.NormInv (Rnd(), mean, sd)
p represents a correlation factor.
My aim is to square root a recalled formula, however when I try the following (inserting the first Sqr), it does not work and gives an error:
Matrix (n, sims) = (R * Sqr(Application.WorksheetFunction.NormInv(Rnd(), mean, sd))) + (Sqr(1 - R) * RandomS(s, x))
where:
R: Correlation factor
RandomS(s,x): generated matrix with Z values.
I don't want to go into too much details about the background and other variables, as the only problem I am getting is with Square Rooting the equation.
Error message I recieve reads:
Run-time error '5':
Invalid procedure call or argument
When I click debug it takes me to the formula, therefore there must be something wrong with the syntax.
Can you help with directly squaring the formula?
Thank you!
Andrew
Square root is simply Sqr.
It works fine in Excel VBA, so for example:
MsgBox Sqr(144)
...returns 12.
Just don't confuse it with the syntax for a worksheet function with is SQRT.
If you're still having an issue with your formula, tit must be with something other than the Square Root function, and I'd suggest you check the values of your variable, and make sure they are properly declared (preferably with Option Explicit at the top of the module).
Also make sure that you're passing Sqr a positive number.
Documentation: Sqr Function
I'm not a math major, but with your formula:
X = Sqr(1-p)Y + Sqr(p)Z,
...you specified how Y and Z are calculated, so calculate them separately to keep it simple:
Dim X as Double, Y as Double, Z as Double
Y = Application.WorksheetFunction.NormInv (Rnd(), mean, sd)
Z = Application.WorksheetFunction.NormInv (Rnd(), mean, sd)
Assuming the comma is not supposed to be in the formula, and having no idea what p is, your final code to calculate X is:
X = Sqr(1-p) * Y + Sqr(p) * Z

Scipy.integrate gives odd results; are there best practices?

I am still struggling with scipy.integrate.quad.
Sparing all the details, I have an integral to evaluate. The function is of the form of the integral of a product of functions in x, like so:
Z(k) = f(x) g(k/x) / abs(x)
I know for certain the range of integration is between tow positive numbers. Oddly, when I pick a wide range that I know must contain all values of x that are positive - like integrating from 1 to 10,000,000 - it intgrates fast and gives an answer which looks right. But when I fingure out the exact limits - which I know sice f(x) is zero over a lot of the real line - and use those, I get another answer that is different. They aren't very different, though I know the second is more accurate.
After much fiddling I got it to work OK, but then needed to add in an exopnentiation - I was at least getting a 'smooth' answer for the computed function of z. I had this working in an OK way before I added in the exponentiation (which is needed), but now the function that gets generated (z) becomes more and more oscillatory and peculiar.
Any idea what is happening here? I know this code comes from an old Fortran library, so there must be some known issues, but I can't find references.
Here is the core code:
def normal(x, mu, sigma) :
return (1.0/((2.0*3.14159*sigma**2)**0.5)*exp(-(x-
mu)**2/(2*sigma**2)))
def integrand(x, z, mu, sigma, f) :
return np.exp(normal(z/x, mu, sigma)) * getP(x, f._x, f._y) / abs(x)
for _z in range (int(z_min), int(z_max) + 1, 1000):
z.append(_z)
pResult = quad(integrand, lb, ub,
args=(float(_z), MU-SIGMA**2/2, SIGMA, X),
points = [100000.0],
epsabs = 1, epsrel = .01) # drop error estimate of tuple
p.append(pResult[0]) # drop error estimate of tuple
By the way, getP() returns a linearly interpolated, piecewise continuous,but non-smooth function to give the integrator values that smoothly fit between the discrete 'buckets' of the histogram.
As with many numerical methods, it can be very sensitive to asymptotes, zeros, etc. The only choice is to keep giving it 'hints' if it will accept them.

Python. Finding the Quadratics Equation given Its Roots

Kindly help me write a simple python program that takes the two roots of a quadratics equation as functions and returns the complete quadratics equation..Sort of reversing the quadratics equation
So, you're given two roots, let's call them a and b.
The math is as follows:
y = (x - a)(x - b)
y = x^2 - (a + b)x + ab
(Note that the right side of the equation can be multiplied by an arbitrary number. There is no way to determine what that number is with the given information.)
Try implementing that and come back here with any questions related to your own code.

Reverse Interpolation

I have a class implementing an audio stream that can be read at varying speed (including reverse and fast varying / "scratching")... I use linear interpolation for the read part and everything works quite decently..
But now I want to implement writing to the stream at varying speed as well and that requires me to implement a kind of "reverse interpolation" i.e. Deduce the input sample vector Z that, interpolated with vector Y will produce the output X (which I'm trying to write)..
I've managed to do it for constant speeds, but generalising for varying speeds (e.g accelerating or decelerating) is proving more complicated..
I imagine this problem has been solved repeatedly, but I can't seem to find many clues online, so my specific question is if anyone has heard of this problem and can point me in the right direction (or, even better, show me a solution :)
Thanks!
I would not call it "reverse interpolation" as that does not exists (my first thought was you were talking about extrapolation!). What you are doing is still simply interpolation, just at an uneven rate.
Interpolation: finding a value between known values
Extrapolation: finding a value beyond known values
Interpolating to/from constant rates is indeed much much simpler than the generic quest of "finding a value between known values". I propose 2 solutions.
1) Interpolate to a significantly higher rate, and then just sub-sample to the nearest one (try adding dithering)
2) Solve the generic problem: for each point you need to use the neighboring N points and fit a order N-1 polynomial to them.
N=2 would be linear and would add overtones (C0 continuity)
N=3 could leave you with step changes at the halfway point between your source samples (perhaps worse overtones than N=2!)
N=4 will get you C1 continuity (slope will match as you change to the next sample), surely enough for your application.
Let me explain that last one.
For each output sample use the 2 previous and 2 following input samples. Call them S0 to S3 on a unit time scale (multiply by your sample period later), and you are interpolating from time 0 to 1. Y is your output and Y' is the slope.
Y will be calculated from this polynomial and its differential (slope)
Y(t) = At^3 + Bt^2 + Ct + D
Y'(t) = 3At^2 + 2Bt + C
The constraints (the values and slope at the endpoints on either side)
Y(0) = S1
Y'(0) = (S2-S0)/2
Y(1) = S2
Y'(1) = (S3-S1)/2
Expanding the polynomial
Y(0) = D
Y'(0) = C
Y(1) = A+B+C+D
Y'(1) = 3A+2B+C
Plugging in the Samples
D = S1
C = (S2-S0)/2
A + B = S2 - C - D
3A+2B = (S3-S1)/2 - C
The last 2 are a system of equations that are easily solvable. Subtract 2x the first from the second.
3A+2B - 2(A+B)= (S3-S1)/2 - C - 2(S2 - C - D)
A = (S3-S1)/2 + C - 2(S2 - D)
Then B is
B = S2 - A - C - D
Once you have A, B, C and D you can put in an time 't' in the polynomial to find a sample value between your known samples.
Repeat for every output sample, reuse A,B,C&D if the next output sample is still between the same 2 input samples. Calculating t each time is similar to Bresenham's line algorithm, you're just advancing by a different amount each time.

Resources