Fit of increasing sinusoidal function - trigonometry

I want to make a sinusoidal fit to my data. I got it working for a constant amplitude, see the Figure below and my code:
Besides that, my data is also slightly skewed to the right. Is it possible to encorporate this as well in to fit? See below the code on how far I got.
def Fit_Sine(tt, yy):
tt = np.array(tt)
yy = np.array(yy)
ff = np.fft.fftfreq(len(tt), (tt[1]-tt[0])) # assume uniform spacing
Fyy = abs(np.fft.fft(yy))
guess_freq = abs(ff[np.argmax(Fyy[1:])+1]) # excluding the zero frequency "peak", which is related to offset
guess_amp = np.std(yy) * 2.**0.5
guess_offset = np.mean(yy)
guess_phase = 0.
guess_damping = 1.5
guess = np.array([guess_amp, 2.*np.pi*guess_freq, guess_phase])
def sinfunc(t, A, w, p): return A * np.sin(w*t + p)
popt, pcov = scipy.optimize.curve_fit(sinfunc, tt, yy, p0=guess)
A, w, p = popt
f = w/(2.*np.pi)
fitfunc = lambda t: A * np.sin(w*t + p)
res = {"amp": A, "omega": w, "phase": p, "freq": f,
"period": 1./f, "fitfunc": fitfunc, "maxcov": np.max(pcov), "rawres": (guess,popt,pcov)}
x_fit = np.arange(0, len(tt))
fit = res["fitfunc"](x_fit)
return fit
The reason for this is that I want to smooth out the bump in the top:
Other suggestions for achieving this are also welcome.
Regards,
Dante

Related

should I use basinhopping instead of minimize to find colors based on a small set of transforms?

question up front
def shade_func(color, offset):
return tuple([int(c * (1 - offset)) for c in color])
def tint_func(color, offset):
return tuple([int(c + (255 - c) * offset) for c in color])
def tone_func(color, offset):
return tuple([int(c * (1 - offset) + 128 * offset) for c in color])
given an objective over a collection of colors that returns the least distance to a target color, how do I ensure that basinhopping isn't better than minimization in scikit learn?
I was thinking that, for any one color, there will be up to 4 moments in a v-shaped curve, and so only one minimum. if the value with offset zero is itself a minimum, maybe it could be 5. Am I wrong? In any case each is a single optimum, so if we are only searching one color at a time, no reason to use basinhopping.
If we instead use basinhopping to scan all colors at once (we can scale the two different dimensions, in fact this is where the idea of a preprocessor function first came from), it scans them, but does not do such a compelling just of scanning all colors. Some colors it only tries once. I think it might completely skip some colors with large enough sets.
details
I was inspired by way artyclick shows colors and allows searching for them. If you look at an individual color, for example mauve you'll notice that it prominently displays the shades, tints, and tones of the color, rather like an artist might like. If you ask it for the name of a color, it will use a hidden unordered list of about a thousand color names, and some javascript to find the nearest colorname to the color you chose. In fact it will also show alternatives.
I noticed that quite often a shade, tint or tone of an alternative (or even the best match) was often a better match than the color it provided. For those who don't know about shade, tint and tone, there's a nice write up at Dunn Edward's Paints. It looks like shade and tint are the same but with signs reversed, if doing this on tuples representing colors. For tone it is different, a negative value would I think saturate the result.
I felt like there must be authoritative (or at least well sourced) colorname sources it could be using.
In terms of the results, since I want any color or its shade/tint/tone, I want a result like this:
{'color': '#aabbcc',
'offset': {'type': 'tint', 'value': 0.31060384614807254}}
So I can return the actual color name from the color, plus the type of color transform to get there and the amount of you have to go.
For distance of colors, there is a great algorithm that is meant to model human perception that I am using, called CIEDE 2000. Frankly, I'm just using a snippet I found that implements this, it could be wrong.
So now I want to take in two colors, compare their shade, tint, and tone to a target color, and return the one with the least distance. After I am done, I can reconstruct if it was a shade, tint or tone transform from the result just by running all three once and choosing the best fit. With that structure, I can iterate over every color, and that should do it. I use optimization because I don't want to hard code what offsets it should consider (though I am reconsidering this choice now!).
because I want to consider negatives for tone but not for shade/tint, my objective will have to transform that. I have to include two values to optimize, since the objection function will need to know what color to transform (or else the result will give me know way of knowing which color to use the offset with).
so my call should look something like the following:
result = min(minimize(objective, (i,0), bounds=[(i, i), (-1, 1)]) for i in range(len(colors)))
offset_type = resolve_offset_type(result)
with that in mind, I implemented this solution, over the past couple of days.
current solution
from scipy.optimize import minimize
import numpy as np
import math
def clamp(low, x, high):
return max(low, min(x, high))
def hex_to_rgb(hex_color):
hex_color = hex_color.lstrip('#')
return tuple(int(hex_color[i:i+2], 16) for i in (0, 2, 4))
def rgb_to_hex(rgb):
return '#{:02x}{:02x}{:02x}'.format(*rgb)
def rgb_to_lab(color):
# Convert RGB to XYZ color space
R = color[0] / 255.0
G = color[1] / 255.0
B = color[2] / 255.0
R = ((R + 0.055) / 1.055) ** 2.4 if R > 0.04045 else R / 12.92
G = ((G + 0.055) / 1.055) ** 2.4 if G > 0.04045 else G / 12.92
B = ((B + 0.055) / 1.055) ** 2.4 if B > 0.04045 else B / 12.92
X = R * 0.4124 + G * 0.3576 + B * 0.1805
Y = R * 0.2126 + G * 0.7152 + B * 0.0722
Z = R * 0.0193 + G * 0.1192 + B * 0.9505
return (X,Y,Z)
def shade_func(color, offset):
return tuple([int(c * (1 - offset)) for c in color])
def tint_func(color, offset):
return tuple([int(c + (255 - c) * offset) for c in color])
def tone_func(color, offset):
return tuple([int(c * (1 - offset) + 128 * offset) for c in color])
class ColorNameFinder:
def __init__(self, colors, distance=None):
if distance is None:
distance = ColorNameFinder.ciede2000
self.distance = distance
self.colors = [hex_to_rgb(color) for color in colors]
#classmethod
def euclidean(self, left, right):
return (left[0] - right[0]) ** 2 + (left[1] - right[1]) ** 2 + (left[2] - right[2]) ** 2
#classmethod
def ciede2000(self, color1, color2):
# Convert color to LAB color space
lab1 = rgb_to_lab(color1)
lab2 = rgb_to_lab(color2)
# Compute CIE 2000 color difference
C1 = math.sqrt(lab1[1] ** 2 + lab1[2] ** 2)
C2 = math.sqrt(lab2[1] ** 2 + lab2[2] ** 2)
a1 = math.atan2(lab1[2], lab1[1])
a2 = math.atan2(lab2[2], lab2[1])
dL = lab2[0] - lab1[0]
dC = C2 - C1
dA = a2 - a1
dH = 2 * math.sqrt(C1 * C2) * math.sin(dA / 2)
L = 1
C = 1
H = 1
LK = 1
LC = math.sqrt(math.pow(C1, 7) / (math.pow(C1, 7) + math.pow(25, 7)))
LH = math.sqrt(lab1[0] ** 2 + lab1[1] ** 2)
CB = math.sqrt(lab2[1] ** 2 + lab2[2] ** 2)
CH = math.sqrt(C2 ** 2 + dH ** 2)
SH = 1 + 0.015 * CH * LC
SL = 1 + 0.015 * LH * LC
SC = 1 + 0.015 * CB * LC
T = 0.0
if (a1 >= a2 and a1 - a2 <= math.pi) or (a2 >= a1 and a2 - a1 > math.pi):
T = 1
else:
T = 0
dE = math.sqrt((dL / L) ** 2 + (dC / C) ** 2 + (dH / H) ** 2 + T * (dC / SC) ** 2)
return dE
def __factory_objective(self, target, preprocessor=lambda x: x):
def fn(x):
print(x, preprocessor(x))
x = preprocessor(x)
color = self.colors[x[0]]
offset = x[1]
bound_offset = abs(offset)
offsets = [
shade_func(color, bound_offset),
tint_func(color, bound_offset),
tone_func(color, offset)]
least_error = min([(right, self.distance(target, right)) \
for right in offsets], key = lambda x: x[1])[1]
return least_error
return fn
def __resolve_offset_type(self, sample, target, offset):
bound_offset = abs(offset)
shade = shade_func(sample, bound_offset)
tint = tint_func(sample, bound_offset)
tone = tone_func(sample, offset)
lookup = {}
lookup[shade] = "shade"
lookup[tint] = "tint"
lookup[tone] = "tone"
offsets = [shade, tint, tone]
least_error = min([(right, self.distance(target, right)) for right in offsets], key = lambda x: x[1])[0]
return lookup[least_error]
def nearest_color(self, target):
target = hex_to_rgb(target)
preprocessor=lambda x: (int(x[0]), x[1])
result = min(\
[minimize( self.__factory_objective(target, preprocessor=preprocessor),
(i, 0),
bounds=[(i, i), (-1, 1)],
method='Powell') \
for i, color in enumerate(self.colors)], key=lambda x: x.fun)
color_index = int(result.x[0])
nearest_color = self.colors[color_index]
offset = preprocessor(result.x)[1]
offset_type = self.__resolve_offset_type(nearest_color, target, offset)
return {
"color": rgb_to_hex(nearest_color),
"offset": {
"type": offset_type,
"value": offset if offset_type == 'tone' else abs(offset)
}
}
let's demonstrate this with mauve. We'll define a target that is similar to a shade of mauve, include mauve in a list of colors, and ideally we'll get mauve back from our test.
colors = ['#E0B0FF', '#FF0000', '#000000', '#0000FF']
target = '#DFAEFE'
agent = ColorNameFinder(colors)
agent.nearest_color(target)
we do get mauve back:
{'color': '#e0b0ff',
'offset': {'type': 'shade', 'value': 0.0031060384614807254}}
the distance is 0.004991238317138219
agent.distance(hex_to_rgb(target), shade_func(hex_to_rgb(colors[0]), 0.0031060384614807254))
why use Powell's method?
in this arrangement, it is simply the best. No other method that uses bounds did a good job of scanning positives and negatives, and I had mixed results using the preprocessor to scale the values back to negative with bounds of (0,2).
I do notice that in the sample test, a range between about 0.003 and 0.0008 seems to produce the same distance, and that the values my approach considers includes a large number of these. is there a more efficient solution?
If I'm wrong, please let me know.
correctness of the color transformations
what is adding a negative amount of white? (in the case of a tint) I was thinking it is like adding a positive amount of black -- ie a shade, with signs reversed.
my implementation is not correct:
agent.distance(hex_to_rgb(target), shade_func(hex_to_rgb(colors[0]), 0.1)) - agent.distance(hex_to_rgb(target), tint_func(hex_to_rgb(colors[0]), -0.1))
produces 0.3239904390784106 instead of 0.
I'll probably be fixing that soon

Optimizing asymmetrically reweighted penalized least squares smoothing (from matlab to python)

I'm trying to apply the method for baselinining vibrational spectra, which is announced as an improvement over asymmetric and iterative re-weighted least-squares algorithms in the 2015 paper (doi:10.1039/c4an01061b), where the following matlab code was provided:
function z = baseline(y, lambda, ratio)
% Estimate baseline with arPLS in Matlab
N = length(y);
D = diff(speye(N), 2);
H = lambda*D'*D;
w = ones(N, 1);
while true
W = spdiags(w, 0, N, N);
% Cholesky decomposition
C = chol(W + H);
z = C \ (C' \ (w.*y) );
d = y - z;
% make d-, and get w^t with m and s
dn = d(d<0);
m = mean(d);
s = std(d);
wt = 1./ (1 + exp( 2* (d-(2*s-m))/s ) );
% check exit condition and backup
if norm(w-wt)/norm(w) < ratio, break; end
end
that I rewrote into python:
def baseline_arPLS(y, lam, ratio):
# Estimate baseline with arPLS
N = len(y)
k = [numpy.ones(N), -2*numpy.ones(N-1), numpy.ones(N-2)]
offset = [0, 1, 2]
D = diags(k, offset).toarray()
H = lam * numpy.matmul(D.T, D)
w_ = numpy.ones(N)
while True:
W = spdiags(w_, 0, N, N, format='csr')
# Cholesky decomposition
C = cholesky(W + H)
z_ = spsolve(C.T, w_ * y)
z = spsolve(C, z_)
d = y - z
# make d- and get w^t with m and s
dn = d[d<0]
m = numpy.mean(dn)
s = numpy.std(dn)
wt = 1. / (1 + numpy.exp(2 * (d - (2*s-m)) / s))
# check exit condition and backup
norm_wt, norm_w = norm(w_-wt), norm(w_)
if (norm_wt / norm_w) < ratio:
break
w_ = wt
return(z)
Except for the input vector y the method requires parameters lam and ratio and it runs ok for values lam<1.e+07 and ratio>1.e-01, but outputs poor results. When values are changed outside this range, for example lam=1e+07, ratio=1e-02 the CPU starts heating up and job never finishes (I interrupted it after 1min). Also in both cases the following warning shows up:
/usr/local/lib/python3.9/site-packages/scipy/sparse/linalg/dsolve/linsolve.py: 144: SparseEfficencyWarning: spsolve requires A to be CSC or CSR matrix format warn('spsolve requires A to be CSC or CSR format',
although I added the recommended format='csr' option to the spdiags call.
And here's some synthetic data (similar to one in the paper) for testing purposes. The noise was added along with a 3rd degree polynomial baseline The method works well for parameters bl_1 and fails to converge for bl_2:
import numpy
from matplotlib import pyplot
from scipy.sparse import spdiags, diags, identity
from scipy.sparse.linalg import spsolve
from numpy.linalg import cholesky, norm
import sys
x = numpy.arange(0, 1000)
noise = numpy.random.uniform(low=0, high = 10, size=len(x))
poly_3rd_degree = numpy.poly1d([1.2e-06, -1.23e-03, .36, -4.e-04])
poly_baseline = poly_3rd_degree(x)
y = 100 * numpy.exp(-((x-300)/15)**2)+\
200 * numpy.exp(-((x-750)/30)**2)+ \
100 * numpy.exp(-((x-800)/15)**2) + noise + poly_baseline
bl_1 = baseline_arPLS(y, 1e+07, 1e-01)
bl_2 = baseline_arPLS(y, 1e+07, 1e-02)
pyplot.figure(1)
pyplot.plot(x, y, 'C0')
pyplot.plot(x, poly_baseline, 'C1')
pyplot.plot(x, bl_1, 'k')
pyplot.show()
sys.exit(0)
All this is telling me that I'm doing something very non-optimal in my python implementation. Since I'm not knowledgeable enough about the intricacies of scipy computations I'm kindly asking for suggestions on how to achieve convergence in this calculations.
(I encountered an issue in running the "straight" matlab version of the code because the line D = diff(speye(N), 2); truncates the last two rows of the matrix, creating dimension mismatch later in the function. Following the description of matrix D's appearance I substituted this line by directly creating a tridiagonal matrix using the diags function.)
Guided by the comment #hpaulj made, and suspecting that the loop exit wasn't coded properly, I re-visited the paper and found out that the authors actually implemented an exit condition that was not featured in their matlab script. Changing the while loop condition provides an exit for any set of parameters; my understanding is that algorithm is not guaranteed to converge in all cases, which is why this condition is necessary but was omitted by error. Here's the edited version of my python code:
def baseline_arPLS(y, lam, ratio):
# Estimate baseline with arPLS
N = len(y)
k = [numpy.ones(N), -2*numpy.ones(N-1), numpy.ones(N-2)]
offset = [0, 1, 2]
D = diags(k, offset).toarray()
H = lam * numpy.matmul(D.T, D)
w_ = numpy.ones(N)
i = 0
N_iterations = 100
while i < N_iterations:
W = spdiags(w_, 0, N, N, format='csr')
# Cholesky decomposition
C = cholesky(W + H)
z_ = spsolve(C.T, w_ * y)
z = spsolve(C, z_)
d = y - z
# make d- and get w^t with m and s
dn = d[d<0]
m = numpy.mean(dn)
s = numpy.std(dn)
wt = 1. / (1 + numpy.exp(2 * (d - (2*s-m)) / s))
# check exit condition and backup
norm_wt, norm_w = norm(w_-wt), norm(w_)
if (norm_wt / norm_w) < ratio:
break
w_ = wt
i += 1
return(z)

Numpy.linalg.eig is giving different results than numpy.linalg.eigh for Hermitian matrices

I have one hermitian matrix (specifically, a Hamiltonian). Though phase of a singe eigenvector can be arbitrary, the quantities I am calculating is physical (I reduced the code a bit keeping just the reproducible part). eig and eigh are giving very different results.
import numpy as np
import numpy.linalg as nlg
import matplotlib.pyplot as plt
def Ham(Ny, Nx, t, phi):
h = np.zeros((Ny,Ny), dtype=complex)
for ii in range(Ny-1):
h[ii+1,ii] = t
h[Ny-1,0] = t
h=h+np.transpose(np.conj(h))
u = np.zeros((Ny,Ny), dtype=complex)
for ii in range(Ny):
u[ii,ii] = -t*np.exp(-2*np.pi*1j*phi*ii)
u = u + 1e-10*np.eye(Ny)
H = np.kron(np.eye(Nx,dtype=int),h) + np.kron(np.diag(np.ones(Nx-1), 1),u) + np.kron(np.diag(np.ones(Nx-1), -1),np.transpose(np.conj(u)))
H[0:Ny,Ny*(Nx-1):Ny*Nx] = np.transpose(np.conj(u))
H[Ny*(Nx-1):Ny*Nx,0:Ny] = u
x=[]; y=[];
for jj in range (1,Nx+1):
for ii in range (1,Ny+1):
x.append(jj); y.append(ii)
x = np.asarray(x)
y = np.asarray(y)
return H, x, y
def C_num(Nx, Ny, E, t, phi):
H, x, y = Ham(Ny, Nx, t, phi)
ifhermitian = np.allclose(H, np.transpose(np.conj(H)), rtol=1e-5, atol=1e-8)
assert ifhermitian == True
Hp = H
V,wf = nlg.eigh(Hp) ##Check. eig gives different result
idx = np.argsort(np.real(V))
wf = wf[:, idx]
normmat = wf*np.conj(wf)
norm = np.sqrt(np.sum(normmat, axis=0))
wf = wf/(norm*np.sqrt(len(H)))
wf = wf[:, V<=E] ##Chose a subset of eigenvectors
V01 = wf*np.exp(1j*x)[:,None]; V12 = wf*np.exp(1j*y)[:,None]
V23 = wf*np.exp(1j*x)[:,None]; V30 = wf*np.exp(1j*y)[:,None]
wff = np.transpose(np.conj(wf))
C01 = np.dot(wff,V01); C12 = np.dot(wff,V12); C23 = np.dot(wff,V23); C30 = np.dot(wff,V30)
F = nlg.multi_dot([C01,C12,C23,C30])
ifhermitian = np.allclose(F, np.transpose(np.conj(F)), rtol=1e-5, atol=1e-8)
assert ifhermitian == True
evals, efuns = nlg.eig(F) ##Check eig gives different result
C = (1/(2*np.pi))*np.sum(np.angle(evals));
return C
C = C_num(16, 16, 0, 1, 1/8)
print(C)
Changing both nlg.eigh to nlg.eig, or even changing only the last one, giving very different results.
As I mentioned elsewhere, the eigenvalue and eigenvector are not unique.
The only thing that is true is that for each eigenvalue $A v = lambda v$, the two matrices returned by eig and eigh describe those solutions, it is natural that eig inexact but approximate results.
You can see that both the solutions will triangularize your matrix in different ways
H, x, y = Ham(16, 16, 1, 1./8)
D, V = nlg.eig(H)
Dh, Vh = nlg.eigh(H)
Then
import matplotlib.pyplot as plt
plt.figure(figsize=(14, 7))
plt.subplot(121);
plt.imshow(abs(np.conj(Vh.T) # H # Vh))
plt.title('diagonalized with eigh')
plt.subplot(122);
plt.imshow(abs(np.conj(V.T) # H # V))
plt.title('diagonalized with eig')
Plots this
That both diagonalizations were successfull, but the eigenvalues are indifferent order.
If you sort the eigenvalues you see they match
plt.plot(np.diag(np.real(np.conj(Vh.T) # H # Vh)))
plt.plot(np.diag(np.imag(np.conj(Vh.T) # H # Vh)))
plt.plot(np.sort(np.diag(np.real(np.conj(V.T) # H # V))))
plt.title('eigenvalues')
plt.legend(['real eigh', 'imag eigh', 'sorted real eig'], loc='upper left')
Since many eigenvalues are repeated, the eigenvector associated with a given eigenvalue is not unique as well, the only thing we can guarantee is that the eigenvectors for a given eigenvalue must span the same subspace.
The diagonalization test is the best in my opinion.
Is eigh always better than eig?
If you search for the eigenvalues in the lapack routines you will have many options. So it is I cannot discuss each possible implementation here. The common sense says that we can expect that the symmetric/hermitian routines to perform better, otherwise ther would be no reason to add one more routine that is more limited. But I never tested carefully the behavior of eig vs eigh.
To have an intuition compare the equation for tridiagonalization for symmetric matrices, and the equation for reduction of a general matrix to its Heisenberg form found here.

Best-Fit without point interpolation

I have two sets of data. One is nominal form. The other is actual form. The problem is that when I wish to calculate the form error alone. It's a big problem when the two sets of data isn't "on top of each other". That gives errors that also include positional error.
Both curves are read from a series of data. The nominal shape (black) is made up from many different size radius that are tangent to each other. Its the leading edge of an airfoil profile.
I have tried various methods of "Best-Fit" I've found both here and on where ever google took me. But the problem is that they all smooth my "actual" data. So it get modified and is not keeping it's actual form.
Is there any function in scipy or any other python lib that "simply" can fit my two curves together without altering the actual shape?
I wish for the green curve with red dots to lie as much as possible on top of the black.
Might it be possible to calculate the center of gravity of both curves and then move the actual curve in x and y depending on the value difference from the center point? It might not be the ultimate solution, but it would get closer?
Here is a solution assuming that the nominal form can be described as a conic, i.a as solution of the equation ax^2 + by^2 + cxy + dx + ey = 1. Then, a least square fit can be applied to find the coefficients (a, b, c, d, e).
import numpy as np
import matplotlib.pylab as plt
# Generate example data
t = np.linspace(-2, 2.5, 25)
e, theta = 0.5, 0.3 # ratio minor axis/major & orientation angle major axis
c, s = np.cos(theta), np.sin(theta)
x = c*np.cos(t) - s*e*np.sin(t)
y = s*np.cos(t) + c*e*np.sin(t)
# add noise:
xy = 4*np.vstack((x, y))
xy += .08 *np.random.randn(*xy.shape) + np.random.randn(2, 1)
# Least square fit by a generic conic equation
# a*x^2 + b*y^2 + c*x*y + d*x + e*y = 1
x, y = xy
x = x - x.mean()
y = y - y.mean()
M = np.vstack([x**2, y**2, x*y, x, y]).T
b = np.ones_like(x)
# solve M*w = b
w, res, rank, s = np.linalg.lstsq(M, b, rcond=None)
a, b, c, d, e = w
# Get x, y coordinates for the fitted ellipse:
# using polar coordinates
# x = r*cos(theta), y = r*sin(theta)
# for a given theta, the radius is obtained with the 2nd order eq.:
# (a*ct^2 + b*st^2 + c*cs*st)*r^2 + (d*ct + e*st)*r - 1 = 0
# with ct = cos(theta) and st = sin(theta)
theta = np.linspace(-np.pi, np.pi, 97)
ct, st = np.cos(theta), np.sin(theta)
A = a*ct**2 + b*st**2 + c*ct*st
B = d*ct + e*st
D = B**2 + 4*A
radius = (-B + np.sqrt(D))/2/A
# Graph
plt.plot(radius*ct, radius*st, '-k', label='fitted ellipse');
plt.plot(x, y, 'or', label='measured points');
plt.axis('equal'); plt.legend();
plt.xlabel('x'); plt.ylabel('y');

Compute Errors on Fit Parameters In Scipy Curve fit

Using "scipy.optimize.curve_fit" we can determine the fit parameters for a curve fit on x and y using
popt, pcov = curve_fit(func, xdata, ydata)
In the documentation for this function, they state that: To compute one standard deviation errors on the parameters use
perr = np.sqrt(np.diag(pcov))
Here's a link to the documentation I was reading. https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html
What if I want to compute something more general than simply 1 standard deviation on the errors of the parameters? In particular, what If I'm looking for, say, 2 standard deviations (a 95% confidence interval on the parameters).
To be clear, I'm not looking for a 10 line+ solution. I already know how to compute these errors in a "hackish" way for a linear function:
def get_slope_params(data1, data2):
x_mean = mean(data1)
y_mean = mean(data2)
N = len(data1)
sum_xy = 0
for (x, y) in zip(data1, data2):
sum_xy = sum_xy + x*y
sum_xsq = 0
for x in data1:
sum_xsq = sum_xsq + x*x
b = (sum_xy-N*x_mean*y_mean)/(sum_xsq-N*x_mean**2)
a = y_mean - b*x_mean
return (a,b)
# 95%
def get_slope_params_uncertainties(data1, data2):
N = len(data1)
a, b = get_slope_params(data1, data2)
y_approx = a+b*data1
s_eps = 0
for (y, y_app) in zip(data2, y_approx):
s_eps = s_eps + (y-y_app)**2
s_eps = np.sqrt(s_eps/(N-2))
s_x = np.sqrt(cov(data1, data1))
delta_b = (1/np.sqrt(N-1))*(s_eps/s_x)*sp.stats.t.ppf(1-0.05/2, N-2)
delta_a = mean(data1)*delta_b
return delta_a, delta_b
What I'd like is a function already implemented entirely by scipy.

Resources