Fixing two parameters in Gaussian fit using curve_fit - gaussian

I am using a Gaussian model and would like to fix two parameters, mu1, and sigma1.
def gauss_cnst(x, A1, mu1, sigma1, const):
return ((A1*np.exp(-(x-mu1)**2/(2.*sigma1**2)))+const)
This is the fitting code I'm using:
p0_zr = [A, t0, sig, cnst]
dd = np.linspace(min(mjd), max(mjd_alloid_zr_brt_cad), 2000)
coeff_zr, var_matrix_zr = curve_fit(gauss_cnst, mjd,
mag, p0=p0_zr,
sigma=err, method='lm', maxfev = 900000)
coeff_zr_err = np.sqrt(np.diag(var_matrix_zr))
gauss_fit_zr=gauss_cnst(dd, *coeff_zr)
where mjd, mag, and err are my data x, y, and error. In p0_zr array, A and cnst are variables that will be given an initial guess to be estimated by the fit but I would like the Gaussian model to fix t0 and sig parameters as e.g., *t0=59800, sig=2.5.
How can I do that?

Related

Curve fitting with known coefficients in Python

I tried using Numpy, Scipy and Scikitlearn, but couldn't find what I need in any of them, basically I need to fit a curve to a dataset, but restricting some of the coefficients to known values, I found how to do it in MATLAB, using fittype, but couldn't do it in python.
In my case I have a dataset of X and Y and I need to find the best fitting curve, I know it's a polynomial of second degree (ax^2 + bx + c) and I know it's values of b and c, so I just needed it to find the value of a.
The solution I found in MATLAB was https://www.mathworks.com/matlabcentral/answers/216688-constraining-polyfit-with-known-coefficients which is the same problem as mine, but with the difference that their polynomial was of degree 5th, how could I do something similar in python?
To add some info: I need to fit a curve to a dataset, so things like scipy.optimize.curve_fit that expects a function won't work (at least as far as I tried).
The tools you have available usually expect functions only inputting their parameters (a being the only unknown in your case), or inputting their parameters and some data (a, x, and y in your case).
Scipy's curve-fit handles that use-case just fine, so long as we hand it a function that it understands. It expects x first and all your parameters as the remaining arguments:
from scipy.optimize import curve_fit
import numpy as np
b = 0
c = 0
def f(x, a):
return c+x*(b+x*a)
x = np.linspace(-5, 5)
y = x**2
# params == [1.]
params, _ = curve_fit(f, x, y)
Alternatively you can reach for your favorite minimization routine. The difference here is that you manually construct the error function so that it only inputs the parameters you care about, and then you don't need to provide that data to scipy.
from scipy.optimize import minimize
import numpy as np
b = 0
c = 0
x = np.linspace(-5, 5)
y = x**2
def error(a):
prediction = c+x*(b+x*a)
return np.linalg.norm(prediction-y)/len(prediction)**.5
result = minimize(error, np.array([42.]))
assert result.success
# params == [1.]
params = result.x
I don't think scipy has a partially applied polynomial fit function built-in, but you could use either of the above ideas to easily build one yourself if you do that kind of thing a lot.
from scipy.optimize import curve_fit
import numpy as np
def polyfit(coefs, x, y):
# build a mapping from null coefficient locations to locations in the function
# coefficients we're passing to curve_fit
#
# idx[j]==i means that unknown_coefs[i] belongs in coefs[j]
_tmp = [i for i,c in enumerate(coefs) if c is None]
idx = {j:i for i,j in enumerate(_tmp)}
def f(x, *unknown_coefs):
# create the entire polynomial's coefficients by filling in the unknown
# values in the right places, using the aforementioned mapping
p = [(unknown_coefs[idx[i]] if c is None else c) for i,c in enumerate(coefs)]
return np.polyval(p, x)
# we're passing an initial value just so that scipy knows how many parameters
# to use
params, _ = curve_fit(f, x, y, np.zeros((sum(c is None for c in coefs),)))
# return all the polynomial's coefficients, not just the few we just discovered
return np.array([(params[idx[i]] if c is None else c) for i,c in enumerate(coefs)])
x = np.linspace(-5, 5)
y = x**2
# (unknown)x^2 + 1x + 0
# params == [1, 0, 0.]
params = fit([None, 0, 0], x, y)
Similar features exist in nearly every mainstream scientific library; you just might need to reshape your problem a bit to frame it in terms of the available primitives.

Masking and Instance Normalization in PyTorch

Assume I have a PyTorch tensor, arranged as shape [N, C, L] where N is the batch size, C is the number of channels or features, and L is the length. In this case, if one wishes to perform instance normalization, one does something like:
N = 20
C = 100
L = 40
m = nn.InstanceNorm1d(C, affine=True)
input = torch.randn(N, C, L)
output = m(input)
This will perform a normalization in the L-wise dimension for each N*C = 2000 slices of data, subtracting 2000 means, scaling by 2000 standard deviations, and re-scaling by 100 learnable weight and bias parameters (one per channel). The unspoken assumption here is that all of these values exist and are meaningful.
But I have a situation where, for the slice N=1, I would like to exclude all data after (say) L=35. For the slice N=2 (say) all the data are valid. For the slice N=3, exclude all data after L=30, etc. This mimics data which are one dimensional time sequences, having multiple features, but which are not the same length.
How can I perform an instance norm on such data, get correct statistics, and maintain differentiability/AutoGrad information in PyTorch?
Update: While maintaining GPU performance, or at least not killing it dead.
I cannot...
...Mask with zero values, as this destroys the computer means and variances giving erroneous results
...Mask with np.nan or np.inf, as PyTorch tensors do not ignore such values, but treat them as errors. They are sticky, and lead to garbage results. PyTorch currently lacks the equivalent of np.nanmean and np.nanvar.
...Permute or transpose to an amenable arrangement of data; no such approach gives me what I need
...Use a pack_padded_sequence; instance normalization does not operate on that data structure, and one cannot import data into that structure as far as I know. Also, data re-arrangement would still be necessary, see 3 above.
Am I missing an approach which would give me what I need? Or perhaps am I missing a method of data re-arrangement which would allow 3 or 4 above to work?
This is an issue faced by recurrent neural networks all the time, hence the pack_padded_sequence functionality, but it isn't quite applicable here.
I don't think this is directly possible to implement using the existing InstanceNorm1d, the easiest way would probably be implementing it yourself from scratch. I did a quick implementation that should work. To make it a little bit more general this module requires a boolean mask (a boolean tensor of the same size as the input) that specifies which elements should be considered when passing through the instance norm.
import torch
class MaskedInstanceNorm1d(torch.nn.Module):
def __init__(self, num_features, eps=1e-6, momentum=0.1, affine=True, track_running_stats=False):
super().__init__()
self.num_features = num_features
self.eps = eps
self.momentum = momentum
self.affine = affine
self.track_running_stats = track_running_stats
self.gamma = None
self.beta = None
if self.affine:
self.gamma = torch.nn.Parameter(torch.ones((1, self.num_features, 1), requires_grad=True))
self.beta = torch.nn.Parameter(torch.zeros((1, self.num_features, 1), requires_grad=True))
self.running_mean = None
self.running_variance = None
if self.affine:
self.running_mean = torch.zeros((1, self.num_features, 1), requires_grad=True)
self.running_variance = torch.zeros((1, self.num_features, 1), requires_grad=True)
def forward(self, x, mask):
mean = torch.zeros((1, self.num_features, 1), requires_grad=False)
variance = torch.ones((1, self.num_features, 1), requires_grad=False)
# compute masked mean and variance of batch
for c in range(self.num_features):
if mask[:, c, :].any():
mean[0, c, 0] = x[:, c, :][mask[:, c, :]].mean()
variance[0, c, 0] = (x[:, c, :][mask[:, c, :]] - mean[0, c, 0]).pow(2).mean()
# update running mean and variance
if self.training and self.track_running_stats:
for c in range(self.num_features):
if mask[:, c, :].any():
self.running_mean[0, c, 0] = (1-self.momentum) * self.running_mean[0, c, 0] \
+ self.momentum * mean[0, c, 0]
self.running_variance[0, c, 0] = (1-self.momentum) * self.running_variance[0, c, 0] \
+ self.momentum * variance[0, c, 0]
# compute output
x = (x - mean)/(self.eps + variance).sqrt()
if self.affine:
x = x * self.gamma + self.beta
return x

Can I use scipy.optimize with numpy.where?

I am trying to fit a box-like function with scipy.optimize, and that function is defined with a numpy.where instruction. But the result is a covariance matrix filled with "inf" entries
The function I am trying to fit is box-like: it is equal to F0 outside an "event" and F0-A inside an "event":
np.where(np.absolute(miu-x)<=omega/2.0,F0-A,F0)
I have tried defining it as a branch function and through numpy.where. In both cases, the end result is not what I intend, as it always gives the following warning:
OptimizeWarning: Covariance of the parameters could not be estimated
def func(x, F0, A, miu, omega):
return np.where(np.absolute(miu-x)<=omega/2.0,F0-A,F0)
popt, pcov = curve_fit(func, xdata, ydata, p0 = pri_values, sigma = sigmas)
# xdata, ydata and p0 are obtained from other files
I expect as an output an array of 4 elements with the best fit for each parameter (popt) as well as a 4x4 covariance matrix (pcov). But my result does not converge to the expected values, and the covariance matrix is filled with "inf".

curve_fit with polynomials of variable length

I'm new to python (and programming in general) and want to make a polynomial fit using curve_fit, where the order of the polynomials (or the number of fit parameters) is variable.
I made this code which is working for a fixed number of 3 parameters a,b,c
# fit function
def fit_func(x, a,b,c):
p = np.polyval([a,b,c], x)
return p
# do the fitting
popt, pcov = curve_fit(fit_func, x_data, y_data)
But now I'd like to have my fit function to only depend on a number N of parameters instead of a,b,c,....
I'm guessing that's not a very hard thing to do, but because of my limited knowledge I can't get it work.
I've already looked at this question, but I wasn't able to apply it to my problem.
You can define the function to be fit to your data like this:
def fit_func(x, *coeffs):
y = np.polyval(coeffs, x)
return y
Then, when you call curve_fit, set the argument p0 to the initial guess of the polynomial coefficients. For example, this plot is generated by the script that follows.
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
# Generate a sample input dataset for the demonstration.
x = np.arange(12)
y = np.cos(0.4*x)
def fit_func(x, *coeffs):
y = np.polyval(coeffs, x)
return y
fit_results = []
for n in range(2, 6):
# The initial guess of the parameters to be found by curve_fit.
# Warning: in general, an array of ones might not be a good enough
# guess for `curve_fit`, but in this example, it works.
p0 = np.ones(n)
popt, pcov = curve_fit(fit_func, x, y, p0=p0)
# XXX Should check pcov here, but in this example, curve_fit converges.
fit_results.append(popt)
plt.plot(x, y, 'k.', label='data')
xx = np.linspace(x.min(), x.max(), 100)
for p in fit_results:
yy = fit_func(xx, *p)
plt.plot(xx, yy, alpha=0.6, label='n = %d' % len(p))
plt.legend(framealpha=1, shadow=True)
plt.grid(True)
plt.xlabel('x')
plt.show()
The parameters of polyval specify p is an array of coefficients from the highest to lowest. With x being a number or array of numbers to evaluate the polynomial at. It says, the following.
If p is of length N, this function returns the value:
p[0]*x**(N-1) + p[1]*x**(N-2) + ... + p[N-2]*x + p[N-1]
def fit_func(p,x):
z = np.polyval(p,x)
return z
e.g.
t= np.array([3,4,5,3])
y = fit_func(t,5)
503
which is if you do the math here is right.

How do I SelectKBest using mutual information from a mixture of discrete and continuous features?

I am using scikit learn to train a classification model. I have both discrete and continuous features in my training data. I want to do feature selection using maximum mutual information. If I have vectors x and labels y and the first three feature values are discrete I can get the MMI values like so:
mutual_info_classif(x, y, discrete_features=[0, 1, 2])
Now I'd like to use the same mutual information selection in a pipeline. I'd like to do something like this
SelectKBest(score_func=mutual_info_classif).fit(x, y)
but there's no way to pass the discrete features mask to SelectKBest. Is there some syntax to do this that I'm overlooking, or do I have to write my own score function wrapper?
Unfortunately I could not find this functionality for the SelectKBest.
But what we can do easily is extend the SelectKBest as our custom class to override the fit() method which will be called.
This is the current fit() method of SelectKBest (taken from source at github)
# No provision for extra parameters here
def fit(self, X, y):
X, y = check_X_y(X, y, ['csr', 'csc'], multi_output=True)
....
....
# Here only the X, y are passed to scoring function
score_func_ret = self.score_func(X, y)
....
....
self.scores_ = np.asarray(self.scores_)
return self
Now we will define our new class SelectKBestCustom with the changed fit(). I have copied everything from the above source, changing only two lines (commented about it):
from sklearn.utils import check_X_y
class SelectKBestCustom(SelectKBest):
# Changed here
def fit(self, X, y, discrete_features='auto'):
X, y = check_X_y(X, y, ['csr', 'csc'], multi_output=True)
if not callable(self.score_func):
raise TypeError("The score function should be a callable, %s (%s) "
"was passed."
% (self.score_func, type(self.score_func)))
self._check_params(X, y)
# Changed here also
score_func_ret = self.score_func(X, y, discrete_features)
if isinstance(score_func_ret, (list, tuple)):
self.scores_, self.pvalues_ = score_func_ret
self.pvalues_ = np.asarray(self.pvalues_)
else:
self.scores_ = score_func_ret
self.pvalues_ = None
self.scores_ = np.asarray(self.scores_)
return self
This can be called simply like:
clf = SelectKBestCustom(mutual_info_classif,k=2)
clf.fit(X, y, discrete_features=[0, 1, 2])
Edit:
The above solution can be useful in pipelines also, and the discrete_features parameter can be assigned different values when calling fit().
Another Solution (less preferable):
Still, if you just need to work SelectKBest with mutual_info_classif, temporarily (just analysing the results), we can also make a custom function which can call mutual_info_classif internally with hard coded discrete_features. Something along the lines of:
def mutual_info_classif_custom(X, y):
# To change discrete_features,
# you need to redefine the function each time
# Because once the func def is supplied to selectKBest, it cant be changed
discrete_features = [0, 1, 2]
return mutual_info_classif(X, y, discrete_features)
Usage of the above function:
selector = SelectKBest(mutual_info_classif_custom).fit(X, y)
You could also use partials as follows:
from functools import partial
discrete_mutual_info_classif = partial(mutual_info_classif, iscrete_features=[0, 1, 2])
SelectKBest(score_func=discrete_mutual_info_classif).fit(x, y)

Resources