Calculate the manhattan distances in the presence of missing values - scikit-learn

I want to run a KNNImputer with a manhattan distance as a metric, instead of default nan_euclidean_distance. The function should conform to the definition of _pairwise_callable(X, Y, metric, **kwds) and accept a missing_values keyword in kwds, so sklearn.metrics.pairwise.manhattan_distances won't work. Is there a function that handles that?

Related

minimum the cosine similarity of two tensors and output one scalar. Pytorch

I use Pytorch cosine similarity function as follows. I have two feature vectors and my goal is to make them dissimilar to each other. So, I thought I could minimum their cosine similarity. I have some doubts about the way I have coded. I appreciate your suggestions about the following questions.
I don't know why here are some negative values in val1?
I have done three steps to convert val1 to a scalar. Am I doing it in the right way? Is there any other way?
To minimum the similarity, I have used 1/var1. Is it a standard way to do this? Is it correct if I use 1-var1?
def loss_func(feat1, feat2):
cosine_loss = torch.nn.CosineSimilarity(dim=1, eps=1e-6)
val1 = cosine_loss(feat1, feat2).tolist()
# 1. calculate the absolute values of each element,
# 2. sum all values together,
# 3. divide it by the number of values
val1 = 1/(sum(list(map(abs, val1)))/int(len(val1)))
val1 = torch.tensor(val1, device='cuda', requires_grad=True)
return val1
Do not convert your loss function to a list. This breaks autograd so you won't be able to optimize your model parameters using pytorch.
A loss function is already something to be minimized. If you want to minimize the similarity then you probably just want to return the average cosine similarity. If instead you want minimize the magnitude of the similarity (i.e. encourage the features to be orthogonal) then you can return the average absolute value of cosine similarity.
It seems like what you've implemented will attempt to maximize the similarity. But that doesn't appear to be in line with what you've stated. Also, to turn a minimization problem into an equivalent maximization problem you would usually just negate the measure. There's nothing wrong with a negative loss value. Taking the reciprocal of a strictly positive measure does convert it from minimization to a maximization problem, but also changes the behavior of the measure and probably isn't what you want.
Depending on what you actually want, one of these is likely to meet your needs
import torch.nn.functional as F
def loss_func(feat1, feat2):
# minimize average magnitude of cosine similarity
return F.cosine_similarity(feat1, feat2).abs().mean()
def loss_func(feat1, feat2):
# minimize average cosine similarity
return F.cosine_similarity(feat1, feat2).mean()
def loss_func(feat1, feat2):
# maximize average magnitude of cosine similarity
return -F.cosine_similarity(feat1, feat2).abs().mean()
def loss_func(feat1, feat2):
# maximize average cosine similarity
return -F.cosine_similarity(feat1, feat2).mean()

Which is the error of a value corresponding to the maximum of a function?

This is my problem:
The first input is the observed data of MUSE, which is an astronomical instrument provides cubes, i.e. an image for each wavelength with a certain range. This means that, taken all the wavelengths corresponding to the pixel i,j, I can extract the spectrum for this pixel. Since these images are observed, for each pixel I have an error.
The second input is a spectrum template, i.e. a model of a spectrum. This template is assumed to be without error. I map this spectra at various redshift (this means multiply the wavelenghts for a factor 1+z, where z belong to a certain range).
The core of my code is the cross-correlation between the cube, i.e. the spectra extracted from each pixel, and the template mapped at different redshift. The result is a cross-correlation function for each pixel for each z, let's call this computed function as f(z). Taking, for each pixel, the argmax of f(z), I get the best redshift.
This is a common and widely-used process, indeed, it actually works well.
My question:
Since my input, i.e. the MUSE cube, has an error, I have propagated this error through the cross-correlation, obtaining an error on f(z), i.e. each f_i has a error sigma_i. So, how can I compute the error on z_max, which is the value of z corresponding to the maximum of f?
Maybe a solution could be the implementation of bootstrap method: I can extract, within the error of f, a certain number of function, for each of them I computed the argamx, so i can have an idea about the scatter of z_max.
By the way, I'm using python (3.x) and tensorflow has been used to compute the cross-correlation function.
Thanks!
EDIT
Following #TF_Support suggestion I'm trying to add some code and some figures to better understand the problem. But, before this, maybe it's better a little of math.
With this expression I had computed the cross-correlation:
where S is the spectra, T is the template and N is the normalization coefficient. Since S has an error, I had propagated these errors through the previous relation founding:
where SST_k is the the sum of the template squared and sigma_ij is the error on on S_ij (actually, I should have written sigma_S_ij).
The follow function (implemented with tensorflow 2.1) makes the cross-correlation between one template and the spectra of batch pixels, and computes the error on the cross-correlation function:
#tf.function
def make_xcorr_err1(T, S, sigma_S):
sum_spectra_sq = tf.reduce_sum(tf.square(S), 1) #shape (batch,)
sum_template_sq = tf.reduce_sum(tf.square(T), 0) #shape (Nz, )
norm = tf.sqrt(tf.reshape(sum_spectra_sq, (-1,1))*tf.reshape(sum_template_sq, (1,-1))) #shape (batch, Nz)
xcorr = tf.matmul(S, T, transpose_a = False, transpose_b= False)/norm
foo1 = tf.matmul(sigma_S**2, T**2, transpose_a = False, transpose_b= False)/norm**2
foo2 = xcorr**2 * tf.reshape(sum_template_sq**2, (1,-1)) * tf.reshape(tf.reduce_sum((S*sigma_S)**2, 1), (-1,1))/norm**4
foo3 = - 2 * xcorr * tf.reshape(sum_template_sq, (1,-1)) * tf.matmul(S*(sigma_S)**2, T, transpose_a = False, transpose_b= False)/norm**3
sigma_xcorr = tf.sqrt(tf.maximum(foo1+foo2+foo3, 0.))
Maybe, in order to understand my problem, more important than code is an image representing an output. This is the cross-correlation function for a single pixel, in red the maximum value, let's call z_best, i.e. the best cross-correlated value. The figure also shows the 3 sigma errors (the grey limits are +3sigma -3sigma).
If i zoom-in near the peak, I get this:
As you can see the maximum (as any other value) oscillates within a certain range. I would like to find a way to map this fluctuations of maximum (or the fluctuations around the maximum, or the fluctuations of the whole function) to an error on the value corresponding the maximum, i.e. an error on z_best.

Covariance matrix is NoneType in lmfit Python3-6

I want to fit a f(x,y) function using lmfit. Dataset is small and there are many fitting parameters (6 points on x-axis, 11 points on y-axis and 16 unconstrained fitting parameters). Using all defaults from Model.fit I cannot obtain covariance matrix and during the fitting process the values of free parameters are not being changed at all.
I tried to change initial values for the parameters. However, when I set the same kind of problem in OriginPro Surface Fitting functionality, the Levenberg-Marquardt algorithm manages to fit data and estimate the errors (although quite large-valued for certain parameters). This means that there has to be some problem with my code. I can't find where the problem lies. I'm not Python master.
The MWE is as below.
import numpy as np
from lmfit import Model, Parameters
import numdifftools # not calling this doesn't change anything
x, y = np.array([226.5, 361.05, 404.41, 589, 632.8, 1013.98]), np.linspace(0,100,11)
X, Y = np.meshgrid(x, y)
Z = np.array([[1.3945, 1.34896, 1.34415, 1.33432, 1.33306, 1.32612],\
[1.39422, 1.3487, 1.34389, 1.33408, 1.33282, 1.32591],\
[1.39336, 1.34795, 1.34315, 1.33336, 1.33211, 1.32524],\
[1.39208, 1.34682, 1.34205, 1.3323, 1.33105, 1.32424],\
[1.39046, 1.3454, 1.34065, 1.33095, 1.32972, 1.32296],\
[1.38854, 1.34373, 1.33901, 1.32937, 1.32814, 1.32145],\
[1.38636, 1.34184, 1.33714, 1.32757, 1.32636, 1.31974],\
[1.38395, 1.33974, 1.33508, 1.32559, 1.32438, 1.31784],\
[1.38132, 1.33746, 1.33284, 1.32342, 1.32223, 1.31576],\
[1.37849, 1.33501, 1.33042, 1.32109, 1.31991, 1.31353],\
[1.37547, 1.33239, 1.32784, 1.31861, 1.31744, 1.31114]])
#This has to be defined beforehand (otherwise parameters names are not defined error)
a1,a2,a3,a4 = 1.3208, -1.2325E-5, -1.8674E-6, 5.0233E-9
b1,b2,b3,b4 = 5208.2413, -0.5179, -2.284E-2, 6.9608E-5
c1,c2,c3,c4 = -2.5551E8, -18341.336, -920, 2.7729
d1,d2,d3,d4 = 9.3495, 2E-3, 3.6733E-5, -1.2932E-7
# Function to fit
def model(x, y, *args):
return a1+a2*y+a3*np.power(y,2)+a4*np.power(y,3)+\
(b1+b2*y+b3*np.power(y,2)+b4*np.power(y,3))/np.power(x,2)+\
(c1+c2*y+c3*np.power(y,2)+c4*np.power(y,3))/np.power(x,4)+\
(d1+d2*y+d3*np.power(y,2)+d4*np.power(y,3))/np.power(x,6)
# This is the callable that is passed to Model.fit. M is a (2,N) array
# where N is the total number of data points in Z, which will be ravelled
# to one dimension.
def _model(M, **args):
x, y = M
arr = model(x, y, params)
return arr
# We need to ravel the meshgrids of X, Y points to a pair of 1-D arrays.
xdata = np.vstack((X.ravel(), Y.ravel()))
# Fitting parameters.
fmodel = Model(_model)
params = Parameters()
params.add_many(('a1',1.3208,True,1,np.inf,None,None),\
('a2',-1.2325E-5,True,-np.inf,np.inf,None,None),\
('a3',-1.8674E-6,True,-np.inf,np.inf,None,None),\
('a4',5.0233E-9,True,-np.inf,np.inf,None,None),\
('b1',5208.2413,True,-np.inf,np.inf,None,None),\
('b2',-0.5179,True,-np.inf,np.inf,None,None),\
('b3',-2.284E-2,True,-np.inf,np.inf,None,None),\
('b4',6.9608E-5,True,-np.inf,np.inf,None,None),\
('c1',-2.5551E8,True,-np.inf,np.inf,None,None),\
('c2',-18341.336,True,-np.inf,np.inf,None,None),\
('c3',-920,True,-np.inf,np.inf,None,None),\
('c4',2.7729,True,-np.inf,np.inf,None,None),\
('d1',9.3495,True,-np.inf,np.inf,None,None),\
('d2',2E-3,True,-np.inf,np.inf,None,None),\
('d3',3.6733E-5,True,-np.inf,np.inf,None,None),\
('d4',-1.2932E-7,True,-np.inf,np.inf,None,None))
result = fmodel.fit(Z.ravel(), params, M=xdata)
fit = model(X, Y, result.params)
print(result.covar)
This code results in covariance being NoneType. I expect that it will after all be calculated, because Origin can somehow manage. If it is needed I can provide all parameters from Origin Surface Fitting Parameters.
When plotting Z-fit difference, there is quite large discrepancy for low x-values (not happening in Origin).
You are not defining your model function in a way that can be used sensibly by lmfit. You have:
def _model(M, **args):
x, y = M
arr = model(x, y, params)
return arr
def model(x, y, *args):
return a1+a2*y+a3*np.power(y,2)+a4*np.power(y,3)+\
(b1+b2*y+b3*np.power(y,2)+b4*np.power(y,3))/np.power(x,2)+\
(c1+c2*y+c3*np.power(y,2)+c4*np.power(y,3))/np.power(x,4)+\
(d1+d2*y+d3*np.power(y,2)+d4*np.power(y,3))/np.power(x,6)
model = Model(_model)
Which has a few problems:
args is not used in _model, and params is not defined in the function so will be module-level.
Similarly in model, args is not used and a1, a2, etc will be taken from the module-level (programming) variables and (importantly!!) these will not be updated in the fit.
In short, your model function never sees varying values for the parameters.
lmfit.Model takes the named function arguments and turns those into parameter names. It does not turn **kws or *position_args into parameter names. So I think that what you want to do is write a model function like this:
def model(x, y, a1, a2, a2, a4, b1, b2, b3 ,b3, c1, c2, c3, c4,
d1, d2, d3, d4):
return a1+a2*y+a3*np.power(y,2)+a4*np.power(y,3)+\
(b1+b2*y+b3*np.power(y,2)+b4*np.power(y,3))/np.power(x,2)+\
(c1+c2*y+c3*np.power(y,2)+c4*np.power(y,3))/np.power(x,4)+\
(d1+d2*y+d3*np.power(y,2)+d4*np.power(y,3))/np.power(x,6)
Then create a model from that with:
# Note: don't give a function and Model instance the same name!!
my_model = Model(model, independent_vars=('x', 'y'))
With that model defined you can run the fit, and without having to unravel your data (the independent data in lmfit can be of almost any data type, and data arrays can be multi-dimensional):
result = my_model.fit(Z, params, x=X, y=Y)
For what it is worth, making such changes works for me in the sense that the fit runs to completion. The fit still gets stuck with some of the parameters not updating from their initial values, but that is sort of a separate question from the mechanics of setting up and running the fit, and is probably due to polynomials being pretty unstable or poor initial estimates.
As an aside: np.power(y,n) can be spelled y**n and readability counts. Also, numerical stability is sometimes improved with replaced
a + b*x + c*x**2 + d*x**3
with
a + x*(b + x*(c + x*d))
Though I do not know if that would help in your case.

slicing keras Variable custom objective function

I've been trying to implement a custom objective function in Keras (the negative log likelihood of the normal distribution)
Keras expects one argument for the ground truth tensor, and one for the predictions tensor; for y_pred,I'm passing a tensor that should represent a nx2 matrix where the first column is the mean of the distribution, and the second the precision.
My problem is that I haven't been able to get a clear idea how I properly slice y_pred before passing it into the likelihood function without yielding the error
'Expected an array-like object, but found a Variable: maybe you are trying to call a function on a (possibly shared) variable instead of a numeric array?'
While I understand that I'm feeding l_func arguments of the variable type when it expects an array,I don't seem to be able to grok how to properly split the input y_pred variable into its mean and precision components to plug into the likelihood function. Here are some attempts; if someone could enlighten me about how to proceed, I would greatly appreciate it.
def log_likelihood(y_true,y_pred):
mu = T.vector('mu')
beta = T.vector('beta')
x=T.vector('x')
likelihood = .5*(beta*(x-mu)**2)-T.log(beta/(2*np.pi))
l_func = function([mu,beta,x], likelihood)
return(l_func(y_pred[:,0],y_pred[:,1],y_true))
def log_likelihood(y_true,y_pred):
likelihood = .5*(y_pred[:,1]*(y_true-y_pred[:,0])**2)-T.log(y_pred[:,1]/(2*np.pi))
l_func = function([y_true,y_pred], likelihood)
return(l_func(y_true,y_pred))
def log_likelihood(y_true,y_pred):
mu=y_pred[:,0]
beta=y_pred[:,1]
x=y_true
mu_function=function([y_pred],mu)
beta_function=function([y_pred],beta)
id_function=function([y_true],x)
likelihood = .5*(beta_function(y_pred)*(id_function(y_true)-mu_function(y_pred))**2)-T.log(beta_function(y_pred)/(2*np.pi))
l_func = function([y_true,y_pred], likelihood)
return(l_func(y_true,y_pred))

Clustering by assigning weights to the attributes

I have a data set in excel sheet which I need to cluster it by assigning weights. How can I do it?
You can define a function that computes the distance between two points by attribute weights into account. An example of this would be weighted euclidean distance
Specifically if there are k attributes for each point in your dataset and if the corresponding weights for the attributes are d1,d2,..,dk then distance between two points X and Y is
d(X,Y) = sum(di * (Xi-Yi)^2) i=1,2..k where Xi is the value of ith attribute for the point X.
If the weights are inverse of the variance of the attribute it reduces to mahalanobis distance
http://en.wikipedia.org/wiki/Mahalanobis_distance
Once you define the distance function you can use K-means to cluster your data.

Resources