KL Divergence of two torch.distribution.Distribution objects - pytorch

I'm trying to determine how to compute KL Divergence of two torch.distribution.Distribution objects. I couldn't find a function to do that so far. Here is what I've tried:
import torch as t
from torch import distributions as tdist
import torch.nn.functional as F
def kl_divergence(x: t.distributions.Distribution, y: t.distributions.Distribution):
"""Compute the KL divergence between two distributions."""
return F.kl_div(x, y)
a = tdist.Normal(0, 1)
b = tdist.Normal(1, 1)
print(kl_divergence(a, b)) # TypeError: kl_div(): argument 'input' (position 1) must be Tensor, not Normal

torch.nn.functional.kl_div is computing the KL-divergence loss. The KL-divergence between two distributions can be computed using torch.distributions.kl.kl_divergence.

tdist.Normal(...) will return a normal distribution object, you have to get a sample out of the distribution...
x = a.sample()
y = b.sample()

Related

Is there a way to compute the matrix logarithm of a Pytorch tensor?

I am trying to compute matrix logarithms in Pytorch but I need to keep tensors because I then apply gradients which means I can't use numpy arrays.
Basically I'm trying to do the equivalent of https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.logm.html but with Pytorch tensors.
Thank you.
Unfortunately the matrix logarithm (unlike the matrix exponential) is not implemented yet, but matrix powers are, this means
in the mean time you can approximate the matrix logarithm by using a the power series expansion, and just truncate it after you get a sufficient accuracy.
Alternatively Lezcano proposes a (slow) solution of a differentiable matrix logarithm via adjoint here. I'll cite their suggested solution:
import scipy.linalg
import torch
def adjoint(A, E, f):
A_H = A.T.conj().to(E.dtype)
n = A.size(0)
M = torch.zeros(2*n, 2*n, dtype=E.dtype, device=E.device)
M[:n, :n] = A_H
M[n:, n:] = A_H
M[:n, n:] = E
return f(M)[:n, n:].to(A.dtype)
def logm_scipy(A):
return torch.from_numpy(scipy.linalg.logm(A.cpu(), disp=False)[0]).to(A.device)
class Logm(torch.autograd.Function):
#staticmethod
def forward(ctx, A):
assert A.ndim == 2 and A.size(0) == A.size(1) # Square matrix
assert A.dtype in (torch.float32, torch.float64, torch.complex64, torch.complex128)
ctx.save_for_backward(A)
return logm_scipy(A)
#staticmethod
def backward(ctx, G):
A, = ctx.saved_tensors
return adjoint(A, G, logm_scipy)
logm = Logm.apply
Easy to implement in Pytorch as follows:
import torch
a=torch.randn(5,10)
cov=torch.cov(a)
u, s, v = torch.linalg.svd(cov)
log_cov=torch.matmul(torch.matmul(u, torch.diag_embed(torch.log(s))), v)
You can easily verify that log_cov and log_cov_np are the same.
log_cov_np=scipy.linalg.logm(cov.detach().numpy())
if cov is singular, you can use the regularization methods to make it have good condtion number for calculating the matrix-log.

Curve fitting with known coefficients in Python

I tried using Numpy, Scipy and Scikitlearn, but couldn't find what I need in any of them, basically I need to fit a curve to a dataset, but restricting some of the coefficients to known values, I found how to do it in MATLAB, using fittype, but couldn't do it in python.
In my case I have a dataset of X and Y and I need to find the best fitting curve, I know it's a polynomial of second degree (ax^2 + bx + c) and I know it's values of b and c, so I just needed it to find the value of a.
The solution I found in MATLAB was https://www.mathworks.com/matlabcentral/answers/216688-constraining-polyfit-with-known-coefficients which is the same problem as mine, but with the difference that their polynomial was of degree 5th, how could I do something similar in python?
To add some info: I need to fit a curve to a dataset, so things like scipy.optimize.curve_fit that expects a function won't work (at least as far as I tried).
The tools you have available usually expect functions only inputting their parameters (a being the only unknown in your case), or inputting their parameters and some data (a, x, and y in your case).
Scipy's curve-fit handles that use-case just fine, so long as we hand it a function that it understands. It expects x first and all your parameters as the remaining arguments:
from scipy.optimize import curve_fit
import numpy as np
b = 0
c = 0
def f(x, a):
return c+x*(b+x*a)
x = np.linspace(-5, 5)
y = x**2
# params == [1.]
params, _ = curve_fit(f, x, y)
Alternatively you can reach for your favorite minimization routine. The difference here is that you manually construct the error function so that it only inputs the parameters you care about, and then you don't need to provide that data to scipy.
from scipy.optimize import minimize
import numpy as np
b = 0
c = 0
x = np.linspace(-5, 5)
y = x**2
def error(a):
prediction = c+x*(b+x*a)
return np.linalg.norm(prediction-y)/len(prediction)**.5
result = minimize(error, np.array([42.]))
assert result.success
# params == [1.]
params = result.x
I don't think scipy has a partially applied polynomial fit function built-in, but you could use either of the above ideas to easily build one yourself if you do that kind of thing a lot.
from scipy.optimize import curve_fit
import numpy as np
def polyfit(coefs, x, y):
# build a mapping from null coefficient locations to locations in the function
# coefficients we're passing to curve_fit
#
# idx[j]==i means that unknown_coefs[i] belongs in coefs[j]
_tmp = [i for i,c in enumerate(coefs) if c is None]
idx = {j:i for i,j in enumerate(_tmp)}
def f(x, *unknown_coefs):
# create the entire polynomial's coefficients by filling in the unknown
# values in the right places, using the aforementioned mapping
p = [(unknown_coefs[idx[i]] if c is None else c) for i,c in enumerate(coefs)]
return np.polyval(p, x)
# we're passing an initial value just so that scipy knows how many parameters
# to use
params, _ = curve_fit(f, x, y, np.zeros((sum(c is None for c in coefs),)))
# return all the polynomial's coefficients, not just the few we just discovered
return np.array([(params[idx[i]] if c is None else c) for i,c in enumerate(coefs)])
x = np.linspace(-5, 5)
y = x**2
# (unknown)x^2 + 1x + 0
# params == [1, 0, 0.]
params = fit([None, 0, 0], x, y)
Similar features exist in nearly every mainstream scientific library; you just might need to reshape your problem a bit to frame it in terms of the available primitives.

How to calculate geometric mean in a differentiable way?

How to calculate goemetric mean along a dimension using Pytorch? Some numbers can be negative. The function must be differentiable.
A known (reasonably) numerically-stable version of the geometric mean is:
import torch
def gmean(input_x, dim):
log_x = torch.log(input_x)
return torch.exp(torch.mean(log_x, dim=dim))
x = torch.Tensor([2.0] * 1000).requires_grad_(True)
print(gmean(x, dim=0))
# tensor(2.0000, grad_fn=<ExpBackward>)
This kind of implementation can be found, for example, in SciPy (see here), which is a quite stable lib.
The implementation above does not handle zeros and negative numbers. Some will argue that the geometric mean with negative numbers is not well-defined, at least when not all of them are negative.
torch.prod() helps:
import torch
x = torch.FloatTensor(3).uniform_().requires_grad_(True)
print(x)
y = x.prod() ** (1.0/x.shape[0])
print(y)
y.backward()
print(x.grad)
# tensor([0.5692, 0.7495, 0.1702], requires_grad=True)
# tensor(0.4172, grad_fn=<PowBackward0>)
# tensor([0.2443, 0.1856, 0.8169])
EDIT: ?what about
y = (x.abs() ** (1.0/x.shape[0]) * x.sign() ).prod()

Bad input argument to theano function

I am new to theano. I am trying to implement simple linear regression but my program throws following error:
TypeError: ('Bad input argument to theano function with name "/home/akhan/Theano-Project/uog/theano_application/linear_regression.py:36" at index 0(0-based)', 'Expected an array-like object, but found a Variable: maybe you are trying to call a function on a (possibly shared) variable instead of a numeric array?')
Here is my code:
import theano
from theano import tensor as T
import numpy as np
import matplotlib.pyplot as plt
x_points=np.zeros((9,3),float)
x_points[:,0] = 1
x_points[:,1] = np.arange(1,10,1)
x_points[:,2] = np.arange(1,10,1)
y_points = np.arange(3,30,3) + 1
X = T.vector('X')
Y = T.scalar('Y')
W = theano.shared(
value=np.zeros(
(3,1),
dtype=theano.config.floatX
),
name='W',
borrow=True
)
out = T.dot(X, W)
predict = theano.function(inputs=[X], outputs=out)
y = predict(X) # y = T.dot(X, W) work fine
cost = T.mean(T.sqr(y-Y))
gradient=T.grad(cost=cost,wrt=W)
updates = [[W,W-gradient*0.01]]
train = theano.function(inputs=[X,Y], outputs=cost, updates=updates, allow_input_downcast=True)
for i in np.arange(x_points.shape[0]):
print "iteration" + str(i)
train(x_points[i,:],y_points[i])
sample = np.arange(x_points.shape[0])+1
y_p = np.dot(x_points,W.get_value())
plt.plot(sample,y_p,'r-',sample,y_points,'ro')
plt.show()
What is the explanation behind this error? (didn't got from the error message). Thanks in Advance.
There's an important distinction in Theano between defining a computation graph and a function which uses such a graph to compute a result.
When you define
out = T.dot(X, W)
predict = theano.function(inputs=[X], outputs=out)
you first set up a computation graph for out in terms of X and W. Note that X is a purely symbolic variable, it doesn't have any value, but the definition for out tells Theano, "given a value for X, this is how to compute out".
On the other hand, predict is a theano.function which takes the computation graph for out and actual numeric values for X to produce a numeric output. What you pass into a theano.function when you call it always has to have an actual numeric value. So it simply makes no sense to do
y = predict(X)
because X is a symbolic variable and doesn't have an actual value.
The reason you want to do this is so that you can use y to further build your computation graph. But there is no need to use predict for this: the computation graph for predict is already available in the variable out defined earlier. So you can simply remove the line defining y altogether and then define your cost as
cost = T.mean(T.sqr(out - Y))
The rest of the code will then work unmodified.

scikit learn: how to check coefficients significance

i tried to do a LR with SKLearn for a rather large dataset with ~600 dummy and only few interval variables (and 300 K lines in my dataset) and the resulting confusion matrix looks suspicious. I wanted to check the significance of the returned coefficients and ANOVA but I cannot find how to access it. Is it possible at all? And what is the best strategy for data that contains lots of dummy variables? Thanks a lot!
Scikit-learn deliberately does not support statistical inference. If you want out-of-the-box coefficients significance tests (and much more), you can use Logit estimator from Statsmodels. This package mimics interface glm models in R, so you could find it familiar.
If you still want to stick to scikit-learn LogisticRegression, you can use asymtotic approximation to distribution of maximum likelihiood estimates. Precisely, for a vector of maximum likelihood estimates theta, its variance-covariance matrix can be estimated as inverse(H), where H is the Hessian matrix of log-likelihood at theta. This is exactly what the function below does:
import numpy as np
from scipy.stats import norm
from sklearn.linear_model import LogisticRegression
def logit_pvalue(model, x):
""" Calculate z-scores for scikit-learn LogisticRegression.
parameters:
model: fitted sklearn.linear_model.LogisticRegression with intercept and large C
x: matrix on which the model was fit
This function uses asymtptics for maximum likelihood estimates.
"""
p = model.predict_proba(x)
n = len(p)
m = len(model.coef_[0]) + 1
coefs = np.concatenate([model.intercept_, model.coef_[0]])
x_full = np.matrix(np.insert(np.array(x), 0, 1, axis = 1))
ans = np.zeros((m, m))
for i in range(n):
ans = ans + np.dot(np.transpose(x_full[i, :]), x_full[i, :]) * p[i,1] * p[i, 0]
vcov = np.linalg.inv(np.matrix(ans))
se = np.sqrt(np.diag(vcov))
t = coefs/se
p = (1 - norm.cdf(abs(t))) * 2
return p
# test p-values
x = np.arange(10)[:, np.newaxis]
y = np.array([0,0,0,1,0,0,1,1,1,1])
model = LogisticRegression(C=1e30).fit(x, y)
print(logit_pvalue(model, x))
# compare with statsmodels
import statsmodels.api as sm
sm_model = sm.Logit(y, sm.add_constant(x)).fit(disp=0)
print(sm_model.pvalues)
sm_model.summary()
The outputs of print() are identical, and they happen to be coefficient p-values.
[ 0.11413093 0.08779978]
[ 0.11413093 0.08779979]
sm_model.summary() also prints a nicely formatted HTML summary.

Resources