How to calculate goemetric mean along a dimension using Pytorch? Some numbers can be negative. The function must be differentiable.
A known (reasonably) numerically-stable version of the geometric mean is:
import torch
def gmean(input_x, dim):
log_x = torch.log(input_x)
return torch.exp(torch.mean(log_x, dim=dim))
x = torch.Tensor([2.0] * 1000).requires_grad_(True)
print(gmean(x, dim=0))
# tensor(2.0000, grad_fn=<ExpBackward>)
This kind of implementation can be found, for example, in SciPy (see here), which is a quite stable lib.
The implementation above does not handle zeros and negative numbers. Some will argue that the geometric mean with negative numbers is not well-defined, at least when not all of them are negative.
torch.prod() helps:
import torch
x = torch.FloatTensor(3).uniform_().requires_grad_(True)
print(x)
y = x.prod() ** (1.0/x.shape[0])
print(y)
y.backward()
print(x.grad)
# tensor([0.5692, 0.7495, 0.1702], requires_grad=True)
# tensor(0.4172, grad_fn=<PowBackward0>)
# tensor([0.2443, 0.1856, 0.8169])
EDIT: ?what about
y = (x.abs() ** (1.0/x.shape[0]) * x.sign() ).prod()
Related
trying to implement custom MLE for binomial distribution (for learning purpose) stuck with implantation of binomial coefficient in google JAX . there is no analog for scipy.special.binom() implemented.
what shall i use instead ?
The binomial coefficient for general real-valued inputs can be computed in terms of the gamma function, which is available in JAX via jax.scipy.special.gammaln. Here's one way you could define it:
def binom(x, y):
return jnp.exp(gammaln(x + 1) - gammaln(y + 1) - gammaln(x - y + 1))
Here is a (sequential) integer implementation using JAX.
def binom_int_seq(x : int, y : int):
def scan_body(carry, values):
n, d = values
carry = (carry*n)//d
return carry, None
y = max(y, x-y)
nd = jnp.concatenate(
(jnp.arange(y+2, x+1, dtype = 'u8')[:,None],
jnp.arange(2, x-y+1, dtype = 'u8')[:,None],),
axis = 1
)
bc, *_ = jax.lax.scan(scan_body, jnp.array(y+1, dtype = 'u8'), nd)
return bc
binom_int_seq_jit = jax.jit(binom_int_seq, static_argnums = (0, 1))
which gives
x, y = 60, 31
bc_ref = sp.special.comb(x, y, exact=True)
# 114449595062769120
binom_int_seq(x, y)-bc_ref
# DeviceArray(0, dtype=uint64)
# Using above logarithmic gamma function based implementation
binom(x, y)-bc_ref
# DeviceArray(496., dtype=float64, weak_type=True)
Keep in mind the binom_int_seq implementation is only correct if
(x-max(x-y, y))*sp.special.comb(x, y, exact=True) < jnp.iinfo(jnp.uint64).max
Unlike the real-valued version, the error will be sudden and catastrophic if this condition is not satisfied.
There may be other ways to increase this constraint, such as running cancellations based upon prime factorisation, without resorting to larger unsigned integers (/arbitrary precision).
A monoidal version could be implemented which computes the binomial coefficient numerator and denominator reductions then integer divides, but this places stricter constraints on the maximum arguments.
I am trying to compute matrix logarithms in Pytorch but I need to keep tensors because I then apply gradients which means I can't use numpy arrays.
Basically I'm trying to do the equivalent of https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.logm.html but with Pytorch tensors.
Thank you.
Unfortunately the matrix logarithm (unlike the matrix exponential) is not implemented yet, but matrix powers are, this means
in the mean time you can approximate the matrix logarithm by using a the power series expansion, and just truncate it after you get a sufficient accuracy.
Alternatively Lezcano proposes a (slow) solution of a differentiable matrix logarithm via adjoint here. I'll cite their suggested solution:
import scipy.linalg
import torch
def adjoint(A, E, f):
A_H = A.T.conj().to(E.dtype)
n = A.size(0)
M = torch.zeros(2*n, 2*n, dtype=E.dtype, device=E.device)
M[:n, :n] = A_H
M[n:, n:] = A_H
M[:n, n:] = E
return f(M)[:n, n:].to(A.dtype)
def logm_scipy(A):
return torch.from_numpy(scipy.linalg.logm(A.cpu(), disp=False)[0]).to(A.device)
class Logm(torch.autograd.Function):
#staticmethod
def forward(ctx, A):
assert A.ndim == 2 and A.size(0) == A.size(1) # Square matrix
assert A.dtype in (torch.float32, torch.float64, torch.complex64, torch.complex128)
ctx.save_for_backward(A)
return logm_scipy(A)
#staticmethod
def backward(ctx, G):
A, = ctx.saved_tensors
return adjoint(A, G, logm_scipy)
logm = Logm.apply
Easy to implement in Pytorch as follows:
import torch
a=torch.randn(5,10)
cov=torch.cov(a)
u, s, v = torch.linalg.svd(cov)
log_cov=torch.matmul(torch.matmul(u, torch.diag_embed(torch.log(s))), v)
You can easily verify that log_cov and log_cov_np are the same.
log_cov_np=scipy.linalg.logm(cov.detach().numpy())
if cov is singular, you can use the regularization methods to make it have good condtion number for calculating the matrix-log.
I tried using Numpy, Scipy and Scikitlearn, but couldn't find what I need in any of them, basically I need to fit a curve to a dataset, but restricting some of the coefficients to known values, I found how to do it in MATLAB, using fittype, but couldn't do it in python.
In my case I have a dataset of X and Y and I need to find the best fitting curve, I know it's a polynomial of second degree (ax^2 + bx + c) and I know it's values of b and c, so I just needed it to find the value of a.
The solution I found in MATLAB was https://www.mathworks.com/matlabcentral/answers/216688-constraining-polyfit-with-known-coefficients which is the same problem as mine, but with the difference that their polynomial was of degree 5th, how could I do something similar in python?
To add some info: I need to fit a curve to a dataset, so things like scipy.optimize.curve_fit that expects a function won't work (at least as far as I tried).
The tools you have available usually expect functions only inputting their parameters (a being the only unknown in your case), or inputting their parameters and some data (a, x, and y in your case).
Scipy's curve-fit handles that use-case just fine, so long as we hand it a function that it understands. It expects x first and all your parameters as the remaining arguments:
from scipy.optimize import curve_fit
import numpy as np
b = 0
c = 0
def f(x, a):
return c+x*(b+x*a)
x = np.linspace(-5, 5)
y = x**2
# params == [1.]
params, _ = curve_fit(f, x, y)
Alternatively you can reach for your favorite minimization routine. The difference here is that you manually construct the error function so that it only inputs the parameters you care about, and then you don't need to provide that data to scipy.
from scipy.optimize import minimize
import numpy as np
b = 0
c = 0
x = np.linspace(-5, 5)
y = x**2
def error(a):
prediction = c+x*(b+x*a)
return np.linalg.norm(prediction-y)/len(prediction)**.5
result = minimize(error, np.array([42.]))
assert result.success
# params == [1.]
params = result.x
I don't think scipy has a partially applied polynomial fit function built-in, but you could use either of the above ideas to easily build one yourself if you do that kind of thing a lot.
from scipy.optimize import curve_fit
import numpy as np
def polyfit(coefs, x, y):
# build a mapping from null coefficient locations to locations in the function
# coefficients we're passing to curve_fit
#
# idx[j]==i means that unknown_coefs[i] belongs in coefs[j]
_tmp = [i for i,c in enumerate(coefs) if c is None]
idx = {j:i for i,j in enumerate(_tmp)}
def f(x, *unknown_coefs):
# create the entire polynomial's coefficients by filling in the unknown
# values in the right places, using the aforementioned mapping
p = [(unknown_coefs[idx[i]] if c is None else c) for i,c in enumerate(coefs)]
return np.polyval(p, x)
# we're passing an initial value just so that scipy knows how many parameters
# to use
params, _ = curve_fit(f, x, y, np.zeros((sum(c is None for c in coefs),)))
# return all the polynomial's coefficients, not just the few we just discovered
return np.array([(params[idx[i]] if c is None else c) for i,c in enumerate(coefs)])
x = np.linspace(-5, 5)
y = x**2
# (unknown)x^2 + 1x + 0
# params == [1, 0, 0.]
params = fit([None, 0, 0], x, y)
Similar features exist in nearly every mainstream scientific library; you just might need to reshape your problem a bit to frame it in terms of the available primitives.
I am using pytorch to calculate loss for a logistic regression (I know pytorch can do this automatically but I have to make it myself). My function is defined below but the cast to torch.tensor breaks autograd and gives me w.grad = None. Im new to pytorch so Im sorry.
logistic_loss = lambda X,y,w: torch.tensor([torch.log(1 + torch.exp(-y[i] * torch.matmul(w, X[i,:]))) for i in range(X.shape[0])], requires_grad=True)
Your post isn't very clear on details and this is a monster of a one-liner. I first reworked it to make a minimal, complete, verifiable example. Please correct me if I misunderstood your intentions and please do it yourself next time.
import torch
# unroll the one-liner to have an easier time understanding what's going on
def logistic_loss(X, y, w):
elementwise = []
for i in range(X.shape[0]):
mm = torch.matmul(w, X[i, :])
exp = torch.exp(-y[i] * mm)
elementwise.append(torch.log(1 + exp))
return torch.tensor(elementwise, requires_grad=True)
# I assume that's the excepted dimensions of your input
X = torch.randn(5, 30, requires_grad=True)
y = torch.randn(5)
w = torch.randn(30)
# I assume you backpropagate from a reduced version
# of your sum, because you can't call .backward on multi-dimensional
# tensors
loss = logistic_loss(X, y, w).mean()
loss.mean().backward()
print(X.grad)
The simplest solution to your problem is to replace torch.tensor(elementwise, requires_grad=True) with torch.stack(elementwise). You can think of torch.tensor as a constructor for entirely new tensors, if your tensor is more of a result of some mathematical expression, you should use operations like torch.stack or torch.cat.
That being said, this code is still wildly inefficient because you do manual looping over i. Instead, you could write simply
def logistic_loss_vectorized(X, y, w):
mm = torch.matmul(X, w)
exp = torch.exp(-y * mm)
return torch.log(1 + exp)
which is mathematically equivalent, but will be much faster in practice, because it allows for better parallelization due to lack of explicit looping.
Note that there is still a numerical issue with this code - you're taking a logarithm of an exponential, but the intermediate result, called exp, is likely to attain very high values, causing loss of precision. There are workarounds for that, which is why the loss functions provided by PyTorch are preferable.
i tried to do a LR with SKLearn for a rather large dataset with ~600 dummy and only few interval variables (and 300 K lines in my dataset) and the resulting confusion matrix looks suspicious. I wanted to check the significance of the returned coefficients and ANOVA but I cannot find how to access it. Is it possible at all? And what is the best strategy for data that contains lots of dummy variables? Thanks a lot!
Scikit-learn deliberately does not support statistical inference. If you want out-of-the-box coefficients significance tests (and much more), you can use Logit estimator from Statsmodels. This package mimics interface glm models in R, so you could find it familiar.
If you still want to stick to scikit-learn LogisticRegression, you can use asymtotic approximation to distribution of maximum likelihiood estimates. Precisely, for a vector of maximum likelihood estimates theta, its variance-covariance matrix can be estimated as inverse(H), where H is the Hessian matrix of log-likelihood at theta. This is exactly what the function below does:
import numpy as np
from scipy.stats import norm
from sklearn.linear_model import LogisticRegression
def logit_pvalue(model, x):
""" Calculate z-scores for scikit-learn LogisticRegression.
parameters:
model: fitted sklearn.linear_model.LogisticRegression with intercept and large C
x: matrix on which the model was fit
This function uses asymtptics for maximum likelihood estimates.
"""
p = model.predict_proba(x)
n = len(p)
m = len(model.coef_[0]) + 1
coefs = np.concatenate([model.intercept_, model.coef_[0]])
x_full = np.matrix(np.insert(np.array(x), 0, 1, axis = 1))
ans = np.zeros((m, m))
for i in range(n):
ans = ans + np.dot(np.transpose(x_full[i, :]), x_full[i, :]) * p[i,1] * p[i, 0]
vcov = np.linalg.inv(np.matrix(ans))
se = np.sqrt(np.diag(vcov))
t = coefs/se
p = (1 - norm.cdf(abs(t))) * 2
return p
# test p-values
x = np.arange(10)[:, np.newaxis]
y = np.array([0,0,0,1,0,0,1,1,1,1])
model = LogisticRegression(C=1e30).fit(x, y)
print(logit_pvalue(model, x))
# compare with statsmodels
import statsmodels.api as sm
sm_model = sm.Logit(y, sm.add_constant(x)).fit(disp=0)
print(sm_model.pvalues)
sm_model.summary()
The outputs of print() are identical, and they happen to be coefficient p-values.
[ 0.11413093 0.08779978]
[ 0.11413093 0.08779979]
sm_model.summary() also prints a nicely formatted HTML summary.