element-wise operation in pytorch - pytorch

I have two Tensors A and B, A.shape is (b,c,100,100), B.shape is (b,c,80,80),
how can I get tensor C with shape (b,c,21,21) subject to
C[:, :, i, j] = torch.mean(A[:, :, i:i+80, j:j+80] - B)?
I wonder whether there's an efficient way to solve this?
Thanks very much.

You should use an average pool to compute the sliding window mean operation.
It is easy to see that:
mean(A[..., i:i+80, j:j+80] - B) = mean(A[..., i:i+80, j:j+80]) - mean(B)
Using avg_pool2d:
import torch.nn.functional as nnf
C = nnf.avg_pool2d(A, kernel_size=80, stride=1, padding=0) - torch.mean(B, dim=(2,3), keepdim=True)
If you are looking for a more general way of performing sliding window operations in PyTorch, you should look at fold and unfold.

Related

Pytorch bincount with gradient

I am trying to get gradient from sum of some indexes of an array using bincount. However, pytorch does not implement the gradient. This can be implemented by a loop and torch.sum but it is too slow. Is it possible to do this efficiently in pytorch (maybe einsum or index_add)? Of course, we can loop over indexes and add one by one, however that would increase the computational graph size significantly and is very low performance.
import torch
from torch import autograd
import numpy as np
tt = lambda x, grad=True: torch.tensor(x, requires_grad=grad)
inds = tt([1, 5, 7, 1], False).long()
y = tt(np.arange(4) + 0.1).float()
sum_y_section = torch.bincount(inds, y * y, minlength=8)
#sum_y_section = torch.sum(y * y)
grad = autograd.grad(sum_y_section, y, create_graph=True, allow_unused=False)
print("sum_y_section", sum_y_section)
print("grad", grad)
We can use a new feature in Pytorch V1.11 called scatter_reduce.
bincount = lambda inds, arr: torch.scatter_reduce(arr, 0, inds, reduce="sum")
I’d try to use a hook to manipulate the gradient in a custom way

calculate adjusted R2 using GridSearchCV

I am trying to use GridSearchCV with multiple scoring metrics, one of which, the adjusted R2. The latter, as far I am concerned, is not implemented in scikit-learn. I would like to confirm whether my approach is the correct one to implement the adjusted R2.
Using the scores implemented in scikit-learn (in the example below MAE and R2), I can do something like shown below (in this dummy example I am ignoring good practices, like feature scaling and a suitable number of iterations for SVR):
import numpy as np
from sklearn.svm import SVR
from sklearn.metrics import make_scorer
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import r2_score, mean_absolute_error
#generate input
X = np.random.normal(75, 10, (1000, 2))
y = np.random.normal(200, 20, 1000)
#perform grid search
params = {"degree": [2, 3], "max_iter": [10]}
grid = GridSearchCV(SVR(), param_grid=params,
scoring={"MAE": "neg_mean_absolute_error", "R2": "r2"}, refit="R2")
grid.fit(X, y)
The example above will report the MAE and R2 for each cross-validated partition and will refit the best parameters based on the best R2. Following this example, I have attempted to do the same using a custom scorer:
def adj_r2(true, pred, p=2):
'''p is the number of independent variables and n is the sample size'''
n = true.size
return 1 - ((1 - r2_score(true, pred)) * (n - 1))/(n-p-1)
scorer=make_scorer(adj_r2)
grid = GridSearchCV(SVR(), param_grid=params,
scoring={"MAE": "neg_mean_absolute_error", "adj R2": scorer}, refit="adj R2")
grid.fit(X, y)
#print(grid.cv_results_)
The code above appears to generate values for the "adj R2" scorer. I have two questions:
Is the approach used above technically correct coding-wise?
If the approach is correct, how can I define p (number of independent variables) in a dynamic way? As you can see, I had to force a default when defining the function, but I would like to be able to define p in GridSearchCV.
Firstly, adjusted R2 score is not available in sklearn so far because the API of scoring functions just takes y_true and y_pred. Hence, measuring the dimensions of X is out of question.
We can do a work around for SearchCVs.
The scorer needs to have a signature of (estimator, X, y). This has been delivered in the make_scorer here.
I have provided a more simplified version of that here for wrapping the r2 scorer.
def adj_r2(estimator, X, y_true):
n, p = X.shape
pred = estimator.predict(X)
return 1 - ((1 - r2_score(y_true, pred)) * (n - 1))/(n-p-1)
grid = GridSearchCV(SVR(), param_grid=params,
scoring={"MAE": "neg_mean_absolute_error",
"adj R2": adj_r2}, refit="adj R2")
grid.fit(X, y)

Masking and Instance Normalization in PyTorch

Assume I have a PyTorch tensor, arranged as shape [N, C, L] where N is the batch size, C is the number of channels or features, and L is the length. In this case, if one wishes to perform instance normalization, one does something like:
N = 20
C = 100
L = 40
m = nn.InstanceNorm1d(C, affine=True)
input = torch.randn(N, C, L)
output = m(input)
This will perform a normalization in the L-wise dimension for each N*C = 2000 slices of data, subtracting 2000 means, scaling by 2000 standard deviations, and re-scaling by 100 learnable weight and bias parameters (one per channel). The unspoken assumption here is that all of these values exist and are meaningful.
But I have a situation where, for the slice N=1, I would like to exclude all data after (say) L=35. For the slice N=2 (say) all the data are valid. For the slice N=3, exclude all data after L=30, etc. This mimics data which are one dimensional time sequences, having multiple features, but which are not the same length.
How can I perform an instance norm on such data, get correct statistics, and maintain differentiability/AutoGrad information in PyTorch?
Update: While maintaining GPU performance, or at least not killing it dead.
I cannot...
...Mask with zero values, as this destroys the computer means and variances giving erroneous results
...Mask with np.nan or np.inf, as PyTorch tensors do not ignore such values, but treat them as errors. They are sticky, and lead to garbage results. PyTorch currently lacks the equivalent of np.nanmean and np.nanvar.
...Permute or transpose to an amenable arrangement of data; no such approach gives me what I need
...Use a pack_padded_sequence; instance normalization does not operate on that data structure, and one cannot import data into that structure as far as I know. Also, data re-arrangement would still be necessary, see 3 above.
Am I missing an approach which would give me what I need? Or perhaps am I missing a method of data re-arrangement which would allow 3 or 4 above to work?
This is an issue faced by recurrent neural networks all the time, hence the pack_padded_sequence functionality, but it isn't quite applicable here.
I don't think this is directly possible to implement using the existing InstanceNorm1d, the easiest way would probably be implementing it yourself from scratch. I did a quick implementation that should work. To make it a little bit more general this module requires a boolean mask (a boolean tensor of the same size as the input) that specifies which elements should be considered when passing through the instance norm.
import torch
class MaskedInstanceNorm1d(torch.nn.Module):
def __init__(self, num_features, eps=1e-6, momentum=0.1, affine=True, track_running_stats=False):
super().__init__()
self.num_features = num_features
self.eps = eps
self.momentum = momentum
self.affine = affine
self.track_running_stats = track_running_stats
self.gamma = None
self.beta = None
if self.affine:
self.gamma = torch.nn.Parameter(torch.ones((1, self.num_features, 1), requires_grad=True))
self.beta = torch.nn.Parameter(torch.zeros((1, self.num_features, 1), requires_grad=True))
self.running_mean = None
self.running_variance = None
if self.affine:
self.running_mean = torch.zeros((1, self.num_features, 1), requires_grad=True)
self.running_variance = torch.zeros((1, self.num_features, 1), requires_grad=True)
def forward(self, x, mask):
mean = torch.zeros((1, self.num_features, 1), requires_grad=False)
variance = torch.ones((1, self.num_features, 1), requires_grad=False)
# compute masked mean and variance of batch
for c in range(self.num_features):
if mask[:, c, :].any():
mean[0, c, 0] = x[:, c, :][mask[:, c, :]].mean()
variance[0, c, 0] = (x[:, c, :][mask[:, c, :]] - mean[0, c, 0]).pow(2).mean()
# update running mean and variance
if self.training and self.track_running_stats:
for c in range(self.num_features):
if mask[:, c, :].any():
self.running_mean[0, c, 0] = (1-self.momentum) * self.running_mean[0, c, 0] \
+ self.momentum * mean[0, c, 0]
self.running_variance[0, c, 0] = (1-self.momentum) * self.running_variance[0, c, 0] \
+ self.momentum * variance[0, c, 0]
# compute output
x = (x - mean)/(self.eps + variance).sqrt()
if self.affine:
x = x * self.gamma + self.beta
return x

Element wise calculation breaks autograd

I am using pytorch to calculate loss for a logistic regression (I know pytorch can do this automatically but I have to make it myself). My function is defined below but the cast to torch.tensor breaks autograd and gives me w.grad = None. Im new to pytorch so Im sorry.
logistic_loss = lambda X,y,w: torch.tensor([torch.log(1 + torch.exp(-y[i] * torch.matmul(w, X[i,:]))) for i in range(X.shape[0])], requires_grad=True)
Your post isn't very clear on details and this is a monster of a one-liner. I first reworked it to make a minimal, complete, verifiable example. Please correct me if I misunderstood your intentions and please do it yourself next time.
import torch
# unroll the one-liner to have an easier time understanding what's going on
def logistic_loss(X, y, w):
elementwise = []
for i in range(X.shape[0]):
mm = torch.matmul(w, X[i, :])
exp = torch.exp(-y[i] * mm)
elementwise.append(torch.log(1 + exp))
return torch.tensor(elementwise, requires_grad=True)
# I assume that's the excepted dimensions of your input
X = torch.randn(5, 30, requires_grad=True)
y = torch.randn(5)
w = torch.randn(30)
# I assume you backpropagate from a reduced version
# of your sum, because you can't call .backward on multi-dimensional
# tensors
loss = logistic_loss(X, y, w).mean()
loss.mean().backward()
print(X.grad)
The simplest solution to your problem is to replace torch.tensor(elementwise, requires_grad=True) with torch.stack(elementwise). You can think of torch.tensor as a constructor for entirely new tensors, if your tensor is more of a result of some mathematical expression, you should use operations like torch.stack or torch.cat.
That being said, this code is still wildly inefficient because you do manual looping over i. Instead, you could write simply
def logistic_loss_vectorized(X, y, w):
mm = torch.matmul(X, w)
exp = torch.exp(-y * mm)
return torch.log(1 + exp)
which is mathematically equivalent, but will be much faster in practice, because it allows for better parallelization due to lack of explicit looping.
Note that there is still a numerical issue with this code - you're taking a logarithm of an exponential, but the intermediate result, called exp, is likely to attain very high values, causing loss of precision. There are workarounds for that, which is why the loss functions provided by PyTorch are preferable.

pytorch: how to directly find gradient w.r.t. loss

In theano, it was very easy to get the gradient of some variable w.r.t. a given loss:
loss = f(x, w)
dl_dw = tt.grad(loss, wrt=w)
I get that pytorch goes by a different paradigm, where you'd do something like:
loss = f(x, w)
loss.backwards()
dl_dw = w.grad
The thing is I might not want to do a full backwards propagation through the graph - just along the path needed to get to w.
I know you can define Variables with requires_grad=False if you don't want to backpropagate through them. But then you have to decide that at the time of variable-creation (and the requires_grad=False property is attached to the variable, rather than the call which gets the gradient, which seems odd).
My Question is is there some way to backpropagate on demand (i.e. only backpropagate along the path needed to compute dl_dw, as you would in theano)?
It turns out that this is reallyy easy. Just use torch.autograd.grad
Example:
import torch
import numpy as np
from torch.autograd import grad
x = torch.autograd.Variable(torch.from_numpy(np.random.randn(5, 4)))
w = torch.autograd.Variable(torch.from_numpy(np.random.randn(4, 3)), requires_grad=True)
y = torch.autograd.Variable(torch.from_numpy(np.random.randn(5, 3)))
loss = ((x.mm(w) - y)**2).sum()
(d_loss_d_w, ) = grad(loss, w)
assert np.allclose(d_loss_d_w.data.numpy(), (x.transpose(0, 1).mm(x.mm(w)-y)*2).data.numpy())
Thanks to JerryLin for answering the question here.

Resources