How can I visualize what happens during loss.backward()? [closed] - pytorch

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I am confident in my understanding of the forward pass of my model, how can I control its backward pass?
This is not a theoretical question about what back-propagation is. The question is a practical one, about whether or not there are tools suited to visualize/track/control what happens during back-propagation.
Ideally, this tool would allow to visualize the structure of the computational graph of the model (a graph of the model's operations), its inputs and its trainable parameters.
Now, I do:
loss.backward()
and I would like to visualize what happens in that step.

There has been been already mention of pytorchviz which lets you visualize the graph.
Here is a small example that might help you to understand how pytorchviz does trace the graph using the grad_fn:
import torch
from torch import nn
d = 5
x = torch.rand(d, requires_grad=True)
print('Tensor x:', x)
y = torch.ones(d, requires_grad=True)
print('Tensor y:', y)
loss = torch.sum(x*y)*3
del x
print()
print('Tracing back tensors:')
def getBack(var_grad_fn):
print(var_grad_fn)
for n in var_grad_fn.next_functions:
if n[0]:
try:
tensor = getattr(n[0], 'variable')
print(n[0])
print('Tensor with grad found:', tensor)
print(' - gradient:', tensor.grad)
print()
except AttributeError as e:
getBack(n[0])
loss.backward()
getBack(loss.grad_fn)
Output:
Tensor x: tensor([0.0042, 0.5376, 0.7436, 0.2737, 0.4848], requires_grad=True)
Tensor y: tensor([1., 1., 1., 1., 1.], requires_grad=True)
Tracing back tensors:
<MulBackward object at 0x1201bada0>
<SumBackward0 object at 0x1201bacf8>
<ThMulBackward object at 0x1201bae48>
<AccumulateGrad object at 0x1201badd8>
Tensor with grad found: tensor([0.0042, 0.5376, 0.7436, 0.2737, 0.4848], requires_grad=True)
- gradient: tensor([3., 3., 3., 3., 3.])
<AccumulateGrad object at 0x1201bad68>
Tensor with grad found: tensor([1., 1., 1., 1., 1.], requires_grad=True)
- gradient: tensor([0.0125, 1.6129, 2.2307, 0.8211, 1.4543])
Further you should definately take a look into how autograd functions (that are used by the backward()-function) are actually work !
Here is a tutorial from the pytorch site with an easy and short example:
PyTorch: Defining New autograd Functions
Hope this helps a bit!

Related

How to get intermediate output grad in Pytorch model

we can get loss of last layer by loss = loss_fn(y_pred, y_true), and results in a loss: Tensor
then we call loss.backward() to do back propagation.
after optimizer.step() we could see updated model.parameters()
taking below example
y = Model1(x) # with optimizer1
z = Model2(y) # with optimizer2
loss = loss_fn(z, z_true)
loss.backward()
optimizer2.optimize() # update Model2 parameters
# in order to update Model1 parameters I think we should do
y.backward(grad_tensor=the_output_gradient_from_Model2)
optimizer1.optimize()
How to get the intermediate back propagation result? e.g. the gradient of output grad, which will be taken by y_pred.backward(grad_tensor=grad).
Update: The solution is setting required_grad=True and take Tensor x.grad. Thanks for the answers.
PS: The scenario is I am doing a federated learning, the model is split into 2 parts. The first part takes input and forward to second part. And it need the second part to calculate the loss and back propagate the loss to first part, so that the first part takes the loss and do its own back propagation.
I will assume you're referring to intermediate gradients when you say "loss of a specific layer".
You can access the gradient of the layer with respect to the output loss by accessing the grad attribute on the parameters of your model which require gradient computation.
Here is a simplistic setup:
>>> f = nn.Sequential(
nn.Linear(10,5),
nn.Linear(5,2),
nn.Linear(2, 2, bias=False),
nn.Sigmoid())
>>> x = torch.rand(3, 10).requires_grad_(True)
>>> f(x).mean().backward()
Navigate through all the parameters per layer:
>>> for n, c in f.named_children():
... for p in c.parameters():
... print(f'<{n}>:{p.grad}')
<0>:tensor([[-0.0054, -0.0034, -0.0028, -0.0058, -0.0073, -0.0066, -0.0037, -0.0044,
-0.0035, -0.0051],
[ 0.0037, 0.0023, 0.0019, 0.0040, 0.0050, 0.0045, 0.0025, 0.0030,
0.0024, 0.0035],
[-0.0016, -0.0010, -0.0008, -0.0017, -0.0022, -0.0020, -0.0011, -0.0013,
-0.0010, -0.0015],
[ 0.0095, 0.0060, 0.0049, 0.0102, 0.0129, 0.0116, 0.0066, 0.0077,
0.0063, 0.0091],
[ 0.0005, 0.0003, 0.0002, 0.0005, 0.0006, 0.0006, 0.0003, 0.0004,
0.0003, 0.0004]])
<0>:tensor([-0.0090, 0.0062, -0.0027, 0.0160, 0.0008])
<1>:tensor([[-0.0035, 0.0035, -0.0026, -0.0106, -0.0002],
[-0.0020, 0.0020, -0.0015, -0.0061, -0.0001]])
<1>:tensor([-0.0289, -0.0166])
<2>:tensor([[0.0355, 0.0420],
[0.0354, 0.0418]])
To supplement gradient related answer(s), it should to say that you can't get the loss of the layer, loss is model level concept, generally, you can't say, which layer is responsible for error. See, if model deep enough one can freeze any model layer, and it can still train to high accuracy.

Keras Metric strange behavior

I am trying to define a Keras metric that returns a rounded value. (Normally K.round() can't be used in the Loss since it's not differentiable but I think it can be used on the Metric).
However despite using K.round() the metric consistently has decimal places, giving me values like 2.0812 and similar. It is worth mentioning that my y_true are 3 value float lists like [1., 3., 5.], [2., 5., 1.] and so on.
To try and understand what is happening I defined a very simple metric.
def simplemetric(y_true, y_pred):
return (y_true)
I was expecting this to give me an error, but it returns values around 2.089 that vary slightly on each epoch but not with batch size (which is 128).
I then tried a different metric.
def simplemetric(y_true, y_pred):
return (K.round(K.sum(y_true)))
This gives me values around 800.142 that vary slightly on each epoch but not with batch number.
As a final test I tried:
def simplemetric(y_true, y_pred):
return (y_true*0 + 10.0)
Which gives me the expected value of 10.0 on every epoch.
So what is happening on the previous cases? Why can't I get a whole number, and what is the meaning of the ~2. and ~800 keras is somehow calculating out of lists like [1., 2., 5.]?

Keras Reshape layer adding an extra dimension?

The Reshape layer is not working how I would expect. In the example below, I think the last line should return a tensor object of shape [5,1]. However an error is thrown, stating that a shape [5] tensor cannot be reshaped into a size [5,5,1] tensor.
>>> from keras.layers import Reshape
>>> from keras import backend as K
>>> import numpy as np
>>> x = K.constant(np.array([1,2,3,4,5]))
>>> K.eval(x)
array([1., 2., 3., 4., 5.], dtype=float32)
>>> Reshape(target_shape=(5,1))(x)
...
ValueError: Cannot reshape a tensor with 5 elements to
shape [5,5,1] (25 elements) for 'reshape_3/Reshape' (op:
'Reshape') with input shapes: [5], [3] and with input
tensors computed as partial shapes: input[1] = [5,5,1].
Can someone kindly explain how the Reshape layer works (i.e. why it's adding the extra dim) and how to do the process of reshaping a vector into a matrix?
Thanks
User Reshape(target_shape=(1,))(x)
The batch_size is implied in the entire model and ignored from the beginning to the end.
If you do want to access the batch size, use a K.reshape(x,(5,1)).
Keras is not supposed to be used without creating a model made entirely of layers.

Equivalent of Keras's binary_crossentropy in PyTorch?

I want to port some code from keras to pytorch, but I cann't find equivalent of Keras's binary_crossentropy in PyTorch. PyTorch's binary_cross_entropy has different behavior with keras's.
import torch
import torch.nn.functional as F
input = torch.tensor([[ 0.6845, 0.2454],
[ 0.7186, 0.3710],
[ 0.3480, 0.3374]])
target = torch.tensor([[ 0., 1.],
[ 1., 1.],
[ 1., 1.]])
F.binary_cross_entropy(input, target, reduce=False)
#tensor([[ 1.1536, 1.4049],
# [ 0.3305, 0.9916],
# [ 1.0556, 1.0865]])
import keras.backend as K
K.eval(K.binary_crossentropy(K.variable(input.detach().numpy()), K.variable(target.detach().numpy())))
#[[11.032836 12.030124]
#[ 4.486187 10.02776 ]
#[10.394435 10.563424]]
Is there anyone know why these two results are different? thanks!
Keras binary crossentropy takes y_true, y_pred, while Pytorch takes them in the opposite order, therefore you need to change the Keras line to
K.eval(K.binary_crossentropy(K.variable(target.detach().numpy()), K.variable(input.detach().numpy())))
In this way you get the correct output:
array([[ 1.15359652, 1.40486574],
[ 0.33045045, 0.99155325],
[ 1.05555284, 1.0864861 ]], dtype=float32)

Scikit-Learn GridSearch custom scoring function

I need to perform kernel pca on a dataset of dimension (5000, 26421) to get a lower dimension representation. To choose the number of components (say k) parameter, I am performing the reduction of the data and reconstruction to the original space and getting the mean square error of the reconstructed and original data for different values of k.
I came across sklearn's gridsearch functionality and want to use it for the above parameter estimation. Since there is no score function for kernel pca, I have implemented a custom scoring function and passing it to Gridsearch.
from sklearn.decomposition.kernel_pca import KernelPCA
from sklearn.model_selection import GridSearchCV
import numpy as np
import math
def scorer(clf, X):
Y1 = clf.inverse_transform(X)
error = math.sqrt(np.mean((X - Y1)**2))
return error
param_grid = [
{'degree': [1, 10], 'kernel': ['poly'], 'n_components': [100, 400, 100]},
{'gamma': [0.001, 0.0001], 'kernel': ['rbf'], 'n_components': [100, 400, 100]},
]
kpca = KernelPCA(fit_inverse_transform=True, n_jobs=30)
clf = GridSearchCV(estimator=kpca, param_grid=param_grid, scoring=scorer)
clf.fit(X)
However, it results in the below error:
/usr/lib64/python2.7/site-packages/sklearn/metrics/pairwise.py in check_pairwise_arrays(X=array([[ 2., 2., 1., ..., 0., 0., 0.],
...., 0., 1., ..., 0., 0., 0.]], dtype=float32), Y=array([[-0.05904257, -0.02796719, 0.00919842, .... 0.00148251, -0.00311711]], dtype=float32), precomp
uted=False, dtype=<type 'numpy.float32'>)
117 "for %d indexed." %
118 (X.shape[0], X.shape[1], Y.shape[0]))
119 elif X.shape[1] != Y.shape[1]:
120 raise ValueError("Incompatible dimension for X and Y matrices: "
121 "X.shape[1] == %d while Y.shape[1] == %d" % (
--> 122 X.shape[1], Y.shape[1]))
X.shape = (1667, 26421)
Y.shape = (112, 100)
123
124 return X, Y
125
126
ValueError: Incompatible dimension for X and Y matrices: X.shape[1] == 26421 while Y.shape[1] == 100
Can someone point out what exactly am I doing wrong?
The syntax of scoring function is incorrect. You only need to pass the predicted and truth values for the classifiers. So this is how you declare your custom scoring function :
def my_scorer(y_true, y_predicted):
error = math.sqrt(np.mean((y_true - y_predicted)**2))
return error
Then you can use make_scorer function in Sklearn to pass it to the GridSearch.Be sure to set the greater_is_better attribute accordingly:
Whether score_func is a score function (default), meaning high is good, or a loss function, meaning low is good. In the latter case, the scorer object will sign-flip the outcome of the score_func.
I am assuming you are calculating an error, so this attribute should set as False, since lesser the error, the better:
from sklearn.metrics import make_scorer
my_func = make_scorer(my_scorer, greater_is_better=False)
Then you pass it to the GridSearch :
GridSearchCV(estimator=my_clf, param_grid=param_grid, scoring=my_func)
Where my_clf is your classifier.
One more thing, I don't think GridSearchCV is exactly what you are looking for. It basically accepts data in the form of train and test splits. But here you only want to transform your input data. You need to use Pipeline in Sklearn. Look at the example mentioned here of combining PCA and GridSearchCV.

Resources