Official comment shows that "This has any effect only on modules such as Dropout or BatchNorm." But I don't understand its implementation.
Dropout and BatchNorm (and maybe some custom modules) behave differently during training and evaluation. You must let the model know when to switch to eval mode by calling .eval() on the model.
This sets self.training to False for every module in the model. If you are implementing your own module that must behave differently during training and evaluation, you can check the value of self.training while doing so.
Related
I have read many tutorials on how to use PyTorch to make a regression over a data set, using, for instance, a model composed of several linear layers and the MSE loss.
Well, imagine that I know the function F depends on a variable x and some unknown parameters (p_j: j=0,..., P-1) with P relatively small, but the function is a composition of special function. So, my problem is the classical minimization knowing the data {x_i,y_i}_i<=N
Min_{ {p_j} } Sum_i (F(x_i;{p_j}) - y_i)^2
I would like to know if I can use the PyTorch optimizers and if yes how I can do it?
Thanks.
In fact, PyTorch experts answer that the function to minimized must be expressed in terms of torch.tensors to let the minimizers computing the gradients. So, it is not possible in my case.
In Keras documentation named activations.md, it says "Activations can either be used through an Activation layer, or through the activation argument supported by all forward layers.". Then what is the meaning of forward layers? I think some layers don't have an activation parameter.(ex. Dropout layer)
And "Activations that are more complex than a simple TensorFlow/Theano/CNTK function (eg. learnable activations, which maintain a state) are available as Advanced Activation layers, and can be found in the module keras.layers.advanced_activations. These include PReLU and LeakyReLU.". Then what is the meaning of state in this case?
I am not sure there is a strict definition of "forward layers" in this context, but basically what it means is that the "classic", keras-built-in types of layers comprising one or more sets of weights used to transform an input matrix into an output one have a activation argument. Typically, Dense layers have one, as well as the various kinds of RNN and CNN layers.
It would not make sense for Dropout layers to have an activation function : they simply add a mechanism triggered at training to (hopefully) improve convergence rate and decrease overfitting chances.
As for the idea of "maintaining a state", it refers to activation functions that would not behave independently on each and every fed-in sample, but would instead retain some learnable information (the so-called state). Typically, for a LeakyReLU activation, you could adjust the leak parameter through training (and it would, in the documentation's terminology, be referred to as a state of this activation function).
This a part of the code for a Deconvolutional-Convoltional Generative Adversarial Network (DC-GAN)
discriminator.trainable = False
ganInput = Input(shape=(100,))
# getting the output of the generator
# and then feeding it to the discriminator
# new model = D(G(input))
x = generator(ganInput)
ganOutput = discriminator(x)
gan = Model(input=ganInput, output=ganOutput)
gan.compile(loss='binary_crossentropy', optimizer=Adam())
I do not understand what the line ganInput = Input(shape=(100,)) does.
Clearly ganInput is a variable but what is Input? Is it a function ?
If Input is a function then what will ganInput contain ?
Then it is ganInput is fed into the generator since it is an empty variable (assuming) it will not matter. Next ganOutput catches the output of the discriminator.
Then comes the problem. I read about the Model API but I do not understand fully what it does.
To summarise these are my problems : What is the role of ganInput and what is Input in the second line. And what is Model doing and what is it?
Using Keras with TensorFlow backend
COMPLETE SOURCE CODE : https://github.com/yashk2810/DCGAN-Keras/blob/master/DCGAN.ipynb
Please ask for any more clarification / details required. If you know the answer to even one of my queries I will request you to please answer it will be a huge help. Thanks
What is input: Notice the wildcard import of keras.layers. In context, Input is keras.layers.Input. Generally, if you see a function or class that wasn't defined or explicitly imported in Python, it got there via a wildcard import, i.e.
from keras.layers import *
That means import everything from keras.layers directly into the workspace.
What is model: The model object is essentially the interface for making neural networks with Keras.
You can read about model and keras.layers.Input at the model docs or at this model guide since I'm not very familiar with Keras.
What's going on in the example is they define generator and discriminator as Sequentials. But the GAN model is a little more complex than a standard old Sequential. The authors deal with that by marking the data that needs fed in at every iteration (in this case, just the random noise for the generator - ganInput) as a keras.layers.Input. Then, like you said, ganOutput catches the output of the discriminator. Since we have two distinct Sequentials that need wrapped together, the authors use the model API.
Newbie in keras:
I am trying to understand the syntax used in keras.
Syntax that I am having difficult in understanding is while building a network. I have seen in number of places as also described in following code.
Statements like: current_layer = SOME_CODE(current_layer)
What is the meaning of such a statement? Does it means first the computation described in SOME_CODE is to be followed to the computation described in the current layer?
What is the use of such a syntax and when should one use it? Any advantages and alternatives?
input_layer = keras.layers.Input(
(IMAGE_BORDER_LENGTH, IMAGE_BORDER_LENGTH, NB_CHANNELS))
current_layer = image_mirror_left_right(input_layer)
current_layer = keras.layers.convolutional.Conv2D(
filters=16, "some values " ])
)(current_layer)
def random_image_mirror_left_right(input_layer):
return keras.layers.core.Lambda(function=lambda batch_imgs: tf.map_fn(
lambda img: tf.image.random_flip_left_right(img), batch_imgs
)
)(input_layer)
If you are indeed newbie in Keras, as you say, I would strongly suggest not bothering with such advanced stuff at this stage yet.
The repo you are referring to is a rather advanced and highly non-trivial case of using a specialized library (HyperOpt) for automatic meta-optimizing a Keras model. It involves 'automatic' model building according to some configuration parameters already stored in a Python dictionary...
Additionally, the function you quote goes beyond Keras to involve TensorFlow methods and lambda functions...
The current_layer=SOME_CODE(current_layer) is a typical example of the Keras Functional API; according to my experience, it is less widely used than the more straightforward Sequential API, but it may come handy in some more advanced cases, e.g.:
The Keras functional API is the way to go for defining complex models,
such as multi-output models, directed acyclic graphs, or models with
shared layers. [...] With the functional API, it is easy to re-use
trained models: you can treat any model as if it were a layer, by
calling it on a tensor. Note that by calling a model you aren't just
re-using the architecture of the model, you are also re-using its
weights.
I have three simple questions.
What will happen if my custom loss function is not differentiable? Will pytorch through error or do something else?
If I declare a loss variable in my custom function which will represent the final loss of the model, should I put requires_grad = True for that variable? or it doesn't matter? If it doesn't matter, then why?
I have seen people sometimes write a separate layer and compute the loss in the forward function. Which approach is preferable, writing a function or a layer? Why?
I need a clear and nice explanation to these questions to resolve my confusions. Please help.
Let me have a go.
This depends on what you mean by "non-differentiable". The first definition that makes sense here is that PyTorch doesn't know how to compute gradients. If you try to compute gradients nevertheless, this will raise an error. The two possible scenarios are:
a) You're using a custom PyTorch operation for which gradients have not been implemented, e.g. torch.svd(). In that case you will get a TypeError:
import torch
from torch.autograd import Function
from torch.autograd import Variable
A = Variable(torch.randn(10,10), requires_grad=True)
u, s, v = torch.svd(A) # raises TypeError
b) You have implemented your own operation, but did not define backward(). In this case, you will get a NotImplementedError:
class my_function(Function): # forgot to define backward()
def forward(self, x):
return 2 * x
A = Variable(torch.randn(10,10))
B = my_function()(A)
C = torch.sum(B)
C.backward() # will raise NotImplementedError
The second definition that makes sense is "mathematically non-differentiable". Clearly, an operation which is mathematically not differentiable should either not have a backward() method implemented or a sensible sub-gradient. Consider for example torch.abs() whose backward() method returns the subgradient 0 at 0:
A = Variable(torch.Tensor([-1,0,1]),requires_grad=True)
B = torch.abs(A)
B.backward(torch.Tensor([1,1,1]))
A.grad.data
For these cases, you should refer to the PyTorch documentation directly and dig out the backward() method of the respective operation directly.
It doesn't matter. The use of requires_gradis to avoid unnecessary computations of gradients for subgraphs. If there’s a single input to an operation that requires gradient, its output will also require gradient. Conversely, only if all inputs don’t require gradient, the output also won’t require it. Backward computation is never performed in the subgraphs, where all Variables didn’t require gradients.
Since, there are most likely some Variables (for example parameters of a subclass of nn.Module()), your loss Variable will also require gradients automatically. However, you should notice that exactly for how requires_grad works (see above again), you can only change requires_grad for leaf variables of your graph anyway.
All the custom PyTorch loss functions, are subclasses of _Loss which is a subclass of nn.Module. See here. If you'd like to stick to this convention, you should subclass _Loss when defining your custom loss function. Apart from consistency, one advantage is that your subclass will raise an AssertionError, if you haven't marked your target variables as volatile or requires_grad = False. Another advantage is that you can nest your loss function in nn.Sequential(), because its a nn.Module I would recommend this approach for these reasons.