[Does anyone know what could be the reason for 0 learnable parameters in lstm cells][1]
The reason this may be the case in your code is because you have freezed the params in your LSTM as their required grad is False.
Or the params in your LSTM may not be calculating gradients as they could have been detached.
Therefore there will be 0 learnable parameters in this case.
Sarthak Jain
Related
what would be the equivalent in Pytorch of the following in tensorflow, where loss is the calculated loss in the iteration of the network and net is the Neural Network.
with tf.GradientTape() as tape:
grads = tape.gradient(loss, net.trainable_variables)
optimizer.apply_gradients(zip(grads, net.trainable_variables))
So, we compute our gradients for all the trainable variables in our network in accordance to the loss function. In the next line we apply the gradients via the optimizer. In the use case I have, this is the way to do it and it works fine.
Now, how would I do the same in Pytorch? I am aware of the "standard" way:
optimizer.zero_grad()
loss.backward()
optimizer.step()
That is however not applicable for me. So how can I apply the gradients "manually". Google doesn't help unfortunately, although I think it is probably a rather simple question.
Hope one of you can enlighten me!
Thanks!
Let's break the standard PyTorch way of doing updates; hopefully, that will clarify what you want.
In Pytorch, each NN parameter has a .data and .grad attribute. .data is ... the actual weight tensor, and .grad is the attribute that will hold the gradient. It is None if the gradient is not computed yet. With this knowledge, let's understand the update steps.
First, we do optimizer.zero_grad(). This zeros out or empties the .grad attribute. .grad may be None already if you never computed the gradients.
Next, we do loss.backward(). This is the backprop step that will compute and update each parameter's .grad attribute.
Once we have gradients, we want to update the weights with some rule (SGD, ADAM, etc.), and we do optimizer.step(). This will iterate over all the parameters and update the weights correctly using the compute .grad attributes.
So, now to apply gradients manually, you can replace the optimizer.step() with a for loop like the below:
for param in model.parameters():
param.data = custom_rule(param.data, param.grad, learning_rate, **any_other_arguments)
and that should do the trick.
I am learning CNN trainable parameters calculation in Keras. I just wonder why we consider filter calculation as trainable parameters? Since the convolution process is a fixed calculation (i.e. matrix multiplication) and there are nothing need to update (trainable). I know there is a formula but why we consider this as trainable parameters. For example: in the first conV2D, image size, say 10x10x1, filter 3 x 3 , 1 filter, the parameters in keras is 10 (3x3+1).
Alex
In your convolution layer there is a 3x3(x1) kernel (the x1 since your image only has a single channel). The values in the convolution layer's kernel are learned parameters. In addition to the kernel itself, the convolution layer (may, it's usually optional) have a learnable bias parameter (that's the +1 in your formula). It's a bit hard to understand from your question, but it looks like in your setup you are asking the layer to learn the parameters for 10 different convolutional kernels (each with a bias) hence the 10(3x3+1) learned parameters.
Need help as I am new to Keras and was reading on dropout and how using dropout can have an impact on loss calculation during training and validation phase. This is because dropout is only present at training time and not validation time, so comparing two losses can be misleading.
Question is
The use of learning_phase_scope(1)
how does it impact validation
What steps to do to correct for testing loss when dropout is used?
It's not only Dropout but BatchNormalization as well that need to be changed or it'll affect validation performance.
If you use keras and just want to get validation loss (and or accuracy or other metrics) then you better use model.evaluate() or add validation_data while model.fit and don't do anything with learning_phase_scope.
The learning_phase_scope(1) means it's for training, 0 is for predict/validate.
Personally I use learning_phase_scope only when I want to train something that not end with simply model.fit (visualize CNN filter) but only once so far in past 3 years.
I'm new with Pytorch and I need a clarification on multiclass classification.
I'm fine-tuning the DenseNet neural network, so it can recognize 3 different classes.
Because it's a multiclass problem, I have to replace the classification layer in this way:
kernelCount = self.densenet121.classifier.in_features
self.densenet121.classifier = nn.Sequential(nn.Linear(kernelCount, 3), nn.Softmax(dim=1))
And use CrossEntropyLoss as the loss function:
loss = torch.nn.CrossEntropyLoss(reduction='mean')
By reading on Pytorch forum, I found that CrossEntropyLoss applys the softmax function on the output of the neural network. Is this true? Should I remove the Softmax activation function from the structure of the network?
And what about the test phase? If it's included, I have to call the softmax function on the output of the model?
Thanks in advance for your help.
Yes, CrossEntropyLoss applies softmax implicitly. You should remove the softmax layer at the end of the network since softmax is not idempotent, therefore applying it twice would be a semantic error.
As far as evaluation/testing goes. Remember that softmax is a monotonically increasing operation (meaning the relative order of outputs doesn't change when you apply it). Therefore the result of argmax before and after softmax will give the same result.
The only time you may want to perform softmax explicitly during evaluation would be if you need the actual confidence value for some reason. If needed you can apply softmax explicitly using torch.softmax on the network output during evaluation.
I am building a simple convolutional network using the Lasagne package and wanted to add a ReLu layer with a simple threshold [max(0,x-threshold)] but could only find rectifiers without a trainable parameter (lasagne.layers.NonlinearityLayer) or that has a parameter that is being multiplied (lasagne.layers.ParametricRectifierLayer). Does this layer exist or am I missing something obvious?
Thank you for any help! Terry
I don't think that exists. With the reason being that you usually have a trainable layer before the relu (e.g. convolutional or fully connected), which has a bias included. Moving the data by some bias is equivalent to having a threshold at the relu.
If you don't have a trainable layer before the relu, you can also explicitly add a lasagne.layers.BiasLayer (http://lasagne.readthedocs.org/en/latest/modules/layers/special.html)
Hope this helps
Michael