Confusion about model.train() [duplicate] - pytorch

This question already has answers here:
What does model.train() do in PyTorch?
(6 answers)
Closed 2 years ago.
I am a beginner in pytorch. I saw on github that some deep learning models have model.train(), and some don’t, but they can run normally. I want to know if model.train() is necessary? what's the effect?

train and its counterpart eval switch the model between training and evaluation mode.
In training mode, the tracked gradients are generally updated on each evaluation of the model. This is needed to perform the gradient descent used for training. In evaluation mode, they are not.

train mode or eval mode only matters when you have modules that behave asymmetrically (e.g. BatchNorm, Dropout) in training/testing. I would like to emphasize that it does not affect gradient accumulation at all. Even with asymmetrical modules, one can perfectly train a model in eval mode. Some do this in order to save memory in training using a pretrained ImageNet model.
If you don't have any asymmetrical modules, it does not matter at all.
By default, all modules start with training=True.

Related

Fine Tuning Pretrained Model MobileNet_V3_Large PyTorch

I am trying to add a layer to fine-tune the MobileNet_V3_Large pre-trained model. I looked around at the PyTorch docs but they don't have a tutorials for this specific pre-trained model. I did find that I can fine-tune MobileNet_V2 with:
model_ft =models.mobilenet_v2(pretrained=True,progress=True)
model_ft.classifier[1] = nn.Linear(model_ft.last_channel, out_features=len(class_names))
but I am not sure what the linear layer for MobileNet V3 should look like.
For V3 Large, you should do
model_ft = models.mobilenet_v3_large(pretrained=True, progress=True)
model_ft.classifier[-1] = nn.Linear(1280, your_number_of_classes)
(This would also work for V2, but the code you posted would not work for V3 correctly).
To see the structure of your network, you can just do
print(model_ft.classifier)
or
print(model_ft)
For fine-tuning people often (but not always) freeze all layers except the last one. Again, the layer to not freeze is model_ft.classifier[-1] rather than model_ft.classifier[1].
Whether or not you should freeze layers depends on how much data you have, and is best determined empirically.

How do I properly tune parameters for a neural network? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
How do I tune the parameters for a neural network such as the amount of layers, the types of layers, the width, etc.? Right now I simply guess for good parameters. This becomes very expensive and time consuming for me as I will tune a network and then find out that it didn't do any better than the previous model. Is there better way to tune the model to get a good test and validation score?
It is totally hit and trail method. You have to play around it. There is no particular method to do that. Try to use GPU instead to CPU to compute fast such as "Google Colab". My suggestion is note down all the parameters that can be tunned.
eg:
Optimizer: Try to use different optimizer such as Adam,SGD,many more
learning rate : This is a very crucial parameter, try to change it from .0001 to 0.001 step by 0.0001.
Number of hidden layers : Try to increase no. of hidden layers.
Try to use Batch Normalization or Drop out or both if required.
Use correct Loss funtion.
Change Batch size and Epoch.
Hidden layers, Epochs, batch-size: Try different numbers.
Optimizers: Adam (gives better results), Rmsprop
Dropout: 0.2 works well in most case
Also as a plus, you should also try different activation functions ( like you can use ReLu in the hidden layers and for output layer use sigmoid for binary class classification and softmax for multiclass classification.

Why is accuracy higher with Caffe than with tf.keras?

I converted a model from tf.keras to caffe. When I evaluate the model with Caffe on the test set, I find that the accuracy is higher with caffe than with tf.keras. I can't think of a way to get a hand on the source of the problem (if there's a problem in the first place...)
Is this difference due to the lower-level libraries used for accelerating the computations (I am thinking of cudnn and the caffe engine)? Is there a well-known accuracy problem with the keras module of tensorflow?
By the way, there are other people that have a similar issue:
https://github.com/keras-team/keras/issues/4444
This can happen.
Once you convert your keras .h5 model to .caffemodel, the weights are numerically copied. But, internally you'll load your model to Caffe and not Keras.
As, caffe and keras are two different libraries, their internal algorithms can vary slightly. Also if you change your pre-processing scheme that can change the result too. Usually, if you use pruning (to optimize the size) the performance can go low, in the weird case this can be thought of as an extreme regularization and act as a performance booster in test.

Best Way to Overcome Early Convergence for Machine Learning Model

I have a machine learning model built that tries to predict weather data, and in this case I am doing a prediction on whether or not it will rain tomorrow (a binary prediction of Yes/No).
In the dataset there is about 50 input variables, and I have 65,000 entries in the dataset.
I am currently running a RNN with a single hidden layer, with 35 nodes in the hidden layer. I am using PyTorch's NLLLoss as my loss function, and Adaboost for the optimization function. I've tried many different learning rates, and 0.01 seems to be working fairly well.
After running for 150 epochs, I notice that I start to converge around .80 accuracy for my test data. However, I would wish for this to be even higher. However, it seems like the model is stuck oscillating around some sort of saddle or local minimum. (A graph of this is below)
What are the most effective ways to get out of this "valley" that the model seems to be stuck in?
Not sure why exactly you are using only one hidden layer and what is the shape of your history data but here are the things you can try:
Try more than one hidden layer
Experiment with LSTM and GRU layer and combination of these layers together with RNN.
Shape of your data i.e. the history you look at to predict the weather.
Make sure your features are scaled properly since you have about 50 input variables.
Your question is little ambiguous as you mentioned RNN with a single hidden layer. Also without knowing the entire neural network architecture, it is tough to say how can you bring in improvements. So, I would like to add a few points.
You mentioned that you are using "Adaboost" as the optimization function but PyTorch doesn't have any such optimizer. Did you try using SGD or Adam optimizers which are very useful?
Do you have any regularization term in the loss function? Are you familiar with dropout? Did you check the training performance? Does your model overfit?
Do you have a baseline model/algorithm so that you can compare whether 80% accuracy is good or not?
150 epochs just for a binary classification task looks too much. Why don't you start from an off-the-shelf classifier model? You can find several examples of regression, classification in this tutorial.

Pytorch: Intermediate testing during training

How can I test my pytorch model on validation data during training?
I know that there is the function myNet.eval() which apparantly switches of any dropout layers, but is it also preventing the gradients from being accumulated?
Also how would I undo the myNet.eval() command in order to continue with the training?
If anyone has some code snippet / toy example I would be grateful!
How can I test my pytorch model on validation data during training?
There are plenty examples where there are train and test steps for every epoch during training. An easy one would be the official MNIST example. Since pytorch does not offer any high-level training, validation or scoring framework you have to write it yourself. Commonly this consists of
a data loader (commonly based on torch.utils.dataloader.Dataloader)
a main loop over the total number of epochs
a train() function that uses training data to optimize the model
a test() or valid() function to measure the effectiveness of the model given validation data and a metric
This is also what you will find in the linked example.
Alternatively you can use a framework that provides basic looping and validation facilities so you don't have to implement everything by yourself all the time.
tnt is torchnet for pytorch, supplying you with different metrics (such as accuracy) and abstraction of the train loop. See this MNIST example.
inferno and torchsample attempt to model things very similar to Keras and provide some tools for validation
skorch is a scikit-learn wrapper for pytorch that lets you use all the tools and metrics from sklearn
Also how would I undo the myNet.eval() command in order to continue with the training?
myNet.train() or, alternatively, supply a boolean to switch between eval and training: myNet.train(True) for train mode.
I know that there is the function myNet.eval() which apparantly switches of any dropout layers, but is it also preventing the gradients from being accumulated?
It doesn't prevent gradients from accumulating.
But I think during testing, you do want to ignore gradients. In that case, you should mark the variable input to the network as volatile=True, and it will save some time and space used in forward calculation.
Also how would I undo the myNet.eval() command in order to continue with the training?
myNet.train()

Resources