I was trying to find, how many epochs was the pretrained Alexnet model (available from torchvision) trained for on Imagenet and also what learning rate was used? I tried checking the checkpoint keys to see if any epoch info was stored.
Any suggestions on how to find it out?
According to this comment on GitHub by a PyTorch team member, most of the training was done with a variant of https://github.com/pytorch/examples/tree/master/imagenet. All the models were trained on Imagenet. According to the file:
The default learning rate schedule starts at 0.1 and decays by a factor of 10 every 30 epochs, though they recommend using 0.01 for Alexnet as initial learning rate.
The default value for epochs is 90.
Related
I am trying sentiment analysis of images.
I have 4 classes - Hilarious , funny very funny not funny.
I tried pre trained models like VGG16/19 densenet201 but my model is overfitting getting training accuracy more than 95% and testing around 30
Can someone give suggestions what else I can try?
Training images - 6K
You can try the following to reduce overfitting:
Implement Early Stopping: compute the validation loss at each epoch and a patience threshold for stopping
Implement Cross Validation: refer to Section Cross-validation in
https://cs231n.github.io/classification/#val
Use Batch Normalisation: normalises the activations of layers to
unit variance and zero mean, improves model generalisation
Use Dropout (either or with batch norm): randomly zeros some activations to incentivise use of all neurons
Also, if your dataset isn't too challenging, make sure you don't use overly complex models and overkill the task.
I have adapted the base transformer model, for my corpus of aligned Arabic-English sentences. As such the model has trained for 40 epochs and accuracy (SparseCategoricalAccuracy) is improving by a factor of 0.0004 for each epoch.
To achieve good results, my estimate is to attain final accuracy anywhere around 0.5 and accuracy after 40 epochs is 0.0592.
I am running the model on the tesla 2 p80 GPU. Each epoch is taking ~2690 sec.
This implies I need at least 600 epochs and training time would be 15-18 days.
Should I continue with the training or is there something wrong in the procedure as the base transformer in the research paper was trained on an ENGLISH-FRENCH corpus?
Key highlights:
Byte-pair(encoding) of sentences
Maxlen_len =100
batch_size= 64
No pre-trained embeddings were used.
Do you mean Tesla K80 on aws p2.xlarge instance.
If that is the case, these gpus are very slow. You should use p3 instances on aws with V100 gpus. You will get around 6-7 times speedup.
Checkout this for more details.
Also, if you are not using the standard model and have made some changes to model or dataset, then try to tune the hyperparameters. Simplest is to try decreasing the learning rate and see if you get better results.
Also, first try to run the standard model with standard dataset to benchmark the time taken in that case and then make your changes and proceed. See when the model starts converging in the standard case. I feel that it should give some results after 40 epochs also.
My keras template is generating a checkpoint for every best time of my training.
However my internet dropped and when loading my last checkpoint and restarting training from last season (using initial_epoch), the accuracy dropped from 89.1 (loaded model value) to 83.6 in the first season of new training. Is this normal behavior when resuming(restarting) a training? Because when my network fell it was already in the 30th season and there was no drop in accuracy, there was also no significant improvement and so did not generate any new checkpoint, forcing me to come back a few epochs.
Thanks in advance for the help.
The problem with saving and retraining is that, when you start retraining from a trained model up to epoch N, at epoch N+1 it does not have the history retained.
Scenario:
You are training a model for 30 epochs. At epoch 15, you have an accuracy of 88% (say you save your model according to the best validation accuracy). Unfortunately, something happens and your training crashes. However, since you trained with checkpoints, you have the resulting model obtained at epoch 15, before your program crashed.
If you start retraining from epoch 15, the previous validation_accuracies(since you now train again "from scratch"), will not be 'remembered anywhere'. If you get at epoch 16 a validation accuracy of 84%, your 'best_model' (with 88% acc) will be overwritten with the epoch 16 model, because there is no saved/internal history data of the prior training/validation accuracies. Under the hood, at a new retraining, 84% will be compared to -inf, therefore it will save the epoch 16 model.
The solution is to either retrain from scratch, or to initialise the second training validation accuracies with a list (manually or obtained from Callback) from the previous training. In this way, the maximum accuracy compared by Keras under the hood at the end of your epoch, would be 88% (in the scenario) not -inf.
I used Bert base uncased as embedding and doing simple cosine similarity for intent classification in my dataset (around 400 classes and 2200 utterances, train:test=80:20). The base BERT model performs 60% accuracy in the test dataset, but different epochs of fine-tuning gave me quite unpredictable results.
This is my setting:
max_seq_length=150
train_batch_size=16
learning_rate=2e-5
These are my experiments:
base model accuracy=0.61
epochs=2.0 accuracy=0.30
epochs=5.0 accuracy=0.26
epochs=10.0 accuracy=0.15
epochs=50.0 accuracy=0.20
epochs=75.0 accuracy=0.92
epochs=100.0 accuracy=0.93
I don't understand while it behaved like this. I expect that any epochs of fine-tuning shouldn't be worse than the base model because I fine-tuned and inferred on the same dataset. Is there anything I misunderstand or should care about?
Well, generally you'll not be able to feed in all the data in your training set at once (I am assuming you have a huge-dataset that you'll have to use mini-batches). Hence, you split it into mini-batches. So, the accuracy that is displayed is strongly infuluenced a lot by the last mini-batch, or the last training step of the epoch.
I am training a model using Keras.
model = Sequential()
model.add(LSTM(units=300, input_shape=(timestep,103), use_bias=True, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(units=536))
model.add(Activation("sigmoid"))
model.compile(loss="binary_crossentropy", optimizer="adam", metrics=["accuracy"])
while True:
history = model.fit_generator(
generator = data_generator(x_[train_indices],
y_[train_indices], batch = batch, timestep=timestep),
steps_per_epoch=(int)(train_indices.shape[0] / batch),
epochs=1,
verbose=1,
validation_steps=(int)(validation_indices.shape[0] / batch),
validation_data=data_generator(
x_[validation_indices],y_[validation_indices], batch=batch,timestep=timestep))
It is a multiouput classification accoriding to scikit-learn.org definition:
Multioutput regression assigns each sample a set of target values.This can be thought of as predicting several properties for each data-point, such as wind direction and magnitude at a certain location.
Thus, it is a recurrent neural network I tried out different timestep sizes. But the result/problem is mostly the same.
After one epoch, my train loss is around 0.0X and my validation loss is around 0.6X. And this values keep stable for the next 10 epochs.
Dataset is around 680000 rows. Training data is 9/10 and validation data is 1/10.
I ask for intuition behind that..
Is my model already over fittet after just one epoch?
Is 0.6xx even a good value for a validation loss?
High level question:
Therefore it is a multioutput classification task (not multi class), I see the only way by using sigmoid an binary_crossentropy. Do you suggest an other approach?
I've experienced this issue and found that the learning rate and batch size have a huge impact on the learning process. In my case, I've done two things.
Reduce the learning rate (try 0.00005)
Reduce the batch size (8, 16, 32)
Moreover, you can try the basic steps for preventing overfitting.
Reduce the complexity of your model
Increase the training data and also balance each sample per class.
Add more regularization (Dropout, BatchNorm)