Pytorch model.train() and model.eval() behave in a weird way

Pytorch model.train() and model.eval() behave in a weird way - pytorch

My model is a CNN based one with multiple BN layers and DO layers.
So originally, I accidentally put model.train() outside of the loop just like the following:
model.train()
for e in range(num_epochs):
# train model
model.eval()
# eval model
For the record, the code above trained well and performed decently on validation set:
[CV:02][E:001][I:320/320] avg. Loss: 0.460897, avg. Acc: 0.742746, test. acc: 0.708046(max: 0.708046)
[CV:02][E:002][I:320/320] avg. Loss: 0.389883, avg. Acc: 0.798791, test. acc: 0.823563(max: 0.823563)
[CV:02][E:003][I:320/320] avg. Loss: 0.319034, avg. Acc: 0.825559, test. acc: 0.834914(max: 0.834914)
[CV:02][E:004][I:320/320] avg. Loss: 0.301322, avg. Acc: 0.834254, test. acc: 0.834052(max: 0.834914)
[CV:02][E:005][I:320/320] avg. Loss: 0.292184, avg. Acc: 0.839575, test. acc: 0.835201(max: 0.835201)
[CV:02][E:006][I:320/320] avg. Loss: 0.285467, avg. Acc: 0.842266, test. acc: 0.837931(max: 0.837931)
[CV:02][E:007][I:320/320] avg. Loss: 0.279607, avg. Acc: 0.844917, test. acc: 0.829885(max: 0.837931)
[CV:02][E:008][I:320/320] avg. Loss: 0.275252, avg. Acc: 0.846443, test. acc: 0.827874(max: 0.837931)
[CV:02][E:009][I:320/320] avg. Loss: 0.270719, avg. Acc: 0.848150, test. acc: 0.822989(max: 0.837931)
While reviewing the code, however, I realized I made a mistake because the code above would turn off BN layers and DO layers after the first iteration.
So, I moved the line: model.train() inside the loop:
for e in range(num_epochs):
model.train()
#train model
model.eval()
#eval model
At this point, the model was learning relatively poorly(seemed like the model overfit as you can see in the following output). It had higher training accuracy, but significantly lower accuracy on validation set(it was starting to get weird at this point considering the usual effect of BN and DO):
[CV:02][E:001][I:320/320] avg. Loss: 0.416946, avg. Acc: 0.750477, test. acc: 0.689080(max: 0.689080)
[CV:02][E:002][I:320/320] avg. Loss: 0.329121, avg. Acc: 0.798992, test. acc: 0.690948(max: 0.690948)
[CV:02][E:003][I:320/320] avg. Loss: 0.305688, avg. Acc: 0.829053, test. acc: 0.719540(max: 0.719540)
[CV:02][E:004][I:320/320] avg. Loss: 0.290048, avg. Acc: 0.840539, test. acc: 0.741954(max: 0.741954)
[CV:02][E:005][I:320/320] avg. Loss: 0.279873, avg. Acc: 0.848872, test. acc: 0.745833(max: 0.745833)
[CV:02][E:006][I:320/320] avg. Loss: 0.270934, avg. Acc: 0.854274, test. acc: 0.742960(max: 0.745833)
[CV:02][E:007][I:320/320] avg. Loss: 0.263515, avg. Acc: 0.856945, test. acc: 0.741667(max: 0.745833)
[CV:02][E:008][I:320/320] avg. Loss: 0.256854, avg. Acc: 0.858672, test. acc: 0.734483(max: 0.745833)
[CV:02][E:009][I:320/320] avg. Loss: 0.252013, avg. Acc: 0.861363, test. acc: 0.723707(max: 0.745833)
[CV:02][E:010][I:320/320] avg. Loss: 0.245525, avg. Acc: 0.865519, test. acc: 0.711494(max: 0.745833)
So I thought to myself: "I guess BN layers and DO layers are having negative effects in my model" and hence, removed them. However, the model didn't perform well with BN and DO layers removed either(In fact, the model didn't seem to be learning anything):
[CV:02][E:001][I:320/320] avg. Loss: 0.552687, avg. Acc: 0.749071, test. acc: 0.689080(max: 0.689080)
[CV:02][E:002][I:320/320] avg. Loss: 0.506234, avg. Acc: 0.749071, test. acc: 0.689080(max: 0.689080)
[CV:02][E:003][I:320/320] avg. Loss: 0.503373, avg. Acc: 0.749071, test. acc: 0.689080(max: 0.689080)
[CV:02][E:004][I:320/320] avg. Loss: 0.502966, avg. Acc: 0.749071, test. acc: 0.689080(max: 0.689080)
[CV:02][E:005][I:320/320] avg. Loss: 0.502870, avg. Acc: 0.749071, test. acc: 0.689080(max: 0.689080)
[CV:02][E:006][I:320/320] avg. Loss: 0.502832, avg. Acc: 0.749071, test. acc: 0.689080(max: 0.689080)
[CV:02][E:007][I:320/320] avg. Loss: 0.502800, avg. Acc: 0.749071, test. acc: 0.689080(max: 0.689080)
[CV:02][E:008][I:320/320] avg. Loss: 0.502765, avg. Acc: 0.749071, test. acc: 0.689080(max: 0.689080)
I was very confused at this point. I went further and carried out another experiment. I placed BN layers and DO layers back into the model and tested the following:
for e in range(num_epochs):
model.eval()
# train model
# eval model
It worked poorly:
[CV:02][E:001][I:320/320] avg. Loss: 0.562196, avg. Acc: 0.744774, test. acc: 0.689080(max: 0.689080)
[CV:02][E:002][I:320/320] avg. Loss: 0.506071, avg. Acc: 0.749071, test. acc: 0.689080(max: 0.689080)
[CV:02][E:003][I:320/320] avg. Loss: 0.503234, avg. Acc: 0.749071, test. acc: 0.689080(max: 0.689080)
[CV:02][E:004][I:320/320] avg. Loss: 0.502916, avg. Acc: 0.749071, test. acc: 0.689080(max: 0.689080)
[CV:02][E:005][I:320/320] avg. Loss: 0.502859, avg. Acc: 0.749071, test. acc: 0.689080(max: 0.689080)
[CV:02][E:006][I:320/320] avg. Loss: 0.502838, avg. Acc: 0.749071, test. acc: 0.689080(max: 0.689080)
I performed the above experiments multiple times and the results were not far off the outputs I have posted above. (Data I'm working with is pretty simple).
To summarize, the model works best in a very special setting.
Batch Normalization and Dropout added to the model. (This is good).
Train the model with model.train() on only for the first epoch. (Weird.. combined with 3)
Train the model with model.eval() on for the rest of the iterations. (Weird as well)
To be honest, I wouldn't have set up the training procedure as above(I don't think anyone would), but it works well for some reason. Has anybody experienced anything similar? or if you could guide me as to why the model behaves this way, it would be much appreciated!!
Thanks in advance!!

Related

What's the meaning of the number before the progress bar when tensorflow is training

Could anyone tell me what's the meaning of '10' and '49' in the following log of tensorflow?
Much Thanks
INFO:tensorflow:Started compiling
INFO:tensorflow:Finished compiling. Time elapsed: 5.899410247802734 secs
10/10 [==============================] - 23s 2s/step - loss: 2.6726 - acc: 0.1459
49/49 [==============================] - 108s 2s/step - loss: 2.3035 - acc: 0.2845 - val_loss: 2.6726 - val_acc: 0.1459
Epoch 2/100
10/10 [==============================] - 1s 133ms/step - loss: 2.8799 - acc: 0.1693
49/49 [==============================] - 17s 337ms/step - loss: 1.9664 - acc: 0.4042 - val_loss: 2.8799 - val_acc: 0.1693

10 and 49 corresponds to the number of batches which your dataset has been divided into in each epoch.
For example, in your train dataset, there are totally 10000 images and your batch size is 64, then there will be totally math.ceil(10000/64) = 157 batches possible in each epoch.

'loss: nan' during training of Neural Network in Keras

I am training a neural net in Keras. During training of the first epoch the loss value returns and then suddenly goes loss: nan before the first epoch ends, significantly dropping the accuracy. Then starting the second epoch the loss: nan continues but the accuracy is 0. This goes on for the rest of the epochs.
The frustrating bit is that there seems to be no consistency in the output for each time I train. As to say, the loss: nan shows up at different points in the first epoch.
There have been a couple of questions on this website that give "guides" to problems similar to this I just haven't seen one done so explicitly in keras.
I am trying to get my neural network to classify a 1 or a 0.
Here are some things I have done, post-ceding this will be my output and code.
Standardization // Normalization
I posted a question about my data here. I was able to figure it out and perform sklearn's StandardScaler() and MinMaxScaler() on my dataset. Both standardization and normalization methods did not help my issue.
Learning Rate
The optimizers I have tried are adam and SGD. In both cases I tried lowering the standard learning rate to see if that would help and in both cases. Same issue arose.
Activations
I thought that it was pretty standard to use relu but I saw on the internet somewhere someone talking about using tanh, tried it, no dice.
Batch Size
Tried 32, 50, 128, 200. 50 got me the farthest into the 1st epoch, everything else didn't help.
Combating Overfitting
Put a dropout layer in and tried a whole bunch of numbers.
Other Observations
The epochs train really really fast for the dimensions of the data (I could be wrong).
loss: nan could have something to do with my loss function being binary_crossentropy and maybe some values are giving that loss function a hard time.
kernel_initializer='uniform' has been untouched and unconsidered in my quest to figure this out.
The internet also told me that there could be a nan value in my data but I think that was for an error that broke their script.
from sklearn.preprocessing import MinMaxScaler
sc = MinMaxScaler()
X_train_total_scale = sc.fit_transform((X_train))
X_test_total_scale = sc.transform((X_test))
print(X_train_total_scale.shape) #(4140, 2756)
print(y_train.shape) #(4140,)
##NN
#adam = keras.optimizers.Adam(lr= 0.0001)
sgd = optimizers.SGD(lr=0.0001, decay=1e-6, momentum=0.9, nesterov=True)
classifier = Sequential()
classifier.add(Dense(output_dim = 1379, kernel_initializer='uniform', activation='relu', input_dim=2756))
classifier.add(Dropout(0.6))
classifier.add(Dense(output_dim = 1379, kernel_initializer='uniform', activation='relu'))
classifier.add(Dense(output_dim = 1, kernel_initializer='uniform', activation='sigmoid'))
classifier.compile(optimizer=sgd, loss='binary_crossentropy', metrics=['accuracy'])
classifier.fit(X_train_total_scale, y_train, validation_data=(X_test_total_scale, y_test), batch_size=50, epochs=100)
(batch size 200 shown to avoid too-big-a text block)
200/4140 [>.............................] - ETA: 7s - loss: 0.6866 - acc: 0.5400
400/4140 [=>............................] - ETA: 4s - loss: 0.6912 - acc: 0.5300
600/4140 [===>..........................] - ETA: 2s - loss: nan - acc: 0.5300
800/4140 [====>.........................] - ETA: 2s - loss: nan - acc: 0.3975
1000/4140 [======>.......................] - ETA: 1s - loss: nan - acc: 0.3180
1200/4140 [=======>......................] - ETA: 1s - loss: nan - acc: 0.2650
1400/4140 [=========>....................] - ETA: 1s - loss: nan - acc: 0.2271
1600/4140 [==========>...................] - ETA: 1s - loss: nan - acc: 0.1987
1800/4140 [============>.................] - ETA: 1s - loss: nan - acc: 0.1767
2000/4140 [=============>................] - ETA: 0s - loss: nan - acc: 0.1590
2200/4140 [==============>...............] - ETA: 0s - loss: nan - acc: 0.1445
2400/4140 [================>.............] - ETA: 0s - loss: nan - acc: 0.1325
2600/4140 [=================>............] - ETA: 0s - loss: nan - acc: 0.1223
2800/4140 [===================>..........] - ETA: 0s - loss: nan - acc: 0.1136
3000/4140 [====================>.........] - ETA: 0s - loss: nan - acc: 0.1060
3200/4140 [======================>.......] - ETA: 0s - loss: nan - acc: 0.0994
3400/4140 [=======================>......] - ETA: 0s - loss: nan - acc: 0.0935
3600/4140 [=========================>....] - ETA: 0s - loss: nan - acc: 0.0883
3800/4140 [==========================>...] - ETA: 0s - loss: nan - acc: 0.0837
4000/4140 [===========================>..] - ETA: 0s - loss: nan - acc: 0.0795
4140/4140 [==============================] - 2s 368us/step - loss: nan - acc: 0.0768 - val_loss: nan - val_acc: 0.0000e+00
Epoch 2/100
200/4140 [>.............................] - ETA: 1s - loss: nan - acc: 0.0000e+00
400/4140 [=>............................] - ETA: 0s - loss: nan - acc: 0.0000e+00
600/4140 [===>..........................] - ETA: 0s - loss: nan - acc: 0.0000e+00
800/4140 [====>.........................] - ETA: 0s - loss: nan - acc: 0.0000e+00
1000/4140 [======>.......................] - ETA: 0s - loss: nan - acc: 0.0000e+00
1200/4140 [=======>......................] - ETA: 0s - loss: nan - acc: 0.0000e+00
1400/4140 [=========>....................] - ETA: 0s - loss: nan - acc: 0.0000e+00
1600/4140 [==========>...................] - ETA: 0s - loss: nan - acc: 0.0000e+00
... and so on...
I hope to be able to get a full training done (duh) but I would also like to learn about some of the intuition people have to figure out these problems on their own!

Firstly, check for NaNs or inf in your dataset.
You could try different optimizers, e.g. rmsprop.
Learning rate could be smaller, though I haven't used anything lower than 0.0001 (which is what you're using) myself.
I thought that it was pretty standard to use relu but I saw on the internet somewhere someone talking about using tanh, tried it, no dice
Try leaky relu, elu if you're concerned about this.

what the bars in keras training show?

I am using keras and part of my network and parameters are as follows:
parser.add_argument("--batch_size", default=396, type=int,
help="batch size")
parser.add_argument("--n_epochs", default=10, type=int,
help="number of epoch")
parser.add_argument("--epoch_steps", default=10, type=int,
help="number of epoch step")
parser.add_argument("--val_steps", default=4, type=int,
help="number of valdation step")
parser.add_argument("--n_labels", default=2, type=int,
help="Number of label")
parser.add_argument("--input_shape", default=(224, 224, 3),
help="Input images shape")
parser.add_argument("--kernel", default=3, type=int,
help="Kernel size")
parser.add_argument("--pool_size", default=(2, 2),
help="pooling and unpooling size")
parser.add_argument("--output_mode", default="softmax", type=str,
help="output activation")
parser.add_argument("--loss", default="categorical_crossentropy", type=str,
help="loss function")
parser.add_argument("--optimizer", default="adadelta", type=str,
help="oprimizer")
args = parser.parse_args()
return args
def main(args):
# set the necessary list
train_list = pd.read_csv(args.train_list, header=None)
val_list = pd.read_csv(args.val_list, header=None)
train_gen = data_gen_small(trainimg_dir, trainmsk_dir,
train_list, args.batch_size,
[args.input_shape[0], args.input_shape[1]], args.n_labels)
#print(train_gen, "train_gen is:")
val_gen = data_gen_small(valimg_dir, valmsk_dir,
val_list, args.batch_size,
[args.input_shape[0], args.input_shape[1]], args.n_labels)
model = segnet(args.input_shape, args.n_labels,
args.kernel, args.pool_size, args.output_mode)
print(model.summary())
model.compile(loss=args.loss,
optimizer=args.optimizer, metrics=["accuracy"])
model.fit_generator(train_gen, steps_per_epoch=args.epoch_steps,
epochs=args.n_epochs, validation_data=val_gen,
validation_steps=args.val_steps, verbose=1)
I get 10 results (the number of epochs) as follows but I do not understand why I have 10 bars for each epoch? Are the accuracy and loss that is reported in each of the bars show the accuracy and loss over each batch? Are they only for one batch or previous batches are also considered in them?
Epoch 10/10
1/10 [==>...........................] - ETA: 3s - loss: 0.4046 - acc: 0.8266
2/10 [=====>........................] - ETA: 3s - loss: 0.3336 - acc: 0.8715
3/10 [========>.....................] - ETA: 2s - loss: 0.3083 - acc: 0.8855
4/10 [===========>..................] - ETA: 2s - loss: 0.2820 - acc: 0.9010
5/10 [==============>...............] - ETA: 1s - loss: 0.2680 - acc: 0.9119
6/10 [=================>............] - ETA: 1s - loss: 0.4112 - acc: 0.8442
7/10 [====================>.........] - ETA: 1s - loss: 0.4040 - acc: 0.8446
8/10 [=======================>......] - ETA: 0s - loss: 0.3811 - acc: 0.8597
9/10 [==========================>...] - ETA: 0s - loss: 0.3623 - acc: 0.8708
10/10 [==============================] - 4s 398ms/step - loss: 0.3495 - acc: 0.8766 - val_loss: 0.5148 - val_acc: 0.7703
PS: the number of my training data is 659 and validation data is 329.

Yes it shows cumulative results. You can change the verbose parameter to 2, that will stop showing multiple lines and you'll get one line for each epoch, or 0 which doesn't show results.

Predicting the price of the natural gas using LSTM neural network

I want to build a model using Keras to predict the price of the natural gas.
The dataset contains the price for the gas daily and monthly since 1997 and it is available Here.
The following graph shows the prices during a sequence of days. X is days and Y is the price.
I have tried LSTM with 4,50,100 cell in hidden layer but the accuracy still not was bad and the model failed to predict future price.
I have added another two hidden layers (full connected) with 100 and 128 cell but it did not work too.
This is the model and the result form training process:
num_units = 100
activation_function = 'sigmoid'
optimizer = 'adam'
loss_function = 'mean_squared_error'
batch_size = 5
num_epochs = 10
log_file_name = f"{SEQ_LEN}-SEQ-{1}-PRED-{int(time.time())}"
# Initialize the model (of a Sequential type)
model = Sequential()
# Adding the input layer and the LSTM layer
model.add(LSTM(units = num_units, activation = activation_function,input_shape=(None, 1)))
# Adding the output layer
model.add(Dense(units = 1))
# Compiling the RNN
model.compile(optimizer = optimizer, loss = loss_function, metrics=['accuracy'])
# Using the training set to train the model
history = model.fit(train_x, train_y, batch_size = batch_size, epochs =num_epochs,validation_data=(test_x, test_y))
and the output is :
Train on 4362 samples, validate on 1082 samples
Epoch 1/10
4362/4362 [==============================] - 11s 3ms/step - loss: 0.0057 - acc: 2.2925e-04 - val_loss: 0.0016 - val_acc: 0.0018
Epoch 2/10
4362/4362 [==============================] - 9s 2ms/step - loss: 6.2463e-04 - acc: 4.5851e-04 - val_loss: 0.0013 - val_acc: 0.0018
Epoch 3/10
4362/4362 [==============================] - 9s 2ms/step - loss: 6.1073e-04 - acc: 2.2925e-04 - val_loss: 0.0014 - val_acc: 0.0018
Epoch 4/10
4362/4362 [==============================] - 8s 2ms/step - loss: 5.4403e-04 - acc: 4.5851e-04 - val_loss: 0.0014 - val_acc: 0.0018
Epoch 5/10
4362/4362 [==============================] - 7s 2ms/step - loss: 5.4765e-04 - acc: 4.5851e-04 - val_loss: 0.0012 - val_acc: 0.0018
Epoch 6/10
4362/4362 [==============================] - 8s 2ms/step - loss: 5.1991e-04 - acc: 4.5851e-04 - val_loss: 0.0013 - val_acc: 0.0018
Epoch 7/10
4362/4362 [==============================] - 7s 2ms/step - loss: 5.7324e-04 - acc: 2.2925e-04 - val_loss: 0.0011 - val_acc: 0.0018
Epoch 8/10
4362/4362 [==============================] - 7s 2ms/step - loss: 4.4248e-04 - acc: 4.5851e-04 - val_loss: 0.0011 - val_acc: 0.0018
Epoch 9/10
4362/4362 [==============================] - 7s 2ms/step - loss: 4.3868e-04 - acc: 4.5851e-04 - val_loss: 0.0011 - val_acc: 0.0018
Epoch 10/10
4362/4362 [==============================] - 7s 2ms/step - loss: 4.6654e-04 - acc: 4.5851e-04 - val_loss: 0.0011 - val_acc: 0.0018
How to know the number of layers and cells for problem like this? Anyone can suggest a netwrok structure that can solve this problem?

resnet50 - increasing training speed? keras

hth do I increase the speed of this? I mean the loss is moving down by hairs. HAIRS.
Epoch 1/30
4998/4998 [==============================] - 307s 62ms/step - loss: 0.6861 - acc: 0.6347
Epoch 2/30
4998/4998 [==============================] - 316s 63ms/step - loss: 0.6751 - acc: 0.6387
Epoch 3/30
4998/4998 [==============================] - 357s 71ms/step - loss: 0.6676 - acc: 0.6387
Epoch 4/30
4998/4998 [==============================] - 376s 75ms/step - loss: 0.6625 - acc: 0.6387
Epoch 5/30
4998/4998 [==============================] - 354s 71ms/step - loss: 0.6592 - acc: 0.6387
Epoch 6/30
4998/4998 [==============================] - 345s 69ms/step - loss: 0.6571 - acc: 0.6387
Epoch 7/30
4998/4998 [==============================] - 349s 70ms/step - loss: 0.6559 - acc: 0.6387
Model Architecture:
resnet50 (CNN with skip connections)
Except instead of 1 FC I have two. And I changed the softmax output to sigmoid for binary classification.
num positive training data: 1806
num neg training data: 3192
My output is represented by a 1 or 0 for each example ( [0, 0, 1, 1, ...])
batches = 40, num epochs =30, but that doesn't matter because the loss stopped

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Pytorch model.train() and model.eval() behave in a weird way - pytorch

Related

What's the meaning of the number before the progress bar when tensorflow is training

'loss: nan' during training of Neural Network in Keras

what the bars in keras training show?

Predicting the price of the natural gas using LSTM neural network

resnet50 - increasing training speed? keras

Categories

Resources