I'm trying to train a LSTM model to predict the temperature.but the model only got trained in first epochs.
I got the usage and temperature of cpu from a server in about twenty hours as the dataset.I want to predict the temperature of cpu after 10m by using 10m's data before.so I reshape my dataset to (1301,10,2) as I have 1301 samples,10m timesteps and 2 features, then I divide it to 1201 and 100 as the train dataset and the validation dataset.
I check the dataset manually,so it should be right.
I creat the LSTM model as below
model = Sequential()
model.add(LSTM(10, activation="relu", input_shape=(train_x.shape[1], train_x.shape[2]),return_sequences=True))
model.add(Flatten())
model.add(Dense(1, activation="softmax"))
model.compile(loss='mean_absolute_error', optimizer='RMSprop')
and try to fit it
model.fit(train_x, train_y, epochs=50, batch_size=32, validation_data=(test_x, test_y), verbose=2)
I got the log like this:
Epoch 1/50
- 1s - loss: 0.8016 - val_loss: 0.8147
Epoch 2/50
- 0s - loss: 0.8016 - val_loss: 0.8147
Epoch 3/50
- 0s - loss: 0.8016 - val_loss: 0.8147
Epoch 4/50
- 0s - loss: 0.8016 - val_loss: 0.8147
Epoch 5/50
- 0s - loss: 0.8016 - val_loss: 0.8147
Epoch 6/50
- 0s - loss: 0.8016 - val_loss: 0.8147
Epoch 7/50
- 0s - loss: 0.8016 - val_loss: 0.8147
Epoch 8/50
- 0s - loss: 0.8016 - val_loss: 0.8147
Epoch 9/50
- 0s - loss: 0.8016 - val_loss: 0.8147
The trainning time of each epoch is 0 expect the first epoch,and the loss never decrease.I tried changing the number of LSTM cells,loss function and optimizer,but it still don't work.
Changing the activation function of last layer from softmax to sigmoid make the model works.Thanks to #giser_yugang #Ashwin Geet D'Sa
Related
Hoping for a quick second pair of eyes before I officially give up hope on applying deep learning to stock prediction.
The goal is to use an LSTM to predict one of two classes. The positive class corresponds to a sequence that led to a price increase of 5% or greater over the next six periods - the negative class corresponds to a sequence that did not. As expected this has led to a bit of class imbalance with the ratio being about 6:1 negative to positive. The problem right now though is that the model is showing the same accuracy across all epochs and is only predicting the negative class. This makes me think that I may have a problem with the structure of my model. The input is adataframe which includes price data and few moving averages:
price_open price_high price_low price_close ma_8 ma_13 ma_21 ma_55 6prd_pctchange entry_flag
time_period_start
11-02-2016 23:00 10.83280 10.98310 10.72591 10.96000 10.932415 10.855693 10.960608 11.087525 0.008535 0.0
11-03-2016 03:00 10.96016 11.02560 10.96000 11.00003 10.937569 10.873219 10.948081 11.075059 0.004544 0.0
11-03-2016 07:00 11.00007 11.14997 10.91000 11.00006 10.954170 10.919378 10.929689 11.062878 -0.007442 0.0
11-03-2016 11:00 11.05829 11.14820 10.90001 10.99208 10.959396 10.923376 10.912183 11.057317 0.008392 0.0
11-03-2016 15:00 10.90170 11.03112 10.70000 10.91529 10.938490 10.933783 10.890906 11.048504 0.006289 0.0
11-03-2016 19:00 10.89420 10.95000 10.82460 10.94980 10.944640 10.947429 10.882745 11.041227 0.005234 0.0
11-03-2016 23:00 10.94128 11.08475 10.88404 11.08475 10.974350 10.957118 10.888859 11.032288 0.011382 0.0
11-04-2016 03:00 11.02761 11.22778 10.94360 10.99813 10.987517 10.967185 10.893531 11.023518 -0.000173 0.0
11-04-2016 07:00 10.95076 11.01814 10.92000 10.92100 10.982642 10.964934 10.904055 11.011691 -0.007187 0.0
11-04-2016 11:00 10.94511 11.06298 10.89000 10.99557 10.982085 10.958244 10.914692 11.000365 0.000318 0.0
and has been converted into numpy arrays that are 6 periods in length and normalized using the scikit-learn method MinMaxScaler. As an example, the first sequence looks like below:
array([[0. , 0.16552483, 0.09965385, 0.52742716, 0. ,
0. , 1. , 1. ],
[0.5648144 , 0.37805671, 1. , 0.9996461 , 0.19101228,
0.19104958, 0.83911884, 0.73073358],
[0.74180673, 1. , 0.80769231, 1. , 0.80630067,
0.69421501, 0.60290376, 0.46764059],
[1. , 0.99114867, 0.76926923, 0.90586292, 1. ,
0.73780155, 0.37807623, 0.34751414],
[0.30555679, 0.40566085, 0. , 0. , 0.22515636,
0.85124563, 0.104818 , 0.15716305],
[0.27229589, 0. , 0.47923077, 0.40710157, 0.45309243,
1. , 0. , 0. ]])
When I build, compile, and fit a model on these sequences my results quickly plateau and the model ends up only predicting the negative class.
# Constants:
loss = 'binary_crossentropy'
optimizer = 'adam'
epochs = 12
batch_size = 300
# Complie model:
model = Sequential()
model.add(LSTM(100))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss=loss, optimizer=optimizer, metrics=['accuracy'])
results = model.fit(X_train, y_train, epochs=epochs, batch_size=batch_size, verbose=1, validation_data=(X_test, y_test), shuffle=False)
model.summary()
It outputs:
Epoch 1/12
22/22 [==============================] - 0s 16ms/step - loss: 0.5696 - accuracy: 0.8410 - val_loss: 0.3953 - val_accuracy: 0.8885
Epoch 2/12
22/22 [==============================] - 0s 10ms/step - loss: 0.4355 - accuracy: 0.8473 - val_loss: 0.3569 - val_accuracy: 0.8885
Epoch 3/12
22/22 [==============================] - 0s 9ms/step - loss: 0.4379 - accuracy: 0.8473 - val_loss: 0.3612 - val_accuracy: 0.8885
Epoch 4/12
22/22 [==============================] - 0s 9ms/step - loss: 0.4320 - accuracy: 0.8473 - val_loss: 0.3554 - val_accuracy: 0.8885
Epoch 5/12
22/22 [==============================] - 0s 10ms/step - loss: 0.4338 - accuracy: 0.8473 - val_loss: 0.3577 - val_accuracy: 0.8885
Epoch 6/12
22/22 [==============================] - 0s 10ms/step - loss: 0.4297 - accuracy: 0.8473 - val_loss: 0.3554 - val_accuracy: 0.8885
Epoch 7/12
22/22 [==============================] - 0s 9ms/step - loss: 0.4303 - accuracy: 0.8473 - val_loss: 0.3570 - val_accuracy: 0.8885
Epoch 8/12
22/22 [==============================] - 0s 9ms/step - loss: 0.4273 - accuracy: 0.8473 - val_loss: 0.3558 - val_accuracy: 0.8885
Epoch 9/12
22/22 [==============================] - 0s 9ms/step - loss: 0.4285 - accuracy: 0.8473 - val_loss: 0.3577 - val_accuracy: 0.8885
Epoch 10/12
22/22 [==============================] - 0s 9ms/step - loss: 0.4254 - accuracy: 0.8473 - val_loss: 0.3565 - val_accuracy: 0.8885
Epoch 11/12
22/22 [==============================] - 0s 9ms/step - loss: 0.4270 - accuracy: 0.8473 - val_loss: 0.3581 - val_accuracy: 0.8885
Epoch 12/12
22/22 [==============================] - 0s 9ms/step - loss: 0.4243 - accuracy: 0.8473 - val_loss: 0.3569 - val_accuracy: 0.8885
Model: "sequential_6"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_6 (LSTM) (None, 100) 42400
_________________________________________________________________
dense_6 (Dense) (None, 1) 101
=================================================================
And a quick check shows that it is only predicting the negative class:
predictions = model.predict(X_test)
predictions_round = [1 if x > 0.5 else 0 for x in predictions]
pd.Series(predictions_round).value_counts()
0 1641
dtype: int64
I'll be the first to say that this may be because predicting a stock price entry point is a task full of noise. BUT I also expected the model to at least make a handful of wrong guesses instead of simply guessing all the same class. To me, that seems like an issue with the way I built the model or structured the inputs.
X_train.shape and y_train.shape give me (6561, 6, 8) and (6561, ) respectively.
Thanks in advance for any help!
I am training a model with the following code
model=Sequential()
model.add(Dense(100, activation='relu',input_shape=(n_cols,)))
model.add(Dense(100, activation='relu'))
model.add(Dense(2,activation='softmax'))
model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])
early_stopping_monitor = EarlyStopping(patience=3)
model.fit(X_train_np,target,validation_split=0.3, epochs=100, callbacks=[early_stopping_monitor])
This is designed to stop the training if the val_loss: parameter does not improve after 3 epochs. The result is shown below. My question is will the model stop with weights of epoch 8 or 7. Because the performance got bad in epoch 8 so it stopped. But the model went ahead by 1 epoch with a bad performing parameter as earlier one (epoch 7) was better. Do I need to retrain the model now with 7 epochs?
Train on 623 samples, validate on 268 samples
Epoch 1/100
623/623 [==============================] - 1s 1ms/step - loss: 4.0365 - accuracy: 0.5923 - val_loss: 1.2208 - val_accuracy: 0.6231
Epoch 2/100
623/623 [==============================] - 0s 114us/step - loss: 1.4412 - accuracy: 0.6356 - val_loss: 0.7193 - val_accuracy: 0.7015
Epoch 3/100
623/623 [==============================] - 0s 103us/step - loss: 1.4335 - accuracy: 0.6260 - val_loss: 1.3778 - val_accuracy: 0.7201
Epoch 4/100
623/623 [==============================] - 0s 106us/step - loss: 3.5732 - accuracy: 0.6324 - val_loss: 2.7310 - val_accuracy: 0.6194
Epoch 5/100
623/623 [==============================] - 0s 111us/step - loss: 1.3116 - accuracy: 0.6372 - val_loss: 0.5952 - val_accuracy: 0.7351
Epoch 6/100
623/623 [==============================] - 0s 98us/step - loss: 0.9357 - accuracy: 0.6645 - val_loss: 0.8047 - val_accuracy: 0.6828
Epoch 7/100
623/623 [==============================] - 0s 105us/step - loss: 0.7671 - accuracy: 0.6934 - val_loss: 0.9918 - val_accuracy: 0.6679
Epoch 8/100
623/623 [==============================] - 0s 126us/step - loss: 2.2968 - accuracy: 0.6629 - val_loss: 1.7789 - val_accuracy: 0.7425
Use restore_best_weights with monitor value set to target quantity. So, the best weights will be restored after training automatically.
early_stopping_monitor = EarlyStopping(patience=3,
monitor='val_loss', # assuming it's val_loss
restore_best_weights=True )
From docs:
restore_best_weights: whether to restore model weights from the epoch with the best value of the monitored quantity ('val_loss' here). If False, the model weights obtained at the last step of training are used (default False).
Docmentation link
All the code that I have placed is in TensorFlow 2.0
file path: Is a string that can have formatting options such as the epoch number. For example the following is a common filepath (weights.{epoch:02d}-{val_loss:.2f}.hdf5)
monitor: (typically it is‘val_loss’or ‘val_accuracy’)
mode: Should it be minimizing or maximizing the monitor value
(typically either ‘min’ or ‘max’)
save_best_only: If this is set to true then it will only save the
model for the current epoch, if it’s metric values, is better than
what has gone before. However, if you set save_best_only to
false it will save every model after each epoch (regardless of
whether that model was better than previous models or not).
Code
model=Sequential()
model.add(Dense(100, activation='relu',input_shape=(n_cols,)))
model.add(Dense(100, activation='relu'))
model.add(Dense(2,activation='softmax'))
model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])
fname = "weights.{epoch:02d}-{val_loss:.2f}.hdf5"
checkpoint = tf.keras.callbacks.ModelCheckpoint(fname, monitor="val_loss",mode="min", save_best_only=True, verbose=1)
model.fit(X_train_np,target,validation_split=0.3, epochs=100, callbacks=[checkpoint])
As the title clearly describes, the accuracy of my simple CNN model is not being affected by the hyper-parameters or even the existence of layers such as Dropout, and MaxPooling. I implemented the model using Keras. What could be the reason behind this odd situation? I added the regarding part of the code below:
input_dim = X_train.shape[1]
nb_classes = Y_train.shape[1]
model = Sequential()
model.add(Conv1D(filters=64, kernel_size=3, activation='relu', input_shape=(input_dim, 1)))
model.add(Dropout(0.5))
model.add(MaxPooling1D(pool_size=2))
model.add(Flatten())
model.add(Dense(40, activation='relu'))
model.add(Dense(nb_classes, activation='softmax'))
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
p.s. The input data (X_train and X_test) contains vectors which were reproduced by Word2Vec. The output is binary.
Edit: You may find a sample training log below:
Sample training log:
Train on 3114 samples, validate on 347 samples
Epoch 1/10
- 1s - loss: 0.6917 - accuracy: 0.5363 - val_loss: 0.6901 - val_accuracy: 0.5476
Epoch 2/10
- 1s - loss: 0.6906 - accuracy: 0.5369 - val_loss: 0.6896 - val_accuracy: 0.5476
Epoch 3/10
- 1s - loss: 0.6908 - accuracy: 0.5369 - val_loss: 0.6895 - val_accuracy: 0.5476
Epoch 4/10
- 1s - loss: 0.6908 - accuracy: 0.5369 - val_loss: 0.6903 - val_accuracy: 0.5476
Epoch 5/10
- 1s - loss: 0.6908 - accuracy: 0.5369 - val_loss: 0.6899 - val_accuracy: 0.5476
Epoch 6/10
- 1s - loss: 0.6909 - accuracy: 0.5369 - val_loss: 0.6901 - val_accuracy: 0.5476
Epoch 7/10
- 1s - loss: 0.6905 - accuracy: 0.5369 - val_loss: 0.6896 - val_accuracy: 0.5476
Epoch 8/10
- 1s - loss: 0.6909 - accuracy: 0.5369 - val_loss: 0.6897 - val_accuracy: 0.5476
Epoch 9/10
- 1s - loss: 0.6905 - accuracy: 0.5369 - val_loss: 0.6892 - val_accuracy: 0.5476
Epoch 10/10
- 1s - loss: 0.6909 - accuracy: 0.5369 - val_loss: 0.6900 - val_accuracy: 0.5476
First you need to change the last layer to
model.add(Dense(1, activation='sigmoid'))
You also need to change the loss function to
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
I assume that you have multi-class classification, right?
Then your loss is not appropriate: you should use 'categorical_crossentropy' not 'mean_squared_error'.
Also, try adding several Conv+Drop+MaxPool (3 sets) in order to clearly verify the robustness of your network.
I'm just trying to play around with Keras, but I'm running into some trouble trying to teach it a basic function (multiply by two). My setup is as follows. Since I'm new to this, I added in comments what I believe to be happening at each step.
x_train = np.linspace(1,1000,1000)
y_train=x_train*2
model = Sequential()
model.add(Dense(32, input_dim=1, activation='sigmoid')) #add a 32-node layer
model.add(Dense(32, activation='sigmoid')) #add a second 32-node layer
model.add(Dense(1, activation='sigmoid')) #add a final output layer
model.compile(loss='mse',
optimizer='rmsprop') #compile it with loss being mean squared error
model.fit(x_train,y_train, epochs = 10, batch_size=100) #train
score = model.evaluate(x_train,y_train,batch_size=100)
print(score)
I get the following output:
1000/1000 [==============================] - 0s 355us/step - loss: 1334274.0375
Epoch 2/10
1000/1000 [==============================] - 0s 21us/step - loss: 1333999.8250
Epoch 3/10
1000/1000 [==============================] - 0s 29us/step - loss: 1333813.4062
Epoch 4/10
1000/1000 [==============================] - 0s 28us/step - loss: 1333679.2625
Epoch 5/10
1000/1000 [==============================] - 0s 27us/step - loss: 1333591.6750
Epoch 6/10
1000/1000 [==============================] - 0s 51us/step - loss: 1333522.0000
Epoch 7/10
1000/1000 [==============================] - 0s 23us/step - loss: 1333473.7000
Epoch 8/10
1000/1000 [==============================] - 0s 24us/step - loss: 1333440.6000
Epoch 9/10
1000/1000 [==============================] - 0s 29us/step - loss: 1333412.0250
Epoch 10/10
1000/1000 [==============================] - 0s 21us/step - loss: 1333390.5000
1000/1000 [==============================] - 0s 66us/step
['loss']
1333383.1143554687
It seems like the loss is extremely high for this basic function, and I'm confused why it's not able to learn it. Am I confused, or have I done something wrong?
Using a sigmoid activation constrains your output to the range [0, 1]. But your target output is in the range [0, 2000], so your network cannot learn. Try a relu activation instead.
Try using adam rather than rmsprop when debugging, it almost always works better.
Train longer.
Putting it all together, I get the following output:
Epoch 860/1000
1000/1000 [==============================] - 0s 29us/step - loss: 5.1868e-08
I am trying to find a useful code for improve classification using autoencoder.
I followed this example keras autoencoder vs PCA
But not for MNIST data, I tried to use it with cifar-10
so I made some changes but it seems like something is not fitting.
Could any one please help me in this?
if you have another example that can run in different dataset, that would help.
the validation in reduced.fit, which is (X_test,Y_test) is not learned, so it gives wronf accuracy in .evalute()
always give
val_loss: 2.3026 - val_acc: 0.1000
This is the code, and the error:
rom keras.datasets import cifar10
from keras.models import Model
from keras.layers import Input, Dense
from keras.utils import np_utils
import numpy as np
num_train = 50000
num_test = 10000
height, width, depth = 32, 32, 3 # MNIST images are 28x28
num_classes = 10 # there are 10 classes (1 per digit)
(X_train, y_train), (X_test, y_test) = cifar10.load_data()
X_train = X_train.reshape(num_train,height * width * depth)
X_test = X_test.reshape(num_test,height * width*depth)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255 # Normalise data to [0, 1] range
X_test /= 255 # Normalise data to [0, 1] range
Y_train = np_utils.to_categorical(y_train, num_classes) # One-hot encode the labels
Y_test = np_utils.to_categorical(y_test, num_classes) # One-hot encode the labels
input_img = Input(shape=(height * width * depth,))
s=height * width * depth
x = Dense(s, activation='relu')(input_img)
encoded = Dense(s//2, activation='relu')(x)
encoded = Dense(s//8, activation='relu')(encoded)
y = Dense(s//256, activation='relu')(x)
decoded = Dense(s//8, activation='relu')(y)
decoded = Dense(s//2, activation='relu')(decoded)
z = Dense(s, activation='sigmoid')(decoded)
model = Model(input_img, z)
model.compile(optimizer='adadelta', loss='mse') # reporting the accuracy
model.fit(X_train, X_train,
nb_epoch=10,
batch_size=128,
shuffle=True,
validation_data=(X_test, X_test))
mid = Model(input_img, y)
reduced_representation =mid.predict(X_test)
out = Dense(num_classes, activation='softmax')(y)
reduced = Model(input_img, out)
reduced.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
reduced.fit(X_train, Y_train,
nb_epoch=10,
batch_size=128,
shuffle=True,
validation_data=(X_test, Y_test))
scores = reduced.evaluate(X_test, Y_test, verbose=0)
print("Accuracy: ", scores[1])
Train on 50000 samples, validate on 10000 samples
Epoch 1/10
50000/50000 [==============================] - 5s - loss: 0.0639 - val_loss: 0.0633
Epoch 2/10
50000/50000 [==============================] - 5s - loss: 0.0610 - val_loss: 0.0568
Epoch 3/10
50000/50000 [==============================] - 5s - loss: 0.0565 - val_loss: 0.0558
Epoch 4/10
50000/50000 [==============================] - 5s - loss: 0.0557 - val_loss: 0.0545
Epoch 5/10
50000/50000 [==============================] - 5s - loss: 0.0536 - val_loss: 0.0518
Epoch 6/10
50000/50000 [==============================] - 5s - loss: 0.0502 - val_loss: 0.0461
Epoch 7/10
50000/50000 [==============================] - 5s - loss: 0.0443 - val_loss: 0.0412
Epoch 8/10
50000/50000 [==============================] - 5s - loss: 0.0411 - val_loss: 0.0397
Epoch 9/10
50000/50000 [==============================] - 5s - loss: 0.0391 - val_loss: 0.0371
Epoch 10/10
50000/50000 [==============================] - 5s - loss: 0.0377 - val_loss: 0.0403
Train on 50000 samples, validate on 10000 samples
Epoch 1/10
50000/50000 [==============================] - 3s - loss: 2.3605 - acc: 0.0977 - val_loss: 2.3026 - val_acc: 0.1000
Epoch 2/10
50000/50000 [==============================] - 3s - loss: 2.3027 - acc: 0.0952 - val_loss: 2.3026 - val_acc: 0.1000
Epoch 3/10
50000/50000 [==============================] - 3s - loss: 2.3027 - acc: 0.0978 - val_loss: 2.3026 - val_acc: 0.1000
Epoch 4/10
50000/50000 [==============================] - 3s - loss: 2.3027 - acc: 0.0980 - val_loss: 2.3026 - val_acc: 0.1000
Epoch 5/10
50000/50000 [==============================] - 3s - loss: 2.3027 - acc: 0.0974 - val_loss: 2.3026 - val_acc: 0.1000
Epoch 6/10
50000/50000 [==============================] - 3s - loss: 2.3027 - acc: 0.1000 - val_loss: 2.3026 - val_acc: 0.1000
Epoch 7/10
50000/50000 [==============================] - 3s - loss: 2.3027 - acc: 0.0992 - val_loss: 2.3026 - val_acc: 0.1000
Epoch 8/10
50000/50000 [==============================] - 3s - loss: 2.3027 - acc: 0.0982 - val_loss: 2.3026 - val_acc: 0.1000
Epoch 9/10
50000/50000 [==============================] - 3s - loss: 2.3027 - acc: 0.0965 - val_loss: 2.3026 - val_acc: 0.1000
Epoch 10/10
50000/50000 [==============================] - 3s - loss: 2.3027 - acc: 0.0978 - val_loss: 2.3026 - val_acc: 0.1000
9856/10000 [============================>.] - ETA: 0s('Accuracy: ', 0.10000000000000001)
there are multiple issues with your code.
Your autoencoder is not fully trained, if you plot the training data, you will see the model haven't converged yet. By
history = model.fit(X_train, X_train,
nb_epoch=10,
batch_size=128,
shuffle=True,
validation_data=(X_test, X_test))
you will obtain the loss values during training. If you plot them, e.g. in matplotlib,
import matplotlib.pyplot as plt
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model train vs validation loss 1')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper right')
plt.show()
you will see that it needs more epochs to converge.
The autoencoder architecture is wrongly built, there is typo in line y = Dense(s//256, activation='relu')(x), you probably wanted to usey = Dense(s//256, activation='linear')(encoded) so it uses previous layer and not the input. And also you don't want to use the relu activation in latent space, because then it disallows you subtracting latent variables from each other and thus makes the autoencoder much less efficient.
With those fixes, the model trains withour problems.
I increased number of epochs to 30 for training both networks so it will train better.
At the end of the trainings, the classification model reports loss: 1.2881 - acc: 0.5397 - val_loss: 1.3841 - val_acc: 0.5126 which is lower than you experienced.