using tfa.layers.crf on top of biLSTM

using tfa.layers.crf on top of biLSTM - python-3.x

I am trying to implement NER model based on CRF with tensorflow-addons library. The model gets sequence of words in word to index and char level format and the concatenates them and feeds them to the BiLSTM layer. Here is the code of implementation:
import tensorflow as tf
from tensorflow.keras import Model, Input
from tensorflow.keras.layers import LSTM, Embedding, Dense, TimeDistributed, Dropout, Conv1D
from tensorflow.keras.layers import Bidirectional, concatenate, SpatialDropout1D, GlobalMaxPooling1D
from tensorflow_addons.layers import CRF
word_input = Input(shape=(max_sent_len,))
word_emb = Embedding(input_dim=n_words + 2, output_dim=dim_word_emb,
input_length=max_sent_len, mask_zero=True)(word_input)
char_input = Input(shape=(max_sent_len, max_word_len,))
char_emb = TimeDistributed(Embedding(input_dim=n_chars + 2, output_dim=dim_char_emb,
input_length=max_word_len, mask_zero=True))(char_input)
char_emb = TimeDistributed(LSTM(units=20, return_sequences=False,
recurrent_dropout=0.5))(char_emb)
# main LSTM
main_input = concatenate([word_emb, char_emb])
main_input = SpatialDropout1D(0.3)(main_input)
main_lstm = Bidirectional(LSTM(units=50, return_sequences=True,
recurrent_dropout=0.6))(main_input)
kernel = TimeDistributed(Dense(50, activation="relu"))(main_lstm)
crf = CRF(n_tags+1) # CRF layer
decoded_sequence, potentials, sequence_length, chain_kernel = crf(kernel) # output
model = Model([word_input, char_input], potentials)
model.add_loss(tf.abs(tf.reduce_mean(kernel)))
model.compile(optimizer="rmsprop", loss='categorical_crossentropy')
When I start to fit the model, I get this Warnings:
WARNING:tensorflow:Gradients do not exist for variables ['chain_kernel:0'] when minimizing the loss.
WARNING:tensorflow:Gradients do not exist for variables ['chain_kernel:0'] when minimizing the loss.
And training process goes like this:
438/438 [==============================] - 80s 163ms/step - loss: nan - val_loss: nan
Epoch 2/10
438/438 [==============================] - 71s 163ms/step - loss: nan - val_loss: nan
Epoch 3/10
438/438 [==============================] - 71s 162ms/step - loss: nan - val_loss: nan
Epoch 4/10
438/438 [==============================] - 71s 161ms/step - loss: nan - val_loss: nan
Epoch 5/10
438/438 [==============================] - 71s 162ms/step - loss: nan - val_loss: nan
Epoch 6/10
438/438 [==============================] - 70s 160ms/step - loss: nan - val_loss: nan
Epoch 7/10
438/438 [==============================] - 70s 161ms/step - loss: nan - val_loss: nan
Epoch 8/10
438/438 [==============================] - 70s 160ms/step - loss: nan - val_loss: nan
Epoch 9/10
438/438 [==============================] - 71s 161ms/step - loss: nan - val_loss: nan
Epoch 10/10
438/438 [==============================] - 70s 159ms/step - loss: nan - val_loss: nan
I am almost sure that the problem is with the way I am setting the loss functions, but I do not know how exactly I should set them. I also searched for my problem but I did not get any answer.
Also When I test my model, it can not predict labels correct and gives them same labels. Can anybody describe for me that how should I solve this problem?

Ghange your loss function to tensorflow_addons.losses.SigmoidFocalCrossEntropy().I guess categorical crossentropy is not a good choice.

Related

Keras LSTM Sequence Classification - Loss and validation loss decreases by trival amount and accuracy and validation accuracy remains static

Hoping for a quick second pair of eyes before I officially give up hope on applying deep learning to stock prediction.
The goal is to use an LSTM to predict one of two classes. The positive class corresponds to a sequence that led to a price increase of 5% or greater over the next six periods - the negative class corresponds to a sequence that did not. As expected this has led to a bit of class imbalance with the ratio being about 6:1 negative to positive. The problem right now though is that the model is showing the same accuracy across all epochs and is only predicting the negative class. This makes me think that I may have a problem with the structure of my model. The input is adataframe which includes price data and few moving averages:
price_open price_high price_low price_close ma_8 ma_13 ma_21 ma_55 6prd_pctchange entry_flag
time_period_start
11-02-2016 23:00 10.83280 10.98310 10.72591 10.96000 10.932415 10.855693 10.960608 11.087525 0.008535 0.0
11-03-2016 03:00 10.96016 11.02560 10.96000 11.00003 10.937569 10.873219 10.948081 11.075059 0.004544 0.0
11-03-2016 07:00 11.00007 11.14997 10.91000 11.00006 10.954170 10.919378 10.929689 11.062878 -0.007442 0.0
11-03-2016 11:00 11.05829 11.14820 10.90001 10.99208 10.959396 10.923376 10.912183 11.057317 0.008392 0.0
11-03-2016 15:00 10.90170 11.03112 10.70000 10.91529 10.938490 10.933783 10.890906 11.048504 0.006289 0.0
11-03-2016 19:00 10.89420 10.95000 10.82460 10.94980 10.944640 10.947429 10.882745 11.041227 0.005234 0.0
11-03-2016 23:00 10.94128 11.08475 10.88404 11.08475 10.974350 10.957118 10.888859 11.032288 0.011382 0.0
11-04-2016 03:00 11.02761 11.22778 10.94360 10.99813 10.987517 10.967185 10.893531 11.023518 -0.000173 0.0
11-04-2016 07:00 10.95076 11.01814 10.92000 10.92100 10.982642 10.964934 10.904055 11.011691 -0.007187 0.0
11-04-2016 11:00 10.94511 11.06298 10.89000 10.99557 10.982085 10.958244 10.914692 11.000365 0.000318 0.0
and has been converted into numpy arrays that are 6 periods in length and normalized using the scikit-learn method MinMaxScaler. As an example, the first sequence looks like below:
array([[0. , 0.16552483, 0.09965385, 0.52742716, 0. ,
0. , 1. , 1. ],
[0.5648144 , 0.37805671, 1. , 0.9996461 , 0.19101228,
0.19104958, 0.83911884, 0.73073358],
[0.74180673, 1. , 0.80769231, 1. , 0.80630067,
0.69421501, 0.60290376, 0.46764059],
[1. , 0.99114867, 0.76926923, 0.90586292, 1. ,
0.73780155, 0.37807623, 0.34751414],
[0.30555679, 0.40566085, 0. , 0. , 0.22515636,
0.85124563, 0.104818 , 0.15716305],
[0.27229589, 0. , 0.47923077, 0.40710157, 0.45309243,
1. , 0. , 0. ]])
When I build, compile, and fit a model on these sequences my results quickly plateau and the model ends up only predicting the negative class.
# Constants:
loss = 'binary_crossentropy'
optimizer = 'adam'
epochs = 12
batch_size = 300
# Complie model:
model = Sequential()
model.add(LSTM(100))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss=loss, optimizer=optimizer, metrics=['accuracy'])
results = model.fit(X_train, y_train, epochs=epochs, batch_size=batch_size, verbose=1, validation_data=(X_test, y_test), shuffle=False)
model.summary()
It outputs:
Epoch 1/12
22/22 [==============================] - 0s 16ms/step - loss: 0.5696 - accuracy: 0.8410 - val_loss: 0.3953 - val_accuracy: 0.8885
Epoch 2/12
22/22 [==============================] - 0s 10ms/step - loss: 0.4355 - accuracy: 0.8473 - val_loss: 0.3569 - val_accuracy: 0.8885
Epoch 3/12
22/22 [==============================] - 0s 9ms/step - loss: 0.4379 - accuracy: 0.8473 - val_loss: 0.3612 - val_accuracy: 0.8885
Epoch 4/12
22/22 [==============================] - 0s 9ms/step - loss: 0.4320 - accuracy: 0.8473 - val_loss: 0.3554 - val_accuracy: 0.8885
Epoch 5/12
22/22 [==============================] - 0s 10ms/step - loss: 0.4338 - accuracy: 0.8473 - val_loss: 0.3577 - val_accuracy: 0.8885
Epoch 6/12
22/22 [==============================] - 0s 10ms/step - loss: 0.4297 - accuracy: 0.8473 - val_loss: 0.3554 - val_accuracy: 0.8885
Epoch 7/12
22/22 [==============================] - 0s 9ms/step - loss: 0.4303 - accuracy: 0.8473 - val_loss: 0.3570 - val_accuracy: 0.8885
Epoch 8/12
22/22 [==============================] - 0s 9ms/step - loss: 0.4273 - accuracy: 0.8473 - val_loss: 0.3558 - val_accuracy: 0.8885
Epoch 9/12
22/22 [==============================] - 0s 9ms/step - loss: 0.4285 - accuracy: 0.8473 - val_loss: 0.3577 - val_accuracy: 0.8885
Epoch 10/12
22/22 [==============================] - 0s 9ms/step - loss: 0.4254 - accuracy: 0.8473 - val_loss: 0.3565 - val_accuracy: 0.8885
Epoch 11/12
22/22 [==============================] - 0s 9ms/step - loss: 0.4270 - accuracy: 0.8473 - val_loss: 0.3581 - val_accuracy: 0.8885
Epoch 12/12
22/22 [==============================] - 0s 9ms/step - loss: 0.4243 - accuracy: 0.8473 - val_loss: 0.3569 - val_accuracy: 0.8885
Model: "sequential_6"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_6 (LSTM) (None, 100) 42400
_________________________________________________________________
dense_6 (Dense) (None, 1) 101
=================================================================
And a quick check shows that it is only predicting the negative class:
predictions = model.predict(X_test)
predictions_round = [1 if x > 0.5 else 0 for x in predictions]
pd.Series(predictions_round).value_counts()
0 1641
dtype: int64
I'll be the first to say that this may be because predicting a stock price entry point is a task full of noise. BUT I also expected the model to at least make a handful of wrong guesses instead of simply guessing all the same class. To me, that seems like an issue with the way I built the model or structured the inputs.
X_train.shape and y_train.shape give me (6561, 6, 8) and (6561, ) respectively.
Thanks in advance for any help!

Best model weights in neural network in case of early stopping

I am training a model with the following code
model=Sequential()
model.add(Dense(100, activation='relu',input_shape=(n_cols,)))
model.add(Dense(100, activation='relu'))
model.add(Dense(2,activation='softmax'))
model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])
early_stopping_monitor = EarlyStopping(patience=3)
model.fit(X_train_np,target,validation_split=0.3, epochs=100, callbacks=[early_stopping_monitor])
This is designed to stop the training if the val_loss: parameter does not improve after 3 epochs. The result is shown below. My question is will the model stop with weights of epoch 8 or 7. Because the performance got bad in epoch 8 so it stopped. But the model went ahead by 1 epoch with a bad performing parameter as earlier one (epoch 7) was better. Do I need to retrain the model now with 7 epochs?
Train on 623 samples, validate on 268 samples
Epoch 1/100
623/623 [==============================] - 1s 1ms/step - loss: 4.0365 - accuracy: 0.5923 - val_loss: 1.2208 - val_accuracy: 0.6231
Epoch 2/100
623/623 [==============================] - 0s 114us/step - loss: 1.4412 - accuracy: 0.6356 - val_loss: 0.7193 - val_accuracy: 0.7015
Epoch 3/100
623/623 [==============================] - 0s 103us/step - loss: 1.4335 - accuracy: 0.6260 - val_loss: 1.3778 - val_accuracy: 0.7201
Epoch 4/100
623/623 [==============================] - 0s 106us/step - loss: 3.5732 - accuracy: 0.6324 - val_loss: 2.7310 - val_accuracy: 0.6194
Epoch 5/100
623/623 [==============================] - 0s 111us/step - loss: 1.3116 - accuracy: 0.6372 - val_loss: 0.5952 - val_accuracy: 0.7351
Epoch 6/100
623/623 [==============================] - 0s 98us/step - loss: 0.9357 - accuracy: 0.6645 - val_loss: 0.8047 - val_accuracy: 0.6828
Epoch 7/100
623/623 [==============================] - 0s 105us/step - loss: 0.7671 - accuracy: 0.6934 - val_loss: 0.9918 - val_accuracy: 0.6679
Epoch 8/100
623/623 [==============================] - 0s 126us/step - loss: 2.2968 - accuracy: 0.6629 - val_loss: 1.7789 - val_accuracy: 0.7425

Use restore_best_weights with monitor value set to target quantity. So, the best weights will be restored after training automatically.
early_stopping_monitor = EarlyStopping(patience=3,
monitor='val_loss', # assuming it's val_loss
restore_best_weights=True )
From docs:
restore_best_weights: whether to restore model weights from the epoch with the best value of the monitored quantity ('val_loss' here). If False, the model weights obtained at the last step of training are used (default False).
Docmentation link

All the code that I have placed is in TensorFlow 2.0
file path: Is a string that can have formatting options such as the epoch number. For example the following is a common filepath (weights.{epoch:02d}-{val_loss:.2f}.hdf5)
monitor: (typically it is‘val_loss’or ‘val_accuracy’)
mode: Should it be minimizing or maximizing the monitor value
(typically either ‘min’ or ‘max’)
save_best_only: If this is set to true then it will only save the
model for the current epoch, if it’s metric values, is better than
what has gone before. However, if you set save_best_only to
false it will save every model after each epoch (regardless of
whether that model was better than previous models or not).
Code
model=Sequential()
model.add(Dense(100, activation='relu',input_shape=(n_cols,)))
model.add(Dense(100, activation='relu'))
model.add(Dense(2,activation='softmax'))
model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])
fname = "weights.{epoch:02d}-{val_loss:.2f}.hdf5"
checkpoint = tf.keras.callbacks.ModelCheckpoint(fname, monitor="val_loss",mode="min", save_best_only=True, verbose=1)
model.fit(X_train_np,target,validation_split=0.3, epochs=100, callbacks=[checkpoint])

Video classification using vgg16 and LSTM

my project is about violence classification using video dataset so i converted all my videos to images every video converted to 7 images .
first i use my vgg16 to extract the features from my images and then train my LSTM on this features , but when i train my LSTM i get bad accuracy and val_accuracy and Strangely, the two values remain constant form many epochs like my accuracy remain 0.5000 for about 50 epoch and the same problem for my validation accuracy .
here is my code if u can figure where is the problem or why the accuracy remain constant and to low .
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Convolution2D, MaxPooling2D, Flatten, Dense, Dropout, GlobalAveragePooling2D
from tensorflow.keras.applications import VGG16
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.layers import Flatten, BatchNormalization ,LSTM
import os, shutil
from keras.preprocessing.image import ImageDataGenerator
import keras
conv = tf.keras.applications.vgg16.VGG16()
model = Sequential()
here i copy my vgg16 to sequential model
for layer in conv.layers:
model.add(layer)
here i get rid of all dense and flatten layers making my last layer maxpool with (7,7,512)shape
model.pop()
model.pop()
model.pop()
model.pop()
import os, shutil
from keras.preprocessing.image import ImageDataGenerator
datagen = ImageDataGenerator()
batch_size = 1
img_width, img_height = 224, 224 # Default input size for VGG16
this is the function use to extract the features for every pic my arameters is the file path and the number of images in this file .
def extract_features(directory, sample_count):
features = np.zeros(shape=(sample_count, 7, 7, 512)) # Must be equal to the output of the
convolutional base
labels = np.zeros(shape=(sample_count,2))
# Preprocess data
generator = datagen.flow_from_directory(directory,
target_size=(img_width,img_height),
batch_size = batch_size,
class_mode='binary')
# Pass data through convolutional base
i = 0
for inputs_batch, labels_batch in generator:
features_batch = model.predict(inputs_batch)
features[i * batch_size: (i + 1) * batch_size] = features_batch
labels[i * batch_size: (i + 1) * batch_size] = labels_batch
i += 1
if i * batch_size >= sample_count:
break
return features, labels
train_violence="/content/drive/My Drive/images/one/data/train"
train_non="/content/drive/My Drive/images/two/data/train"
valid_violence="/content/drive/My Drive/images/three/data/validation"
valid_non="/content/drive/My Drive/images/four/data/validation"
train_violence_features, train_violence_labels = extract_features(train_violence,119)
train_non_features , train_non_labels = extract_features(train_non,119)
valid_violence_features , valid_violence_labels = extract_features(valid_violence,77)
valid_non_features , valid_non_labels = extract_features(valid_non,77)
now i have my features for violence and non violence features for training and validation so i need to concatenate my 2 arrays for training to make it one array have all violence features then come the non violence features because i found that flow from directory function take the images randomly from the 2 classes but i need it as a sequence so i need every 7 photos come as a sequence so it cant be arranges randomly from the 2 classes so i used 4 arrays 2 Validation arrays one for violence and one of non violence and the same for the training and then i concatenate them maintaining the correct sequence for every video .
x= np.concatenate((train_violence_features, train_non_features))
y = np.concatenate((valid_violence_features, valid_non_features))
now the shape of x is (238, 7, 7, 512) as 228 photo and y for validation is (154, 7, 7, 512)
here i rshape the input for the LSTM to be in shape of(samples , time steps , features) which will be 34 video for training every video was converted to 7 images so 7 is my time steps and 7*7*512 is the number of features equal to 25088
lstm_train_sample = np.reshape(x,(34,7,25088))
lstm_validation_sample = np.reshape(y,(22,7,25088))
here i make my labels ->label for every video
t_labels = [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
v_labels = [1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0]
t_labels= keras.utils.to_categorical(t_labels, num_classes=2, dtype='float32')
v_labels= keras.utils.to_categorical(v_labels, num_classes=2, dtype='float32')
finally my LSTM :
lstm = Sequential()
lstm.add(LSTM(200, activation='relu', return_sequences=True, input_shape=(7, 25088)))
lstm.add(LSTM(25, activation='relu'))
lstm.add(Dense(20, activation='relu'))
lstm.add(Dense(10, activation='relu'))
lstm.add(Dense(2))
lstm.compile(optimizer='adam', loss='mse')
lstm.compile(optimizer='adam', loss='mse',metrics=['accuracy'])
lstm.fit(lstm_train_sample , t_labels , epochs=100 , batch_size=2 , validation_data=
(lstm_validation_sample,v_labels) , validation_batch_size= 2 )
this is an example of the result and i want to know why its like that :
Epoch 55/100
17/17 [==============================] - 8s 460ms/step - loss: 0.4490 - accuracy: 0.5000 - val_loss: 1.4238 - val_accuracy: 0.5000
Epoch 56/100
17/17 [==============================] - 8s 464ms/step - loss: 0.4476 - accuracy: 0.5000 - val_loss: 1.4218 - val_accuracy: 0.5000
Epoch 57/100
17/17 [==============================] - 8s 462ms/step - loss: 0.4461 - accuracy: 0.5000 - val_loss: 1.4198 - val_accuracy: 0.5000
Epoch 58/100
17/17 [==============================] - 8s 461ms/step - loss: 0.4447 - accuracy: 0.5000 - val_loss: 1.4176 - val_accuracy: 0.5000
Epoch 59/100
17/17 [==============================] - 8s 457ms/step - loss: 0.4432 - accuracy: 0.5000 - val_loss: 1.4156 - val_accuracy: 0.5000
Epoch 60/100
17/17 [==============================] - 8s 461ms/step - loss: 0.4418 - accuracy: 0.5000 - val_loss: 1.4135 - val_accuracy: 0.5000
Epoch 61/100
17/17 [==============================] - 8s 459ms/step - loss: 0.4403 - accuracy: 0.5000 - val_loss: 1.4114 - val_accuracy: 0.5000
Epoch 62/100
17/17 [==============================] - 8s 458ms/step - loss: 0.4388 - accuracy: 0.5000 - val_loss: 1.4094 - val_accuracy: 0.5000
Epoch 63/100
17/17 [==============================] - 8s 456ms/step - loss: 0.4373 - accuracy: 0.5000 - val_loss: 1.4072 - val_accuracy: 0.5000
Epoch 64/100
17/17 [==============================] - 8s 461ms/step - loss: 0.4358 - accuracy: 0.5000 - val_loss: 1.4051 - val_accuracy: 0.5000
Epoch 65/100
17/17 [==============================] - 8s 467ms/step - loss: 0.4343 - accuracy: 0.5000 - val_loss: 1.4029 - val_accuracy: 0.5000
Epoch 66/100
17/17 [==============================] - 8s 458ms/step - loss: 0.4328 - accuracy: 0.5000 - val_loss: 1.4008 - val_accuracy: 0.5000
Epoch 67/100
17/17 [==============================] - 8s 460ms/step - loss: 0.4313 - accuracy: 0.5000 - val_loss: 1.3987 - val_accuracy: 0.5000
Epoch 68/100
17/17 [==============================] - 8s 461ms/step - loss: 0.4298 - accuracy: 0.5000 - val_loss: 1.3964 - val_accuracy: 0.5000

'loss: nan' during training of Neural Network in Keras

I am training a neural net in Keras. During training of the first epoch the loss value returns and then suddenly goes loss: nan before the first epoch ends, significantly dropping the accuracy. Then starting the second epoch the loss: nan continues but the accuracy is 0. This goes on for the rest of the epochs.
The frustrating bit is that there seems to be no consistency in the output for each time I train. As to say, the loss: nan shows up at different points in the first epoch.
There have been a couple of questions on this website that give "guides" to problems similar to this I just haven't seen one done so explicitly in keras.
I am trying to get my neural network to classify a 1 or a 0.
Here are some things I have done, post-ceding this will be my output and code.
Standardization // Normalization
I posted a question about my data here. I was able to figure it out and perform sklearn's StandardScaler() and MinMaxScaler() on my dataset. Both standardization and normalization methods did not help my issue.
Learning Rate
The optimizers I have tried are adam and SGD. In both cases I tried lowering the standard learning rate to see if that would help and in both cases. Same issue arose.
Activations
I thought that it was pretty standard to use relu but I saw on the internet somewhere someone talking about using tanh, tried it, no dice.
Batch Size
Tried 32, 50, 128, 200. 50 got me the farthest into the 1st epoch, everything else didn't help.
Combating Overfitting
Put a dropout layer in and tried a whole bunch of numbers.
Other Observations
The epochs train really really fast for the dimensions of the data (I could be wrong).
loss: nan could have something to do with my loss function being binary_crossentropy and maybe some values are giving that loss function a hard time.
kernel_initializer='uniform' has been untouched and unconsidered in my quest to figure this out.
The internet also told me that there could be a nan value in my data but I think that was for an error that broke their script.
from sklearn.preprocessing import MinMaxScaler
sc = MinMaxScaler()
X_train_total_scale = sc.fit_transform((X_train))
X_test_total_scale = sc.transform((X_test))
print(X_train_total_scale.shape) #(4140, 2756)
print(y_train.shape) #(4140,)
##NN
#adam = keras.optimizers.Adam(lr= 0.0001)
sgd = optimizers.SGD(lr=0.0001, decay=1e-6, momentum=0.9, nesterov=True)
classifier = Sequential()
classifier.add(Dense(output_dim = 1379, kernel_initializer='uniform', activation='relu', input_dim=2756))
classifier.add(Dropout(0.6))
classifier.add(Dense(output_dim = 1379, kernel_initializer='uniform', activation='relu'))
classifier.add(Dense(output_dim = 1, kernel_initializer='uniform', activation='sigmoid'))
classifier.compile(optimizer=sgd, loss='binary_crossentropy', metrics=['accuracy'])
classifier.fit(X_train_total_scale, y_train, validation_data=(X_test_total_scale, y_test), batch_size=50, epochs=100)
(batch size 200 shown to avoid too-big-a text block)
200/4140 [>.............................] - ETA: 7s - loss: 0.6866 - acc: 0.5400
400/4140 [=>............................] - ETA: 4s - loss: 0.6912 - acc: 0.5300
600/4140 [===>..........................] - ETA: 2s - loss: nan - acc: 0.5300
800/4140 [====>.........................] - ETA: 2s - loss: nan - acc: 0.3975
1000/4140 [======>.......................] - ETA: 1s - loss: nan - acc: 0.3180
1200/4140 [=======>......................] - ETA: 1s - loss: nan - acc: 0.2650
1400/4140 [=========>....................] - ETA: 1s - loss: nan - acc: 0.2271
1600/4140 [==========>...................] - ETA: 1s - loss: nan - acc: 0.1987
1800/4140 [============>.................] - ETA: 1s - loss: nan - acc: 0.1767
2000/4140 [=============>................] - ETA: 0s - loss: nan - acc: 0.1590
2200/4140 [==============>...............] - ETA: 0s - loss: nan - acc: 0.1445
2400/4140 [================>.............] - ETA: 0s - loss: nan - acc: 0.1325
2600/4140 [=================>............] - ETA: 0s - loss: nan - acc: 0.1223
2800/4140 [===================>..........] - ETA: 0s - loss: nan - acc: 0.1136
3000/4140 [====================>.........] - ETA: 0s - loss: nan - acc: 0.1060
3200/4140 [======================>.......] - ETA: 0s - loss: nan - acc: 0.0994
3400/4140 [=======================>......] - ETA: 0s - loss: nan - acc: 0.0935
3600/4140 [=========================>....] - ETA: 0s - loss: nan - acc: 0.0883
3800/4140 [==========================>...] - ETA: 0s - loss: nan - acc: 0.0837
4000/4140 [===========================>..] - ETA: 0s - loss: nan - acc: 0.0795
4140/4140 [==============================] - 2s 368us/step - loss: nan - acc: 0.0768 - val_loss: nan - val_acc: 0.0000e+00
Epoch 2/100
200/4140 [>.............................] - ETA: 1s - loss: nan - acc: 0.0000e+00
400/4140 [=>............................] - ETA: 0s - loss: nan - acc: 0.0000e+00
600/4140 [===>..........................] - ETA: 0s - loss: nan - acc: 0.0000e+00
800/4140 [====>.........................] - ETA: 0s - loss: nan - acc: 0.0000e+00
1000/4140 [======>.......................] - ETA: 0s - loss: nan - acc: 0.0000e+00
1200/4140 [=======>......................] - ETA: 0s - loss: nan - acc: 0.0000e+00
1400/4140 [=========>....................] - ETA: 0s - loss: nan - acc: 0.0000e+00
1600/4140 [==========>...................] - ETA: 0s - loss: nan - acc: 0.0000e+00
... and so on...
I hope to be able to get a full training done (duh) but I would also like to learn about some of the intuition people have to figure out these problems on their own!

Firstly, check for NaNs or inf in your dataset.
You could try different optimizers, e.g. rmsprop.
Learning rate could be smaller, though I haven't used anything lower than 0.0001 (which is what you're using) myself.
I thought that it was pretty standard to use relu but I saw on the internet somewhere someone talking about using tanh, tried it, no dice
Try leaky relu, elu if you're concerned about this.

Keras autoencoder classification

I am trying to find a useful code for improve classification using autoencoder.
I followed this example keras autoencoder vs PCA
But not for MNIST data, I tried to use it with cifar-10
so I made some changes but it seems like something is not fitting.
Could any one please help me in this?
if you have another example that can run in different dataset, that would help.
the validation in reduced.fit, which is (X_test,Y_test) is not learned, so it gives wronf accuracy in .evalute()
always give
val_loss: 2.3026 - val_acc: 0.1000
This is the code, and the error:
rom keras.datasets import cifar10
from keras.models import Model
from keras.layers import Input, Dense
from keras.utils import np_utils
import numpy as np
num_train = 50000
num_test = 10000
height, width, depth = 32, 32, 3 # MNIST images are 28x28
num_classes = 10 # there are 10 classes (1 per digit)
(X_train, y_train), (X_test, y_test) = cifar10.load_data()
X_train = X_train.reshape(num_train,height * width * depth)
X_test = X_test.reshape(num_test,height * width*depth)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255 # Normalise data to [0, 1] range
X_test /= 255 # Normalise data to [0, 1] range
Y_train = np_utils.to_categorical(y_train, num_classes) # One-hot encode the labels
Y_test = np_utils.to_categorical(y_test, num_classes) # One-hot encode the labels
input_img = Input(shape=(height * width * depth,))
s=height * width * depth
x = Dense(s, activation='relu')(input_img)
encoded = Dense(s//2, activation='relu')(x)
encoded = Dense(s//8, activation='relu')(encoded)
y = Dense(s//256, activation='relu')(x)
decoded = Dense(s//8, activation='relu')(y)
decoded = Dense(s//2, activation='relu')(decoded)
z = Dense(s, activation='sigmoid')(decoded)
model = Model(input_img, z)
model.compile(optimizer='adadelta', loss='mse') # reporting the accuracy
model.fit(X_train, X_train,
nb_epoch=10,
batch_size=128,
shuffle=True,
validation_data=(X_test, X_test))
mid = Model(input_img, y)
reduced_representation =mid.predict(X_test)
out = Dense(num_classes, activation='softmax')(y)
reduced = Model(input_img, out)
reduced.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
reduced.fit(X_train, Y_train,
nb_epoch=10,
batch_size=128,
shuffle=True,
validation_data=(X_test, Y_test))
scores = reduced.evaluate(X_test, Y_test, verbose=0)
print("Accuracy: ", scores[1])
Train on 50000 samples, validate on 10000 samples
Epoch 1/10
50000/50000 [==============================] - 5s - loss: 0.0639 - val_loss: 0.0633
Epoch 2/10
50000/50000 [==============================] - 5s - loss: 0.0610 - val_loss: 0.0568
Epoch 3/10
50000/50000 [==============================] - 5s - loss: 0.0565 - val_loss: 0.0558
Epoch 4/10
50000/50000 [==============================] - 5s - loss: 0.0557 - val_loss: 0.0545
Epoch 5/10
50000/50000 [==============================] - 5s - loss: 0.0536 - val_loss: 0.0518
Epoch 6/10
50000/50000 [==============================] - 5s - loss: 0.0502 - val_loss: 0.0461
Epoch 7/10
50000/50000 [==============================] - 5s - loss: 0.0443 - val_loss: 0.0412
Epoch 8/10
50000/50000 [==============================] - 5s - loss: 0.0411 - val_loss: 0.0397
Epoch 9/10
50000/50000 [==============================] - 5s - loss: 0.0391 - val_loss: 0.0371
Epoch 10/10
50000/50000 [==============================] - 5s - loss: 0.0377 - val_loss: 0.0403
Train on 50000 samples, validate on 10000 samples
Epoch 1/10
50000/50000 [==============================] - 3s - loss: 2.3605 - acc: 0.0977 - val_loss: 2.3026 - val_acc: 0.1000
Epoch 2/10
50000/50000 [==============================] - 3s - loss: 2.3027 - acc: 0.0952 - val_loss: 2.3026 - val_acc: 0.1000
Epoch 3/10
50000/50000 [==============================] - 3s - loss: 2.3027 - acc: 0.0978 - val_loss: 2.3026 - val_acc: 0.1000
Epoch 4/10
50000/50000 [==============================] - 3s - loss: 2.3027 - acc: 0.0980 - val_loss: 2.3026 - val_acc: 0.1000
Epoch 5/10
50000/50000 [==============================] - 3s - loss: 2.3027 - acc: 0.0974 - val_loss: 2.3026 - val_acc: 0.1000
Epoch 6/10
50000/50000 [==============================] - 3s - loss: 2.3027 - acc: 0.1000 - val_loss: 2.3026 - val_acc: 0.1000
Epoch 7/10
50000/50000 [==============================] - 3s - loss: 2.3027 - acc: 0.0992 - val_loss: 2.3026 - val_acc: 0.1000
Epoch 8/10
50000/50000 [==============================] - 3s - loss: 2.3027 - acc: 0.0982 - val_loss: 2.3026 - val_acc: 0.1000
Epoch 9/10
50000/50000 [==============================] - 3s - loss: 2.3027 - acc: 0.0965 - val_loss: 2.3026 - val_acc: 0.1000
Epoch 10/10
50000/50000 [==============================] - 3s - loss: 2.3027 - acc: 0.0978 - val_loss: 2.3026 - val_acc: 0.1000
9856/10000 [============================>.] - ETA: 0s('Accuracy: ', 0.10000000000000001)

there are multiple issues with your code.
Your autoencoder is not fully trained, if you plot the training data, you will see the model haven't converged yet. By
history = model.fit(X_train, X_train,
nb_epoch=10,
batch_size=128,
shuffle=True,
validation_data=(X_test, X_test))
you will obtain the loss values during training. If you plot them, e.g. in matplotlib,
import matplotlib.pyplot as plt
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model train vs validation loss 1')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper right')
plt.show()
you will see that it needs more epochs to converge.
The autoencoder architecture is wrongly built, there is typo in line y = Dense(s//256, activation='relu')(x), you probably wanted to usey = Dense(s//256, activation='linear')(encoded) so it uses previous layer and not the input. And also you don't want to use the relu activation in latent space, because then it disallows you subtracting latent variables from each other and thus makes the autoencoder much less efficient.
With those fixes, the model trains withour problems.
I increased number of epochs to 30 for training both networks so it will train better.
At the end of the trainings, the classification model reports loss: 1.2881 - acc: 0.5397 - val_loss: 1.3841 - val_acc: 0.5126 which is lower than you experienced.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

using tfa.layers.crf on top of biLSTM - python-3.x

Ghange your loss function to tensorflow_addons.losses.SigmoidFocalCrossEntropy().I guess categorical crossentropy is not a good choice.

Related

Keras LSTM Sequence Classification - Loss and validation loss decreases by trival amount and accuracy and validation accuracy remains static

Best model weights in neural network in case of early stopping

Video classification using vgg16 and LSTM

'loss: nan' during training of Neural Network in Keras

Keras autoencoder classification

Categories

Resources