Inner workings of Keras LSTM - nlp

I am working on a multi-class classification task: the goal is to identify what is the correct language of origin of a certain surname. For this, I am using a Keras LSTM.
So far, I have only worked with PyTorch and I am very surprised by the "black box" character of Keras. For this classification task, my understanding is that I need to retrieve the output of the last time step for a given input sequence in the LSTM and then apply the softmax on it to get the probability distribution over all classes.
Interestingly, without me explicitly defining to do so, the LSTM seems to automatically do the right thing and chooses the last time step's output and not e.g. the hidden state to apply the softmax on (good training & validation results so far). How is that possible? Does the choice of the appropriate loss function categorical_crossentropy indicate to the model that is should use the last time step's output to do the classification?
Code:
model = Sequential()
model.add(Dense(100, input_shape=(max_len, len(alphabet)), kernel_regularizer=regularizers.l2(0.00001)))
model.add(Dropout(0.85))
model.add(LSTM(100, input_shape=(100,)))
model.add(Dropout(0.85))
model.add(Dense(num_output_classes, activation='softmax'))
adam = Adam(lr=0.001, beta_1=0.9, beta_2=0.999, decay=1e-6)
model.compile(loss='categorical_crossentropy',
optimizer=adam,
metrics=['accuracy'])
history = model.fit(train_data, train_labels,
epochs=5000,
batch_size=num_train_examples,
validation_data = (valid_data, valid_labels))

No, returning the last time step's output is just what every Keras RNN layer does by default. See the documentation for return_sequences, which causes it to return every time step's output instead (which is necessary for stacking RNN layers). There's no automatic intuition based on what kinds of layers you're hooking together, you just got what you wanted by default, presumably because the designers figured that to be the most common case.

Related

Data augmentation in Keras model

I am trying to add data augmentation as a layer to a model but I am getting the following error.
TypeError: The added layer must be an instance of class Layer. Found: <tensorflow.python.keras.preprocessing.image.ImageDataGenerator object at 0x7f8c2dea0710>
data_augmentation = tf.keras.preprocessing.image.ImageDataGenerator(
rotation_range=30, horizontal_flip=True)
model = Sequential()
model.add(data_augmentation)
model.add(Dense(1028,input_shape=(final_features.shape[1],)))
model.add(Dropout(0.7,input_shape=(final_features.shape[1],)))
model.add(Dense(n_classes, activation= 'softmax', kernel_regularizer='l2'))
model.compile(optimizer=adam,
loss='categorical_crossentropy',
metrics=['accuracy'])
history = model.fit(final_features, y,
batch_size=batch_size,
epochs=epochs,
validation_split=0.2,
callbacks=[lrr,EarlyStop])
I have also tried this way:
data_augmentation = Sequential(
[
preprocessing.RandomFlip("horizontal"),
preprocessing.RandomRotation(0.1),
preprocessing.RandomZoom(0.1),
]
)
model = Sequential()
model.add(data_augmentation)
model.add(Dense(1028,input_shape=(final_features.shape[1],)))
model.add(Dropout(0.7,input_shape=(final_features.shape[1],)))
model.add(Dense(n_classes, activation= 'softmax', kernel_regularizer='l2'))
model.compile(optimizer=adam,
loss='categorical_crossentropy',
metrics=['accuracy'])
history = model.fit(final_features, y,
batch_size=batch_size,
epochs=epochs,
validation_split=0.2,
callbacks=[lrr,EarlyStop])
It gives an error:
ValueError: Input 0 of layer sequential_7 is incompatible with the layer: expected ndim=4, found ndim=2. Full shape received: [128, 14272]
Could you please advice how I can use augmentation in Keras?
In your first case, you are using ImageDataGenerator as a layer, which is not: as the name says, it is just a generator which applies random transformations to images (image augmentation) before feeding the network. So, the images are augmented in CPU and then feed to the neural network which can run in GPU if you have one.
Generators are usually used also to avoid loading huge datasets into memory since they allow to load only the batches being used soon.
In the second case, you are using image augmentation as layers of your model properly. The difference here is that the augmentation is run as part of your model, so if you have a GPU available for instance, those operations will run in GPU.
The problem with your second case is in the model itself (in fact the model is also wrong in the first approach, you only get an error there with the bad usage of ImageDataGenerator before your execution arrives to the model).
Note that you are using images as inputs, so, the input should be of shape (height, width, channels), but then you are starting your model with a dense layer, which expects a single array of shape (n_features,).
If your model needs to start with a Dense layer (strange, but may be ok in some case) then you need first to use Flatten layer to convert images of shape (h,w,c) into vectors of shape (h*w*c,). This change will solve your second approach for sure.
That said, you don't need to specify the input shape on every single layer: doing it in your first layer should be enough.
Last, but not least: are you sure this model is being feed with images? According to your fit call, it looks like you are using previously extracted features that may be vectors (this make sense with your current model architecture but makes no sense with the usage of image augmentation).
Please, provide more details with respect to your data to clarify this point.

LSTM Autoencoder for Anomaly detection in time series, correct way to fit model

I'm trying to find correct examples of using LSTM Autoencoder for defining anomalies in time series data in internet and see a lot of examples, where LSTM Autoencoder model are fitted with labels, which are future time steps for feature sequences (as for usual time series forecasting with LSTM), but I suppose, that this kind of model should be trained with labels which are the same sequence as sequence of features (previous time steps).
The first link in the google by this searching for example - https://towardsdatascience.com/time-series-of-price-anomaly-detection-with-lstm-11a12ba4f6d9
1.This function defines the way to get labels (y feature)
def create_sequences(X, **y**, time_steps=TIME_STEPS):
Xs, ys = [], []
for i in range(len(X)-time_steps):
Xs.append(X.iloc[i:(i+time_steps)].values)
ys.append(y.iloc**[i+time_steps]**)
return np.array(Xs), np.array(ys)
X_train, **y_train** = create_sequences(train[['Close']], train['Close'])
X_test, y_test = create_sequences(test[['Close']], test['Close'])
2.Model is fitted as follow
history = model.fit(X_train, **y_train**, epochs=100, batch_size=32, validation_split=0.1,
callbacks=[keras.callbacks.EarlyStopping(monitor='val_loss', patience=3, mode='min')], shuffle=False)
Could you kindly comment the way how Autoencoder is implemented in the link on towardsdatascience.com/?
Is it correct method or model should be fitted following way ?
model.fit(X_train,X_train)
Thanks in advance!
This is time series auto-encoder. If you want to predict for future, it goes this way. The auto-encoder / machine learning model fitting is different for different problems and their solutions. You cannot train and fit one model / workflow for all problems. Time-series / time lapse can be what we already collected data for time period and predict, it can be for data collected and future prediction. Both are differently constructed. Like time series data for sub surface earth is differently modeled, and for weather forecast is differently. One model cannot work for both.
By definition an autoencoder is any model attempting at reproducing it's input, independent of the type of architecture (LSTM, CNN,...).
Framed this way it is a unspervised task so the training would be : model.fit(X_train,X_train)
Now, what she does in the article you linked, is to use a common architecture for LSTM autoencoder but applied to timeseries forecasting:
model.add(LSTM(128, input_shape=(X_train.shape[1], X_train.shape[2])))
model.add(RepeatVector(X_train.shape[1]))
model.add(LSTM(128, return_sequences=True))
model.add(TimeDistributed(Dense(X_train.shape[2])))
She's pre-processing the data in a way to get X_train = [x(t-seq)....x(t)] and y_train = x(t+1)
for i in range(len(X)-time_steps):
Xs.append(X.iloc[i:(i+time_steps)].values)
ys.append(y.iloc[i+time_steps])
So the model does not per-se reproduce the input it's fed, but it doesn't mean it's not a valid implementation since it produce valuable prediction.

Many to many RNN in keras - predict output for every nth input

I'm trying to figure out how to build a model using LSTM/GRU that predicts many to many but for every nth (7 in my case) input. For example, my input data has timesteps per day for a whole year but I'm only trying to predict the output at the end of each week and not each day.
The only information I was able to find is this answer:
Many to one and many to many LSTM examples in Keras
It says:
"Many-to-many when number of steps differ from input/output length: this is freaky hard in Keras. There are no easy code snippets to code that."
In pytorch it seems like you can set the ignore_index in the loss function which I think should do the trick.
Is there a solution for keras?
I think I found the answer. Since I'm trying to predict every nth value we can just keep the output from the LSTM layer that we are trying to predict and get rid of the rest. I created a lambda layer to do that - it just reads every 7th value from the lstm output.
This is the code:
X = np.random.normal(0,1,size=(100,365,5))
y = np.random.randint(2,size=(100,52,1))
model = Sequential()
model.add(LSTM(1, input_shape=(365, 5), return_sequences=True))
model.add(Lambda(lambda x: x[:, 6::7, :]))
model.add(TimeDistributed(Dense(1,activation='sigmoid')))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X,y,epochs=3,verbose=1)

Using Keras for predicting next word

I have a sequence prediction problem that I approach as a language model.
My data contains 4 choices (1-4) and a reward (1-100) .
I started using Keras but I'm not sure it has the flexibility I need.
This is how the model's architecture looks :
I'm not sure about the test phase. One option is sampling:
And I'm not sure how to evaluate the output of this option vs my test set.
Another option is to give the trained model a sequence and let it plot the last timestep value (like giving a sentence and predicting last word) - but still having x = t_hat.
is it possible in Keras ? I can't find examples like this.
Besides passing the previous choice (or previous word) as an input , I need to pass the second feature, which is a reward value. The choice are one-hot encoded , how can I add a single number with an encoded vector?
EDIT :
This is the training phase (haven't done the sampling yet) :
model = Sequential()
model.add(LSTM(64, input_shape=(seq_length, X_train.shape[2]) , return_sequences=True))
model.add(Dense(y_cat_train.shape[2], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X_train, y_cat_train, epochs=100, batch_size=10, verbose=2)
Google designed Keras to support all kind of needs and it should fit your need - YES.
In your case you are using the LSTM cells of some arbitrary number of units (usually 64 or 128), with: a<1>, a<2>, a<3>... a< Ty> as hidden parameters. Note: Your last index should not be 3, instead is should be Ty.
I would suggest checking https://keras.io/utils/#to_categorical function to convert your data to "one-hot" encoded format.

multi layer LSTM net with stateful=True

My question is does the this code make sense? And if this makes sense what should be the purpose?
model.add(LSTM(18, return_sequences=True,batch_input_shape=(batch_size,look_back,dim_x), stateful=True))
model.add(Dropout(0.3))
model.add(LSTM(50,return_sequences=False,stateful=False))
model.add(Dropout(0.3))
model.add(Dense(1, activation='linear'))
Because if my first LSTM layer returns my state from one batch to the next, why shouldn't do my second LSTM layer the same?
I'm having a hard time to understand the LSTM mechanics in Keras so I'm very thankful for any kind of help :)
And if you down vote this post could you tell me why in the commands? thanks.
Your program is a regression problem where your model consists of 2 lstm layers with 18 and 50 layers each and finally a dense layer to show the regression value.
LSTM requires a 3D input.Since the output of your first LSTM layer is going to the input for the second LSTM layer.The input of the Second LSTM layer should also be in 3D. so we set the retrun sequence as true in 1st as it will return a 3D output which can then be used as an input for the second LSTM.
Your second LSTMs value does not return a sequence because after the second LSTM you have a dense layer which does not need a 3D value as input.
[update]
In keras by default LSTM states are reset after each batch of training data,so if you don't want the states to be reset after each batch you can set the stateful=True. If LSTM is made stateful final state of a batch will be used as an initial state for the next batch.
You can later reset the states by calling reset_states()

Resources