Sentiment Analysis using LSTM (Model has not not generate good output) - nlp

I Make a sentiment analysis model using LSTM but my model gives very bad prediction.
Here is the complete code
Dataset for amazon review
My LSTM model looks like this:
def ltsm_model(input_shape, word_to_vec_map, word_to_index):
"""
Function creating the ltsm_model model's graph.
Arguments:
input_shape -- shape of the input, usually (max_len,)
word_to_vec_map -- dictionary mapping every word in a vocabulary into its 50-dimensional vector representation
word_to_index -- dictionary mapping from words to their indices in the vocabulary (400,001 words)
Returns:
model -- a model instance in Keras
"""
### START CODE HERE ###
# Define sentence_indices as the input of the graph, it should be of shape input_shape and dtype 'int32' (as it contains indices).
sentence_indices = Input(shape=input_shape, dtype='int32')
# Create the embedding layer pretrained with GloVe Vectors (≈1 line)
embedding_layer = pretrained_embedding_layer(word_to_vec_map, word_to_index)
# Propagate sentence_indices through your embedding layer, you get back the embeddings
embeddings = embedding_layer(sentence_indices)
# Propagate the embeddings through an LSTM layer with 128-dimensional hidden state
# Be careful, the returned output should be a batch of sequences.
X = LSTM(128, return_sequences=True)(embeddings)
# Add dropout with a probability of 0.5
X = Dropout(0.5)(X)
# Propagate X trough another LSTM layer with 128-dimensional hidden state
# Be careful, the returned output should be a single hidden state, not a batch of sequences.
X = LSTM(128, return_sequences=False)(X)
# Add dropout with a probability of 0.5
X = Dropout(0.5)(X)
# Propagate X through a Dense layer with softmax activation to get back a batch of 5-dimensional vectors.
X = Dense(2, activation='relu')(X)
# Add a softmax activation
X = Activation('softmax')(X)
# Create Model instance which converts sentence_indices into X.
model = Model(inputs=[sentence_indices], outputs=X)
### END CODE HERE ###
return model
Here is what my training dataset looks like:
This is my testing data:
x_test = np.array(['amazing!: this soundtrack is my favorite music..'])
X_test_indices = sentences_to_indices(x_test, word_to_index, maxLen)
print(x_test[0] +' '+ str(np.argmax(model.predict(X_test_indices))))
I got following out for this:
amazing!: this soundtrack is my favorite music.. 0
But it should be positive sentiment and should be 1
Also this my fit model output:
How can I improve my model performance? This pretty bad model I suppose.

Related

Keras, how to feed an Embedding layer with a random sampling of a Softmax layer

In the model I am constructing, I have the following layer:
y = layers.Dense(10, activation="softmax")(x)
And I want the next layer of this model to be an Embedding layer that "represent" the choice made by the Dense layer.
I.e, I want
to sample a choice from y (based on the probability "represented" by the values of the softmax)
to turn this choice into an Embedding Layer with vocabulary size 10.
Any idea how to do this ?
Regards
Initial answer
Add a layer that takes the argmax of the output of the dense layer before feeding it into the embedding layer to propagate the most likely category label:
import tensorflow as tf
from keras import backend as K
# generate some data
BATCH_SIZE,INPUT_DIM = (4,2)
x = tf.random.uniform([BATCH_SIZE,INPUT_DIM])
# model
NUM_CLASSES = 10
EMBEDDING_DIM = 10
dense = tf.keras.layers.Dense(NUM_CLASSES,activation='softmax')(x)
argmax = tf.keras.layers.Lambda(lambda x: K.argmax(x,axis=-1))(dense)
emb = tf.keras.layers.Embedding(NUM_CLASSES,EMBEDDING_DIM)(argmax)
Updated answer
If you want to propagate a randomly sampled category label instead of the most likely category label, you can do so by using tf.random.categorical. Note that tf.random.categorical takes logits as inputs, so you don't need the softmax activation at the end of the dense layer.
NUM_CLASSES = 10
EMBEDDING_DIM = 10
logits = tf.keras.layers.Dense(NUM_CLASSES)(x)
sample = tf.keras.layers.Lambda(lambda logits: tf.squeeze(tf.random.categorical(logits, 1)))(logits)
emb = tf.keras.layers.Embedding(NUM_CLASSES,EMBEDDING_DIM)(sample)

Training many-to-many stateful LSTM with and without final dense layer

I am trying to train a recurrent model in Keras containing an LSTM for regression purposes.
I would like to use the model online and, as far as I understood, I need to train a stateful LSTM.
Since the model has to output a sequence of values, I hope it computes the loss on each of the expected output vector.
However, I fear my code is not working this way and I would be grateful if anyone would help me to understand if I am doing right or if there is some better approach.
The input to the model is a sequence of 128-dimensional vectors. Each sequence in the training set has a different lenght.
At each time, the model should output a vector of 3 elements.
I am trying to train and compare two models:
A) a simple LSTM with 128 inputs and 3 outputs;
B) a simple LSTM with 128 inputs and 100 outputs + a dense layer with 3 outputs;
For model A) I wrote the following code:
# Model
model = Sequential()
model.add(LSTM(3, batch_input_shape=(1, None, 128), return_sequences=True, activation = "linear", stateful = True))`
model.compile(loss='mean_squared_error', optimizer=Adam())
# Training
for i in range(n_epoch):
for j in np.random.permutation(n_sequences):
X = data[j] # j-th sequences
X = X[np.newaxis, ...] # X has size 1 x NTimes x 128
Y = dataY[j] # Y has size NTimes x 3
history = model.fit(X, Y, epochs=1, batch_size=1, verbose=0, shuffle=False)
model.reset_states()
With this code, model A) seems to train fine because the output sequence approaches the ground-truth sequence on the training set.
However, I wonder if the loss is really computed by considering all NTimes output vectors.
For model B), I could not find any way to get the entire output sequence due to the dense layer. Hence, I wrote:
# Model
model = Sequential()
model.add(LSTM(100, batch_input_shape=(1, None, 128), , stateful = True))
model.add(Dense(3, activation="linear"))
model.compile(loss='mean_squared_error', optimizer=Adam())
# Training
for i in range(n_epoch):
for j in np.random.permutation(n_sequences):
X = data[j] #j-th sequence
X = X[np.newaxis, ...] # X has size 1 x NTimes x 128
Y = dataY[j] # Y has size NTimes x 3
for h in range(X.shape[1]):
x = X[0,h,:]
x = x[np.newaxis, np.newaxis, ...] # h-th vector in j-th sequence
y = Y[h,:]
y = y[np.newaxis, ...]
loss += model.train_on_batch(x,y)
model.reset_states() #After the end of the sequence
With this code, model B) does not train fine. It seems to me the training does not converge and loss values increase and decrease cyclically
I have also tried to use as Y only the last vector and them calling the fit function on the Whole training sequence X, but no improvements.
Any idea? Thank you!
If you want to still have three outputs per step of your sequence, you need to TimeDistribute your Dense layer like so:
model.add(TimeDistributed(Dense(3, activation="linear")))
This applies the dense layer to each timestep independently.
See https://keras.io/layers/wrappers/#timedistributed

Attribute error: None type has no attribute summary in keras

I have tried to go in deep with my understanding of word embedding and NLP in keras implementing and copying part of the code creating a Keras model using functional API. When I launch model.summary I receive an Attribute error: None type has no attribute 'summary'.
After many attempts decreasing the numbers of layers, the dimension of word embedding matrix unfortunately nothing changed. I don't know what to do.
def pretrained_embedding_layer(word_to_vec, word_to_index):
vocab_len = len(word_to_index) + 1
emb_dim = word_to_vec["sole"].shape[0]
emb_matrix = np.zeros((vocab_len,emb_dim))
for word, index in word_to_index.items():
emb_matrix[index, :] = word_to_vec[word]
print(emb_matrix.shape)
embedding_layer = Embedding(vocab_len,emb_dim,trainable =False)
embedding_layer.build((None,))
embedding_layer.set_weights([emb_matrix])
return embedding_layer
def Chatbot_V1(input_shape, word_to_vec, word_to_index):
# Define sentence_indices as the input of the graph, it should be of shape input_shape and dtype 'int32' (as it contains indices).
sentence_indices = Input(input_shape, dtype='int32')
# Create the embedding layer pretrained with GloVe Vectors (≈1 line)
embedding_layer = pretrained_embedding_layer(word_to_vec, word_to_index)
embeddings = embedding_layer(sentence_indices)
# Propagate the embeddings through an LSTM layer with 128-dimensional hidden state
X = LSTM(128, return_sequences=True)(embeddings)
# Add dropout with a probability of 0.5
X = Dropout(0.5)(X)
# Propagate X trough another LSTM layer with 128-dimensional hidden state
# Be careful, the returned output should be a single hidden state, not a batch of sequences.
X = LSTM(128, return_sequences=True)(X)
# Add dropout with a probability of 0.5
X = Dropout(0.5)(X)
# Propagate X through a Dense layer with softmax activation to get back a batch of vocab_dim dimensional vectors.
X = Dense(vocab_dim)(X)
# Add a softmax activation
preds = Activation('softmax')(X)
# Create Model instance which converts sentence_indices into X.
model = Model(sentence_indices, preds)
model = Chatbot_V1((maxLen,), word_to_vec, word_to_index)
model.summary()
Launching model.summary:
AttributeError: 'NoneType' object has no attribute 'summary'
Why? What is wrong in layers definition?
The function Chatbot_V1 does not return anything, and in python this is signaled by None if you assign the return value of the function to a variable. So just use the return keyword to return the model at the end of Chatbot_V1

Keras: feed output as input at next timestep

The goal is to predict a timeseries Y of 87601 timesteps (10 years) and 9 targets. The input features X (exogenous input) are 11 timeseries of 87600 timesteps. The output has one more timestep, as this is the initial value.
The output Yt at timestep t depends on the input Xt and on the previous output Yt-1.
Hence, the model should look like this: Model layout
I could only find this thread on this: LSTM: How to feed the output back to the input? #4068.
I tried to implemented this with Keras as follows:
def build_model():
# Input layers
input_x = layers.Input(shape=(features,), name='input_x')
input_y = layers.Input(shape=(targets,), name='input_y-1')
# Merge two inputs
merge = layers.concatenate([input_x,input_y], name='merge')
# Normalise input
norm = layers.Lambda(normalise, name='scale')(merge)
# Hidden layers
x = layers.Dense(128, input_shape=(features,))(norm)
# Output layer
output = layers.Dense(targets, activation='relu', name='output')(x)
model = Model(inputs=[input_x,input_y], outputs=output)
model.compile(loss='mean_squared_error', optimizer=Adam())
return model
def make_prediction(model, X, y):
y_pred = [y[0,None,:]]
for i in range(len(X)):
y_pred.append(model.predict([X[i,None,:],y_pred[i]]))
y_pred = np.asarray(y_pred)
y_pred = y_pred.reshape(y_pred.shape[0],y_pred.shape[2])
return y_pred
# Fit
model = build_model()
model.fit([X_train, y_train[:-1]], [y_train[1:]]], epochs=200,
batch_size=24, shuffle=False)
# Predict
y_hat = make_prediction(model, X_train, y_train)
This works, but is it not what I want to achieve, as there is no connection between input and output. Hence, the model doesn't learn how to correct for an error in the fed-back output, which results in poor accuracy when predicting as the error on the output is accumulated at every timestep.
Is there a way in Keras to implement the output-input feed-back during training stage?
Also, as the initial value of Y is always known, I want to feed this to the network as well.

Is it possible to train using same model with two inputs?

Hello I have a some question for keras.
currently i want implement some network
using same cnn model, and use two images as input of cnn model
and use two result of cnn model, provide to Dense model
for example
def cnn_model():
input = Input(shape=(None, None, 3))
x = Conv2D(8, (3, 3), strides=(1, 1))(input)
x = GlobalAvgPool2D()(x)
model = Model(input, x)
return model
def fc_model(cnn1, cnn2):
input_1 = cnn1.output
input_2 = cnn2.output
input = concatenate([input_1, input_2])
x = Dense(1, input_shape=(None, 16))(input)
x = Activation('sigmoid')(x)
model = Model([cnn1.input, cnn2.input], x)
return model
def main():
cnn1 = cnn_model()
cnn2 = cnn_model()
model = fc_model(cnn1, cnn2)
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit(x=[image1, image2], y=[1.0, 1.0], batch_size=1, ecpochs=1)
i want to implement model something like this, and train models
but i got error message like below :
'All layer names should be unique'
Actually i want use only one CNN model as feature extractor and finally use two features to predict one float value as 0.0 ~ 1.0
so whole system -->>
using two images and extract features from same CNN model, and features are provided to Dense model to get one floating value
Please, help me implement this system and how to train..
Thank you
See the section of the Keras documentation on shared layers:
https://keras.io/getting-started/functional-api-guide/
A code snippet from the documentation above demonstrating this:
# This layer can take as input a matrix
# and will return a vector of size 64
shared_lstm = LSTM(64)
# When we reuse the same layer instance
# multiple times, the weights of the layer
# are also being reused
# (it is effectively *the same* layer)
encoded_a = shared_lstm(tweet_a)
encoded_b = shared_lstm(tweet_b)
# We can then concatenate the two vectors:
merged_vector = keras.layers.concatenate([encoded_a, encoded_b], axis=-1)
# And add a logistic regression on top
predictions = Dense(1, activation='sigmoid')(merged_vector)
# We define a trainable model linking the
# tweet inputs to the predictions
model = Model(inputs=[tweet_a, tweet_b], outputs=predictions)
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=['accuracy'])
model.fit([data_a, data_b], labels, epochs=10)

Resources