How to write an LSTM in Keras without an Embedding layer? - theano

How do you write a simple sequence copy task in keras using the LSTM architecture without an Embedding layer? I already have the word vectors.

If you say that you have word vectors, I guess you have a dictionary to map a word to its vector representation (calculated from word2vec, GloVe...).
Using this dictionary, you replace all words in your sequence by their corresponding vectors. You will also need to make all your sequences the same length, since LSTMs need all the input sequences to be of constant length. So you will need to determine a max_length value and trim all sequences that are longer and pad all sequences that are shorter with zeros (see Keras pad_sequences function).
Then you can pass the vectorized sequences directly to the LSTM layer of your neural network. Since the LSTM layer is the first layer of the network, you will need to define the input shape, which in your case is (max_length, embedding_dim, ).
This ways you skip the Embedding layer and use your own precomputed word vectors instead.

I had same problem after searching in Keras at "Stacked LSTM for sequence classification" part , I found following code might be useful:
model = Sequential()
model.add(LSTM(3NumberOfLSTM, return_sequences=True,
input_shape=(YourSequenceLenght, YourWord2VecLenght)))

Related

How to use masking with Convolution1D layer in keras?

I am trying to perform a sentiment classification task for which I am using Attention based architecture which has both Convolution layer and BiLSTM layers. The first layer of my model is a Embedding layer followed by a Convolution1D layer. I have used mask_zero=True for the Embedding layer since I have padded the sequence with zeros. This however creates an error for the Convolution1D layer since this layer does not support masking. However, I do need to mask the zero inputs since I have LSTM layers after the convolutional layers. Does anyone have any solution for this. I have attached a sample code of my model till the Convolution1D layer for reference.
wordsInputs = Input(shape=(maxSeq,), name='words_input')
embed_reg = l2(l=0.001)
emb = Embedding(vocabSize, 300, mask_zero=True, init='glorot_uniform', W_regularizer=embed_reg)(wordsInputs)
convOutput = Convolution1D(nb_filter=100, filter_length=3, activation='relu', border_mode='same')(emb)
It looks like you have defined a maxSeq length and you say you are padding the sequence with zeros. The mask_zero means something else, specifically that zero is a reserved input word index that you are not supposed to use and is reserved for the internals of the program to mark the end of a variable length sequence.
I think the solution is simply to remove the parameter mask_zero=True, as it is unneeded (because it is for variable length sequences), and to use zero as your padding word index.

What are some common ways to get sentence vector from corresponding word vectors?

I have successfully implemented word2vec model to generate word embedding or word vectors now I require to generate sentence vectors from generated word vectors so that I can feed a neural network to summarize a text corpus. What are the common approaches to generate sentence vectors from word vectors?
You can try adding an LSTM/RNN encoder before your actual Neural Network and feed your neural net using hidden states of your encoder( which would act as document representations).
Benefit of doing this is your document embeddings will be trained for your specific task of text summarization.
I don't know what framework you are using otherwise would have helped you with some code to get you started.
EDIT 1: Add code snippet
word_in = Input(shape=("<MAX SEQ LEN>",))
emb_word = Embedding(input_dim="<vocab size>", output_dim="<embd_dim>",input_length="<MAX SEQ LEN>", mask_zero=True)(word_in)
lstm = LSTM(units="<size>", return_sequences=False,
recurrent_dropout=0.5, name="lstm_1")(emb_word)
Add any type of dense layer which takes vectors as inputs.
LSTM takes input of shape batch_size * sequence_length * word_vector_dimension and produces output of shape batch_size * rnn_size; which you can use as document embeddings.
Sentence representations can simply be the column-wise mean of all the word vectors in your sentence. There are also implementation of this like doc2vec https://radimrehurek.com/gensim/models/doc2vec.html where a document is just a collection of words like a sentence or paragraph.

Keras: Embedding in LSTM

In a keras example on LSTM for modeling IMDB sequence data (https://github.com/fchollet/keras/blob/master/examples/imdb_lstm.py), there is an embedding layer before input into a LSTM layer:
model.add(Embedding(max_features,128)) #max_features=20000
model.add(LSTM(128))
What does the embedding layer really do? In this case, does that mean the length of the input sequence into the LSTM layer is 128? If so, can I write the LSTM layer as:
model.add(LSTM(128,input_shape=(128,1))
But it is also noted the input X_train has subjected to pad_sequences processing:
print('Pad sequences (samples x time)')
X_train = sequence.pad_sequences(X_train, maxlen=maxlen) #maxlen=80
X_test = sequence.pad_sequences(X_test, maxlen=maxlen) #maxlen=80
It seems the input sequence length is 80?
To quote the documentation:
Turns positive integers (indexes) into dense vectors of fixed size.
eg. [[4], [20]] -> [[0.25, 0.1], [0.6, -0.2]]
Basically this transforms indexes (that represent which words your IMDB review contained) to a vector with the given size (in your case 128).
If you don't know what embeddings are in general, here is the wikipedia definition:
Word embedding is the collective name for a set of language modeling
and feature learning techniques in natural language processing (NLP)
where words or phrases from the vocabulary are mapped to vectors of
real numbers in a low-dimensional space relative to the vocabulary
size ("continuous space").
Coming back to the other question you've asked:
In this case, does that means the length of the input sequence into
the LSTM layer is 128?
not quite. For recurrent nets you'll have a time dimension and a feature dimension. 128 is your feature dimension, as in how many dimensions each embedding vector should have. The time dimension in your example is what is stored in maxlen, which is used to generate the training sequences.
Whatever you supply as 128 to the LSTM layer is the actual number of output units of the LSTM.

LSTM with variable sequences & return full sequences

How can I set up a keras model such that the final LSTM layer outputs a prediction for each time step while having variable sequence lengths as input?
I'd then like to provide labels for each of the timesteps after a dense layer with linear activation.
When I try to add a reshape or a dense layer to the LSTM model that is returning the full sequence and has a masking layer to take care of variable sequence lengths, it says:
The reshape and the dense layers do not support masking.
Would this be possible to do?
You can use the TimeDistributed layer wrapper for this. This applies the layer you want to each timestep. In your case, you could also just use TimeDistributedDense.

Train some embeddings, keep others fixed

I do sequence classification with Keras, using an RNN and embeddings. My sequences are a bit weird. I have words mixed with special symbols. Words are associated with fixed, pre-trained embeddings, but the special symbol embeddings have to be modified during training.
In an Embedding layer during learning, how can I keep some embeddings fixed while updating others? Is there a way to mask those indices which shouldn't be modified? Or is this a case for a custom Embedding layer?
I do not believe that this is achievable with the existing Embedding layer. To get around it I would just create a custom layer that builds two embedding layers internally, and only puts the embedding matrix of one of them into the trainable_parameters.

Resources