How to use my own sentence embeddings in Keras? - keras

I am new to Keras and I created my own tf_idf sentence embeddings with shape (no_sentences, embedding_dim). I am trying to add this matrix as input to an LSTM layer. My network looks something like this:
q1_tfidf = Input(name='q1_tfidf', shape=(max_sent, 300))
q2_tfidf = Input(name='q2_tfidf', shape=(max_sent, 300))
q1_tfidf = LSTM(100)(q1_tfidf)
q2_tfidf = LSTM(100)(q2_tfidf)
distance2 = Lambda(preprocessing.exponent_neg_manhattan_distance, output_shape=preprocessing.get_shape)(
[q1_tfidf, q2_tfidf])
I'm struggling with how the matrix should be shaped. I am getting this error:
ValueError: Error when checking input: expected q1_tfidf to have 3 dimensions, but got array with shape (384348, 300)
I already checked this post: Sentence Embedding Keras but still can't figure it out. It seems like I'm missing something obvious.
Any idea how to do this?

Ok as far as I understood, you want to predict the difference between two sentences.
What about reusing the LSTM layer (the language model should be the same) and just learn a single sentence embedding and use it twice:
q1_tfidf = Input(name='q1_tfidf', shape=(max_sent, 300))
q2_tfidf = Input(name='q2_tfidf', shape=(max_sent, 300))
lstm = LSTM(100)
lstm_out_q1= lstm (q1_tfidf)
lstm_out_q2= lstm (q2_tfidf)
predict = concatenate([lstm_out_q1, lstm_out_q2])
model = Model(inputs=[q1_tfidf ,q1_tfidf ], outputs=predict)
predict = concatenate([q1_tfidf , q2_tfidf])
You could also introduce your custom distance in an additional lambda layer, but therefore you need to use a different reshaping in concatenation.

Related

Can't get Keras Code Example #1 to work with multi-label dataset

Apologies in advance.
I am attempting to recreate this CNN (from the Keras Code Examples), with another dataset.
https://keras.io/examples/vision/image_classification_from_scratch/
The dataset I am using is one for retinal scans, and classifies images on a scale from 0-4. So, it's a multi-label image classification.
The Keras example used is binary classification (cats v dogs), though I would have hoped it wouldn't make much difference (maybe this is a big assumption on my part).
I skipped the 'image augmentation' part of the walkthrough. So, I have not created the
data_augmentation = keras.Sequential(
[
layers.RandomFlip("horizontal"),
layers.RandomRotation(0.1),
]
)
part. So, instead of:
def make_model(input_shape, num_classes):
inputs = keras.Input(shape=input_shape)
# Image augmentation block
x = data_augmentation(inputs)
# Entry block
x = layers.Rescaling(1.0 / 255)(x)
.......
at the beginning of the model, I have:
def make_model(input_shape, num_classes):
inputs = keras.Input(shape=input_shape)
# Image augmentation block
x = keras.Sequential(inputs)
# Entry block
x = layers.Rescaling(1.0 / 255)(x)
.......
However I keep getting different errors no matter how much I try to change things around, such as "TypeError: Keras symbolic inputs/outputs do not implement __len__.", or "ValueError: Exception encountered when calling layer "rescaling_3" (type Rescaling).".
What am I missing here?

Using pretrained word2vector model

I am trying to use a pretrained word2vector model to create word embeddings but i am getting the following error when Im trying to create weight matrix from word2vec genism model:
Code:
import gensim
w2v_model = gensim.models.KeyedVectors.load_word2vec_format("/content/drive/My Drive/GoogleNews-vectors-negative300.bin.gz", binary=True)
vocab_size = len(tokenizer.word_index) + 1
print(vocab_size)
EMBEDDING_DIM=300
# Function to create weight matrix from word2vec gensim model
def get_weight_matrix(model, vocab):
# total vocabulary size plus 0 for unknown words
vocab_size = len(vocab) + 1
# define weight matrix dimensions with all 0
weight_matrix = np.zeros((vocab_size, EMBEDDING_DIM))
# step vocab, store vectors using the Tokenizer's integer mapping
for word, i in vocab.items():
weight_matrix[i] = model[word]
return weight_matrix
embedding_vectors = get_weight_matrix(w2v_model, tokenizer.word_index)
Im getting the following error:
Error
As a note: it's better to paste a full error is as formatted text than as an image of text. (See Why not upload images of code/errors when asking a question? for a full list of the reasons why.)
But regarding your question:
If you get a KeyError: word 'didnt' not in vocabulary error, you can trust that the word you've requested is not in the set-of-word-vectors you've requested it from. (In this case, the GoogleNews vectors that Google trained & released back around 2012.)
You could check before looking it up – 'didnt' in w2v_model, which would return False, and then do something else. Or you could use a Python try: ... catch: ... formulation to let it happen, but then do something else when it happens.
But it's up to you what your code should do if the model you've provided doesn't have the word-vectors you were hoping for.
(Note: the GoogleNews vectors do include a vector for "didn't", the contraction with its internal apostrophe. So in this one case, the issue may be that your tokenization is stripping such internal-punctuation-marks from contractions, but Google chose not to when making those vectors. But your code should be ready for handling missing words in any case, unless you're sure through other steps that can never happen.)

Why pytorch transformer src_mask doesn't block positions from attending?

I am trying to train word embedding with transformer encoder by masking the word itself with diagonal src_mask:
def _generate_square_subsequent_mask(self, sz):
mask = torch.diag(torch.full((sz,),float('-inf')))
return mask
def forward(self, src):
if self.src_mask is None or self.src_mask.size(0) != len(src):
device = src.device
mask = self._generate_square_subsequent_mask(len(src)).to(device)
self.src_mask = mask
src = self.embedding(src) * math.sqrt(self.ninp)
src = self.dropout(src)
src = self.pos_encoder(src)
src = self.transformer_encoder(src, self.src_mask)
output = self.decoder(src) # Linear layer
return output
After training the model predicts exactly the same sentence from the input. If I change any word in the input - it predict the new word. So the model doesn't block according to the mask.
Why is it ?
I understand that there is a mistake in my logic because BERT would probably be much simpler if it worked. But where am I wrong ?
Edit:
I am using the a sequence of word indices as input. Output is the same sequence as input.
As far as I understand - the model doesn't prevent each word to indirectly “see itself” in multylayer context. I tried to use one layer - it looks like the model works. But training is too slow.

Keras. How to concatenate intermediate layers of two different models into a third model

I have two sequential models that both do a pretty good job of classifying audio. One uses mfccs and the other wave forms. I am now trying to combine them into a third functional API model using one of the later Dense layers from each of the mfcc and wave form models. The example about how to get the intermediate layers in the Keras FAQ is not working for me (https://keras.io/getting-started/faq/#how-can-i-obtain-the-output-of-an-intermediate-layer).
Here is my code:
mfcc_model = load_model(S01_model_local_loc)
waveform_model = load_model(T01_model_local_loc)
mfcc_input = Input(shape=(79,30,1))
mfcc_model_as_layer = Model(inputs=mfcc_model.input,
outputs=mfcc_model.get_layer(name = 'dense_9').output)
waveform_input = Input(shape=(40000,1))
waveform_model_as_layer = Model(inputs=waveform_model.input,
outputs=waveform_model.get_layer(name = 'dense_2').output)
concatenated_1024 = concatenate([mfcc_model_as_layer, waveform_model_as_layer])
model_pred = layers.Dense(2, activation='sigmoid')(concatenated_1024)
uber_model = Model(inputs=[mfcc_input,waveform_input], outputs=model_pred)
This throws the error:
AttributeError: Layer sequential_5 has multiple inbound nodes, hence the notion of "layer input" is ill-defined. Use get_input_at(node_index) instead.
Changing the inputs to the first two Model statements to inputs=mfcc_model.get_input_at(1) and inputs=waveform_model.get_input_at(1) solves that error message, but I then get this error message:
ValueError: Graph disconnected: cannot obtain value for tensor Tensor("dropout_21_input:0", shape=(?, 79, 30, 1), dtype=float32) at layer "dropout_21_input". The following previous layers were accessed without issue: []
If I remove the .get_layer statements and just take the final output of the model the graph connects nicely.
What do I need to do to just get the output of the Dense layers that I want?
Update: I found a really hacky way of getting what I want. I pop'ed off the layers of the mfcc and wave form models until the output layers were what I wanted. Then the code below seems to work. I'd love to know the right way to do this!
mfcc_input = Input(shape=(79,30,1))
waveform_input = Input(shape=(40000,1))
mfcc_model_as_layer = mfcc_model(mfcc_input)
waveform_model_as_layer = waveform_model(waveform_input)
concatenated_1024 = concatenate([mfcc_model_as_layer, waveform_model_as_layer])
model_pred = layers.Dense(2, activation='sigmoid')(concatenated_1024)
test_model = Model(inputs=[mfcc_input,waveform_input], outputs=model_pred)

How to get the output of intermediate layers which are not connected via Sequential() function?

I am new in Keras, but I worked with pure tensorflow before. I am trying to debug some of the following network (I will just copy a fragment. Loss function, optimizer, etc are unimportant to me for with this code)
#Block 1 (Conv,relu,batch) starts with 800 x 400
main_input = LNN.Input(shape=((800,400,5)),name='main_input')
enc_conv1 = LNN.Convolution2D(8,3,padding='same',activation='relu')(main_input)
enc_bn1 = LNN.BatchNormalization(axis=1)(enc_conv1)
#Block 2 (Conv,relu,batch) starts with 400 x 200
maxp1_4 = LNN.MaxPooling2D(strides=2)(enc_bn1)
enc_conv2 = LNN.Convolution2D(16,3,padding='same',activation='relu')(maxp1_4)
enc_bn2 = LNN.BatchNormalization(axis=1)(enc_conv2)
enc_conv3 = LNN.Convolution2D(16,3,padding='same',activation='relu')(enc_bn2)
enc_bn3 = LNN.BatchNormalization(axis=1)(enc_conv3)
concat1_5 = LNN.concatenate(axis=3,inputs=[enc_bn3,maxp1_4])
I have seen some examples of how to do it adding each operation to a Sequential() function (for example as the one explained here but with the add() function. Is there a way to check the output of each layer without adding them to a model itself (as it can be also done with Tensorflow, making a session)?
The best is to make a model that outputs those layers:
modelToOutputAll = Model(main_input, [enc_conv1, enc_bn1, maxp1_4, enc_conv2, enc_bn2, enc_conv3, enc_bn3, concat1_5])
For training, keep a model with only the final output:
modelForTraining = Model(main_input,concat1_5)
Both models are using the exact same weights, so training one changes the other. You use each one for doing what you need at the moment.
Train with modelForTraining.fit(xTrain,yTrain, ...)
See intermediate layers with modelToOutputAll.predict(xInput)

Resources