How to enforce LSTM to learn monotonic sequences? - keras

I'm using LSTM with Keras to forecast a set of sequences. Here my basic model:
inputs = Input(shape=(1,seq_dim)) #seq_dim = 2
# shape = (timesteps, featdim) = (1,2) since my input sequences are pair of values
# I want to predict the sequence of the fist values in the pairs
se = LSTM(lstm_size)(inputs)
out = Dense(1)(se) # I want to forecast one value
model = Model(inputs=inputs, outputs=out)
I know for sure that the sequences start from 0 and are monotonic (not-decreasing).
I tried with the Maximum() layer
max_out = Maximum()([output_seq,input_seq])
Here the model
inputs = Input(shape=(1,seq_dim))
# shape = (timesteps, featdim) = (1,2) since my input sequences are pair of values
# I want to predict the sequence of the fist values in the pairs
se = LSTM(lstm_size)(inputs)
out = Dense(1)(se) # I want to forecast one value
# max between the output and the previous value of the sequence (current input)
max_out = Maximum()([out,inputs[:,:,0]])
model = Model(inputs=inputs, outputs=max_out)
however at compiling the model an error is raised:
"AttributeError: 'Tensor' object has no attribute '_keras_history'"
I've also tried with a Lambda layer but it raises the same error.
max_out = Lambda(lambda x: K_BACKEND.max(x))([out,inputs[:,:,0]])
How can I add this constrain to my model? Is it possible to do in the architecture definitio (as I'm trying to do), or by editing the loss function?
thanks in advance

Try this
max_out = Lambda( lambda oi: K_BACKEND.maximum( oi[0], oi[1][:,:,0], axis=-1)),output_shape=lambda oi : oi[0] )([out,inputs]).

Related

How to use bart-large-mnli model for NLI task?

I want to use facebook/bart-large-mnli model for NLI task.
I have dataset with premises and hypothesis columns and labels [0,1,2].
How can I use this model for that NLI task ?
I wrote the following code:
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
nli_model = AutoModelForSequenceClassification.from_pretrained('facebook/bart-large-mnli')
tokenizer = AutoTokenizer.from_pretrained('facebook/bart-large-mnli')
nli_model.to(device)
i = 0 # first examle check
premise = tokenized_datasets['TRAIN'][i]['premise']
hypothesis = tokenized_datasets['TRAIN'][i]['hypothesis']
x = tokenizer.encode(premise, hypothesis, return_tensors='pt', truncation_strategy='only_first')
logits = nli_model(x.to(device))[0]
entail_contradiction_logits = logits[:,[0,2]]
probs = entail_contradiction_logits.softmax(dim=1)
probs
and I got only 2 values: tensor([[8.8793e-05, 9.9991e-01]], device='cuda:0', grad_fn=<SoftmaxBackward0>) (instead of 3 values - contradiction, neutral, entailment)
How can I use this model for NLI (predict the right value from 3 labels) ?
This code line:
entail_contradiction_logits = logits[:,[0,2]]
selects the first and third element of the logits tensor (i.e. you are removing the logits for neutral). Just use the variable logits as it is to get probabilities for all 3 values.

Forward outputs on multiple sequences is wrong

I am using T5 to summarize multiple sequences as a batch. Here I want to generate the output of model.generate(input_ids) by calling forward function (model(**inputs)). I know that forward() and generate() work completely different see this. To make them working the same way. I take some sequences and call model.generate() on them to generate the corresponding outputs and get pairs of (text, summary). Now, Calling the forward function on these pairs one each time generates the same outputs. However, when calling the forward function on batch of sequences, the output is not the same ? What I missed ?
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("t5-small")
model = AutoModelForSeq2SeqLM.from_pretrained("t5-small")
model.resize_token_embeddings(len(tokenizer))
model.to("cuda")
model.eval()
# sequences
seq1 = "summarize: Calling the model (which means the forward method) uses the labels for teacher forcing. This means inputs to the decoder are the labels shifted by one"
output1 = "calling the model uses the labels for teacher forcing. inputs to the decoder"
seq2 = "summarize: When you call the generate method, the model is used in the autoregressive fashion"
output2 = "the model is used in the auto-aggressive fashion."
seq3 = "summarize: However, selecting the token is a hard decision, and the gradient cannot be propagated through this decision"
output3 = "the token is a hard decision, and the gradient cannot be propagated through this decision"
input_sequences = [seq1, seq2, seq3]
output_seq = [output1, output2, output3]
# encoding input and attention mask
encoding = tokenizer(
input_sequences,
padding="longest",
max_length=128,
truncation=True,
return_tensors="pt",
)
input_ids, attention_mask = encoding.input_ids.to("cuda"), encoding.attention_mask.to("cuda")
# labels
target_encoding = tokenizer(
output_seq, padding="longest", max_length=128, truncation=True
)
labels = target_encoding.input_ids
labels = torch.tensor(labels).to("cuda")
labels[labels == tokenizer.pad_token_id] = -100
# Call the models
logits = model(input_ids=input_ids, attention_mask=attention_mask, labels=labels).logits
# Apply softamx() and batch_decode()
X = logits
X = F.softmax(X, dim=-1)
ids = X.argmax(dim=-1)
y = tokenizer.batch_decode(sequences=ids, skip_special_tokens=True)
# results: batch_size=3
['call the model uses the labels for teacher forcing inputs to the decoder are',
'the model is used in the auto-aggressive fashion the the the',
'the token is a hard decision, and the gradient cannot be propagated through this decision ']
# results: batch_size =1 i.e. consider 1 seq each time
['call the model uses the labels for teacher forcing inputs to the decoder are']
['the model is used in the auto-aggressive fashion ']
['the token is a hard decision, and the gradient cannot be propagated through this decision ']

Debug output of keras layers during training

When fitting a model using keras, I encounter nans, and I want to debug the output of each layer.
The code has an input in1 which goes through multiple layers, and during the final layer I multiply elementwise with another input in2 and then do the prediction. The input in2 is sparse and is used for masking (a row resembles something like this [0 0 0 1 0 0 1 0 1 0... 0]). Label matrix contains one-hot-encoded rows. Input in1 is a vector of real values.
in1 = Input(shape=(27,), name='in1')
in2 = Input(shape=(1000,), name='in2')
# Hidden layers
hidden_1 = Dense(1024, activation='relu')(in1)
hidden_2 = Dense(512, activation='relu')(hidden_1)
hidden_3 = Dense(256, activation='relu')(hidden_2)
hidden_4 = Dense(10, activation='linear')(hidden_3)
final = Dense(1000, activation='linear')(hidden_4)
# Ensure we do not overflow when we exponentiate
final2 = Lambda(lambda x: x - K.max(x))(final)
#Masked soft-max using Lambda and merge-multiplication
exponentiate = Lambda(lambda x: K.exp(x))(final2)
masked = Multiply()([exponentiate, in2])
predicted = Lambda(lambda x: x / K.sum(x))(masked)
# Compile with categorical crossentropy and adam
mdl = Model(inputs=[in1, in2],outputs=predicted)
mdl.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
tensorboard = TensorBoard(log_dir="/Users/somepath/tmp/{}".format(time()), write_graph=True,
write_grads=True)
mdl.fit({'in1': in1_matrix, 'in2': in2_matrix},
label_matrix, epochs=1, batch_size=32, verbose=2, callbacks=[tensorboard])
I want to print the output of each layer, gradients during training and how to send auxiliary input (in2) while debugging.
I have tried to print the output of each layer like below, which works until layer7:
get_layer_output = K.function([mdl.layers[0].input],[mdl.layers[7].output])
layer_output = get_layer_output([in1_matrix])
But when I get to layer 8, I'm unable to add in2_matrix. I get the following error when I use the following code to print.
get_layer_output2 = K.function([mdl.layers[0].input],[mdl.layers[8].output])
layer_output2 = get_layer_output2([in1_matrix])
Error:
InvalidArgumentError: You must feed value for placeholder tensor 'in2' with dtype float and shape [?,1000]
I don't know how to provide in2 in K.function, and also in2_matrix to get_layer_output2.
(I have checked the in1_matrix, in2_matrix, and the label_matrix. They all look fine, with no nans or inf. Label array has no rows or columns with all zeros.)'
I'm new to Keras, any idea on how to debug nans, with callbacks even to print gradients would be appreciated. Please also let me know if there is anything wrong with the way the layers are composed.
If you print out mdl.layers[8], you can find it is Input layer, I guess you want to get the output of mdl.layers[9], which is Multiply layer. You can get like this,
get_layer_output2 = K.function([mdl.layers[0].input, mdl.layers[8].input],[mdl.layers[9].output])
layer_output2 = get_layer_output2([in1_matrix, in2_matrix])

Attribute error: None type has no attribute summary in keras

I have tried to go in deep with my understanding of word embedding and NLP in keras implementing and copying part of the code creating a Keras model using functional API. When I launch model.summary I receive an Attribute error: None type has no attribute 'summary'.
After many attempts decreasing the numbers of layers, the dimension of word embedding matrix unfortunately nothing changed. I don't know what to do.
def pretrained_embedding_layer(word_to_vec, word_to_index):
vocab_len = len(word_to_index) + 1
emb_dim = word_to_vec["sole"].shape[0]
emb_matrix = np.zeros((vocab_len,emb_dim))
for word, index in word_to_index.items():
emb_matrix[index, :] = word_to_vec[word]
print(emb_matrix.shape)
embedding_layer = Embedding(vocab_len,emb_dim,trainable =False)
embedding_layer.build((None,))
embedding_layer.set_weights([emb_matrix])
return embedding_layer
def Chatbot_V1(input_shape, word_to_vec, word_to_index):
# Define sentence_indices as the input of the graph, it should be of shape input_shape and dtype 'int32' (as it contains indices).
sentence_indices = Input(input_shape, dtype='int32')
# Create the embedding layer pretrained with GloVe Vectors (≈1 line)
embedding_layer = pretrained_embedding_layer(word_to_vec, word_to_index)
embeddings = embedding_layer(sentence_indices)
# Propagate the embeddings through an LSTM layer with 128-dimensional hidden state
X = LSTM(128, return_sequences=True)(embeddings)
# Add dropout with a probability of 0.5
X = Dropout(0.5)(X)
# Propagate X trough another LSTM layer with 128-dimensional hidden state
# Be careful, the returned output should be a single hidden state, not a batch of sequences.
X = LSTM(128, return_sequences=True)(X)
# Add dropout with a probability of 0.5
X = Dropout(0.5)(X)
# Propagate X through a Dense layer with softmax activation to get back a batch of vocab_dim dimensional vectors.
X = Dense(vocab_dim)(X)
# Add a softmax activation
preds = Activation('softmax')(X)
# Create Model instance which converts sentence_indices into X.
model = Model(sentence_indices, preds)
model = Chatbot_V1((maxLen,), word_to_vec, word_to_index)
model.summary()
Launching model.summary:
AttributeError: 'NoneType' object has no attribute 'summary'
Why? What is wrong in layers definition?
The function Chatbot_V1 does not return anything, and in python this is signaled by None if you assign the return value of the function to a variable. So just use the return keyword to return the model at the end of Chatbot_V1

Convert code to new keras version (functional API) or how to concatenate 2 models

Megre doesn't work anymore. I tried the new functional API (concatenate, add, multiply) but it doesn't work for models. How to implement it?
lower_model = [self.build_network(self.model_config['critic_lower'], input_shape=(self.history_length, self.n_stock, 1))
for _ in range(1 + self.n_smooth + self.n_down)]
merged = Merge(lower_model, mode='concat')
# upper layer
upper_model = self.build_network(self.model_config['critic_upper'], model=merged)
# action layer
action = self.build_network(self.model_config['critic_action'], input_shape=(self.n_stock,), is_conv=False)
# output layer
merged = Merge([upper_model, action], mode='mul')
model = Sequential()
model.add(merged)
model.add(Dense(1))
return model
I cannot really give you the exact answer, because your question is not detailed enough, but I can provide you an example, where layers are concatenated. Common problem is to import Concatenate and use it as in previous versions.
nlp_input = Input(shape=(seq_length,), name='nlp_input')
meta_input = Input(shape=(10,), name='meta_input')
emb = Embedding(output_dim=embedding_size, input_dim=100, input_length=seq_length)(nlp_input)
nlp_out = Bidirectional(LSTM(128, dropout=0.3, recurrent_dropout=0.3, kernel_regularizer=regularizers.l2(0.01)))(emb)
x = concatenate([nlp_out, meta_input])
x = Dense(classifier_neurons, activation='relu')(x)
x = Dense(1, activation='sigmoid')(x)
model = Model(inputs=[nlp_input , meta_input], outputs=[x])
This is a dirty workaround to show how to get input and output tensors from models and use concatenate layers with them. Also to learn how to use Dense and other layers with tensors and create functional API models.
Ideally, you should rewrite everything that's inside build_network for clean and optimized code. (Perhaps this doesn't even work depending on the content of this function, but this is the idea)
lower_model = [self.build_network(
self.model_config['critic_lower'],
input_shape=(self.history_length, self.n_stock, 1))
for _ in range(1 + self.n_smooth + self.n_down)]
#for building models you need input and output tensors
lower_inputs = [model.input for model in lower_model]
lower_outputs = [model.output for model in lower_model]
#these lines assume each model in the list has only one input and output
#using a concatenate layer on a list of tensors
merged_tensor = Concatenate()(lower_outputs) #or Concatenate(axis=...)(lower_outputs)
#this is a workaround for compatibility.
#ideally you should work just with tensors, not create unnecessary intermediate models
merged_model = Model(lower_inputs, merged_tensor) #make model from input tensors to outputs
# upper layer
upper_model = self.build_network(self.model_config['critic_upper'], model=merged_model)
# action layer
action = self.build_network(self.model_config['critic_action'], input_shape=(self.n_stock,), is_conv=False)
# output layer - get the output tensors from the models
upper_out = upper_model.output
action_out = action.output
#apply the Multiply layer on the list of tensors
merged_tensor = Multiply()([upper_out, action_out])
#apply the Dense layer on the merged tensor
out = Dense(1)(merged_tensor)
#get input tensors to create a model
upper_iputs = upper_model.inputs #should be a list
action_inputs = action.inputs #if not a list, append to the previous list
inputs = upper_inputs + action_inputs
model = Model(inputs, out)
return model

Resources