How to add attention layer to seq2seq model on Keras - nlp

Based on this article, I wrote this model:
enc_in=Input(shape=(None,in_alphabet_len))
lstm=LSTM(lstm_dim,return_sequences=True,return_state=True,use_bias=False)
enc_out,h,c=lstm(enc_in)
dec_in=Input(shape=(None,in_alphabet_len))
decoder,_,_=LSTM(decoder_dim,return_sequences=True,return_state=True)(dec_in,initial_state=[h,c])
decoder=Dense(units=in_alphabet_len,activation='softmax')(decoder)
model=Model([enc_in,dec_in],decoder)
How can I add attention layer to this model before decoder?

You can use this repo,
you will need to pip install keras-self-attention
import layer from keras_self_attention import SeqSelfAttention
if you want to use tf.keras not keras, add the following before the import os.environ['TF_KERAS'] = '1'
Make sure if you are using keras to omit the previous flag as it will cause inconsistencies
Since you are using keras functional API,
enc_out, h, c = lstm()(enc_in)
att = SeqSelfAttention()(enc_out)
dec_in = Input(shape=(None, in_alphabet_len))(att)
I hope this answers your question, and future readers

Related

Using tf.keras.layers with keras.model

yes I've read everywhere that keras and tf.keras aren't compatible. But you can pass tf.keras.layers into a keras model, and it does work. When I try to do that with my own models... it does not work!
If you examine the resnet sourcecode for Resnet50.py, they build models like
input = layers.Input(shape=input_shape)
x = layers.Dense()(x)
model = Model(input,x)
and it works fine whether you pass in layers=tf.keras.layers or layers=keras.layers
demonstration code:
import tensorflow as tf
import keras
# THIS WORKS!
input_shape = (224,224,3)
base_model = keras.applications.ResNet50(layers=tf.keras.layers, weights='imagenet',
weights='imagenet', include_top=False, pooling=None,
input_shape=input_shape,
classes=1000)
# this fails!!
input = tf.keras.layers.Input(shape=input_shape)
x = tf.keras.layers.Dense(1000,activation='relu')(input)
model = keras.Model(input, x)
My code produces this error: type error:
object of type Dense has no len
How to make my this work? Apparently there is a way to make it work, because the keras.applications prebuilt models do seem to support it and it works fine
I want to use tf.keras.layers because their batchnormalization layer works different. This is potentially the easiest way to drop it into our massive existing code base.
I do see this related stackoverflow post with the same error: Object of Type 'Dense' has no len()
They correctly mention it's due to tf.keras and keras not being compatible. but again, I've confirmed that passing tf.keras.layers into keras.applications.resnet50 does return a keras model with the correct layers. Somehow.
You got the wrong conclusion, keras.applications is a module that supports both keras and tf.keras packages, as keras.applications uses models.Model, it detects if you use tf.keras or keras and gets the corresponding modules so the code is agnostic to the actual keras implementation.
keras.applications is not mixing usage of keras and tf.keras, it just supports both.

Keras LSTM layers in Keras-rl

I am trying to implement a DQN agent using Keras-rl. The problem is that when I define my model I need to use an LSTM layer in the architecture:
model = Sequential()
model.add(Flatten(input_shape=(1, 8000)))
model.add(Reshape(target_shape=(200, 40)))
model.add(LSTM(20))
model.add(Dense(3, activation='softmax'))
return model
Executing the rl-agent I obtain the following error:
RuntimeError: Attempting to capture an EagerTensor without building a function.
Which is related to the use of the LSTM and to the following line of code:
tf.compat.v1.disable_eager_execution()
Using a Dense layer instead of an LSTM:
model = Sequential()
model.add(Flatten(input_shape=(1, 8000)))
model.add(Dense(20))
model.add(Dense(3, activation='softmax'))
return model
and maintaining eager execution disabled I don't have the previously reported error. If I delete the disabling of the eager execution with the LSTM layer I have other errors.
Can anyone help me to understand the reason of the error?
The keras-rl library does not have explicit support for TensorFlow 2.0, so it will not work with such version of TensorFlow. The library is sparsely updated and the last release is around 2 years old (from 2018), so if you want to use it you should use TensorFlow 1.x
Install keras-rl2 from github support tensorflow 2.x
Although it is possible to migrate the code for keras-rl to use eager execution and therefore an LSTM. LSTMs need to be updated with a whole episode of learning to prove accurate something which keras-rl does not support. See more here: https://github.com/keras-rl/keras-rl/issues/41

Compatibility between keras and tf.keras models

I am interested in training a model in tf.keras and then loading it with keras. I know this is not highly-advised, but I am interested in using tf.keras to train the model because
tf.keras is easier to build input pipelines
I want to take advantage of the tf.dataset API
and I am interested in loading it with keras because
I want to use coreml to deploy the model to ios.
I want to use coremltools to convert my model to ios, and coreml tools only works with keras, not tf.keras.
I have run into a few road-blocks, because not all of the tf.keras layers can be loaded as keras layers. For instance, I've had no trouble with a simple DNN, since all of the Dense layer parameters are the same between tf.keras and keras. However, I have had trouble with RNN layers, because tf.keras has an argument time_major that keras does not have. My RNN layers have time_major=False, which is the same behavior as keras, but keras sequential layers do not have this argument.
My solution right now is to save the tf.keras model in a json file (for the model structure) and delete the parts of the layers that keras does not support, and also save an h5 file (for the weights), like so:
model = # model trained with tf.keras
# save json
model_json = model.to_json()
with open('path_to_model_json.json', 'w') as json_file:
json_ = json.loads(model_json)
layers = json_['config']['layers']
for layer in layers:
if layer['class_name'] == 'SimpleRNN':
del layer['config']['time_major']
json.dump(json_, json_file)
# save weights
model.save_weights('path_to_my_weights.h5')
Then, I use the coremlconverter tool to convert from keras to coreml, like so:
with CustomObjectScope({'GlorotUniform': glorot_uniform()}):
coreml_model = coremltools.converters.keras.convert(
model=('path_to_model_json','path_to_my_weights.h5'),
input_names=#inputs,
output_names=#outputs,
class_labels = #labels,
custom_conversion_functions = { "GlorotUniform": tf.keras.initializers.glorot_uniform
}
)
coreml_model.save('my_core_ml_model.mlmodel')
My solution appears to be working, but I am wondering if there is a better approach? Or, is there imminent danger in this approach? For instance, is there a better way to convert tf.keras models to coreml? Or is there a better way to convert tf.keras models to keras? Or is there a better approach that I haven't thought of?
Any advice on the matter would be greatly appreciated :)
Your approach seems good to me!
In the past, when I had to convert tf.keras model to keras model, I did following:
Train model in tf.keras
Save only the weights tf_model.save_weights("tf_model.hdf5")
Make Keras model architecture using all layers in keras (same as the tf keras one)
load weights by layer names in keras: keras_model.load_weights(by_name=True)
This seemed to work for me. Since, I was using out of box architecture (DenseNet169), I had to very less work to replicate tf.keras network to keras.

Loading a Pytorch model into C++ Using Dev Pytorch 1.0

Pytorch 1.0 has a feature to convert a model into a torch script program (serialized in a way) to enable its execution in C++ without Python dependencies.
The details are in this tutorial.
https://pytorch.org/tutorials/advanced/cpp_export.html
This is how it is done:
import torch
import torchvision
# An instance of your model.
model = A UNET MODEL FROM FASTAI which has hooks as required by UNET
# An example input you would normally provide to your model's forward() method.
example = torch.rand(1, 3, 224, 224)
# Use torch.jit.trace to generate a torch.jit.ScriptModule via tracing.
traced_script_module = torch.jit.trace(model, example)
In my use case, I am using a UNET Model for semantic segmentation. However, I trace the model using this method, I get the following error.
Forward or backward hooks can't be compiled
UNET Model uses hooks to save intermediate features which is used at later layers in the network. Is there a way around it? or This is still a limitation of this new method that it cannot work with Models using such hooks.
If you can use the UNET model from Pytorch hub. It will work with TorchScript.
import torch
# downloading the model from torchhub
model = torch.hub.load('mateuszbuda/brain-segmentation-pytorch', 'unet',
in_channels=3, out_channels=1, init_features=32, pretrained=True)
# downloading the sample
import urllib
url, filename = ("https://github.com/mateuszbuda/brain-segmentation-pytorch/raw/master/assets/TCGA_CS_4944.png", "TCGA_CS_4944.png")
try: urllib.URLopener().retrieve(url, filename)
except: urllib.request.urlretrieve(url, filename)
# reading the sample and some prerequisites for transformation
import numpy as np
from PIL import Image
from torchvision import transforms
input_image = Image.open(filename)
m, s = np.mean(input_image, axis=(0, 1)), np.std(input_image, axis=(0, 1))
preprocess = transforms.Compose([transforms.ToTensor(),transforms.Normalize(mean=m, std=s),])
input_tensor = preprocess(input_image)
input_batch = input_tensor.unsqueeze(0)
# creating the trace
traced_module = torch.jit.trace(model,input_batch)
# running the trace
traced_module(input_batch)
PS: Both torch.jit.trace/torch.jit.script does not support all torch functionality, so there is always tricky to use them with external libraries.
Maybe you could rewrite the model in C++, since c++ API has almost the same interface as python version.

Show model layout / design (with all connections) in Keras

I have major differences when testing a Keras LSTM model after I've trained it compared to when I load that trained model from a .h5 file (Accuracy of the first is always > 0.85 but of the later is always below < 0.2 i.e. a random guess).
However I checked the weights, they are identical and also the sparse layout Keras give me via plot_model is the same, but since this only retrieves a rough overview:
Is there away to show the full layout of a Keras model (especially node connections)?
If you're using tensorflow backend, apart from plot_model, you can also use keras.callbacks.TensorBoard callback to visualize the whole graph in tensorboard. Example:
callback = keras.callbacks.TensorBoard(log_dir='./graph',
histogram_freq=0,
write_graph=True,
write_images=True)
model.fit(..., callbacks=[callback])
Then run tensorboard --logdir ./graph from the same directory.
This is a quick shortcut, but you can go even further with that.
For example, add tensorflow code to define (load) the model within custom tf.Graph instance, like this:
from keras.layers import LSTM
import tensorflow as tf
my_graph = tf.Graph()
with my_graph.as_default():
# All ops / variables in the LSTM layer are created as part of our graph
x = tf.placeholder(tf.float32, shape=(None, 20, 64))
y = LSTM(32)(x)
.. after which you can list all graph nodes with dependencies, evaluate any variable, display the graph topology and so on, to compare the models.
I personally think, the simplest way is to setup your own session. It works in all cases with minimal patching:
import tensorflow as tf
from keras import backend as K
sess = tf.Session()
K.set_session(sess)
...
# Now can evaluate / access any node in this session, e.g. `sess.graph`

Resources