How to add CRF layer in a tensorflow sequential model? - python-3.x

I am trying to implement a CRF layer in a TensorFlow sequential model for a NER problem. I am not sure how to do it. Previously when I implemented CRF, I used CRF from keras with tensorflow as backend i.e. I created the entire model in keras instead of tensorflow and then passed the entire model through CRF. It worked.
But now I want to develop the model in Tensorflow as tensorflow2.0.0 beta already has keras inbuilt in it and I am trying to build a sequential layer and add CRF layer after a bidirectional lstm layer. Although I am not sure how to do that. I have gone through the CRF documentation in tensorflow-addons and it contains different functions such as forward CRF etc etc but not sure how to implement them as a layer ? I am wondering is it possible at all to implement a CRF layer inside a sequential tensorflow model or do I need to build the model graph from scratch and then use CRF functions ? Can anyone please help me with it. Thanks in advance

In the training process:
You can refer to this API:
tfa.text.crf_log_likelihood(
inputs,
tag_indices,
sequence_lengths,
transition_params=None
)
The inputs are the unary potentials(just like that in the logistic regression, and you can refer to this answer) and here in your case, they are the logits(it is usually not the distributions after the softmax activation function) or states of the BiLSTM for each character in the encoder(P1, P2, P3, P4 in the diagram above; ).
The tag_indices are the target tag indices, and the sequence_lengths represent the sequence lengths in a batch.
The transition_params are the binary potentials(also how the tag transits from one time step to the next), you can create the matrix yourself or you just let the API do it for you.
In the inference process:
You just utilize this API:
tfa.text.viterbi_decode(
score,
transition_params
)
The score stands for the same input like that in the training(the P1, P2, P3, P4 states) and the transition_params are also that trained in the training process.

Related

CNN with CTC loss

I want to extract features using a pretrained CNN model(ResNet50, VGG, etc) and use the features with a CTC loss function.
I want to build it as a text recognition model.
Anyone on how can i achieve this ?
I'm not sure if you are looking to finetune the pretrained models or to use the models for feature extraction. To do the latter freeze the petrained model weights (there are several ways to do this in PyTorch, the simplest being calling .eval() on the model), and feed the logits from the last layer of the model to your new output head. See the PyTorch tutorial here for a more in depth guide.

Keras Embedding layer activation function?

In the fully connected hidden layer of Keras embedding, what is the activation function leveraged? I'm either misunderstanding the concept of this class or unable to find documentation. I understand that it is encoding from word to real-valued vector of dimension d via answers like the below on stackoverflow:
Embedding layers in Keras are trained just like any other layer in your network architecture: they are tuned to minimize the loss function by using the selected optimization method. The major difference with other layers, is that their output is not a mathematical function of the input. Instead the input to the layer is used to index a table with the embedding vectors [1]. However, the underlying automatic differentiation engine has no problem to optimize these vectors to minimize the loss function...
In my network, I have a word embedding portion that is then linked to a larger network that is predicting a binary outcome (e.g., click yes/no). I understand that this Keras embedding is not operating like word2vec because here my embedding is being trained and updated against my end cross-entropy function. But, there is no mention of how the embedding fully-connected layer is activated. Thanks!

How to use a pre-trained object detection in tensorflow?

How can I use the weights of a pre-trained network in my tensorflow project?
I know some theory information about this but no information about coding in tensorflow.
As been pointed out by #Matias Valdenegro in the comments, your first question does not make sense. For your second question however, there are multiple ways to do so. The term that you're searching for is Transfer Learning (TL). TL means transferring the "knowledge" (basically it's just the weights) from a pre-trained model into your model. Now there are several types of TL.
1) You transfer the entire weights from a pre-trained model into your model and use that as a starting point to train your network.
This is done in a situation where you now have extra data to train your model but you don't want to start over the training again. Therefore you just load the weights from your previous model and resume the training.
2) You transfer only some of the weights from a pre-trained model into your new model.
This is done in a situation where you have a model trained to classify between, say, 5 classes of objects. Now, you want to add/remove a class. You don't have to re-train the whole network from the start if the new class that you're adding has somewhat similar features with (an) existing class(es). Therefore, you build another model with the same exact architecture as your previous model except the fully-connected layers where now you have different output size. In this case, you'll want to load the weights of the convolutional layers from the previous model and freeze them while only re-train the fully-connected layers.
To perform these in Tensorflow,
1) The first type of TL can be performed by creating a model with the same exact architecture as the previous model and simply loading the model using tf.train.Saver().restore() module and continue the training.
2) The second type of TL can be performed by creating a model with the same exact architecture for the parts where you want to retain the weights and then specify the name of the weights in which you want to load from the previous pre-trained weights. You can use the parameter "trainable=False" to prevent Tensorflow from updating them.
I hope this helps.

How to use MC Dropout on a variational dropout LSTM layer on keras?

I'm currently trying to set up a (LSTM) recurrent neural network with Keras (tensorflow backend).
I would like to use variational dropout with MC Dropout on it.
I believe that variational dropout is already implemented with the option "recurrent_dropout" of the LSTM layer but I don't find any way to set a "training" flag to put on to true like a classical Dropout layer.
This is quite easy in Keras, first you need to define a function that takes both model input and the learning_phase:
import keras.backend as K
f = K.function([model.layers[0].input, K.learning_phase()],
[model.layers[-1].output])
For a Functional API model with multiple inputs/outputs you can use:
f = K.function([model.inputs, K.learning_phase()],
[model.outputs])
Then you can call this function like f([input, 1]) and this will tell Keras to enable the learning phase during this call, executing Dropout. Then you can call this function multiple times and combine the predictions to estimate uncertainty.
The source code for "Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning" (2015) is located at https://github.com/yaringal/DropoutUncertaintyExps/blob/master/net/net.py. They also use Keras and the code is quite easy to understand. The Dropout layers are used without the Sequential api in order to pass the training parameter. This is a different approach to the suggestion from Matias:
inter = Dropout(dropout_rate)(inter, training=True)

How to adopt multiple different loss functions in each steps of LSTM in Keras

I have a set of sentences and their scores, I would like to train a marking system that could predict the score for a given sentence, such one example is like this:
(X =Tomorrow is a good day, Y = 0.9)
I would like to use LSTM to build such a marking system, and also consider the sequential relationship between each word in the sentence, so the training example shown above is transformed as following:
(x1=Tomorrow, y1=is) (x2=is, y2=a) (x3=a, y3=good) (x4=day, y4=0.9)
When training this LSTM, I would like the first three time steps using a softmax classifier, and the final step using a MSE. It is obvious that the loss function used in this LSTM is composed of two different loss functions. In this case, it seems the Keras does not provide the way to address my problem directly. In addition, I am not sure whether my method to build the marking system is correct or not.
Keras support multiple loss functions as well:
model = Model(inputs=inputs,
outputs=[lang_model, sent_model])
model.compile(optimizer='sgd',
loss=['categorical_crossentropy', 'mse'],
metrics=['accuracy'], loss_weights=[1., 1.])
Based on your explanation, I think you need a model that first, predict a token based on previous tokens, in NLP domain it usually called Language model, and then compute a score which I assume it is a sentiment (it is applicable to other domain).
To do so, you can train your language model with LSTM and pick the last output of LSTM for your ranking task. To this end, you need to define two loss function: categorical_crossentropy for the language model and MSE for the ranking task.
This tutorial would be helpful: https://www.pyimagesearch.com/2018/06/04/keras-multiple-outputs-and-multiple-losses/

Resources