I've got a basic Keras model with a GRU layer where stateful=True. I want to convert my model to a TFLite model and make predictions on data one element at a time, i.e a sequence will be fed to the model in batches of size 1. Looking at the TensorFlow Docs, there isn't a way to convert a Stateful GRU to a TFLite Model. However it does say (https://www.tensorflow.org/lite/convert/rnn):
It is still possible to model a stateful Keras LSTM layer using the underlying stateless Keras LSTM layer and managing the state explicitly in the user program. Such a TensorFlow program can still be converted to TensorFlow Lite using the feature being described here.
I don't what is meant by this. If I set the GRU to not be stateful, how can I prevent its state from being reset after each prediction (batch)? Is there a way to keep the state from resetting?
Related
I am working on a binary classification task and would like to try adding lstm layer on top of the last hidden layer of huggingface BERT model, however, I couldn't reach the last hidden layer. Is it possible to combine BERT with LSTM?
tokenizer = BertTokenizer.from_pretrained(model_path)
tain_inputs, train_labels, train_masks = data_prepare_BERT(
train_file, lab2ind, tokenizer, content_col, label_col,
max_seq_length)
validation_inputs, validation_labels, validation_masks = data_prepare_BERT(
dev_file, lab2ind, tokenizer, content_col, label_col,max_seq_length)
# Load BertForSequenceClassification, the pretrained BERT model with a single linear classification layer on top.
model = BertForSequenceClassification.from_pretrained(
model_path, num_labels=len(lab2ind))
Indeed it is possible, but you need to implement it yourself. BertForSequenceClassification class is a wrapper for BertModel. It runs the model, takes the hidden state corresponding to the [CLS] tokens, and applies a classifier on top of that.
In your case, you can the class as a starting point, and add there an LSTM layer between the BertModel and the classifier. The BertModel returns both the hidden states and a pooled state for classification in a tuple. Just take the other tuple member than is used in the original class.
Although it is technically possible, I would expect any performance gain compared to using BertForSequenceClassification. Finetuning of the Transformer layers can learn anything that an additional LSTM layer is capable of.
I am trying to implement a CRF layer in a TensorFlow sequential model for a NER problem. I am not sure how to do it. Previously when I implemented CRF, I used CRF from keras with tensorflow as backend i.e. I created the entire model in keras instead of tensorflow and then passed the entire model through CRF. It worked.
But now I want to develop the model in Tensorflow as tensorflow2.0.0 beta already has keras inbuilt in it and I am trying to build a sequential layer and add CRF layer after a bidirectional lstm layer. Although I am not sure how to do that. I have gone through the CRF documentation in tensorflow-addons and it contains different functions such as forward CRF etc etc but not sure how to implement them as a layer ? I am wondering is it possible at all to implement a CRF layer inside a sequential tensorflow model or do I need to build the model graph from scratch and then use CRF functions ? Can anyone please help me with it. Thanks in advance
In the training process:
You can refer to this API:
tfa.text.crf_log_likelihood(
inputs,
tag_indices,
sequence_lengths,
transition_params=None
)
The inputs are the unary potentials(just like that in the logistic regression, and you can refer to this answer) and here in your case, they are the logits(it is usually not the distributions after the softmax activation function) or states of the BiLSTM for each character in the encoder(P1, P2, P3, P4 in the diagram above; ).
The tag_indices are the target tag indices, and the sequence_lengths represent the sequence lengths in a batch.
The transition_params are the binary potentials(also how the tag transits from one time step to the next), you can create the matrix yourself or you just let the API do it for you.
In the inference process:
You just utilize this API:
tfa.text.viterbi_decode(
score,
transition_params
)
The score stands for the same input like that in the training(the P1, P2, P3, P4 states) and the transition_params are also that trained in the training process.
How can I use the weights of a pre-trained network in my tensorflow project?
I know some theory information about this but no information about coding in tensorflow.
As been pointed out by #Matias Valdenegro in the comments, your first question does not make sense. For your second question however, there are multiple ways to do so. The term that you're searching for is Transfer Learning (TL). TL means transferring the "knowledge" (basically it's just the weights) from a pre-trained model into your model. Now there are several types of TL.
1) You transfer the entire weights from a pre-trained model into your model and use that as a starting point to train your network.
This is done in a situation where you now have extra data to train your model but you don't want to start over the training again. Therefore you just load the weights from your previous model and resume the training.
2) You transfer only some of the weights from a pre-trained model into your new model.
This is done in a situation where you have a model trained to classify between, say, 5 classes of objects. Now, you want to add/remove a class. You don't have to re-train the whole network from the start if the new class that you're adding has somewhat similar features with (an) existing class(es). Therefore, you build another model with the same exact architecture as your previous model except the fully-connected layers where now you have different output size. In this case, you'll want to load the weights of the convolutional layers from the previous model and freeze them while only re-train the fully-connected layers.
To perform these in Tensorflow,
1) The first type of TL can be performed by creating a model with the same exact architecture as the previous model and simply loading the model using tf.train.Saver().restore() module and continue the training.
2) The second type of TL can be performed by creating a model with the same exact architecture for the parts where you want to retain the weights and then specify the name of the weights in which you want to load from the previous pre-trained weights. You can use the parameter "trainable=False" to prevent Tensorflow from updating them.
I hope this helps.
I'm not sure if this is actually possible with the information I have, but please let me know if that's the case.
From a previously trained Tensorflow model I have the following files:
graph.pbtxt, checkpoint, model.ckpt-10000.data-00000-of-00001, model.ckpt-10000.index, and model.ckpt-10000.meta
I was told that the input size of this model was a Dense layer of size 5000 and the output was a Dense sigmoid binary classification, but I don't know how many/what size layers were in between. (I'm also not 100% positive that the input size is correct).
From this information and associated files, is there a way to replicate the TF model with trained weights into a Keras functional model?
(The idea was that this small dense network was added onto the last FC layer of VGG-16, so that'll be my end goal.)
I am using Keras newsgroup example code for text classification. I have saved the trained model using the h5py library. Will the embedding layer also get saved or should I write some extra code when loading the model to use the embedding layer?
Embedding layer is part of the model so it will be saved with the model. Check out this on saving the model.
Also one important addition, the Keras Embedding layer will be initialized with the random values at the very start of the training process, and the parameters will be learned in the training phase.