How to access BERT's inter layer? - nlp

I want to put [batch_size, 768, text_length] tensor into
6th layer of BERT.
How can I give input to 6th layer?
Can I take just 6~last layer of BERT then use it?
Thank you.

If you want to use only the last 6 layers of BERT, you can create a new model that only consists of these layers, using the following code:
import torch
from transformers import BertModel
model = BertModel.from_pretrained("bert-base-uncased")
last_6_layers = torch.nn.ModuleList(model.bert.encoder.layer[-6:]).to(device)
This will create a new model last_6_layers that consists of only the last 6 layers of the BERT model. You can then input your [batch_size, 768, text_length] tensor into this new model and use it for your specific task.

Related

How to add LSTM layer on top of Huggingface BERT model

I am working on a binary classification task and would like to try adding lstm layer on top of the last hidden layer of huggingface BERT model, however, I couldn't reach the last hidden layer. Is it possible to combine BERT with LSTM?
tokenizer = BertTokenizer.from_pretrained(model_path)
tain_inputs, train_labels, train_masks = data_prepare_BERT(
train_file, lab2ind, tokenizer, content_col, label_col,
max_seq_length)
validation_inputs, validation_labels, validation_masks = data_prepare_BERT(
dev_file, lab2ind, tokenizer, content_col, label_col,max_seq_length)
# Load BertForSequenceClassification, the pretrained BERT model with a single linear classification layer on top.
model = BertForSequenceClassification.from_pretrained(
model_path, num_labels=len(lab2ind))
Indeed it is possible, but you need to implement it yourself. BertForSequenceClassification class is a wrapper for BertModel. It runs the model, takes the hidden state corresponding to the [CLS] tokens, and applies a classifier on top of that.
In your case, you can the class as a starting point, and add there an LSTM layer between the BertModel and the classifier. The BertModel returns both the hidden states and a pooled state for classification in a tuple. Just take the other tuple member than is used in the original class.
Although it is technically possible, I would expect any performance gain compared to using BertForSequenceClassification. Finetuning of the Transformer layers can learn anything that an additional LSTM layer is capable of.

ResNet50 and VGG16 for data with 2 channels

Is there any way that I can try to modify ResNet50 and VGG16 where my data(spectograms) is of the shape (64,256,2)?
I understand that I can take some layers out and modify them(output, dense) but I am not really sure for input channels.
Can anyone suggest a way to accommodate 2 channels in the models? Help is much appreciated!
You can use a different number of channels in the input (and a different height and width), but in this case, you cannot use the pretrained imagenet weights. You have to train from scratch. You can create them as follows:
from tensorflow import keras # or just import keras
vggnet = keras.applications.vgg16.VGG16(input_shape=(64,256,2), include_top=False, weights=None)
Note the weights=None argument. It means initialize weights randomly. If you have number of channels set to 3, you could use weights='imagenet', but in your case, you have 2 channels, so it won't work and you have to set it to None. The include_top=False is there for you to add final classification layers with different categories yourself. You could also create vgg19.VGG19 in the same way. For ResNet, you could similarly create it as follows:
resnet = keras.applications.resnet50.ResNet50(input_shape=(64, 256, 2), weights=None, include_top=False)
For other models and versions of vgg and resnet, please check here.

Keras Model - Functional API - adding layers to existing model

I am trying to learn to use the Keras Model API for modifying a trained model for the purpose of fine-tuning it on the go:
A very basic model:
inputs = Input((x_train.shape[1:]))
x = BatchNormalization(axis=1)(inputs)
x = Flatten()(x)
outputs = Dense(10, activation='softmax')(x)
model1 = Model(inputs, outputs)
model1.compile(optimizer=Adam(lr=1e-5), loss='categorical_crossentropy', metrics=['categorical_accuracy'])
The architecture of it is
InputLayer -> BatchNormalization -> Flatten -> Dense
After I do some training batches on it I want to add some extra Dense layer between the Flatten one and the outputs:
x = Dense(32,activation='relu')(model1.layers[-2].output)
outputs = model1.layers[-1](x)
However, when I run it, i get this:
ValueError: Input 0 is incompatible with layer dense_1: expected axis -1 of input shape to have value 784 but got shape (None, 32)
Could someone please explain what is going on and how/if can I add layers to an already trained model?
Thank you
A Dense layer is made strictly for a certain input dimension. That dimension cannot be changed after you define it (it would need a different number of weights).
So, if you really want to add layers before a dense layer that is already used, you need to make sure that the outputs of the last new layer is the same shape as the flatten's output. (It says you need 784, so your new last dense layer needs 784 units).
Another approach
Since you're adding intermediate layers, it's pointless to keep the last layer: it was trained specifically for a certain input, if you change the input, then you need to train it again.
Well... since you need to train it again anyway, why keep it? Just create a new one that will be suited to the shapes of your new previous layers.

Is it possible to save a trained layer to use layer on Keras?

I haven't used Keras and I'm thinking whether to use it or not.
I want to save a trained layer to use later. For example:
I train a model.
Then, I gain a trained layer t_layer.
I have another model to train which consists of layer1, layer2, layer3 .
I want to use t_layer as layer2 and not to update this layer(i.e. t_layer does not learn any more).
This may be an odd attempt, but I want to try this. Is this possible on Keras?
Yes, it is.
You will probably have to save the layer's weights and biases instead of saving the layer itself, but it's possible.
Keras also allows you to save entire models.
Suppose you have a model in the var model:
weightsAndBiases = model.layers[i].get_weights()
This is a list of numpy arrays, very probably with two arrays: weighs and biases. You can simply use numpy.save() to save these two arrays and later you can create a similar layer and give it the weights:
from keras.layers import *
from keras.models import Model
inp = Input(....)
out1 = SomeKerasLayer(...)(inp)
out2 = AnotherKerasLayer(....)(out1)
....
model = Model(inp,out2)
#above is the usual process of creating a model
#supposing layer 2 is the layer you want (you can also use names)
weights = numpy.load(...path to your saved weights)
biases = numpy.load(... path to your saved biases)
model.layers[2].set_weights([weights,biases])
You can make layers untrainable (must be done before the model compilation):
model.layers[2].trainable = False
Then you compile the model:
model.compile(.....)
And there you go, a model, whose one layer is untrainable and has weights and biases defined by you, taken from somewhere else.
Yes, it is a common practice in transfer learning, see here.
Thjs piece_to_share below can be one or more layers.
piece_to_share = tf.keras.Model(...)
full_model = tf.keras.Sequential([piece_to_share, ...])
full_model.fit(...)
piece_to_share.save(...)

Keras, siamese network, how to abstract feature?

I'm trying to modify Keras Siamese Network example to get image feature.
The problem is, how can I get image features? The output of last layer is only a number. What should I do to get the feature before euclidean_distance?
You can try to first train the model on the entire dataset and save it.
Load the model back again, now set the output layers to be processed_a and processed_b
Now call the model.predict() function on the entire dataset once again and you'll have the features for each image in the dataset.
Have a look at this
Hope this helps!
To get the embeddings from the Keras siamese network MNIST example after training:
model_a = Model(inputs=model.input, outputs=processed_a)
model_a.predict([tr_pairs[:, 0], tr_pairs[:, 1]])
I did it as follows (reference from my github post):
My trained siamese model looked like this:
siamese_model.summary()
Note that my newly redefined model is basically the same as the one highlighted in yellow
I then redefined my model which I wanted to use for extracting embeddings (It should be the same model you defined except now it will not have those multiple inputs like siamese) which looked like this:
siamese_embeddings_model = build_siamese_model(input_shape)
siamese_embeddings_model .summary()
Then I just extracted the weights from my trained siamese model and set them into my new model
embeddings_weights = siamese_model.layers[-3].get_weights()
siamese_embeddings_model.set_weights(embeddings_weights )
Then you can supply the new Image to extract the embeddings from the new model
vector = siamese.predict(image)
len(vector[0]) it will print 150 because of my fine dense layer (which are the output vector)

Resources