How to change max sequence length for transformers.bert? - pytorch

I download bert-base pretrained model. I edit the config.json (from 512 to 256)
"max_position_embeddings": 256,
Then I want to use bert model,
from transformers import BertForSequenceClassification
model = BertForSequenceClassification.from_pretrained(
MODEL_PATH,
num_labels = 2, # The number of output labels--2 for binary classification.
output_attentions = False,
output_hidden_states = False,
)
# Tell pytorch to run this model on the GPU.
model.cuda()
But it raise an error
Error(s) in loading state_dict for BertForSequenceClassification:
size mismatch for bert.embeddings.position_embeddings.weight: copying a param with shape torch.Size([512, 768]) from checkpoint, the shape in current model is torch.Size([256, 768]).
I know the reason is because I change the max sequence length. What is the right way, if I want to change the max seq lenght?

The error says that the saved weights cannot be loaded to initialized model because of the difference in shapes of layers.
If you want to finetune the model on a subsequent task you can not change pretrained model config. Instead you should set max_length in encode_plus function and that will truncate the input sequence to max_length.
But if you want to pretrain model with a specific config you should initialize model with no weights or may find appropriate weights on huggingface repository.

Related

compilation step in keras sequential model throwing the error "ValueError: Input 0 of layer sequential_9 is incompatible with the layer:

I'm trying to develop as classifier for two classes. I've implemented the model as follows:
model = keras.models.Sequential() #using tensorflow's version of keras
model.add(keras.layers.InputLayer(input_shape = X_train_scaled[:,1].shape))
model.add(keras.layers.Dense(250,activation="relu"))
model.add(keras.layers.Dense(50,activation="relu"))
model.add(keras.layers.Dense(2,activation="softmax"))
model.summary()
# Compile the model
model.compile(loss = 'sparse_categorical_crossentropy',
optimizer = "sgd",
metrics = ["accuracy"])
The size of the inputs are
X_train_scaled[:,1].shape, y_train.shape
((552,), (552,))
The entire error message is:
ValueError: Input 0 of layer sequential_9 is incompatible with the layer:
expected axis -1 of input shape to have value 552 but received input with shape (None, 1)
What am I doing wrong here?
The error message says that you defined a model which expects as input a shape of (batch_size, 552) and you are trying to feed it an array with a shape of (batch_size, 1).
The issue is most likely with
input_shape = X_train_scaled[:,1].shape)
This should most likely be:
input_shape = X_train_scaled.shape[1:]
i.e. you want to define the shape of your model to the the shape of the features (without the number of examples). The model is then fed in mini-batches... e.g. if you do call model.fit(X_train_scaled, ...) keras will create mini-batches (of 32 examples by default) and update the model weights for each mini batch.
Also, please be aware that you defined the model to return a shape of (batch_size, 2). So y_train must have a shape of (X_train.shape[0], 2).
The question was answered by Pedro

ValueError: Layer conv2d_41 was called with an input that isn't a symbolic tensor. All inputs to the layer should be tensors

I try transfer learning with custom input of backbone:
(I can not transfer learning normally because my input shape is N*N*8, so I need add small network_1 to reach N*N*3)
model_1
|
|
add model_2
|
add some layer
My code:
model_1.add(model_2)
model_1 is my small network:
model_2 is Mobilenet or VGG16, or Densenet .....
model_1 = Sequential()
model_1.add(InputLayer(input_shape=(size, size, F), name="InputLayer"))
model_1.add(Convolution2D(3, 128, padding = 'same'))
from keras.applications.densenet import DenseNet169
model_2=DenseNet169(weights='imagenet',include_top=False)
model_2.layers.pop(0) # remove input_layer of model_2
model_1.add(model_2) # output model_1 is input model_2?
model_1 = GlobalAveragePooling2D()(model_1)
model_1 = Dropout(0.2)(model_1)
model_1 = Dense(256*256, activation='softmax')(model_1)
model_1 = Reshape(256, 256)(model_1)
I got errors:
ValueError: Layer global_average_pooling2d_3 was called with an input that isn't a symbolic tensor. Received type: <class 'keras.engine.sequential.Sequential'>. Full input: [<keras.engine.sequential.Sequential object at 0x7f74f6621d68>]. All inputs to the layer should be tensors.
What wrong in my code?
Global Average Pooling 2D is a layer that performs an operation on a tensor, or multi-dimensional array. Therefore, passing a model architecture like DenseNet throws an error because the model has no idea what it's looking at. Model Architecture files are totally different from tensors.
To achieve what I think you're trying to achieve, run DenseNet and then pass the output of DenseNet into the model you're creating, instead of passing the model itself. Good luck!

How to extract features from a layer of the pretrained ResNet model Keras

I trained a model with Resnet3D and I want to extract the neurons of a layer. I plan to use them with the SVM classifier. How can I extract these weights and put them to the numpy array?
Load the weights by keras
model = Resnet3DBuilder.build_resnet_18((128, 96, 96, 3), nClass[0])
model.load_weights('drive/app/models/3d_resnet_modelq.hdf5')
extract a layer
dns = model.layers[-1].output
now what should i do?
If you just want to visualise the features, in pure Keras you can define a Model with the desired layer as output:
from keras.models import Model
model_cut = Model(inputs=model.inputs, output=model.layers[-1].output)
features = model_cut.predict(x) # Assuming you have your images in x
Note that in order for this to work, model must have been compiled at least once.

Modify ResNet50 output layer for regression

I am trying to create a ResNet50 model for a regression problem, with an output value ranging from -1 to 1.
I omitted the classes argument, and in my preprocessing step I resize my images to 224,224,3.
I try to create the model with
def create_resnet(load_pretrained=False):
if load_pretrained:
weights = 'imagenet'
else:
weights = None
# Get base model
base_model = ResNet50(weights=weights)
optimizer = Adam(lr=1e-3)
base_model.compile(loss='mse', optimizer=optimizer)
return base_model
and then create the model, print the summary and use the fit_generator to train
history = model.fit_generator(batch_generator(X_train, y_train, 100, 1),
steps_per_epoch=300,
epochs=10,
validation_data=batch_generator(X_valid, y_valid, 100, 0),
validation_steps=200,
verbose=1,
shuffle = 1)
I get an error though that says
ValueError: Error when checking target: expected fc1000 to have shape (1000,) but got array with shape (1,)
Looking at the model summary, this makes sense, since the final Dense layer has an output shape of (None, 1000)
fc1000 (Dense) (None, 1000) 2049000 avg_pool[0][0]
But I can't figure out how to modify the model. I've read through the Keras documentation and looked at several examples, but pretty much everything I see is for a classification model.
How can I modify the model so it is formatted properly for regression?
Your code is throwing the error because you're using the original fully-connected top layer that was trained to classify images into one of 1000 classes. To make the network working, you need to replace this top layer with your own which should have the shape compatible with your dataset and task.
Here is a small snippet I was using to create an ImageNet pre-trained model for the regression task (face landmarks prediction) with Keras:
NUM_OF_LANDMARKS = 136
def create_model(input_shape, top='flatten'):
if top not in ('flatten', 'avg', 'max'):
raise ValueError('unexpected top layer type: %s' % top)
# connects base model with new "head"
BottleneckLayer = {
'flatten': Flatten(),
'avg': GlobalAvgPooling2D(),
'max': GlobalMaxPooling2D()
}[top]
base = InceptionResNetV2(input_shape=input_shape,
include_top=False,
weights='imagenet')
x = BottleneckLayer(base.output)
x = Dense(NUM_OF_LANDMARKS, activation='linear')(x)
model = Model(inputs=base.inputs, outputs=x)
return model
In your case, I guess you only need to replace InceptionResNetV2 with ResNet50. Essentially, you are creating a pre-trained model without top layers:
base = ResNet50(input_shape=input_shape, include_top=False)
And then attaching your custom layer on top of it:
x = Flatten()(base.output)
x = Dense(NUM_OF_LANDMARKS, activation='sigmoid')(x)
model = Model(inputs=base.inputs, outputs=x)
That's it.
You also can check this link from the Keras repository that shows how ResNet50 is constructed internally. I believe it will give you some insights about the functional API and layers replacement.
Also, I would say that both regression and classification tasks are not that different if we're talking about fine-tuning pre-trained ImageNet models. The type of task mostly depends on your loss function and the top layer's activation function. Otherwise, you still have a fully-connected layer with N outputs but they are interpreted in a different way.

Embedding vs inserting word vectors directly to input layer

I used gensim to build a word2vec embedding of my corpus.
Currently I'm converting my (padded) input sentences to the word vectors using the gensim model.
This vectors are used as input for the model.
model = Sequential()
model.add(Masking(mask_value=0.0, input_shape=(MAX_SEQUENCE_LENGTH, dim)))
model.add(Bidirectional(
LSTM(num_lstm, dropout=0.5, recurrent_dropout=0.4, return_sequences=True))
)
...
model.fit(training_sentences_vectors, training_labels, validation_data=validation_data)
Are there any drawbacks using the word vectors directly without a keras embedding layer?
I'm also currently adding additional (one-hot encoded) tags to the input tokens by concatenating them to each word vector, does this approach make sense?
In your current setup, the drawback will be that you will not be able to set your word vectors to be trainable. You will not be able to fine tune your model for your task.
What I mean by this is that Gensim has only learned the "Language Model". It understands your corpus and its contents. However, it does not know how to optimize for whatever downstream task you are using keras for. Your model's weights will help to fine tune your model, however you will likely experience an increase in performance if you extract the embeddings from gensim, use them to initialize a keras embedding layer, and then pass in indexes instead of word vectors for your input layer.
There's an elegant way to do what you need.
Problem with your solution is that:
the size of the input is large: (batch_size, MAX_SEQUENCE_LENGTH, dim) and may not fit in memory.
You won't be able to train and update the word vectors as per your task
You can instead get away with just: (batch_size, MAX_SEQUENCE_LENGTH). The keras embedding layer allows you to pass in a word index and get a vector. So, 42 -> Embedding Layer -> [3, 5.2, ..., 33].
Conveniently, gensim's w2v model has a function get_keras_embedding which creates the needed embedding layer for you with the trained weights.
gensim_model = # train it or load it
embedding_layer = gensim_model.wv.get_keras_embedding(train_embeddings=True)
embedding_layer.mask_zero = True # No need for a masking layer
model = Sequential()
model.add(embedding_layer) # your embedding layer
model.add(Bidirectional(
LSTM(num_lstm, dropout=0.5, recurrent_dropout=0.4, return_sequences=True))
)
But, you have to make sure the index for a word in the data is the same as the index for the word2vec model.
word2index = {}
for index, word in enumerate(model.wv.index2word):
word2index[word] = index
Use the above word2index dictionary to convert your input data to have the same index as the gensim model.
For example, your data might be:
X_train = [["hello", "there"], ["General", "Kenobi"]]
new_X_train = []
for sent in X_train:
temp_sent = []
for word in sent:
temp_sent.append(word2index[word])
# Add the padding for each sentence. Here I am padding with 0
temp_sent += [0] * (MAX_SEQUENCE_LENGTH - len(temp_sent))
new_X_train.append(temp_sent)
X_train = numpy.as_array(new_X_train)
Now you can use X_train and it will be like: [[23, 34, 0, 0], [21, 63, 0, 0]]
The Embedding Layer will map the index to that vector automatically and train it if needed.
I think this is the best way of doing it but I'll dig into how gensim wants it to be done and update this post if needed.

Resources