Finetune Text embeddings using BERT? - nlp

are the text emebddings also fine-tuned when fine-tuning for classification task? Or up to which layer are the encodings fine-tuned (sencond last layer)?

If you are using the original BERT repository published by Google, all layers are trainable; meaning: no freezing at all. You can check that by printing tf.trainable_variables().

Related

Can I fine-tune BERT using only masked language model and next sentence prediction?

So if I understand correctly there are mainly two ways to adapt BERT to a specific task: fine-tuning (all weights are changed, even pretrained ones) and feature-based (pretrained weights are frozen). However, I am confused.
When to use which one? If you have unlabeled data (unsupervised learning), should you then use fine-tuning?
If I want to fine-tuned BERT, isn't the only option to do that using masked language model and next sentence prediction? And also: is it necessary to put another layer of neural network on top?
Thank you.
Your first approach should be to try the pre-trained weights. Generally it works well. However if you are working on a different domain (e.g.: Medicine), then you'll need to fine-tune on data from new domain. Again you might be able to find pre-trained models on the domains (e.g.: BioBERT).
For adding layer, there are slightly different approaches depending on your task. E.g.: For question-answering, have a look at TANDA paper (Transfer and Adapt Pre-Trained Transformer Models for Answer Sentence Selection). It is a very nice easily readable paper which explains the transfer and adaptation strategy. Again, hugging-face has modified and pre-trained models for most of the standard tasks.

Unsupervised finetuning of BERT for embeddings only?

I would like to fine-tuning BERT for a specific domain on unlabeled data and get the output layer to check the similarity between them. How can I do it? Do I need to fine-tuning first a classifier task (or question answer, etc..) and get the embeddings? Or can I just use a pre-trained Bert model without task and fine-tuning with my own data?
There is no need to fine-tune for classification, especially if you do not have any supervised classification dataset.
You should continue training BERT the same unsupervised way it was originally trained, i.e., continue "pre-training" using the masked-language-model objective and next sentence prediction. Hugginface's implementation contains class BertForPretraining for this.

CNN with CTC loss

I want to extract features using a pretrained CNN model(ResNet50, VGG, etc) and use the features with a CTC loss function.
I want to build it as a text recognition model.
Anyone on how can i achieve this ?
I'm not sure if you are looking to finetune the pretrained models or to use the models for feature extraction. To do the latter freeze the petrained model weights (there are several ways to do this in PyTorch, the simplest being calling .eval() on the model), and feed the logits from the last layer of the model to your new output head. See the PyTorch tutorial here for a more in depth guide.

convolutional neural network text classification additional features

I am trying to implement CNN on text classification task. I understand that CNN can extract and abstract features from pure text.
What if I have some additional very useful features that not in the text? How should I add those features into the CNN?
Currently, what I am doing is concatenating the convolution layer results with an additional feature vector. And then feed them to the hidden layers. Is this a right way to do it?
Thanks!

Train some embeddings, keep others fixed

I do sequence classification with Keras, using an RNN and embeddings. My sequences are a bit weird. I have words mixed with special symbols. Words are associated with fixed, pre-trained embeddings, but the special symbol embeddings have to be modified during training.
In an Embedding layer during learning, how can I keep some embeddings fixed while updating others? Is there a way to mask those indices which shouldn't be modified? Or is this a case for a custom Embedding layer?
I do not believe that this is achievable with the existing Embedding layer. To get around it I would just create a custom layer that builds two embedding layers internally, and only puts the embedding matrix of one of them into the trainable_parameters.

Resources