Backpropagation in bert - nlp

i would like to know when people say pretrained bert model, is it only the final classification neural network is trained
Or
Is there any update inside transformer through back propagation along with classification neural network

During pre-training, there is a complete training if the model (updation of weights). Moreover, BERT is trained on Masked Language Model objective and not classification objective.
In pre-training, you usually train a model with huge amount of generic data. Thus, it has to be fine-tuned with the task-specific data and task-specific objective.
So, if your task is classification on a dataset X. You fine-tune BERT accordingly. And now, you will be adding a task-specific layer (classification layer, in BERT they have used dense layer over [CLS] token). While fine-tuning, you update the pre-trained model weights as well as the new task-specific layer.

Related

NLP, Pre-trained models, BERT

I have a problem with training a transformer model. I am working on building a new transformer model. Now, the model is training on the corpus (Arabic text) with size 105GiB. The hyperparameters are the same as BERT, as the model is BERT. There is often exploding loss, and the value becomes larger and larger, as shown in the picture. Is there any interpretation and solution for this problem?
Picture:
[exploding loss]
(https://i.stack.imgur.com/Q8qjU.jpg)

How to convert the output of pretrained Huggingface transformer model from classification to regression for fine-tuning on my data?

I am using a transformer model that was extended from huggingface (DNABERT). This is a pretrained classification model whose output I would like to convert to regression, then fine-tune that model on my own data. I imagine this process would be roughly the same for any BERT-based huggingface classification model. How would I go about doing this?

How to add LSTM layer on top of Huggingface BERT model

I am working on a binary classification task and would like to try adding lstm layer on top of the last hidden layer of huggingface BERT model, however, I couldn't reach the last hidden layer. Is it possible to combine BERT with LSTM?
tokenizer = BertTokenizer.from_pretrained(model_path)
tain_inputs, train_labels, train_masks = data_prepare_BERT(
train_file, lab2ind, tokenizer, content_col, label_col,
max_seq_length)
validation_inputs, validation_labels, validation_masks = data_prepare_BERT(
dev_file, lab2ind, tokenizer, content_col, label_col,max_seq_length)
# Load BertForSequenceClassification, the pretrained BERT model with a single linear classification layer on top.
model = BertForSequenceClassification.from_pretrained(
model_path, num_labels=len(lab2ind))
Indeed it is possible, but you need to implement it yourself. BertForSequenceClassification class is a wrapper for BertModel. It runs the model, takes the hidden state corresponding to the [CLS] tokens, and applies a classifier on top of that.
In your case, you can the class as a starting point, and add there an LSTM layer between the BertModel and the classifier. The BertModel returns both the hidden states and a pooled state for classification in a tuple. Just take the other tuple member than is used in the original class.
Although it is technically possible, I would expect any performance gain compared to using BertForSequenceClassification. Finetuning of the Transformer layers can learn anything that an additional LSTM layer is capable of.

How to use BERT pre-trained model in Keras Embedding layer

How do I use a pre-trained BERT model like bert-base-uncased as weights in the Embedding layer in Keras?
Currently, I am generating word embddings using BERT model and it takes a lot of time. And I am assigning those weights like in the cide shown below
model.add(Embedding(307200, 1536, input_length=1536, weights=[embeddings]))
I searched on internet but the method is given in PyTorch. I need to do it in Keras. Please help.

Unsupervised finetuning of BERT for embeddings only?

I would like to fine-tuning BERT for a specific domain on unlabeled data and get the output layer to check the similarity between them. How can I do it? Do I need to fine-tuning first a classifier task (or question answer, etc..) and get the embeddings? Or can I just use a pre-trained Bert model without task and fine-tuning with my own data?
There is no need to fine-tune for classification, especially if you do not have any supervised classification dataset.
You should continue training BERT the same unsupervised way it was originally trained, i.e., continue "pre-training" using the masked-language-model objective and next sentence prediction. Hugginface's implementation contains class BertForPretraining for this.

Resources