I have a problem with training a transformer model. I am working on building a new transformer model. Now, the model is training on the corpus (Arabic text) with size 105GiB. The hyperparameters are the same as BERT, as the model is BERT. There is often exploding loss, and the value becomes larger and larger, as shown in the picture. Is there any interpretation and solution for this problem?
Picture:
[exploding loss]
(https://i.stack.imgur.com/Q8qjU.jpg)
I am using a transformer model that was extended from huggingface (DNABERT). This is a pretrained classification model whose output I would like to convert to regression, then fine-tune that model on my own data. I imagine this process would be roughly the same for any BERT-based huggingface classification model. How would I go about doing this?
I am working on a binary classification task and would like to try adding lstm layer on top of the last hidden layer of huggingface BERT model, however, I couldn't reach the last hidden layer. Is it possible to combine BERT with LSTM?
tokenizer = BertTokenizer.from_pretrained(model_path)
tain_inputs, train_labels, train_masks = data_prepare_BERT(
train_file, lab2ind, tokenizer, content_col, label_col,
max_seq_length)
validation_inputs, validation_labels, validation_masks = data_prepare_BERT(
dev_file, lab2ind, tokenizer, content_col, label_col,max_seq_length)
# Load BertForSequenceClassification, the pretrained BERT model with a single linear classification layer on top.
model = BertForSequenceClassification.from_pretrained(
model_path, num_labels=len(lab2ind))
Indeed it is possible, but you need to implement it yourself. BertForSequenceClassification class is a wrapper for BertModel. It runs the model, takes the hidden state corresponding to the [CLS] tokens, and applies a classifier on top of that.
In your case, you can the class as a starting point, and add there an LSTM layer between the BertModel and the classifier. The BertModel returns both the hidden states and a pooled state for classification in a tuple. Just take the other tuple member than is used in the original class.
Although it is technically possible, I would expect any performance gain compared to using BertForSequenceClassification. Finetuning of the Transformer layers can learn anything that an additional LSTM layer is capable of.
How do I use a pre-trained BERT model like bert-base-uncased as weights in the Embedding layer in Keras?
Currently, I am generating word embddings using BERT model and it takes a lot of time. And I am assigning those weights like in the cide shown below
model.add(Embedding(307200, 1536, input_length=1536, weights=[embeddings]))
I searched on internet but the method is given in PyTorch. I need to do it in Keras. Please help.
I would like to fine-tuning BERT for a specific domain on unlabeled data and get the output layer to check the similarity between them. How can I do it? Do I need to fine-tuning first a classifier task (or question answer, etc..) and get the embeddings? Or can I just use a pre-trained Bert model without task and fine-tuning with my own data?
There is no need to fine-tune for classification, especially if you do not have any supervised classification dataset.
You should continue training BERT the same unsupervised way it was originally trained, i.e., continue "pre-training" using the masked-language-model objective and next sentence prediction. Hugginface's implementation contains class BertForPretraining for this.