Accuracy not increasing with BERT Large model - nlp

I used both BERT_base_cased and BERT_large_Cased model for multi class text classification. With BERT_base_cased, I got satisfactory results. When I tried with BERT_large_cased model, the accuracy is same for all the epochs
With BERT_base_cased, there is no such problem. But with BERT_large_cased, why accuracy is same in all the epochs? Any help is really appreciated.............

Related

NLP, Pre-trained models, BERT

I have a problem with training a transformer model. I am working on building a new transformer model. Now, the model is training on the corpus (Arabic text) with size 105GiB. The hyperparameters are the same as BERT, as the model is BERT. There is often exploding loss, and the value becomes larger and larger, as shown in the picture. Is there any interpretation and solution for this problem?
Picture:
[exploding loss]
(https://i.stack.imgur.com/Q8qjU.jpg)

training accuracy highprediction of the training set is low

I am using keras 2.2.4 to train a text-based emotion recognition model, which is three categories classification.
I put accuracy in the model compile metric. While training, the accuracy showed in the result bar was normal, around 86%,and the loss is around 0.35, which i believed it was working properly.
After training, I found that the prediction with the testing set was pretty bad, acc was only around 0.44. with my instinct, it might be overfitting issue with the model. However, with my random curiosity, I put the training set into the model prediction, the accuracy was also pretty bad, same as the testing set.
The result showed that it might not be an overfitting issue with the model, and I cannot come up with any possible reason why this happen. Also, the difference between the accuracy output while training with the training set and the accuracy with the training set after training is even more confusing.
Does anyone ever encounter the same situation, and what problem it may be?

Would training a BERT Multi-Label Classifier for 100 labels decrease accuracy a lot?

I am trying to train a text classifier which would be able to classify a sentence as being of a certain query type. I have used the BERT Model and trained a Multi-Label classifier which does the job with 90% accuracy for about 20 labels.
My question is that if I have to train the model for 100/200 labels would the accuracy be impacted severely?
If your class distributions does not have a large overlap and you have the good amount of train data representing each class, your accuracy should not be severely impacted. For data hungry model like BERT its all about data. If you have large amount of data represent your 100/200 class you are good to go.

Good Accuracy + Low Val_loss but very bad predictions

Going straight to the problem...
I am using Keras flow_from_directory to load the data for sound classification. Data_generator without any augmentation and shuffle =True and although most of my models have a very good accuracy (92%) and a small val_loss the confusion matrix shows that the model is not predicting the labels correctly
I have tried simple models and complex models with keras flow_from_directory and data_generator on UrbanSound8k dataset. Also tried batch normalization, bias and kernel regularizers to avoid overfitting.
The results look almost random.

Accuracy of fine-tuning BERT varied significantly based on epochs for intent classification task

I used Bert base uncased as embedding and doing simple cosine similarity for intent classification in my dataset (around 400 classes and 2200 utterances, train:test=80:20). The base BERT model performs 60% accuracy in the test dataset, but different epochs of fine-tuning gave me quite unpredictable results.
This is my setting:
max_seq_length=150
train_batch_size=16
learning_rate=2e-5
These are my experiments:
base model accuracy=0.61
epochs=2.0 accuracy=0.30
epochs=5.0 accuracy=0.26
epochs=10.0 accuracy=0.15
epochs=50.0 accuracy=0.20
epochs=75.0 accuracy=0.92
epochs=100.0 accuracy=0.93
I don't understand while it behaved like this. I expect that any epochs of fine-tuning shouldn't be worse than the base model because I fine-tuned and inferred on the same dataset. Is there anything I misunderstand or should care about?
Well, generally you'll not be able to feed in all the data in your training set at once (I am assuming you have a huge-dataset that you'll have to use mini-batches). Hence, you split it into mini-batches. So, the accuracy that is displayed is strongly infuluenced a lot by the last mini-batch, or the last training step of the epoch.

Resources