How to improve Accuracy of AlexNet model? - python-3.x

I am making a model for covid-19 detection using Alex net CNN architecture.
After making model, training a dataset I found that model accuracy very low.
I am attaching code snapshot link of Google Clolab.
https://colab.research.google.com/drive/1MyACfSmaaZ0Gnfy3NgOm2kltvdMheWZP?usp=sharing
Please suggest how to improve its Accuracy and prediction.

Related

Accelerate BERT training with HuggingFace Model Parallelism

I am currently using SageMaker to train BERT and trying to improve the BERT training time. I use PyTorch and Huggingface on AWS g4dn.12xlarge instance type.
However when I run parallel training it is far from achieving linear improvement. I'm looking for some hints on distributed training to improve the BERT training time in SageMaker.
You can use SageMaker Distributed Data Parallel (SMDDP) to run training on a multinode and multigpu setup. Please refer to the below links for BERT based training example
https://github.com/aws/amazon-sagemaker-examples/blob/main/training/distributed_training/pytorch/data_parallel/bert/pytorch_smdataparallel_bert_demo.ipynb
This is with HuggingFace - https://github.com/aruncs2005/pytorch-ddp-sm-example
please refer to the documentation here for step by step instructions.
https://docs.aws.amazon.com/sagemaker/latest/dg/data-parallel-modify-sdp-pt.html

Does training a tflite model require images annotated?

I am trying to implement TFLite model for food detection and segmentation. This is the model i chose suitable for my food images dataset: [https://tfhub.dev/s?deployment-format=lite&q=inception%20resnet%20v2].
I searched over google to understand how the images are required to be annotated, but only end up in confusion. I understand the dataset is converted to TFRecords and then fed to the pretrained model. But for training the model with custom dataset, does not it require an annotation file? I dont see any info about this on TF hub either.
Please can anyone help me on this!
The answer to your question is depends on what model do you plan to train.
In the case of a model for food detection and segmentation you do need annotations when training. If you do not provide the model with labeled training data as it is a supervised learning model it cannot learn from them.
If you were to train an autoencoder the data does not need to be annotated. Hope the keywords used in this answer help you out search for more information about the topic.

Why some weights of GPT2Model are not initialized?

I am using the GPT2 pre-trained model for a research project and when I load the pre-trained model with the following code,
from transformers.models.gpt2.modeling_gpt2 import GPT2Model
gpt2 = GPT2Model.from_pretrained('gpt2')
I get the following warning message:
Some weights of GPT2Model were not initialized from the model checkpoint at gpt2 and are newly initialized: ['h.0.attn.masked_bias', 'h.1.attn.masked_bias', 'h.2.attn.masked_bias', 'h.3.attn.masked_bias', 'h.4.attn.masked_bias', 'h.5.attn.masked_bias', 'h.6.attn.masked_bias', 'h.7.attn.masked_bias', 'h.8.attn.masked_bias', 'h.9.attn.masked_bias', 'h.10.attn.masked_bias', 'h.11.attn.masked_bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
From my understanding, it says that the weights of the above layers are not initialized from the pre-trained model. But we all know that attention layers ('attn') are so important in GPT2 and if we can not have their actual weights from the pre-trained model, then what is the point of using a pre-trained model?
I really appreciate it if someone could explain this to me and tell me how I can fix this.
The masked_bias was added but the huggingface community as a speed improvement compared to the original implementation. It should not negatively impact the performance as the original weights are loaded properly. Check this PR for further information.

BERT weight calculation

I am trying to understand the BERT weight calculation. Please suggest me some article which can help me to understand the internal workings of BERT. I have read articles from Medium.
https://towardsdatascience.com/deconstructing-bert-distilling-6-patterns-from-100-million-parameters-b49113672f77
https://towardsdatascience.com/deconstructing-bert-part-2-visualizing-the-inner-workings-of-attention-60a16d86b5c1
I am doing a small project to understand the Bert pretraining and fine-tuning from different sources. My idea is to calculate the weights of each token in their own sources and find avg of all weights to get a global model. Then this global model can be used to fine-tune in different sources.
How can I find these weights, and how can average these weights from multiple sources?
can I visualise it? Then how?
Also, note that I am trying to use Tensorflow version of the Bert implementation and planning to fine-tune for the NER task.

Accuracy not increasing with BERT Large model

I used both BERT_base_cased and BERT_large_Cased model for multi class text classification. With BERT_base_cased, I got satisfactory results. When I tried with BERT_large_cased model, the accuracy is same for all the epochs
With BERT_base_cased, there is no such problem. But with BERT_large_cased, why accuracy is same in all the epochs? Any help is really appreciated.............

Resources