How to properly use BERT in keras for classification - keras

I am having an issue with using BERT for classification of text within my database. Previously, I have used GLoVE and ELMo that work quite ok. Also Random forests give me quite good F1-scores (over 0.85), however, when using BERT, I am stuck around 0.55. I was trying to modify learning rate for Adam optimizer, used anything between 0.001 to 0.000001, but nothing really helps.
This is my code: https://github.com/EuropeanSocialInnovationDatabase/ESID-main/blob/development/TextMining/Classifiers/DatabaseWithKickStarter/NNClassifierTest2.py
If anyone can pin the problem down, I would be really grateful.

Related

What type of optimization to perform on my multi-label text classification LSTM model with Keras?

I'm using Windows 10 machine. Libraries: Keras with Tensorflow 2.0 Embeddings: Glove(100 dimensions).
I am trying to implement an LSTM architecture for multi-label text classification.
I am using different types of fine-tuning to achieve better results but with no luck so far.
The main problem I believe is the difference in class distributions of my dataset but after a lot of tries and errors, I couldn't implement stratified-k-split in Keras.
I am also experimenting with dropout layers, batch sizes, # of layers, learning rates, clip values, validation splits but I get a minimum boost or worst performance sometimes.
For metrics, I use mainly ROC and F1.
I also followed the suggestion from a StackOverflow member who said to delete some of my examples so I can balance my dataset but if I do that I will have a very low number of examples.
What would you suggest to me?
If someone can provide code based on my implementation for
stratified-k-split I would be grateful cause I have checked all the
online resources but can't implement it.
Any tips, suggestions will be really helpful.
Metrics Plots
Dataset form+Embedings form+train-test-split form
Dataset's labels distribution
My LSTM implementation

Wrong predictions from MNSIT keras model

I am new to neural networks so I tried my first neural network which is pretty close to one at keras learn page,given below:
https://github.com/aakarsh1011/Neural-Network/blob/master/MNSIT%20classification.ipynb
Kindlly look at the ending where I red a random image and tried to predict it which comes out as a bag, and when trained at epocs=5 it predicted it as a sandal.
Is something wrong with my code or labeling.
UPDATE - Being new to the field I didn't know the importance of epochs so I asked this question, I was afraid that I don't over-fit the model or train train too much. But there is no definite way to do this, it's all try and error. GOOD LUCK!
First of all, as far as I can see, your code is correct. Your model predicting the wrong item can be caused by the model not being trained for long enough. I would highly recommend you to set epochs=100 and you will be able to see the model's accuracy rise. You should generally always try to give your model as many epochs as possible for training. It will simply take some time. Try out some different numbers of epochs to find the one not taking too long, but still giving an acceptable result.

Looking for input on an accuracy rate that is different than the exact deep learning compiled code

I just began my Deep learning journey with Keras along with Tenserflow. I followed a tutorial that used a feed forward model on MNIST dataset. The strange part is that I used the same complied code, yet, I got a higher accuracy rate than the exact same code. I'm looking to understand why or how can this happen?

Keras layers explaination

I want to get a deep idea about how this keras layers works in a model. What does each layer doing in the model etc. I followed kers documentation and information isn't enough. If any of you know place to get more knowledge let me know.Thanks in advance
Keras layers are widely used CNN, DNN and RNN layers. There is atleast one research paper for each of them and there is a lot of educational material out there. If you are really curious you could look at keras' code. Some links for you:
https://github.com/keras-team/keras/tree/master/keras/layers
http://cs231n.github.io/convolutional-networks/
https://leonardoaraujosantos.gitbooks.io/artificial-inteligence
http://www.jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf

Can i turn the CIFAR-10 dataset to grayscale images and convert it to same dimension as MNIST dataset. Will the model be invalid or fail to learn?

I'm new in the field of Deep Neural Network. There are various deep learning frameworks nearby. Notably Theano, Torch7, Caffe, and recently open sourced TensorFlow. I have tried out a couple of tutorials with TensorFlow provided on their site. Specifically the MNIST dataset. I guess this is the hello world of every deep learning framework out there. I also viewed tutorials from here. This one was explained in detail, but they do not provide hands on experience with any deep learning frameworks. So which framework should be better for beginners? I looked up similar questions asked on Quora. Some said that theano is tougher to learn but it gives more control, Caffe is easier, but it gives less control over the network. And nothing on Tensorflow, as it is new, but from what i've seen the documentation is not That well written, also it seems tougher to understand. So as a newbie what should i choose to learn?
Another question, As I said, MNIST is the hello world of every deep learning framework, and many neural networks can be found for recognizing MNIST dataset. So, if I use the same network to detect other dataset, say CIFAR-10 dataset, will it work?? Let's just say that i turn the CIFAR-10 dataset to grayscale images and convert it to same dimension as MNIST dataset. Will the model be invalid or fail to learn? or have bad accuracy or what?

Resources