What is the math behind the Keras Tokenizer() function? - keras

I would like to know the mathematics behind Tokenizer()
Good morning, first of all I would like to thank you in advance for your answer. I am doing an essay on the mathematics behind a text classifier with NLP and neural networks and I would like to know how exactly the TOKENIZER function of keras works. Whether cosine similarity is involved and how the dictionary creation is carried out taking frequency into account. If anyone knows the answer or a book/article where it is reflected, I will be eternally grateful.

Related

How to get the probability of a particular token(word) in a sentence given the context

I'm trying to calculate the probability or any type of score for words in a sentence using NLP. I've tried this approach with GPT2 model using Huggingface Transformers library, but, I couldn't get satisfactory results due to the model's unidirectional nature which for me didn't seem to predict within context. So I was wondering whether there is a way, to calculate the above said using BERT since it's Bidirectional.
I've found this post relatable, which I randomly saw the other day but didn't see any answer which would be useful for me as well.
Hope I will be able to receive ideas or a solution for this. Any help is appreciated. Thank you.
BERT is trained as a masked language model, i.e., it is trained to predict tokens that were replaced by a [MASK] token.
from transformers import AutoTokenizer, BertForMaskedLM
tok = AutoTokenizer.from_pretrained("bert-base-cased")
bert = BertForMaskedLM.from_pretrained("bert-base-cased")
input_idx = tok.encode(f"The {tok.mask_token} were the best rock band ever.")
logits = bert(torch.tensor([input_idx]))[0]
prediction = logits[0].argmax(dim=1)
print(tok.convert_ids_to_tokens(prediction[2].numpy().tolist()))
It prints token no. 11581 which is:
Beatles
To get a normalized probability distribution over BERT's vocabulary, you can normalize the logits using the softmax function, i.e., F.softmax(logits, dim=1), (assuming standart import torch.nn.fucntional as F).
The tricky thing is that words might be split into multiple subwords. You can simulate that by adding multiple [MASK] tokens, but then you have a problem with how to compare the scores of prediction so different lengths reliably. I would probably average the probabilities, but maybe there is a better way.

Which model should I use? - Multi label classification

I am newbie on data science so my question might be basic.
I have a dataset. 1st column is comments of people about issues (as text), 2nd columns is class/labels of that failure (as text). There are many failure types on my 2nd column.
I want to train a model. When another comment is entered and explained the issue, model should classify the failure.
Can I use Keras Sequential model? Or should I use different model? If you can share a link which can be related my question, I will be appreciate.
You can use Keras Sequential model for sure. Now as a beginner, try using Dense layers, and you can also use Convolutional Neural Networks for it...
and btw try using the tensorflow.keras.preprocessing.text Tokenizer to label each word as numbers so the machine can understand.
For more information, search on Google for text classification and search for the Tokenizer.

How to properly use BERT in keras for classification

I am having an issue with using BERT for classification of text within my database. Previously, I have used GLoVE and ELMo that work quite ok. Also Random forests give me quite good F1-scores (over 0.85), however, when using BERT, I am stuck around 0.55. I was trying to modify learning rate for Adam optimizer, used anything between 0.001 to 0.000001, but nothing really helps.
This is my code: https://github.com/EuropeanSocialInnovationDatabase/ESID-main/blob/development/TextMining/Classifiers/DatabaseWithKickStarter/NNClassifierTest2.py
If anyone can pin the problem down, I would be really grateful.

Replicating Semantic Analysis Model in Demo

Good day, I am a student that is interested in NLP. I have come across the demo on AllenNLP's homepage, which stated that:
The model is a simple LSTM using GloVe embeddings that is trained on the binary classification setting of the Stanford Sentiment Treebank. It achieves about 87% accuracy on the test set.
Is there any reference to the sample code or any tutorial that I can follow to replicate this result, so that I can learn more about this subject? I am trying to obtain a Regression Output (Instead of classification).
I hope that someone can point me in the right direction.. Any help is much appreciated. Thank you!
AllenAI provides all code for examples and lib opensource on Git, including AllenNLP.
I found exactly how the example was run here: https://github.com/allenai/allennlp/blob/master/allennlp/tests/data/dataset_readers/stanford_sentiment_tree_bank_test.py
However, to make it a Regression task, you'll have to tweak directly on Pytorch, which is the underlying technology for AllenNLP.

How do I use a trained Theano artificial neural network on single examples?

I have been following the http://deeplearning.net/tutorial/ tutorial on how to train an ANN to classify the MNIST numbers. I am now at the "Convolutional Neural Networks" chapter. I want to use the trained network on single examples (MNIST images) and get the predictions. Is there a way to do that?
I have looked ahead in the tutorial and on google but can't find anything.
Thanks a lot in advance for any kind of help!
The material in the Theano tutorial in the earlier chapters, before reaching the Convolutional Neural Networks (CNN) chapter, give a good overview of how Theano works and some of the components the CNN sample code uses. It might be reasonable to assume that students reaching this point have developed their understanding of Theano sufficiently to figure out how to modify the code to extract the model's predictions. Here's a few hints.
The CNN's output layer, called layer3, is an instance of the LogisticRegression class, introduced in an earlier chapter.
The LogisticRegression class has an attribute called y_pred. The comments next to the code which assigns that attribute's values says
symbolic description of how to compute prediction as class whose
probability is maximal
Looking for places where y_pred is used in the logistic regression sample will highlight a function called predict(). This does for the logistic regression sample what is desired of the CNN example.
If one follows the same approach, using layer3.y_pred as the output of a new Theano function, the model's predictions will become apparent.

Resources