I am trying to solve the problem of sequence completion. Let's suppose we have ground truth sequence (1,2,4,7,6,8,10,12,18,20)
The input to our model is an incomplete sequence. i.e (1,2,4, _ , _ ,_,10,12,18,20). From this incomplete sequence, we want to predict the original sequence (Ground Truth sequence). Which deep learning models can be used to solve this problem?
Is this the problem of encoder-decoder LSTM architecture?
Note: we have thousands of complete sequences to train and test the model.
Any help is appreciated.
This not exactly sequence-to-sequence problem, this is a sequence labeling problem. I would suggest either stacking bidirectional LSTM layers followed by a classifier or Transformer layers followed by a classifier.
Encoder-decoder architecture requires plenty of data to train properly and is particularly useful if the target sequence can be of arbitrary length, only vaguely depending on the source sequence length. It would eventually learn to do the job with enough, but sequence labeling is a more straightforward problem.
With sequence labeling, you can set a custom mask over the output, so the model will only predict the missing numbers. An encoder-decoder model would need to learn to copy most of the input first.
In your sequence completion task, are you trying to predict next items in a sequence or learn only the missing values?
Training a neural network with missing data is an issue on its own terms.
If you're using Keras and LSTM-type NN for solving your problem, you should consider masking, you can refer to this stackoverflow thread for more details: Multivariate LSTM with missing values
Regarding predicting the missing values, why not try auto-encoders?
Related
I want to fine tune BERT on a specific domain. I have texts of that domain in text files. How can I use these to fine tune BERT?
I am looking here currently.
My main objective is to get sentence embeddings using BERT.
The important distinction to make here is whether you want to fine-tune your model, or whether you want to expose it to additional pretraining.
The former is simply a way to train BERT to adapt to a specific supervised task, for which you generally need in the order of 1000 or more samples including labels.
Pretraining, on the other hand, is basically trying to help BERT better "understand" data from a certain domain, by basically continuing its unsupervised training objective ([MASK]ing specific words and trying to predict what word should be there), for which you do not need labeled data.
If your ultimate objective is sentence embeddings, however, I would strongly suggest you to have a look at Sentence Transformers, which is based on a slightly outdated version of Huggingface's transformers library, but primarily tries to generate high-quality embeddings. Note that there are ways to train with surrogate losses, where you try to emulate some form ofloss that is relevant for embeddings.
Edit: The author of Sentence-Transformers recently joined Huggingface, so I expect support to greatly improve over the upcoming months!
#dennlinger gave an exhaustive answer. Additional pretraining is also referred as "post-training", "domain adaptation" and "language modeling fine-tuning". here you will find an example how to do it.
But, since you want to have good sentence embeddings, you better use Sentence Transformers. Moreover, they provide fine-tuned models, which already capable of understanding semantic similarity between sentences. "Continue Training on Other Data" section is what you want to further fine-tune the model on your domain. You do have to prepare training dataset, according to one of available loss functions. E.g. ContrastLoss requires a pair of texts and a label, whether this pair is similar.
I believe transfer learning is useful to train the model on a specific domain. First you load the pretrained base model and freeze its weights, then you add another layer on top of the base model and train that layer based on your own training data. However, the data would need to be labelled.
Tensorflow has some useful guide on transfer learning.
You are talking about pre-training. Fine-tuning on unlabeled data is called pre-training and for getting started, you can take a look over here.
I'm trying to build a keras model to classify text for 45 different classes. I'm a little confused about preparing my data for the input as required by google's BERT model.
Some blog posts insert data as a tf dataset with input_ids, segment ids, and mask ids, as in this guide, but then some only go with input_ids and masks, as in this guide.
Also in the second guide, it notes that the segment mask and attention mask inputs are optional.
Can anyone explain whether or not those two are required for a multiclass classification task?
If it helps, each row of my data can consist of any number of sentences within a reasonably sized paragraph. I want to be able to classify each paragraph/input to a single label.
I can't seem to find many guides/blogs about using BERT with Keras (Tensorflow 2) for a multiclass problem, indeed many of them are for multi-label problems.
I guess it is too late to answer but I had the same question. I went through huggingface code and found that if attention_mask and segment_type ids are None then by default it pays attention to all tokens and all the segments are given id 0.
If you want to check it out, you can find the code here
Let me know if this clarifies it or you think otherwise.
I don't have much experience with training neural networks. I have 4 variable vectors as input and I have respectively 3 variable output vector. I want to create a neural network that takes these inputs and outputs which have some unknown correlation(might not be linear) between them and train. So that when I put previously untrained data through it should predict the correlated output.
I was wondering,
What type of model should I use in such scenarios? Is it Restricted boltzmann machine, regression, GAN, etc?
What library is easiest to learn and implement for such a model? eg:- TensorFlow, PyTorch, etc
If images were involved which can be processed as fft arrays, would the model change.
I did find this answer, but I am not satisfied with it.
Please let me know if there are any functions or other points you would like me to know. Any help is much appreciated.
A multilayer perceprton is a good place to start.
Keras is the highest level/easiest to use library I have used.
If you are working with images or spatially structured data a convolutional neural network will probably work best.
I am a newby to the convolutional neural nets... so this may be an ignorant question.
I have followed many examples and tutorials now on the MNIST example in TensforFlow. In the CNN examples, all authors talk bout using the 'input filters' to run in the CNN. But no one that I can find mentions WHERE they come from. Can anyone answer where these come from? Or are they magically obtained from the input images.
Thanks! Chris
This is an image that one professor uses, be he does not exaplain if he made them or TensorFlow auto-extracts these somehow.
Disclaimer: I am not an expert, more of an enthusiast.
To cut a long story short: filters are the CNN equivalent of weights, and all a neural network essentially does is learning their optimal values.
Which it does by iterating through a training dataset, making predictions, comparing them to the label/value already assigned to each training unit (usually an image in case of a CNN) and adjusting weights to minimize the error function (the difference between the predicted value and the actual value).
Initial values of filters/weights do not matter that much, so although they might affect the speed of convergence to a small degree, I believe they are often assigned random values.
It is the job of the neural network to figure out the optimal weights, not of the person implementing it.
I have been following the http://deeplearning.net/tutorial/ tutorial on how to train an ANN to classify the MNIST numbers. I am now at the "Convolutional Neural Networks" chapter. I want to use the trained network on single examples (MNIST images) and get the predictions. Is there a way to do that?
I have looked ahead in the tutorial and on google but can't find anything.
Thanks a lot in advance for any kind of help!
The material in the Theano tutorial in the earlier chapters, before reaching the Convolutional Neural Networks (CNN) chapter, give a good overview of how Theano works and some of the components the CNN sample code uses. It might be reasonable to assume that students reaching this point have developed their understanding of Theano sufficiently to figure out how to modify the code to extract the model's predictions. Here's a few hints.
The CNN's output layer, called layer3, is an instance of the LogisticRegression class, introduced in an earlier chapter.
The LogisticRegression class has an attribute called y_pred. The comments next to the code which assigns that attribute's values says
symbolic description of how to compute prediction as class whose
probability is maximal
Looking for places where y_pred is used in the logistic regression sample will highlight a function called predict(). This does for the logistic regression sample what is desired of the CNN example.
If one follows the same approach, using layer3.y_pred as the output of a new Theano function, the model's predictions will become apparent.