Is there any ideal parameter values for fasttext train_supervised function? - nlp

I working on NLP problem and try to make text classification with word embedding method. I am training my model with fasttext's train_supervised but is there any ideal or best parameter values for this function that you can advise me also I am using Kfold with some values how can I find best K-fold number in this problem ?
My solution is I am using fasttext's autotune function to find best param values for model to train but is there any possible suggestion to give me ? Following image shows my best params in the model. Finally , I am using fasttext's pretrained word vector model for my training.

Let me answer my own question you can look at the default and optimum parameters values by clicking the following link ;
https://fasttext.cc/docs/en/options.html
and also you can use fasttext's libraries autotune function (Automatic hyperparameter optimization) to find best parameters for your special train and validation dataset by clicking the following link ;
https://fasttext.cc/docs/en/autotune.html
and finally this is the pretrained word vectors provided by fasttext library to utilize in your model's training process also making positive progress for model , in the following link's site they are in the Model section ;
https://fasttext.cc/docs/en/crawl-vectors.html

Related

Use NLLB Machine Translation with Pytorch

I've managed to save a model for automatic translation with pytorch and I'm wondering how can I use it to translate input sentences.
I came across an article suggesting to use a class_mapping so I guess that I should use the dictionary available on that github page as python list https://github.com/facebookresearch/fairseq/tree/nllb
I used this line of code to load the model :
model = torch.load(PATH)
model.eval()
So my question is how can I use the model and the dictionary in order to "predict" the translation of a sentence ?
Thank you

How to get last layer before classification layer on fasttext?

I would like to solve a text similarity problem using fasttext. I am able to create model to get text classification labels using fasttext. But I would like to get document vector which is created for classification layer input generating fasttext model. Then using some similarity methods get the scores .
How can I do that ? Any help would be great
Thanks
I'm fairly sure FastText's supervised mode interim value is just an average of all the input text's word-vectors. So you can request the individual word-vectors, then average them together yourself.

Which model: Best estimator from gridsearchCV or all training data?

I am a little confused when it comes to gridsearch and fitting the final model. I split the in 2: training and testing. The testing set is only used for final evaluation. I perform grid search only using the training data.
Say one has done a grid search over several hyperparameters using cross-validation. The grid search gives the best combination of the hyperparameters. Next step is to train the model, and this is where I am confused. I see 2 possibilities:
1) Don't train the model. Use the parameters from the best model from the grid search.
or
2) Don't use the parameters from the best model from the grid search. Train the model on the full training set with the best hyperparameter combination from the grid search.
What is the correct approach, 1 or 2?
This is probably late, but might be useful for someone else who comes along.
GridSearchCV has an attribute called refit, which is set to True by default. This means that after performing k-fold cross-validation (i.e., training on a subset of the data you passed in), it refits the model using the best hyperparameters from the grid search, on the complete training set.
Presumably your question, from what I can glean, can be summarized as:
Suppose you use 5-fold cross-validation. Your model is then fitted only on 4 folds, as the fifth fold is used for validation. So would you need to retrain the model on the whole of train (i.e., the data from all 5 folds)?
The answer is no, provided you set refit to True, in which case GridSearchCV will perform the training over the whole of the training set using the best hyperparameters it has found after cross-validation. It will then return the trained estimator object, on which you can directly call the predict method, as you would normally do otherwise.
Refer: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html
You train the model using the training set and the parameters obtained by the GridSearch.
And then you can test the model with the test set.

Use pretrained embedding in Spanish with Torchtext

I am using Torchtext in an NLP project. I have a pretrained embedding in my system, which I'd like to use. Therefore, I tried:
my_field.vocab.load_vectors(my_path)
But, apparently, this only accepts the names of a short list of pre-accepted embeddings, for some reason. In particular, I get this error:
Got string input vector "my_path", but allowed pretrained vectors are ['charngram.100d', 'fasttext.en.300d', ..., 'glove.6B.300d']
I found some people with similar problems, but the solutions I can find so far are "change Torchtext source code", which I would rather avoid if at all possible.
Is there any other way in which I can work with my pretrained embedding? A solution that allows to use another Spanish pretrained embedding is acceptable.
Some people seem to think it is not clear what I am asking. So, if the title and final question are not enough: "I need help using a pre-trained Spanish word-embedding in Torchtext".
It turns out there is a relatively simple way to do this without changing Torchtext's source code. Inspiration from this Github thread.
1. Create numpy word-vector tensor
You need to load your embedding so you end up with a numpy array with dimensions (number_of_words, word_vector_length):
my_vecs_array[word_index] should return your corresponding word vector.
IMPORTANT. The indices (word_index) for this array array MUST be taken from Torchtext's word-to-index dictionary (field.vocab.stoi). Otherwise Torchtext will point to the wrong vectors!
Don't forget to convert to tensor:
my_vecs_tensor = torch.from_numpy(my_vecs_array)
2. Load array to Torchtext
I don't think this step is really necessary because of the next one, but it allows to have the Torchtext field with both the dictionary and vectors in one place.
my_field.vocab.set_vectors(my_field.vocab.stoi, my_vecs_tensor, word_vectors_length)
3. Pass weights to model
In your model you will declare the embedding like this:
my_embedding = toch.nn.Embedding(vocab_len, word_vect_len)
Then you can load your weights using:
my_embedding.weight = torch.nn.Parameter(my_field.vocab.vectors, requires_grad=False)
Use requires_grad=True if you want to train the embedding, use False if you want to freeze it.
EDIT: It looks like there is another way that looks a bit easier! The improvement is that apparently you can pass the pre-trained word vectors directly during the vocabulary-building step, so that takes care of steps 1-2 here.

How do I use a trained Theano artificial neural network on single examples?

I have been following the http://deeplearning.net/tutorial/ tutorial on how to train an ANN to classify the MNIST numbers. I am now at the "Convolutional Neural Networks" chapter. I want to use the trained network on single examples (MNIST images) and get the predictions. Is there a way to do that?
I have looked ahead in the tutorial and on google but can't find anything.
Thanks a lot in advance for any kind of help!
The material in the Theano tutorial in the earlier chapters, before reaching the Convolutional Neural Networks (CNN) chapter, give a good overview of how Theano works and some of the components the CNN sample code uses. It might be reasonable to assume that students reaching this point have developed their understanding of Theano sufficiently to figure out how to modify the code to extract the model's predictions. Here's a few hints.
The CNN's output layer, called layer3, is an instance of the LogisticRegression class, introduced in an earlier chapter.
The LogisticRegression class has an attribute called y_pred. The comments next to the code which assigns that attribute's values says
symbolic description of how to compute prediction as class whose
probability is maximal
Looking for places where y_pred is used in the logistic regression sample will highlight a function called predict(). This does for the logistic regression sample what is desired of the CNN example.
If one follows the same approach, using layer3.y_pred as the output of a new Theano function, the model's predictions will become apparent.

Resources