Use pretrained models to further train current corpus - nlp

Is it possible to leverage the pretrained model e.g. GLOVE and use it to further train a corpus.
Any example will be very helpful.

Related

How to convert the output of pretrained Huggingface transformer model from classification to regression for fine-tuning on my data?

I am using a transformer model that was extended from huggingface (DNABERT). This is a pretrained classification model whose output I would like to convert to regression, then fine-tune that model on my own data. I imagine this process would be roughly the same for any BERT-based huggingface classification model. How would I go about doing this?

How to post-train BERT model on custom dataset

I want to get the BERT word embeddings which will be used in another down-stream task later. I have a corpus for my custom dataset and want to further pre-train the pre-trained Huggingface BERT base model. I think this is called post-training. How can I do this using Huggingface transformers? Can I use transformers.BertForMaskedLM?

How to use pre-trained FastText embeddings with existing Seq2Seq model?

I'm new in NLP and I am trying to understand how to use pre-trained word embeddings like fastText with the existing Seq2Seq model. The Seq2Seq model I'm working with is the following. The encoder is simple and the decoder is Pointer Generator Network with CRF on the top. Both of them use an embedding layer.
The question: If I have my own dataset & vocab, how do I use both my own vocab and the one from the fastText? Do I have to use fastText weights in both the encoder and decoder?

How to use BERT pre-trained model in Keras Embedding layer

How do I use a pre-trained BERT model like bert-base-uncased as weights in the Embedding layer in Keras?
Currently, I am generating word embddings using BERT model and it takes a lot of time. And I am assigning those weights like in the cide shown below
model.add(Embedding(307200, 1536, input_length=1536, weights=[embeddings]))
I searched on internet but the method is given in PyTorch. I need to do it in Keras. Please help.

How to create gensim word2vec model using pre trained word vectors?

I have created word vectors using a distributed word2vec algorithm. Now I have words and their corresponding vectors. How to build a gensim word2vec model using these words and vectors?
I am not sure if you created word2vec model using gensim or some other tools but if understand your question correctly you want to just load the word2vec model using gensim. This is done in the following way:
import gensim
w2v_file = codecs.open(WORD2VEC_PATH, encoding='utf-8')
model = gensim.models.KeyedVectors.load_word2vec_format(w2v_file, binary=True) # or binary=False if the model is not compressed
If, however, what you want to do is to train word2vec model from scratch (i.e. from raw text) using purely gensim here is a tutorial on how to train word2vec model using gensim.

Resources