Best tool for text representation to deep learning - keras

so I wanna ask you which is the best tool used to prepare my text to deep learning?
What is the difference between Word2Vec, Glove, Keras, LSA...

You should use a pre-trained embedding to represent the sentence into a vector or a matrix. There are a lot of sources where you can find pre-trained embeddings that use different dataset (for instance all the Wikipedia) to train their models. These models can have different length, but normally each word is represented with 100 or 300 dimensions.
Pre-trained embeddings
Pre-trained embeddings 2

Related

Training SVM classifier (word embeddings vs. sentence embeddings)

I want to experiment with different embeddings such Word2Vec, ELMo, and BERT but I'm a little confused about whether to use the word embeddings or sentence embeddings, and why. I'm using the embeddings as features input to SVM classifier.
Thank you.
Though both approaches can prove efficient for different datasets, as a rule of thumb I would advice you to use word embeddings when your input is of a few words, and sentence embeddings when your input in longer (e.g. large paragraphs).

Is it OK to combine domain specific word2vec embeddings and off the shelf ELMo embeddings for a downstream unsupervised task?

I am wondering if I am using word embeddings correctly.
I have combined contextualised word vectors with static word vectors because:
my domain corpus is too small to effectively train the model from scratch
my domain is too specialised to use general embeddings.
I used the off the shelf ELMo small model and trained word2vec model on a small domain specific corpus (around 500 academic papers). I then did a simple concatenation of the vectors from the two different embeddings.
I loosely followed the approach in this paper:
https://www.aclweb.org/anthology/P19-2041.pdf
But the approach in the paper trains the embeddings for a specific task. In my domain there is no labeled training data. Hence me just training the embeddings on the corpus alone.
I am new to NLP, so apologies if I am asking a stupid question.

Using cosine similarity for classifying documents

I have a set of files for five different categories and most of them are not labelled correctly.Objective is to predict the correct category of the file whenever the same is uploaded.I used cosine similarity along with tf -idf to predict the class of the document with which cosine similarity is the maximum as of now i am getting good results but really not sure how well this will work down the road. Also why isnt cosine similarity used in building document classifiers instead of machine learning models when the categories of files are labelled correctly?Would really appreciate your feedback on my approach as well as your answer to the question.
Cosine similarity is used for calculating the angle between two n-dimensional vectors. These vectors are mostly produced by Embeddings. They are pretrained models which produce word embeddings or fixed size vectors.
Cosine similarity is mostly used with vectors produced by word
embeddings. If you are using something like Doc2Vec, then you get a
vector for the whole document. These vectors could be categorized by
using cosine similarity.
In your case, you should try a LSTM text classifier using Embedding layers. 1D Convolution layers can also be useful.
Also, referring to TF-IDF, it is useful for text classification which is dependent on certain words in the corpus. The words with higher term frequency and less document frequency have a higher TF-IDF score. The model learns to classify texts based on such scores.
In most cases, RNNs are the best to classify texts. The use of pretrained embeddings makes the model efficient.
Also, not the least, you can give Bayes text classification a try. It has been super useful in spam classification.
Tip:
You can implement the above methods with each other, creating a text classification system. Following the process like,
Generate embeddings from Doc2Vec.
Comparing the similarity of the input with other texts and thereby determine its class.
Using the embedding in a LSTM network to produce class probabilities.
Apply Bayes text classification.
The steps 2 , 3 , 4 give three predictions. If the majority prediction was CLASS1, then we can make the output of the system as CLASS1!.

Does pre-trained Embedding matrix has <EOS>, <UNK> word vector?

I want to build a seq2seq chatbot with a pre-trained Embedding matrix. Does the pre-trained Embedding matrix, for example GoogleNews-vectors-negative300, FastText and GloVe, has the specific word vector for <EOS> and <UNK>?
The pre-trained embedding has a specific vocabulary defined. The words which are not in vocabulary are called words also called oov( out of vocabulary) words. The pre-trained embedding matrix will not provide any embedding for UNK. There are various methods to deal with the UNK words.
Ignore the UNK word
Use some random vector
Use Fasttext as pre-trained model because it solves the oov problem by constructing vector for the UNK word from n-gram vectors that constitutes a word.
If the number of UNK is low the accuracy won't get affected a lot. If the number is higher better to train embedding or use fast text.
"EOS" Token can also be taken (initialized) as a random vector.
Make sure the both random vectors are not the same.

how's the input word2vec get fine-tuned when training CNN

When I read the paper "Convolutional Neural Networks for Sentence Classification"-Yoon Kim-New York University, I noticed that the paper implemented the "CNN-non-static" model--A model with pre-trained vectors from word2vec,and all words— including the unknown ones that are randomly initialized, and the pre-trained vectors are fine-tuned for each task.
So I just do not understand how the pre-trained vectors are fine-tuned for each task. Cause as far as I know, the input vectors, which are converted from strings by word2vec.bin(pre-trained), just like image matrix, which can not change during training CNN. So, if they can, HOW? Please help me out, Thanks a lot in advance!
The word embeddings are weights of the neural network, and can therefore be updated during backpropagation.
E.g. http://sebastianruder.com/word-embeddings-1/ :
Naturally, every feed-forward neural network that takes words from a vocabulary as input and embeds them as vectors into a lower dimensional space, which it then fine-tunes through back-propagation, necessarily yields word embeddings as the weights of the first layer, which is usually referred to as Embedding Layer.

Resources