convolutional neural network text classification additional features - nlp

I am trying to implement CNN on text classification task. I understand that CNN can extract and abstract features from pure text.
What if I have some additional very useful features that not in the text? How should I add those features into the CNN?
Currently, what I am doing is concatenating the convolution layer results with an additional feature vector. And then feed them to the hidden layers. Is this a right way to do it?
Thanks!

Related

How will wavelet transforms be useful with CNN for image classification?

I am planning to use the wavelet transform to extract textural features from images for classification purpose. However, I am not sure about whether using wavelet transform is good choice and which type of wavelet should I choose?
This is one example: the scattering transform. The difference from a regular CNN is that the filters don't emerge from training the network, but are based on a 3D basis with desirable properties. The case above is not a 'deep' network either: the wavelet 'footprints' are fed to an SVM.
The convolution filters learned in the first layers of image classification networks usually look like edge detectors, like wavelet filters. One way to use wavelets in CNN is to have them in the first layers and freeze them: don't change these layers during training.

Differences in encoder - decoder models between Keras and Pytorch

There seem to be significant, fundamental differences in construction of encoder-decoder models between keras and pytorch. Here is keras' enc-dec blog and here is pytorch's enc-dec blog.
Some differences I noticed are the following:
Keras' model directly feeds input to LSTM layer. Whereas Pytorch uses an embedding layer for both the encoder and decoder.
Pytorch uses an embedding layer with no activation in the encoder but uses relu activation for the embedding layer in the decoder.
Given these observations, my questions are the following:
My understanding is the following, is it correct? The embedding layer is not strictly required but it helps in finding a better and denser representation of the input. It is optional and you can still build a good model without the embedding layer (dependent on the problem). This is why Keras chose not to use it in this particular example. Is this a sound reason or is there more to the story?
Why use an activation for the embedding layer in the decoder but not the encoder?
Why use 'relu' as the activation instead of 'tanh', etc for the embedding layer? What's the intuition here? I've only seen 'relu' applied to data that has spatial relation, not temporal relation.
You have a wrong understanding of encoder-decoder models. First of all, please note Keras and Pytorch are two deep learning frameworks, while encoder-decoder is a type of neural network architecture. So, you need to understand how encoder-decoder works in the first place and then revise their architecture as per your need. Now, let me come back to your questions.
Embedding layer converts one-hot encoding representations into low-dimensional vector representations. For example, we have a sentence I love programming. We want to translate this sentence into German using an encoder-decoder network. So, the first step is to first convert the words in the input sentence into a sequence of vector representations, and this can be done using an embedding layer. Please note, the use of Keras or Pytorch doesn't matter. You can think, how would you give a natural language sentence as input to an LSTM? Obviously, you first need to convert them into vectors.
There is no such rule that you should use an activation layer in the embedding layer for the decoder, but not in the encoder. Remember, activation functions are non-linear functions. So, applying a non-linearity has different consequences but it has nothing to do with the encoder-decoder framework.
Again, the choice of activation function depends on other factors, not on encoder or decoder or a specific type of neural network architecture. I suggest you read the characteristics of the popular activation functions that are used in neural networks. Also, do not come into conclusions after observing a few use cases. Such conclusions are dangerous.

Multi input & output CNN

I have the following problem:
Input: a set of 6 images
Output: a probability for each image determining whether the image is the correct one out of the 6 images
I know how to create a CNN with keras, but not how to have multiple images as an input.
How would one solve this problem?
One way I can think of is to use a pre-trained model (VGG16 etc.) and extract out the vectors from some intermediate layer, then concat 6 vectors together then feed it into a neural network (or some other classification model) and train it as a multiclass classification task.
You can also use an Autoencoder and take the anomaly detection approach.

CNN with CTC loss

I want to extract features using a pretrained CNN model(ResNet50, VGG, etc) and use the features with a CTC loss function.
I want to build it as a text recognition model.
Anyone on how can i achieve this ?
I'm not sure if you are looking to finetune the pretrained models or to use the models for feature extraction. To do the latter freeze the petrained model weights (there are several ways to do this in PyTorch, the simplest being calling .eval() on the model), and feed the logits from the last layer of the model to your new output head. See the PyTorch tutorial here for a more in depth guide.

Train some embeddings, keep others fixed

I do sequence classification with Keras, using an RNN and embeddings. My sequences are a bit weird. I have words mixed with special symbols. Words are associated with fixed, pre-trained embeddings, but the special symbol embeddings have to be modified during training.
In an Embedding layer during learning, how can I keep some embeddings fixed while updating others? Is there a way to mask those indices which shouldn't be modified? Or is this a case for a custom Embedding layer?
I do not believe that this is achievable with the existing Embedding layer. To get around it I would just create a custom layer that builds two embedding layers internally, and only puts the embedding matrix of one of them into the trainable_parameters.

Resources