I have a query regarding the extraction of VGG16/VGG19 features for my experiments.
The pre-trained VGG16 and VGG19 models have been trained on ImageNet dataset having 1000 classes (say c1,c2, ... c1000) and normally we extract the features from first and second fully connected layers designated ('FC1' and 'FC2'); these 4096 dimensional feature vectors are then used for computer vision tasks.
My question is that can we use these networks to extract features of an image that does not belong to any of the above 1000 classes ? In other words, can we use these networks to extract features of an image with label c1001 ? Remember that c1001 does not belong to the Imagenet classes on which these networks were initially trained on.
In the article available on https://www.pyimagesearch.com/2019/05/20/transfer-learning-with-keras-and-deep-learning/, I am quoting the following -
When performing feature extraction, we treat the pre-trained network
as an arbitrary feature extractor, allowing the input image to
propagate forward, stopping at pre-specified layer, and taking the
outputs of that layer as our features
From the above text, there is no restriction to whether the image must necessarily belong to one of the Imagenet classes.
Kindly spare some time to uncover this mystery.
In the research papers, the authors simply state that they have used features extracted from VGG16/VGG19 network pre-trained on Imagenet dataset without giving any further details.
I am giving a case study for reference:
Animal with Attribute dataset (see https://cvml.ist.ac.at/AwA2/) is a very popular dataset with 50 animal classes for image recognition task. The authors have extracted ILSVRC-pretrained ResNet101 features for the above dataset images. This ResNet 101 network has been pre-trained on 1000 imagenet classes (different imagenet classes are available at https://gist.github.com/yrevar/942d3a0ac09ec9e5eb3a#file-imagenet1000_clsidx_to_labels-txt).
Also, the AWA classes are put as follows:
antelope, grizzly+bear, killer+whale, beaver, dalmatian, persian+cat, horse
german+shepherd, blue+whale, siamese+cat, skunk, mole, tiger, hippopotamus, leopard, moose, spider+monkey, humpback+whale, elephant, gorilla, ox, fox, sheep
seal, chimpanzee, hamster, squirrel, rhinoceros, rabbit, bat, giraffe, wolf, chihuahua, rat, weasel, otter, buffalo, zebra, giant+panda, deer, bobcat, pig, lion, mouse, polar+bear, collie, walrus, raccoon, cow, dolphin
Now, if we compare the classes in the dataset with 1000 Imagenet classes, we find that classes like dolphin, cow, racoon, bobcat, bat, seal, sheep, horse, grizzly bear, giraffe etc are not there in the Imagenet and still the authors went on with extracting ResNet101 features. I believe that the features extracted are generalizable and that is why authors consider these features as meaningful representations for the AWA images.
Your take on this ?
The idea is to get the representations for the images not belonging to ImageNet classes and use them along with their labels in some other classifier.
Yes, you can, but.
Features in first fully-connected layers suppose to encode very general patterns, like angles, lines, and simple shapes. You can assume those can be generalized outside the class set it was trained on.
There is one But, however - those features were found as to minimize error on that particular classification task with 1000 classes. It means, that there can be no guarantee that they are helpful for classifying arbitrary class.
For only extracting the features, you can input any image you want in your pretrained VGG/other CNN. However, for the purpose of training, you have to implement other steps as stated below.
The features that are extracted have been determined by means of exclusively training on those 1000 classes and belong to those 1000 classes. You can use your network to predict on images that do not belong to those 1000 classes, but in the paragraphs below I explain why this is not the desired approach.
The key point to outline here is that, the set features that were extracted can be used to detect/determine the presence of other objects within a photo, but not "ready"/"out of the box".
For example, edges and lines are features that are not related exclusively to those 1000 classes, but also to other ones, hence they are useful, general features.
Therefore, you can employ "transfer learning", to train on your own images (dataset), for example c1001, c1002, c1003.
Notice however that you need to train on your own set before you can use the network to predict on your new images(new classes). Transfer learning refers to using the set of already gathered/learned features, which can be suitable to apply on another problem, but you need to train on your "new problem", say c1001, c1002, c1003.
For Image classification you may need to fine tune the model using relevant classes for c1001 class label.
But if you are planning to use it for unsupervised learning and using it for feature extraction part only, then there is no need to retrain the model. You can use existing pre-trained weights from ImageNet and extract feature then using that weights as VGG16/19 will generalize lower level feature in its initial layers and last few layers are only used for classification purpose.
So basically pretrained model can be used for unsupervised and feature extraction purpose without retraining.
I want to fine tune BERT on a specific domain. I have texts of that domain in text files. How can I use these to fine tune BERT?
I am looking here currently.
My main objective is to get sentence embeddings using BERT.
The important distinction to make here is whether you want to fine-tune your model, or whether you want to expose it to additional pretraining.
The former is simply a way to train BERT to adapt to a specific supervised task, for which you generally need in the order of 1000 or more samples including labels.
Pretraining, on the other hand, is basically trying to help BERT better "understand" data from a certain domain, by basically continuing its unsupervised training objective ([MASK]ing specific words and trying to predict what word should be there), for which you do not need labeled data.
If your ultimate objective is sentence embeddings, however, I would strongly suggest you to have a look at Sentence Transformers, which is based on a slightly outdated version of Huggingface's transformers library, but primarily tries to generate high-quality embeddings. Note that there are ways to train with surrogate losses, where you try to emulate some form ofloss that is relevant for embeddings.
Edit: The author of Sentence-Transformers recently joined Huggingface, so I expect support to greatly improve over the upcoming months!
#dennlinger gave an exhaustive answer. Additional pretraining is also referred as "post-training", "domain adaptation" and "language modeling fine-tuning". here you will find an example how to do it.
But, since you want to have good sentence embeddings, you better use Sentence Transformers. Moreover, they provide fine-tuned models, which already capable of understanding semantic similarity between sentences. "Continue Training on Other Data" section is what you want to further fine-tune the model on your domain. You do have to prepare training dataset, according to one of available loss functions. E.g. ContrastLoss requires a pair of texts and a label, whether this pair is similar.
I believe transfer learning is useful to train the model on a specific domain. First you load the pretrained base model and freeze its weights, then you add another layer on top of the base model and train that layer based on your own training data. However, the data would need to be labelled.
Tensorflow has some useful guide on transfer learning.
You are talking about pre-training. Fine-tuning on unlabeled data is called pre-training and for getting started, you can take a look over here.
Is it possible to add a new face features into trained face recognition model, without retraining it with previous faces?
Currently am using facenet architecture,
Take a look in Siamese Neural Network.
Actually if you use such approach you don't need to retrain the model.
Basically you train a model to generate an embedding (a vector) that maps similar images near and different ones far.
After you have this model trainned, when you add a new face it will be far from the others but near of the samples of the same person.
basically, by the mathematics theory behind the machine learning models, you basically need to do another train iteration with only this new data...
but, in practice, those models, especially the sophisticated ones, rely on multiple iterations for training and a various technics of suffering and nose reductions
a good approach can be train of the model from previous state with a subset of the data that include the new data, for a couple of iterations
I have done implementation part of convolution neural network. But I am still confused about how to select the filter to obtain convolved feature in convolution neural network. As I know we detect features(like eyes, nose, mouth) to recognize a face from an image using convolution layer with the help of the filter.is it true that filter contains eyes, nose, mouth to recognize a face from an image?
There is no hard rule for this purpose.
In many university courses and even implemented models in papers, researcher uses 3x3 or 5x5 filters with with 1 or 2 strides.
It is one of your hyperparameters you should tune for your model. But the best way as a practice is to go to implemented model's documentations by google or others and find best size with respect to your conv layers.
But the last thing you should know is that the purpose of adding filters is to reduce nmber of parameters but keeping high quality features.
Here is a link to all models implemented using Tensoflow for different tasks.
Good luck
I am a newbie in field of machine Learning. I have taken Udacity's "Introduction to Machine Learning" course. So I know running basic classifiers using sklearn and python. But all the classifiers they taught in the course was trained on a single data type.
I have a problem wherein I want to classify a code commit as "clean" or "buggy".
I have a feature set which contains String data (like name of person), Categorical data (say "clean" vs "buggy"), numeric data (like no. of commits) and timestamp data (like time of commit). How can I train a classifier based on these three features simultaneously. Lets assuming that I plan on using a Naive Bayes classifier and sklearn. Please Help!
I am trying to implement the paper. Any help would really be appreciable.
Many machine learning classifiers like logistic regression, random forest, decision trees and SVM work fine with both continuous and categorical features. My guess is that you have two paths to follow. The first one is data pre-processing. For example, convert all string/cateogorical data (name of a person) to integers or you can use ensemble learning.
Ensemble learning is when you combine different classifiers (each one dealing with one kind of heterogeneous feature) using majority vote, for example, so they can find a consensus in classification. Hope it helps.