I have a Neural Network with five inputs for a classification task. Two inputs out of those five are very important and have a direct relationship to the classification task. Therefore, I need to prioritize those two inputs within the network and give less priority to the other three. Is there a way in the neural network to facilitate my requirement?
If training works well, the NN should automatically pick up what's most important for your classification. That's the entire point of a NN (or ML in general); so that you don't have to manually tell it what's more important and what's not. After learning, you can verify that the model indeed does learn the correct order of importance between the features.
You can use any model explanation technique for this. ELI5, SHAP or LIME are some examples. All these will tell you if your model did indeed learn that the features that you know are important is actually important to the network.
You probably shouldn't try to manually incorporate such biases into the network (unless you have a very good reason for doing so, like incorporating spatial information of images via CNNs). Trust the learning xD
Related
I was not able to understand one thing , when it says "fine-tuning of BERT", what does it actually mean:
Are we retraining the entire model again with new data.
Or are we just training top few transformer layers with new data.
Or we are training the entire model but considering the pretrained weights as initial weight.
Or there is already few layers of ANN on top of transformer layers which is only getting trained keeping transformer weight freeze.
Tried Google but I am getting confused, if someone can help me on this.
Thanks in advance!
I remember reading about a Twitter poll with similar context, and it seems that most people tend to accept your suggestion 3. (or variants thereof) as the standard definition.
However, this obviously does not speak for every single work, but I think it's fairly safe to say that 1. is usually not included when talking about fine-tuning. Unless you have vast amounts of (labeled) task-specific data, this step would be referred to as pre-training a model.
2. and 4. could be considered fine-tuning as well, but from personal/anecdotal experience, allowing all parameters to change during fine-tuning has provided significantly better results. Depending on your use case, this is also fairly simple to experiment with, since freezing layers is trivial in libraries such as Huggingface transformers.
In either case, I would really consider them as variants of 3., since you're implicitly assuming that we start from pre-trained weights in these scenarios (correct me if I'm wrong).
Therefore, trying my best at a concise definition would be:
Fine-tuning refers to the step of training any number of parameters/layers with task-specific and labeled data, from a previous model checkpoint that has generally been trained on large amounts of text data with unsupervised MLM (masked language modeling).
I don't have much experience with training neural networks. I have 4 variable vectors as input and I have respectively 3 variable output vector. I want to create a neural network that takes these inputs and outputs which have some unknown correlation(might not be linear) between them and train. So that when I put previously untrained data through it should predict the correlated output.
I was wondering,
What type of model should I use in such scenarios? Is it Restricted boltzmann machine, regression, GAN, etc?
What library is easiest to learn and implement for such a model? eg:- TensorFlow, PyTorch, etc
If images were involved which can be processed as fft arrays, would the model change.
I did find this answer, but I am not satisfied with it.
Please let me know if there are any functions or other points you would like me to know. Any help is much appreciated.
A multilayer perceprton is a good place to start.
Keras is the highest level/easiest to use library I have used.
If you are working with images or spatially structured data a convolutional neural network will probably work best.
The primary objective (my assigned work) is to do an image segmentation for the underwater images using a convolutional neural network. The camera shots taken from the underwater structure will have poor image quality due to severe noise and bad light exposure. In order to achieve higher classification accuracy, I want to do an automatic image enhancement for the images (see the attached file). So, I want to know, which CNN architecture will be best to do both tasks. Please kindly suggest any possible solutions to achieve the objective.
What do you need to segment? I'd be nice so see some labels of the segmentation.
You may not need to enhance the image, if all your dataset has that same amount of noise, the network will generalize properly.
Regarding CNNs architectures, it depends on the constraints you have with processing power and accuracy. If that is not a constrain go with something like MaskRCNN, check that repo as a good starting point, some results are like this:
Be mindful it's a bit of a complex architecture so inference times might be a bit too high (but it's doable on realtime depending your gpu).
Other simple architectures are FCN (Fully Convolutional Networks) with are basically your CNN but instead of fully connected layers:
You replace with with Fully Convolutional Layers:
Images taken from HERE.
The advantage of this FCNs are that they are really easy to implement and modify since you can go with simple architectures (FCN-Alexnet), to more complex and more accurate ones (FCN-VGG, FCN-Resnet).
Also, I think you don't mention framework, there are many to choose from and it depends on your familiarly with languages, most of them you can do them with python:
TensorFlow
Pytorch
MXNet
But if you are a beginner, try starting with a GUI based one, Nvidia Digits is a great starting point and really easy to configure, it's based on Caffe so it's fairly fast when deploying and can easily be integrated with accelerators like TensorRT.
I have done implementation part of convolution neural network. But I am still confused about how to select the filter to obtain convolved feature in convolution neural network. As I know we detect features(like eyes, nose, mouth) to recognize a face from an image using convolution layer with the help of the filter.is it true that filter contains eyes, nose, mouth to recognize a face from an image?
There is no hard rule for this purpose.
In many university courses and even implemented models in papers, researcher uses 3x3 or 5x5 filters with with 1 or 2 strides.
It is one of your hyperparameters you should tune for your model. But the best way as a practice is to go to implemented model's documentations by google or others and find best size with respect to your conv layers.
But the last thing you should know is that the purpose of adding filters is to reduce nmber of parameters but keeping high quality features.
Here is a link to all models implemented using Tensoflow for different tasks.
Good luck
I am a newby to the convolutional neural nets... so this may be an ignorant question.
I have followed many examples and tutorials now on the MNIST example in TensforFlow. In the CNN examples, all authors talk bout using the 'input filters' to run in the CNN. But no one that I can find mentions WHERE they come from. Can anyone answer where these come from? Or are they magically obtained from the input images.
Thanks! Chris
This is an image that one professor uses, be he does not exaplain if he made them or TensorFlow auto-extracts these somehow.
Disclaimer: I am not an expert, more of an enthusiast.
To cut a long story short: filters are the CNN equivalent of weights, and all a neural network essentially does is learning their optimal values.
Which it does by iterating through a training dataset, making predictions, comparing them to the label/value already assigned to each training unit (usually an image in case of a CNN) and adjusting weights to minimize the error function (the difference between the predicted value and the actual value).
Initial values of filters/weights do not matter that much, so although they might affect the speed of convergence to a small degree, I believe they are often assigned random values.
It is the job of the neural network to figure out the optimal weights, not of the person implementing it.