Re-weight the input to a neural network - pytorch

For example, I feed a set of images into a CNN. And the default weight of these images is 1. How can I re-weight some of these images so that they have different weights? Can 'DataLoader' achieve this goal in pytorch?
I learned two other possibilities:
Defining a custom loss function, providing weights for each sample as I require.
Repeating samples in the training set, which will result in more frequent samples having a higher weight in the final loss.
Is there any other way, we can achieve that? Any suggestion would be appreciated.

I can think of two ways to achieve this.
Pass on the weight explicitly, when you backpropagate the gradients.
After you computed loss, and when you're about to backpropagate, you can pass a Tensor to backward() and all the subsequent gradients will be scaled by the corresponding element, i.e. do something like
loss = torch.pow(out-target,2)
loss.backward(my_weights) # is a Tensor of same dimension as loss
However, if you use want to assign individual weights in a batch, you can't use the custom loss functions from the nn.module which aggregates the loss over all samples in a batch.
Use torch.utils.data.sampler.WeightedRandomSampler
If you use PyTorch's data.utils anyway, this is simpler than multiplying your training set. However it doesn't assign exact weights, since it's stochastic. But if you're iterating over your training set a sufficient number of times, it's probably close enough.

Related

Large dataset - ANN

I am trying to classify around 400K data with 13 attributes. I have used python sklearn's SVM package, but it didn't work, and then I learned that SVM's are not suitable for large dataset classification. Then I used the (sklearn) ANN using the following MLPClassifier:
MLPClassifier(solver='adam', alpha=1e-5, random_state=1,activation='relu', max_iter=500)
and trained the system using 200K samples, and tested the model on the remaining ones. The classification worked well. However, my concern is that the system is over trained or overfit. Can you please guide me on the number of hidden layers and node sizes to make sure that there is no overfit? (I have learned that the default implementation has 100 hidden neurons. Is it ok to use the default implementation as is?)
To know if your are overfitting you have to compute:
Training set accuracy
Test set accuracy
Once you have calculated this scores, compare it. If training set score is much better than your test set score, then you are overfitting. This means that your model is "memorizing" your data, instead of learning from it to make future predictions.
If you are overfitting with Neuronal Networks you probably have to reduce the number of layers and reduce the number of neurons per layer. There isn't any strict rule that says the number of layer or neurons you need depending on you dataset size. Every dataset can behaves completely different with the same dataset size.
So, to conclude, if you are overfitting, you would have to evaluate your model accuracy using different parameters of layers and number of neurons, and, then, observe with which values you obtain the best results. There are some methods you can use to find the best parameters, is like gridsearchCV.

Get exact formula used by Pytorch autograd to compute gradients

I am implementing a custom CNN with some custom modules in it. I have implemented only the forward pass for the custom modules and left their backward pass to autograd.
I have manually computed the correct formulae for backpropagation through the parameters of the custom modules, and I wished to see whether they match with the formulae used internally by autograd to compute the gradients.
Is there any way to see this?
Thanks
Edit (To add a test case) :-
I have a complex affine layer where the weights and inputs are complex-valued matrices, and the operation is a matrix multiplication of the weight and input matrices.
The multiplication of two complex numbers is given by -
(a+ib)(c+id) = (ac-bd)+i(ad+bc)
I computed the backpropagation formula for this layer given we have the incoming gradient from the higher layer.
It comes out to be dL/dI(n) = (hermitian(W(n))).matmul(dL/dI(n+1))
where I(n) and W(n) are the input and weight of nth layer and I(n+1) is input of (n+1)th layer.
So I wished to check whether autograd is also computing dL/dI(n) using the same formula that I derived.
(Since Pytorch doesn't support complex-valued tensors backpropagation as for now, I have created my own representation of complex numbers by dealing with separate real and imaginary tensors)
I don't believe there is such a feature in pytorch, even because it would be quite unreadable. What you can do is to implement a custom backward method for your layer with the formula you derived, then know by design that the backpropagation is what you want.

When to use bias in Keras model?

I am new to modeling with Keras. I am trying to evaluate appropriate parameters for setting up the model. How do I know when you use bias vs when to turn it off?
The short answer is, always use bias variables when your model is small. Otherwise, it is still recommended to keep using bias in all neural network architectures.
Because each neurone performs like a simple logistic regression. In each neurone, the input values are multiplied with by the weights and the bias affects the initial level in the sigmoid function, which results the desired the non-linearity.
For example, if you have a zero input in your training data like X = [[0,0,...], [0,0,...],... ] , Y = 1, in a sigmoid function, the output will always be exactly Y=0.5 since X*W is zero. However, in large networks, each node can make a bias node out of the average activation of all of its inputs.

Probability Distribution of batch in keras

I am trying to train a CNN model on imbalanced dataset. I wanted to know how well a batch approximates the distribution in the training dataset. Is there any parameter in an inbuilt function in keras which could be specified to maintain the same distribution in batches?
It's possible to train and get good results depending on how severe the imbalance is.
But yes, there are easy ways to compensate this, such as using sample_weight and class_weight in the fit method.
From the documentation on the fit method:
class_weight: Optional dictionary mapping class indices (integers) to a weight (float) value, used for weighting the loss function (during training only). This can be useful to tell the model to "pay more attention" to samples from an under-represented class.
sample_weight: Optional Numpy array of weights for the training samples, used for weighting the loss function (during training only). You can either pass a flat (1D) Numpy array with the same length as the input samples (1:1 mapping between weights and samples), or in the case of temporal data, you can pass a 2D array with shape (samples, sequence_length), to apply a different weight to every timestep of every sample. In this case you should make sure to specify sample_weight_mode="temporal" in compile().
So, you can compensate three kinds of imbalance:
Class imbalance: when you have classes (results) that are more present than others
Sample imbalance: when some of the inputs are more important than others
Temporal (or 2D) imbalance: when some steps in a sequence are more important than others

value of steps per epoch passed to keras fit generator function

What is the need for setting steps_per_epoch value when calling the function fit_generator() when ideally it should be number of total samples/ batch size?
Keras' generators are infinite.
Because of this, Keras cannot know by itself how many batches the generators should yield to complete one epoch.
When you have a static number of samples, it makes perfect sense to use samples//batch_size for one epoch. But you may want to use a generator that performs random data augmentation for instance. And because of the random process, you will never have two identical training epochs. There isn't then a clear limit.
So, these parameters in fit_generator allow you to control the yields per epoch as you wish, although in standard cases you'll probably keep to the most obvious option: samples//batch_size.
Without data augmentation, the number of samples is static as Daniel mentioned.
Then, the number of samples for training is steps_per_epoch * batch size.
By using ImageDataGenerator in Keras, we make additional training data for data augmentation. Therefore, the number of samples for training can be set by yourself.
If you want two times training data, just set steps_per_epoch as (original sample size *2)/batch_size.

Resources