pytorch: index out of bound for index bigger than batch size - mnist

I am trying to predict some images from MNIST using Pytorch, and I cannot choose whatever index from 0-60,000.
First I added "drop_last=True" because I noticed the last incomplete batch was not discarded, and I thought that would solve the problem, but it didn't. If I have a batch size of 256, the max index I can choose to predict is 255. How should I solve this?
Here is how I get my images:
images = MNIST('mnist_data',transform=T, download=True, train=True)
image_loader = torch.utils.data.DataLoader(images,batch_size=256, drop_last=True, shuffle=True)
and how i try to make a prediction:
image_index = 258
value = (images[image_index])
prediction = Net().forward(value)
Then I get
IndexError: index 258 is out of bounds for dimension 0 with size 256

It's principally because you are trying to index to a value greater than the length of your array/tensor. You would want to increase batch size to fix it

Related

Nan loss in keras with triplet loss

I'm trying to learn an embedding for Paris6k images combining VGG and Adrian Ung triplet loss. The problem is that after a small amount of iterations, in the first epoch, the loss becomes nan, and then the accuracy and validation accuracy grow to 1.
I've already tried lowering the learning rate, increasing the batch size (only to 16 beacuse of memory), changing optimizer (Adam and RMSprop), checking if there are None values on my dataset, changing data format from 'float32' to 'float64', adding a little bias to them and simplify the model.
Here is my code:
base_model = VGG16(include_top = False, input_shape = (512, 384, 3))
input_images = base_model.input
input_labels = Input(shape=(1,), name='input_label')
embeddings = Flatten()(base_model.output)
labels_plus_embeddings = concatenate([input_labels, embeddings])
model = Model(inputs=[input_images, input_labels], outputs=labels_plus_embeddings)
batch_size = 16
epochs = 2
embedding_size = 64
opt = Adam(lr=0.0001)
model.compile(loss=tl.triplet_loss_adapted_from_tf, optimizer=opt, metrics=['accuracy'])
label_list = np.vstack(label_list)
x_train = image_list[:2500]
x_val = image_list[2500:]
y_train = label_list[:2500]
y_val = label_list[2500:]
dummy_gt_train = np.zeros((len(x_train), embedding_size + 1))
dummy_gt_val = np.zeros((len(x_val), embedding_size + 1))
H = model.fit(
x=[x_train,y_train],
y=dummy_gt_train,
batch_size=batch_size,
epochs=epochs,
validation_data=([x_val, y_val], dummy_gt_val),callbacks=callbacks_list)
The images are 3366 with values scaled in range [0, 1].
The network takes dummy values because it tries to learn embeddings from images in a way that images of the same class should have small distance, while images of different classes should have high distances and than the real class is part of the training.
I've noticed that I was previously making an incorrect class division (and keeping images that should be discarded), and I didn't have the nan loss problem.
What should I try to do?
Thanks in advance and sorry for my english.
In some case, the random NaN loss can be caused by your data, because if there are no positive pairs in your batch, you will get a NaN loss.
As you can see in Adrian Ung's notebook (or in tensorflow addons triplet loss; it's the same code) :
semi_hard_triplet_loss_distance = math_ops.truediv(
math_ops.reduce_sum(
math_ops.maximum(
math_ops.multiply(loss_mat, mask_positives), 0.0)),
num_positives,
name='triplet_semihard_loss')
There is a division by the number of positives pairs (num_positives), which can lead to NaN.
I suggest you try to inspect your data pipeline in order to ensure there is at least one positive pair in each of your batches. (You can for example adapt some of the code in the triplet_loss_adapted_from_tf to get the num_positives of your batch, and check if it is greater than 0).
Try increasing your batch size. It happened to me also. As mentioned in the previous answer, network is unable to find any num_positives. I had 250 classes and was getting nan loss initially. I increased it to 128/256 and then there was no issue.
I saw that Paris6k has 15 classes or 12 classes. Increase your batch size 32 and if the GPU memory occurs you can try with model with less parameters. You can work on Efficient B0 model for starting. It has 5.3M compared to VGG16 which has 138M parameters.
I have implemented a package for triplet generation so that every batch is guaranteed to include postive pairs. It is compatible with TF/Keras only.
https://github.com/ma7555/kerasgen (Disclaimer: I am the owner)

How to calculate unbalanced weights for BCEWithLogitsLoss in pytorch

I am trying to solve one multilabel problem with 270 labels and i have converted target labels into one hot encoded form. I am using BCEWithLogitsLoss(). Since training data is unbalanced, I am using pos_weight argument but i am bit confused.
pos_weight (Tensor, optional) – a weight of positive examples. Must be a vector with length equal to the number of classes.
Do i need to give total count of positive values of each label as a tensor or they mean something else by weights?
The PyTorch documentation for BCEWithLogitsLoss recommends the pos_weight to be a ratio between the negative counts and the positive counts for each class.
So, if len(dataset) is 1000, element 0 of your multihot encoding has 100 positive counts, then element 0 of the pos_weights_vector should be 900/100 = 9. That means that the binary crossent loss will behave as if the dataset contains 900 positive examples instead of 100.
Here is my implementation:
(new, based on this post)
pos_weight = (y==0.).sum()/y.sum()
(original)
def calculate_pos_weights(class_counts):
pos_weights = np.ones_like(class_counts)
neg_counts = [len(data)-pos_count for pos_count in class_counts]
for cdx, pos_count, neg_count in enumerate(zip(class_counts, neg_counts)):
pos_weights[cdx] = neg_count / (pos_count + 1e-5)
return torch.as_tensor(pos_weights, dtype=torch.float)
Where class_counts is just a column-wise sum of the positive samples. I posted it on the PyTorch forum and one of the PyTorch devs gave it his blessing.
Maybe is a little late, but here is how I calculate the same. Looking into the documentation:
For example, if a dataset contains 100 positive and 300 negative examples of a single class, then pos_weight for the class should be equal to 300/100 = 3.
So an easy way to calcule the positive weight is using the tensor methods with your label vector "y", in my case train_dataset.data.y. And then calculating the total negative labels.
num_positives = torch.sum(train_dataset.data.y, dim=0)
num_negatives = len(train_dataset.data.y) - num_positives
pos_weight = num_negatives / num_positives
Then the weights can be used easily as:
criterion = torch.nn.BCEWithLogitsLoss(pos_weight = pos_weight)
PyTorch solution
Well, actually I have gone through docs and you can simply use pos_weight indeed.
This argument gives weight to positive sample for each class, hence if you have 270 classes you should pass torch.Tensor with shape (270,) defining weight for each class.
Here is marginally modified snippet from documentation:
# 270 classes, batch size = 64
target = torch.ones([64, 270], dtype=torch.float32)
# Logits outputted from your network, no activation
output = torch.full([64, 270], 0.9)
# Weights, each being equal to one. You can input your own here.
pos_weight = torch.ones([270])
criterion = torch.nn.BCEWithLogitsLoss(pos_weight=pos_weight)
criterion(output, target) # -log(sigmoid(0.9))
Self-made solution
When it comes to weighting, there is no built-in solution, but you may code one yourself really easily:
import torch
class WeightedMultilabel(torch.nn.Module):
def __init__(self, weights: torch.Tensor):
self.loss = torch.nn.BCEWithLogitsLoss()
self.weights = weights.unsqueeze()
def forward(outputs, targets):
return self.loss(outputs, targets) * self.weights
Tensor has to be of the same length as the number of classes in your multilabel classification (270), each giving weight for your specific example.
Calculating weights
You just add labels of every sample in your dataset, divide by the minimum value and inverse at the end.
Sort of snippet:
weights = torch.zeros_like(dataset[0])
for element in dataset:
weights += element
weights = 1 / (weights / torch.min(weights))
Using this approach class occurring the least will give normal loss, while others will have weights smaller than 1.
It might cause some instability during training though, so you might want to experiment with those values a little (maybe log transform instead of linear?)
Other approach
You may think about upsampling/downsampling (though this operation is complicated as you would add/delete other classes as well, so advanced heuristics would be needed I think).
Just to provide a quick revision on #crypdick's answer, this implementation of the function worked for me:
def calculate_pos_weights(class_counts,data):
pos_weights = np.ones_like(class_counts)
neg_counts = [len(data)-pos_count for pos_count in class_counts]
for cdx, (pos_count, neg_count) in enumerate(zip(class_counts, neg_counts)):
pos_weights[cdx] = neg_count / (pos_count + 1e-5)
return torch.as_tensor(pos_weights, dtype=torch.float)
Where data is the dataset you're trying to apply weights to.

I don't understand the code for training a classifier in pytorch

I don't understand the line labels.size(0). I'm new to Pytorch and been quite confused about the data structure.
correct = 0
total = 0
with torch.no_grad():
for data in testloader:
images, labels = data
outputs = net(images)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
print('Accuracy of the network on the 10000 test images: %d %%' % (
100 * correct / total))`
labels is a Tensor with dimensions [N, 1], where N is equal to the number of samples in the batch. .size(...) returns a subclass of tuple (torch.Size) with the dimensions of the Tensor, and .size(0) returns an integer with the value of the first (0-based) dimension (i.e., N).
To answer your question
In PyTorch, tensor.size() allows you to check out the shape of a tensor.
In your code,
images, labels = data
images and labels will each contain N number of training examples depends on your batch size. If you check out the shape of labels, it should be [N, 1], where N is the size of mini-batch training example.
A bit of prescience for those who are new to training a neural network.
When training a neural network, practitioners will forward pass the dataset through the network and optimize the gradients.
Say your training dataset contain 1 million images, and your training script is designed in a way to pass all 1 million images in a single epoch. The problem with this approach is it will take a really long time for you to receive feedback from your neural network. This is where mini-batch training comes in.
In PyTorch, the DataLoader class allows us to split the dataset into multiple batches. If your training loader contains 1 Million examples and batch size is 1000, you will expect each epoch will iterate 1000 step through all the mini-batches. This way, you can observe and optimize the training performance better.

what if steps_per_epoch does not fit into numbers of samples?

using Keras fit_generator, steps_per_epoch should be equivalent to the total number available of samples divided by the batch_size.
But how would the generator or the fit_generator react if I choose a batch_size that does not fit n times into the samples? Does it yield samples until it cannot fill a whole batch_size anymore or does it just use a smaller batch_size for the last yield?
Why I ask: I divide my data into train/validation/test of different size (different %) but would use the same batch size for train and validation sets but especially for train and test sets. As they are different in size I cannot guarantee that batch size fit into the total amount of samples.
If it's your generator with yield
It's you who create the generator, so the behavior is defined by you.
If steps_per_epoch is greater than the expected batches, fit will not see anything, it will simply keep requesting batches until it reaches the number of steps.
The only thing is: you must assure your generator is infinite.
Do this with while True: at the beginning, for instance.
If it's a generator from ImageDataGenerator.
If the generator is from an ImageDataGenerator, it's actually a keras.utils.Sequence and it has the length property: len(generatorInstance).
Then you can check yourself what happens:
remainingSamples = total_samples % batch_size #confirm that this is gerater than 0
wholeBatches = total_samples // batch_size
totalBatches = wholeBatches + 1
if len(generator) == wholeBatches:
print("missing the last batch")
elif len(generator) == totalBatches:
print("last batch included")
else:
print('weird behavior')
And check the size of the last batch:
lastBatch = generator[len(generator)-1]
if lastBatch.shape[0] == remainingSamples:
print('last batch contains the remaining samples')
else:
print('last batch is different')
If you assign N to the parameter steps_per_epoch of fit_generator(), Keras will basically call your generator N times before considering one epoch done. It's up to your generator to yield all your samples in N batches.
Note that since for most models it is fine to have different batch sizes each iteration, you could fix steps_per_epoch = ceil(dataset_size / batch_size) and let your generator output a smaller batch for the last samples.
i had facing the same logical error
solved it with defining steps_per_epochs
BS = 32
steps_per_epoch=len(trainX) // BS
history = model.fit(train_batches,
epochs=initial_epochs,steps_per_epoch=steps_per_epoch,
validation_data=validation_batches)

PyTorch ValueError: Target and input must have the same number of elements

I am kinda new to PyTorch, but I am trying to understand how the sizes of target and input work in torch.nn.BCELoss() when computing loss function.
import torch
import torch.nn as nn
from torch.autograd import Variable
time_steps = 15
batch_size = 3
embeddings_size = 100
num_classes = 2
model = nn.LSTM(embeddings_size, num_classes)
input_seq = Variable(torch.randn(time_steps, batch_size, embeddings_size))
lstm_out, _ = model(input_seq)
last_out = lstm_out[-1]
print(last_out)
loss = nn.BCELoss()
target = Variable(torch.LongTensor(batch_size).random_(0, num_classes))
print(target)
err = loss(last_out.long(), target)
err.backward()
I received the following error:
Warning (from warnings module):
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/torch/nn/functional.py", line 767
"Please ensure they have the same size.".format(target.size(), input.size()))
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/torch/nn/functional.py", line 770, in binary_cross_entropy
"!= input nelement ({})".format(target.nelement(), input.nelement()))
ValueError: Target and input must have the same number of elements. target nelement (3) != input nelement (6)
This error definitely comes from the different sizes of last_out (size 3x2) and target (size 3). So my question is how can I convert last_out into something like target (of size 3 and containing just 0s and 1s) to compute loss function?
The idea of nn.BCELoss() is to implement the following formula:
Both o and tare tensors of arbitrary (but same!) size and i simply indexes each element of the two tensor to compute the sum above.
Typically, nn.BCELoss() is used in a classification setting: o and i will be matrices of dimensions N x D. N will be the number of observations in your dataset or minibatch. D will be 1 if you are only trying to classify a single property, and larger than 1 if you are trying to classify multiple properties. t, the target matrix, will only hold 0 and 1, as for every property, there are only two classes (that's where the binary in binary cross entropy loss comes from). o will hold the probability with which you assign every property of every observation to class 1.
Now in your setting above, it is not clear how many classes you are considering and how many properties there are for you. If there is only one property as suggested by the shape of your target, you should only output a single quantity, namely the probability of being in class 1 from your model. If there are two properties, your targets are incomplete! If there are multiple classes, you should work with torch.nn.CrossEntropyLossinstead of torch.nn.BCELoss().
As an aside, often it is desirable for the sake of numerical stability to use torch.nn.BCEWithLogitsLossinstead of torch.nn.BCELoss() following nn.Sigmoid() on some outputs.

Resources