pytorch: how can I use picture as label in dataloader? - pytorch

I want to do some image reconstruction using autoencoders in pytorch, however, I didn't find a way to use image as label for an input image.(the label image is different from original ones)
I've tried the image folder method, but I think that's for classfication and I am currently unable to come up with one solution. Should I create a custom dataset for this...
Thanks in advance!

Write your custom Dataset, below is a simple example.
import torch.utils.data.Dataset as Dataset
class CustomDataset(Dataset):
def __init__(self, input_imgs, label_imgs, transform):
self.input_imgs = input_imgs
self.label_imgs = label_imgs
self.transform = transform
def __len__(self):
return len(self.input_imgs)
def __getitem__(self, idx):
input_img, label_img = self.input_imgs[idx], self.label_imgs[idx]
return self.transform(input_img), self.transform(label_img)
And then, pass it to Dataloader:
dataloader = DataLoader(CustomDataset)

Related

Change image labels when using pytorch

I am loading an image dataset with pytorch as seen below:
dataset = datasets.ImageFolder('...', transform=transform)
loader = DataLoader(dataset, batch_size=args.batchsize)
The dataset is i na folder with structure as seen below:
dataset/
class_1/
class_2/
class_3/
So in result each image in class_1 folder has a label of 0..etc.
However i would like to change these labels and randomly assign a label to each image in the dataset. What i tried is:
new_labels = [random.randint(0, 3) for i in range(len(dataset.targets))]
dataset.targets = new_labels
This however does not change the labels as i wanted due to some errors later in model training.
Is this the correct way to do it or is tehre a more appropriate one?
You can have a transformation for the labels:
import random
class rand_label_transform(object):
def __init__(self, num_labels):
self.num_labels = num_labels
def __call__(self, labels):
# generate new random label
new_label = random.randint(0, self.num_labels - 1)
return new_label
dataset = datasets.ImageFolder('...', transform=transform, target_transform=rand_label_transform(num_labels=3))
See ImageFolder for more details.

Subclass of PyTorch DataLoader for changing batch output

I'm interested in a way of applying a transform to a batch generated by a PyTorch DataLoader class. My minimal example is something like this:
class CustomLoader(torch.utils.data.DataLoader):
def __iter__(self):
result = super().__iter__()
return some_function(result)
But this errors since the DataLoader.__iter()__ returns _MultiProcessingDataLoaderIter or _SingleProcessingDataLoaderIter. Weirdly though, directly returning the output does return a Tensor, so any explanation there would be greatly appreciated!
I understand that in general, transform to data should be done in the subclassed Dataset class. However, in my case the data is tabular and the transform is via numpy, and doing it on a sample-wise basis is much slower (5x) than doing it on an entire batch, since surely these operations are vectorized under the hood.
I know I can do something simple like
for X, y in loader:
X = some_function(X)
But I'd also like to use the DataLoader with pytorch-lightning, so this isn't an option.
What is the proper way to subclass PyTorch Dataloaders?
__iter__() is a generator. You will need to yield the result instead of returning it. You can read more about generators here
Regarding your problem to apply a transform to a batch, you can create a custom Dataset instead of DataLoader and then apply the transforms.
class MyDataset(Dataset):
def __init__(self, transforms=None):
super().__init__()
self.data = ... # define your data here
self.transforms = transforms
def __getitem__(self, idx):
x = self.data[idx]
if self.transforms: x = self.transforms(x)
return x
# use your `MyDataset` class for creating your dataloader
dataloader = DataLoader(MyDataset(transforms = CustomTransforms(), batch_size=4)
You can use this dataloader with PyTorch Lightning Trainer as well.
If you are using PyTorch Lightning, I would suggest you to join our Slack channel and ask questions on Github Discussions as well.
Thanks :)
EDIT: (Add transforms to Batch)
If you are using PyTorch Lightning then I would recommend to use LightningDataModule which provides on_before_batch_transfer hook that can be used to apply transforms on a batch ;)
Here is an example:
def on_before_batch_transfer(self, batch, dataloader_idx):
batch['x'] = transforms(batch['x'])
return batch
Checkout the documentation for more

How to add new sample to CIFAR10 torchvision?

Hi I want to add my own images to the CIFAR10 dataset in torchvision, how can I do that?
train_data = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=train_transform)
train_data.add # or a workaround!
thanks
You can either create a custom dataset for CIFAR10, using the raw cifar10 images here or you can still use the CIFAR10 dataset inside your new custom dataset and then add your logic in the __getitem__() method.
This is a simple example to get you going :
class CIFAR10_2(torch.utils.data.Dataset):
def __init__(self, dataset_path='/cifar10', transformations=None, should_download=True):
self.dataset_train = torchvision.datasets.CIFAR10(dataset_path, download=should_download)
self.transformations = transformations
def __getitem__(self, index):
# do as you wish , add your logic here
(img, label) = self.dataset_train[index]
# for transformations for example
if self.transformations is not None:
return self.transformations(img), label
return img, label
def __len__(self):
return len(self.dataset_train)
you can get fancy and add logic for test,validation, etc and do what ever you like.

Plot several graphs in Pytorch tensorboard

I am training a dynamic neural network, meaning that each epoch I tweak the architecture and get a different computational graph.
I want to plot the graph for each epoch using tensorboard, but when I use SummaryWriter.add_graph() at the end of each epoch it simply overwrites the previous one.
Any ideas how to plot several graphs using pytorch + tensorboard? It seems achievable as each graph has a “tag” but I found no option to change this tag to plot several of them.
Thanks,
Elad
If you still want to use SummaryWriter, there is the optionn of using the method "add_scalars":
Example:
summary.add_scalars(f'loss/check_info', {
'score': score[iteration],
'score_nf': score_nf[iteration],
}, iteration)
Instead of using the "tag" feature, you can use the "run" feature.
To do so, you have to open tensorboard from a directory within which you stored your summaries in distinct sub-directories.
In your example, you could save the summary of the first epoch at the directory "tensorboard_log_dir/epoch_1", and then save the summary of the second epoch at the directory "tensorboard_log_dir/epoch_2", etc.
That way, when using tensorboard --logdir=tensorboard_log_dir, you'll be able to switch from one computational graph to another via the "run" widget.
Here's a reproducible example:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.tensorboard import SummaryWriter
dummy_input = (torch.zeros(1, 3),)
# Two different architectures (PyTorch)
class oneLinear(nn.Module):
def __init__(self):
super(oneLinear, self).__init__()
self.l1 = nn.Linear(3, 5)
def forward(self, x):
x = self.l1(x)
return x
class twoLinear(nn.Module):
def __init__(self):
super(twoLinear, self).__init__()
self.l1 = nn.Linear(3, 5)
self.l2 = nn.Linear(5, 5)
def forward(self, x):
x = self.l1(x)
x = F.relu(self.l2(x))
return x
# add graph into 2 distinct subdirectories
with SummaryWriter('./tensorboard_log_dir/oneLinear') as w:
w.add_graph(oneLinear(), dummy_input)
with SummaryWriter('./tensorboard_log_dir/twoLinear') as w:
w.add_graph(twoLinear(), dummy_input)

load multi-modal data with pytorch

I'm trying to load multi-modal data (e.g. text and image) in pytorch for image classification. I do not know how to load them simultaneously, like the following code.
def __init__(self, img_path, txt_path, transform=None, loader=default_loader):
def __len__(self):
return len(self.img_name)
def __getitem__(self, item):
Can anyone help me?
In __getitem__, you can use a dictionary or a tuple to represent one sample of your data. Later during training when you create a dataloader using the dataset, pytorch will automatically create batches of dictonary or tuples.
If you want to create samples in a much more different way, check out collate_fn in pytorch.
The method getitem(self, item) would help you do this.
For example:
def __getitem__(self, item): # item can be thought as an index
text = textList[item] # textList would be a list containing the text you want to input into the model for element 'item'
img = imgList[item] # imgList would be a list containing the images you want to input into the model for element 'item'
input = [text, img]
y = labels[item] # labels would be a list containing the label for the input of the text and img. This is your target.
return input, y

Resources