How to use multiprocessing in PyTorch? - pytorch

I'm trying to use PyTorch with complex loss function. In order to accelerate the code, I hope that I can use the PyTorch multiprocessing package.
The first trial, I put 10x1 features into the NN and get 10x4 output.
After that, I want to pass 10x4 parameters into a function to do some calculation. (The calculation will be complex in the future.)
After calculating, the function will return a 10x1 array in total. This array will be set as NN_energy and calculate loss function.
Besides, I also want to know if there is another method to create a backward-able array to store the NN_energy array, instead of using
NN_energy = net(Data_in)[0:10,0]
Thanks a lot.
Full Code:
import torch
import numpy as np
from torch.autograd import Variable
from torch import multiprocessing
def func(msg,BOP):
ans = (BOP[msg][0]+BOP[msg][1]/BOP[msg][2])*BOP[msg][3]
return ans
class Net(torch.nn.Module):
def __init__(self, n_feature, n_hidden_1, n_hidden_2, n_output):
super(Net, self).__init__()
self.hidden_1 = torch.nn.Linear(n_feature , n_hidden_1) # hidden layer
self.hidden_2 = torch.nn.Linear(n_hidden_1, n_hidden_2) # hidden layer
self.predict = torch.nn.Linear(n_hidden_2, n_output ) # output layer
def forward(self, x):
x = torch.tanh(self.hidden_1(x)) # activation function for hidden layer
x = torch.tanh(self.hidden_2(x)) # activation function for hidden layer
x = self.predict(x) # linear output
return x
if __name__ == '__main__': # apply_async
Data_in = Variable( torch.from_numpy( np.asarray(list(range( 0,10))).reshape(10,1) ).float() )
Ground_truth = Variable( torch.from_numpy( np.asarray(list(range(20,30))).reshape(10,1) ).float() )
net = Net( n_feature=1 , n_hidden_1=15 , n_hidden_2=15 , n_output=4 ) # define the network
optimizer = torch.optim.Rprop( net.parameters() )
loss_func = torch.nn.MSELoss() # this is for regression mean squared loss
NN_output = net(Data_in)
args = range(0,10)
pool = multiprocessing.Pool()
return_data = pool.map( func, zip(args, NN_output) )
pool.close()
pool.join()
NN_energy = net(Data_in)[0:10,0]
for i in range(0,10):
NN_energy[i] = return_data[i]
loss = torch.sqrt( loss_func( NN_energy , Ground_truth ) ) # must be (1. nn output, 2. target)
print(loss)
Error messages:
File
"C:\ProgramData\Anaconda3\lib\site-packages\torch\multiprocessing\reductions.py",
line 126, in reduce_tensor
raise RuntimeError("Cowardly refusing to serialize non-leaf tensor which requires_grad, "
RuntimeError: Cowardly refusing to serialize non-leaf tensor which
requires_grad, since autograd does not support crossing process
boundaries. If you just want to transfer the data, call detach() on
the tensor before serializing (e.g., putting it on the queue).

First of all, Torch Variable API is deprecated since a very long time, just don't use it.
Next, torch.from_numpy( np.asarray(list(range( 0,10))).reshape(10,1) ).float() is wrong at many levels: np.asarray of list is useless since a copy will be performed anyway, and np.array takes list as input by design. Then, np.arange is available to return a range as numpy array, and it is also available on Torch. Next, specifying both dimension for reshape is useless and error prone, you could simply do reshape((-1, 1)), or even better unsqueeze(-1).
Here is the simplified expression torch.arange(10, dtype=torch.float32, requires_grad=True).unsqueeze(-1).
Using multiprocessing pool is a bad practice if using batch processing is possible. It will be both way more efficient and readable. Indeed, performing N small algebraic operations in parallel is always slower and a larger single algebraic operation, and even more on GPU. More importantly, computing the gradient is not supported by multiprocessing, hence the error that you get. Yet, this is partially true, because it is supports for tensors on cpu since 1.6.0. Have a lok, to the official release changelog.
Could you post a more representative example of what func method could be to make sure you really need it ?
NB: Distributed autograd as you are looking is now available in Pytorch as an experimental feature available in beta since 1.6.0. Have a look to the official documentation.

Related

PyTorch Dataset / Dataloader from random source

I have a source of random (non-deterministic, non-repeatable) data, that I'd like to wrap in Dataset and Dataloader for PyTorch training. How can I do this?
__len__ is not defined, as the source is infinite (with possible repition).
__getitem__ is not defined, as the source is non-deterministic.
When defining a custom dataset class, you'd ordinarily subclass torch.utils.data.Dataset and define __len__() and __getitem__().
However, for cases where you want sequential but not random access, you can use an iterable-style dataset. To do this, you instead subclass torch.utils.data.IterableDataset and define __iter__(). Whatever is returned by __iter__() should be a proper iterator; it should maintain state (if necessary) and define __next__() to obtain the next item in the sequence. __next__() should raise StopIteration when there's nothing left to read. In your case with an infinite dataset, it never needs to do this.
Here's an example:
import torch
class MyInfiniteIterator:
def __next__(self):
return torch.randn(10)
class MyInfiniteDataset(torch.utils.data.IterableDataset):
def __iter__(self):
return MyInfiniteIterator()
dataset = MyInfiniteDataset()
dataloader = torch.utils.data.DataLoader(dataset, batch_size = 32)
for batch in dataloader:
# ... Do some stuff here ...
# ...
# if some_condition:
# break

Subclass of PyTorch DataLoader for changing batch output

I'm interested in a way of applying a transform to a batch generated by a PyTorch DataLoader class. My minimal example is something like this:
class CustomLoader(torch.utils.data.DataLoader):
def __iter__(self):
result = super().__iter__()
return some_function(result)
But this errors since the DataLoader.__iter()__ returns _MultiProcessingDataLoaderIter or _SingleProcessingDataLoaderIter. Weirdly though, directly returning the output does return a Tensor, so any explanation there would be greatly appreciated!
I understand that in general, transform to data should be done in the subclassed Dataset class. However, in my case the data is tabular and the transform is via numpy, and doing it on a sample-wise basis is much slower (5x) than doing it on an entire batch, since surely these operations are vectorized under the hood.
I know I can do something simple like
for X, y in loader:
X = some_function(X)
But I'd also like to use the DataLoader with pytorch-lightning, so this isn't an option.
What is the proper way to subclass PyTorch Dataloaders?
__iter__() is a generator. You will need to yield the result instead of returning it. You can read more about generators here
Regarding your problem to apply a transform to a batch, you can create a custom Dataset instead of DataLoader and then apply the transforms.
class MyDataset(Dataset):
def __init__(self, transforms=None):
super().__init__()
self.data = ... # define your data here
self.transforms = transforms
def __getitem__(self, idx):
x = self.data[idx]
if self.transforms: x = self.transforms(x)
return x
# use your `MyDataset` class for creating your dataloader
dataloader = DataLoader(MyDataset(transforms = CustomTransforms(), batch_size=4)
You can use this dataloader with PyTorch Lightning Trainer as well.
If you are using PyTorch Lightning, I would suggest you to join our Slack channel and ask questions on Github Discussions as well.
Thanks :)
EDIT: (Add transforms to Batch)
If you are using PyTorch Lightning then I would recommend to use LightningDataModule which provides on_before_batch_transfer hook that can be used to apply transforms on a batch ;)
Here is an example:
def on_before_batch_transfer(self, batch, dataloader_idx):
batch['x'] = transforms(batch['x'])
return batch
Checkout the documentation for more

Errors when Building up a Custom Loss Function

I try to build up my own loss function as follows
import numpy as np
from keras import backend as K
def MyLoss(self, x_input, x_reconstruct):
a = np.copy(x_reconstruct)
a = np.asarray(a, dtype='float16')
a = np.floor(4*a)/4
return K.mean(K.square(a - x_input), axis=-1)`
In compilation, it says
ValueError: setting an array element with a sequence
Both x_input and x_reconstruct are [m, n, 1] np arrays. The last line of code is actually copied directly from Keras' built-in MSE loss function.
Also, I suppose loss is calculated per sample. If dimensions of the input and reconstructed input are both [m, n, 1], the result of Keras' built-in loss will also be a matrix sized [m, n]. So why does it work properly?
I then tried to us np's functions directly by
def MyLoss(self, x_input, x_reconstruct):
a = np.copy(x_reconstruct)
a = np.asarray(a, dtype=self.precision)
a = np.floor(4*a)/4
Diff = a - x_input
xx = np.mean(np.square(Diff), axis=-1)
yy = np.sum(xx)
return yy
yet the error persists. What mistake did I make? How should write the code?
Having borrowed the suggestion from Make a Custom loss function in Keras in detail, I tried following
def MyLoss(self, x_input, x_reconstruct):
if self.precision == 'float16':
K.set_floatx('float16')
K.set_epsilon(1e-4)
a = K.cast_to_floatx(x_input)
a = K.round(a*4.-0.5)/4.0
return K.sum(K.mean(K.square(x_input-a), axis=-1))
But the same error happens
You can not use numpy arrays in your loss. You have to use TensorFlow or Keras backend operations. Try this maybe:
import tensorflow as tf
import keras.backend as K
def MyLoss(x_input, x_reconstruct):
a = tf.cast(x_input, dtype='tf.float16')
a = tf.floor(4*a)/4
return K.mean(K.square(a - x_input), axis=-1)
I found the answer myself, and let me share it here
If I write code like this
def MyLoss(self, y_true, y_pred):
if self.precision == 'float16':
K.set_floatx('float16')
K.set_epsilon(1e-4)
return K.mean(K.square(y_true-K.round(y_pred*4.-0.5)/4.0), axis=-1)
It works. The trick is, I think, that I cannot use 'K.cast_to_floatx(y_true)'. Instead, simply use y_true directly. I still do not understand why...

Find wrongly categorized samples from validation step

I am using a keras neural net for identifying category in which the data belongs.
self.model.compile(loss='categorical_crossentropy',
optimizer=keras.optimizers.Adam(lr=0.001, decay=0.0001),
metrics=[categorical_accuracy])
Fit function
history = self.model.fit(self.X,
{'output': self.Y},
validation_split=0.3,
epochs=400,
batch_size=32
)
I am interested in finding out which labels are getting categorized wrongly in the validation step. Seems like a good way to understand what is happening under the hood.
You can use model.predict_classes(validation_data) to get the predicted classes for your validation data, and compare these predictions with the actual labels to find out where the model was wrong. Something like this:
predictions = model.predict_classes(validation_data)
wrong = np.where(predictions != Y_validation)
If you are interested in looking 'under the hood', I'd suggest to use
model.predict(validation_data_x)
to see the scores for each class, for each observation of the validation set.
This should shed some light on which categories the model is not so good at classifying. The way to predict the final class is
scores = model.predict(validation_data_x)
preds = np.argmax(scores, axis=1)
be sure to use the proper axis for np.argmax (I'm assuming your observation axis is 1). Use preds to then compare with the real class.
Also, as another exploration you want to see the overall accuracy on this dataset, use
model.evaluate(x=validation_data_x, y=validation_data_y)
I ended up creating a metric which prints the "worst performing category id + score" on each iteration. Ideas from link
import tensorflow as tf
import numpy as np
class MaxIoU(object):
def __init__(self, num_classes):
super().__init__()
self.num_classes = num_classes
def max_iou(self, y_true, y_pred):
# Wraps np_max_iou method and uses it as a TensorFlow op.
# Takes numpy arrays as its arguments and returns numpy arrays as
# its outputs.
return tf.py_func(self.np_max_iou, [y_true, y_pred], tf.float32)
def np_max_iou(self, y_true, y_pred):
# Compute the confusion matrix to get the number of true positives,
# false positives, and false negatives
# Convert predictions and target from categorical to integer format
target = np.argmax(y_true, axis=-1).ravel()
predicted = np.argmax(y_pred, axis=-1).ravel()
# Trick from torchnet for bincounting 2 arrays together
# https://github.com/pytorch/tnt/blob/master/torchnet/meter/confusionmeter.py
x = predicted + self.num_classes * target
bincount_2d = np.bincount(x.astype(np.int32), minlength=self.num_classes**2)
assert bincount_2d.size == self.num_classes**2
conf = bincount_2d.reshape((self.num_classes, self.num_classes))
# Compute the IoU and mean IoU from the confusion matrix
true_positive = np.diag(conf)
false_positive = np.sum(conf, 0) - true_positive
false_negative = np.sum(conf, 1) - true_positive
# Just in case we get a division by 0, ignore/hide the error and set the value to 0
with np.errstate(divide='ignore', invalid='ignore'):
iou = false_positive / (true_positive + false_positive + false_negative)
iou[np.isnan(iou)] = 0
return np.max(iou).astype(np.float32) + np.argmax(iou).astype(np.float32)
~
usage:
custom_metric = MaxIoU(len(catagories))
self.model.compile(loss='categorical_crossentropy',
optimizer=keras.optimizers.Adam(lr=0.001, decay=0.0001),
metrics=[categorical_accuracy, custom_metric.max_iou])

Data loading with variable batch size?

I am currently working on patch based super-resolution. Most of the papers divide an image into smaller patches and then use the patches as input to the models.I was able to create patches using custom dataloader. The code is given below:
import torch.utils.data as data
from torchvision.transforms import CenterCrop, ToTensor, Compose, ToPILImage, Resize, RandomHorizontalFlip, RandomVerticalFlip
from os import listdir
from os.path import join
from PIL import Image
import random
import os
import numpy as np
import torch
def is_image_file(filename):
return any(filename.endswith(extension) for extension in [".png", ".jpg", ".jpeg", ".bmp"])
class TrainDatasetFromFolder(data.Dataset):
def __init__(self, dataset_dir, patch_size, is_gray, stride):
super(TrainDatasetFromFolder, self).__init__()
self.imageHrfilenames = []
self.imageHrfilenames.extend(join(dataset_dir, x)
for x in sorted(listdir(dataset_dir)) if is_image_file(x))
self.is_gray = is_gray
self.patchSize = patch_size
self.stride = stride
def _load_file(self, index):
filename = self.imageHrfilenames[index]
hr = Image.open(self.imageHrfilenames[index])
downsizes = (1, 0.7, 0.45)
downsize = 2
w_ = int(hr.width * downsizes[downsize])
h_ = int(hr.height * downsizes[downsize])
aug = Compose([Resize([h_, w_], interpolation=Image.BICUBIC),
RandomHorizontalFlip(),
RandomVerticalFlip()])
hr = aug(hr)
rv = random.randint(0, 4)
hr = hr.rotate(90*rv, expand=1)
filename = os.path.splitext(os.path.split(filename)[-1])[0]
return hr, filename
def _patching(self, img):
img = ToTensor()(img)
LR_ = Compose([ToPILImage(), Resize(self.patchSize//2, interpolation=Image.BICUBIC), ToTensor()])
HR_p, LR_p = [], []
for i in range(0, img.shape[1] - self.patchSize, self.stride):
for j in range(0, img.shape[2] - self.patchSize, self.stride):
temp = img[:, i:i + self.patchSize, j:j + self.patchSize]
HR_p += [temp]
LR_p += [LR_(temp)]
return torch.stack(LR_p),torch.stack(HR_p)
def __getitem__(self, index):
HR_, filename = self._load_file(index)
LR_p, HR_p = self._patching(HR_)
return LR_p, HR_p
def __len__(self):
return len(self.imageHrfilenames)
Suppose the batch size is 1, it takes an image and gives an output of size [x,3,patchsize,patchsize]. When batch size is 2, I will have two different outputs of size [x,3,patchsize,patchsize] (for example image 1 may give[50,3,patchsize,patchsize], image 2 may give[75,3,patchsize,patchsize] ). To handle this a custom collate function was required that stacks these two outputs along dimension 0. The collate function is given below:
def my_collate(batch):
data = torch.cat([item[0] for item in batch],dim = 0)
target = torch.cat([item[1] for item in batch],dim = 0)
return [data, target]
This collate function concatenates along x (From the above example, I finally get [125,3,patchsize,pathsize]. For training purposes, I need to train the model using a minibatch size of say 25. Is there any method or any functions which I can use to directly get an output of size [25 , 3, patchsize, pathsize] directly from the dataloader using the necessary number of images as input to the Dataloader?
The following code snippet works for your purpose.
First, we define a ToyDataset which takes in a list of tensors (tensors) of variable length in dimension 0. This is similar to the samples returned by your dataset.
import torch
from torch.utils.data import Dataset
from torch.utils.data.sampler import RandomSampler
class ToyDataset(Dataset):
def __init__(self, tensors):
self.tensors = tensors
def __getitem__(self, index):
return self.tensors[index]
def __len__(self):
return len(tensors)
Secondly, we define a custom data loader. The usual Pytorch dichotomy to create datasets and data loaders is roughly the following: There is an indexed dataset, to which you can pass an index and it returns the associated sample from the dataset. There is a sampler which yields an index, there are different strategies to draw indices which give rise to different samplers. The sampler is used by a batch_sampler to draw multiple indices at once (as many as specified by batch_size). There is a dataloader which combines sampler and dataset to let you iterate over a dataset, importantly the data loader also owns a function (collate_fn) which specifies how the multiple samples retrieved from the dataset using the indices from the batch_sampler should be combined. For your use case, the usual PyTorch dichotomy does not work well, because instead of drawing a fixed number of indices, we need to draw indices until the objects associated with the indices exceed the cumulative size we desire. This means we need immediate inspection of the objects and use this knowledge to decide whether to return a batch or keep drawing indices. This is what the custom data loader below does:
class CustomLoader(object):
def __init__(self, dataset, my_bsz, drop_last=True):
self.ds = dataset
self.my_bsz = my_bsz
self.drop_last = drop_last
self.sampler = RandomSampler(dataset)
def __iter__(self):
batch = torch.Tensor()
for idx in self.sampler:
batch = torch.cat([batch, self.ds[idx]])
while batch.size(0) >= self.my_bsz:
if batch.size(0) == self.my_bsz:
yield batch
batch = torch.Tensor()
else:
return_batch, batch = batch.split([self.my_bsz,batch.size(0)-self.my_bsz])
yield return_batch
if batch.size(0) > 0 and not self.drop_last:
yield batch
Here we iterate over the dataset, after drawing an index and loading the associated object, we concatenate it to the tensors we drew before (batch). We keep doing this until we reach the desired size, such that we can cut out and yield a batch. We retain the rows in batch, which we did not yield. Because it may be the case that a single instance exceeds the desired batch_size, we use a while loop.
You could modify this minimal CustomDataloader to add more features in the style of PyTorch's dataloader. There is also no need to use a RandomSampler to draw in indices, others would work equally well. It would also be possible to avoid repeated concats, in case your data is large by using for example a list and keeping track of the cumulative length of its tensors.
Here is an example, that demonstrates it works:
patch_size = 5
channels = 3
dim0sizes = torch.LongTensor(100).random_(1, 100)
data = torch.randn(size=(dim0sizes.sum(), channels, patch_size, patch_size))
tensors = torch.split(data, list(dim0sizes))
ds = ToyDataset(tensors)
dl = CustomLoader(ds, my_bsz=250, drop_last=False)
for i in dl:
print(i.size(0))
(Related, but not exactly in topic)
For batch size adaptation you can use the code as exemplified in this repo. It is implemented for a different purpose (maximize GPU memory usage), but it is not too hard to translate to your problem.
The code does batch adaptation and batch spoofing.
To improve the previous answer, I found a repo that uses DataManger to achieve different patch sizes and batch sizes. It is basically initiating different dataloaders with different settings and a set_epoch function is used to set the appropriate dataloader for a given epoch.

Resources