I am trying to parallelly do inference using multi-core CPU and single GPU, however I just got following runtime errors. I have attached a simplified example which can reproduce the errors.
Runtime Error
THCudaCheck FAIL file=c:\a\w\1\s\tmp_conda_3.6_091443\conda\conda-bld\pytorch_1544087948354\work\torch\csrc\generic\StorageSharing.cpp line=232 error=71 : operation not supported
File "C:\Users\Anaconda3\lib\site-packages\torch\multiprocessing\reductions.py", line 213, in reduce_tensor
(device, handle, storage_size_bytes, storage_offset_bytes) = storage._share_cuda_()
RuntimeError: cuda runtime error (71) : operation not supported at c:\a\w\1\s\tmp_conda_3.6_091443\conda\conda-bld\pytorch_1544087948354\work\torch\csrc\generic\StorageSharing.cpp:232
A simplified example which can reproduce the errors
import torch
from torch import nn
# model used to do inference
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.fc1 = nn.Linear(100,1)
def forward(self,x):
return self.fc1(x)
# class running inference
class A(object):
def __init__(self):
pass
def do_something(self, model):
# do something
x = torch.randn(100).view(-1)
print(model.forward(x))
def run(self):
mp = torch.multiprocessing.get_context('spawn')
processes = []
for i in range(2):
p = mp.Process(target=self.do_something, args=(Model().cuda(),))
processes.append(p)
for p in processes:
p.start()
if __name__ == '__main__':
a = A()
a.run()
It would be greatly appreciated if anyone can help solve this problem. By the way, my PC runs on Windows 10 with one GTX 1070 GPU.
Related
I want to tune the learning rate for my PyTorch Lightning model. My code runs on a GPU cluster, so I can only write to certain folders that I bind mount. However, trainer.tuner.lr_find tries to write the checkpoint to the folder where my script runs and since this folder is not writable, it fails with the following error:
OSError: [Errno 30] Read-only file system: '/opt/xrPose/.lr_find_43df1c5c-0aed-4205-ac56-2fe4523ca4a7.ckpt'
Is there anyway to change the checkpoint path for lr_find? I checked the documentation but I couldn't find any information on that, in the part related to checkpointing.
My code is below:
res = trainer.tuner.lr_find(model, train_dataloaders=train_dataloader, val_dataloaders=val_dataloader, min_lr=1e-5)
logging.info(f"suggested learning rate: {res.suggestion()}")
model.hparams.learning_rate = res.suggestion()
You may need to specify default_root_dir when initialize Trainer:
trainer = Trainer(default_root_dir='./my_dir')
Description from the Official Documentation:
default_root_dir - Default path for logs and weights when no logger or
pytorch_lightning.callbacks.ModelCheckpoint callback passed.
Code example:
import numpy as np
import torch
from pytorch_lightning import LightningModule, Trainer
from torch.utils.data import DataLoader, Dataset
class MyDataset(Dataset):
def __init__(self) -> None:
super().__init__()
def __getitem__(self, index):
x = np.zeros((10,), np.float32)
y = np.zeros((1,), np.float32)
return x, y
def __len__(self):
return 100
class MyModel(LightningModule):
def __init__(self):
super().__init__()
self.model = torch.nn.Linear(10, 1)
def forward(self, x):
return self.model(x)
def training_step(self, batch, batch_idx):
x, y = batch
y_hat = self(x)
loss = torch.nn.MSELoss()(y_hat, y)
return loss
def configure_optimizers(self):
return torch.optim.Adam(self.parameters(), lr=0.02)
model = MyModel()
trainer = Trainer(default_root_dir='./my_dir')
train_dataloader = DataLoader(MyDataset())
trainer.tuner.lr_find(model, train_dataloader)
As it is defined in the lr_finder.py as:
# Save initial model, that is loaded after learning rate is found
ckpt_path = os.path.join(trainer.default_root_dir, f".lr_find_{uuid.uuid4()}.ckpt")
trainer.save_checkpoint(ckpt_path)
The only way of changing the directory for saving the checkpoint is to change the default_root_dir. But be aware that this is also the directory that the lightning logs are saved to.
You can easily change it with trainer = Trainer(default_root_dir='./NAME_OF_THE_DIR').
Let's say I have a DataLoader
dataloader = DataLoader(dataset, batch_size=32)
I want to define a neural network that can feed forward my custom function. I know in traditional fully-connected network, I could just using its Linear or other already existed functions. And, importantly, those functions can feed with DataLoader and automatically run with batch.
Now I want to define my own function, but I don't know how to write it without for loop. For example (I randomly write some custom function f(x)),
def f(x):
x = np.sin(np.exp(x)) + np.log(x) - 1/x
return x
class NeuralNet(nn.Module):
def __init__(self, input_dim):
super(NeuralNet, self).__init__()
self.net = f
def forward(self, x_batch):
result = torch.zeros(len(x_batch))
for i in len(x_batch):
result[i] = self.net(x_batch[i])
return result
Or for loop in f
def f(x_batch):
result = torch.zeros(len(x_batch))
for i in range(x_batch):
result[i] = np.sin(np.exp(x_batch[i])) + np.log(x_batch[i]) - 1/x_batch[i]
return result
class NeuralNet(nn.Module):
def __init__(self, input_dim):
super(NeuralNet, self).__init__()
self.net = f
def forward(self, x_batch):
return self.net(x_batch)
Is there any way to get rid of for loop, in order to do parallel calculation on GPU? Cause I think for loop is not going to utilize the advantage of GPU. Or did I misunderstand something?
Basic mathematical operators implemented in PyTorch work with n-dimension tensors out-of-the-box. If you define your f function this way, you can feed the whole batch in one go.
In your case you could do:
def f(x):
x = torch.sin(torch.exp(x)) + torch.log(x) - 1/x
return x
class NeuralNet(nn.Module):
def __init__(self):
super(NeuralNet, self).__init__()
self.net = f
def forward(self, x_batch):
result = self.net(x_batch)
return result
>>> net = NeuralNet()
>>> net(torch.tensor([[1,2,3]]))
If you stick with PyTorch operators (most have a backward function implemented) and don't switch to Numpy at some point. Then you will be able to call the backward pass on your model.
Of course, all of these computations can be carried out on the GPU!
Pennylane provides a torch layer
https://pennylane.readthedocs.io/en/stable/code/api/pennylane.qnn.TorchLayer.html
I need to create a DataLoader where the collator function would require to have non trivial computation, actually a double layer loop which is significantly slowing down the training process. For example, consider this toy code where I try to use numba to JIT the collate function:
import torch
import torch.utils.data
import numba as nb
class Dataset(torch.utils.data.Dataset):
def __init__(self):
self.A = np.zeros((100000, 300))
self.B = np.ones((100000, 300))
def __getitem__(self, index):
return self.A[index], self.B[index]
def __len__(self):
return self.A.shape[0]
#nb.njit(cache=True)
def _collate_fn(batch):
batch_data = np.zeros((len(batch), 300))
for i in range(len(batch)):
batch_data[i] = batch[i][0] + batch[i][1]
return batch_data
and then I create the DataLoader as follows:
train_dataset = Dataset()
train_loader = torch.utils.data.DataLoader(
train_dataset,
batch_size=256,
num_workers=6,
collate_fn=_collate_fn,
shuffle=True)
However this just gets stuck but works fine if I remove the JITing of the _collate_fn. I am not able to understand what is happening here. I don't have to stick to numba and can use anything which will help me overcome the loop inefficiencies in Python. TIA and Happy 12,021.
If my model contains only nn.Module layers such as nn.Linear, nn.DataParallel works fine.
x = torch.randn(100,10)
class normal_model(torch.nn.Module):
def __init__(self):
super(normal_model, self).__init__()
self.layer = torch.nn.Linear(10,1)
def forward(self, x):
return self.layer(x)
model = normal_model()
model = nn.DataParallel(model.to('cuda:0'))
model(x)
However, when my model contains a tensor operation such as the following
class custom_model(torch.nn.Module):
def __init__(self):
super(custom_model, self).__init__()
self.layer = torch.nn.Linear(10,5)
self.weight = torch.ones(5,1, device='cuda:0')
def forward(self, x):
return self.layer(x) # self.weight
model = custom_model()
model = torch.nn.DataParallel(model.to('cuda:0'))
model(x)
It gives me the following error
RuntimeError: Caught RuntimeError in replica 1 on device 1. Original
Traceback (most recent call last): File
"/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py",
line 60, in _worker
output = module(*input, **kwargs) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py",
line 541, in call
result = self.forward(*input, **kwargs) File "", line 7, in forward
return self.layer(x) # self.weight RuntimeError: arguments are located on different GPUs at
/pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:277
How to avoid this error when we have some tensor operations in our model?
I have no experience with DataParallel, but I think it might be because your tensor is not part of the model parameters. You can do this by writing:
torch.nn.Parameter(torch.ones(5,1))
Note that you don't have to move it to the gpu when initializing, because now when you call model.to('cuda:0') this is done automatically.
I can imagine that DataParallel uses the model parameters to move them to the appropriate gpu.
See this answer for more on the difference between a torch tensor and torch.nn.Parameter.
If you don't want the tensor values to be updated by backpropagation during training, you can add requires_grad=False.
Another way that might work is to override the to method, and initialize the tensor in the forward pass:
class custom_model(torch.nn.Module):
def __init__(self):
super(custom_model, self).__init__()
self.layer = torch.nn.Linear(10,5)
def forward(self, x):
return self.layer(x) # torch.ones(5,1, device=self.device)
def to(self, device: str):
new_self = super(custom_model, self).to(device)
new_self.device = device
return new_self
or something like this:
class custom_model(torch.nn.Module):
def __init__(self, device:str):
super(custom_model, self).__init__()
self.layer = torch.nn.Linear(10,5)
self.weight = torch.ones(5,1, device=device)
def forward(self, x):
return self.layer(x) # self.weight
def to(self, device: str):
new_self = super(custom_model, self).to(device)
new_self.device = device
new_self.weight = torch.ones(5,1, device=device)
return new_self
Adding to the answer from #Elgar de Groot since OP also wanted to freeze that layer. To do so you can still use torch.nn.Parameter but then you explicitly set requires_grad to false like this:
self.layer = torch.nn.Parameter(torch.ones(5,1))
self.layer.requires_grad = False
I am beginner pytorch user, and I am trying to use dataloader.
Actually, I am trying to implement this into my network but it takes a very long time to load. And so, I debugged my network to see if the network itself has the problem, but it turns out it has something to with my dataloader class. Here is the code:
from torch.utils.data import Dataset, DataLoader
import numpy as np
import pandas as pd
class DiabetesDataset(Dataset):
def __init__(self, csv):
self.xy = pd.read_csv(csv)
def __len__(self):
return len(self.xy)
def __getitem__(self, index):
self.x_data = torch.Tensor(xy.iloc[:, 0:-1].values)
self.y_data = torch.Tensor(xy.iloc[:, [-1]].values)
return self.x_data[index], self.y_data[index]
dataset = DiabetesDataset("trial.csv")
train_loader = DataLoader(dataset=dataset,
batch_size=1,
shuffle=True,
num_workers=2)`
for a in train_loader:
print(a)
To verify that the dataloader causes all the delay, I created a dummy csv file with 2 columns of 1s and 2s, for a total of 10 samples for each columns. Then, I looped over the train_loader object, it has been more than 1 hr and it is still running, considering that the sample size is small and batch size is set to 1.
I am not sure as to what the error to my code is and it is causing this issue.
Any comments/inputs are greatly appreciated!
There are some bugs in your code - could you check if this works (it is working on my computer with your toy example):
from torch.utils.data import Dataset, DataLoader
import numpy as np
import pandas as pd
import torch
class DiabetesDataset(Dataset):
def __init__(self, csv):
self.xy = pd.read_csv(csv)
def __len__(self):
return len(self.xy)
def __getitem__(self, index):
x_data = torch.Tensor(self.xy.iloc[:, 0:-1].values)
y_data = torch.Tensor(self.xy.iloc[:, [-1]].values)
return x_data[index], y_data[index]
dataset = DiabetesDataset("trial.csv")
train_loader = DataLoader(
dataset=dataset,
batch_size=1,
shuffle=True,
num_workers=2)
if __name__ == '__main__':
for a in train_loader:
print(a)
Edit: Your code is not working because you are missing a self in the __getitem__ method (self.xy.iloc...) and because you do not have a if __name__ == '__main__ at the end of your script. For the second error, see RuntimeError on windows trying python multiprocessing