For example, I'm trying to view the implementation of RoI Pooling in pytorch.
Here is a code fragment showing how to use RoIPool in pytorch
import torch
from torchvision.ops.roi_pool import RoIPool
device = torch.device('cuda')
# create feature layer, proposals and targets
num_proposals = 10
feature_map = torch.randn(1, 64, 32, 32)
proposals = torch.zeros((num_proposals, 4))
proposals[:, 0] = torch.randint(0, 16, (num_proposals,))
proposals[:, 1] = torch.randint(0, 16, (num_proposals,))
proposals[:, 2] = torch.randint(16, 32, (num_proposals,))
proposals[:, 3] = torch.randint(16, 32, (num_proposals,))
roi_pool_obj = RoIPool(3, 2**-1)
roi_pool = roi_pool_obj(feature_map, [proposals])
I'm using pychram, so when I follow RoIPool from the second line, it opens a file located at ~/anaconda3/envs/CV/lib/python3.8/site-package/torchvision/ops/roi_pool.py, which is exactly the same as codes in the documentation.
I pasted the code below without documentations.
from typing import List, Union
import torch
from torch import nn, Tensor
from torch.jit.annotations import BroadcastingList2
from torch.nn.modules.utils import _pair
from torchvision.extension import _assert_has_ops
from ..utils import _log_api_usage_once
from ._utils import convert_boxes_to_roi_format, check_roi_boxes_shape
def roi_pool(
input: Tensor,
boxes: Union[Tensor, List[Tensor]],
output_size: BroadcastingList2[int],
spatial_scale: float = 1.0,
) -> Tensor:
if not torch.jit.is_scripting() and not torch.jit.is_tracing():
_log_api_usage_once(roi_pool)
_assert_has_ops()
check_roi_boxes_shape(boxes)
rois = boxes
output_size = _pair(output_size)
if not isinstance(rois, torch.Tensor):
rois = convert_boxes_to_roi_format(rois)
output, _ = torch.ops.torchvision.roi_pool(input, rois, spatial_scale, output_size[0], output_size[1])
return output
class RoIPool(nn.Module):
def __init__(self, output_size: BroadcastingList2[int], spatial_scale: float):
super().__init__()
_log_api_usage_once(self)
self.output_size = output_size
self.spatial_scale = spatial_scale
def forward(self, input: Tensor, rois: Tensor) -> Tensor:
return roi_pool(input, rois, self.output_size, self.spatial_scale)
def __repr__(self) -> str:
s = f"{self.__class__.__name__}(output_size={self.output_size}, spatial_scale={self.spatial_scale})"
return s
So, in the code example:
When running roi_pool_obj = RoIPool(3, 2**-1) it will create an instance of RoIPool by calling its __init__ method, which only initialized two instance variables;
When running roi_pool = roi_pool_obj(feature_map, [proposals]), it must have called the forward() method (but I don't know how) which then called the roi_pool() function above;
When running the roi_pool() function, it did some checking first and then computed output with the line output, _ = torch.ops.torchvision.roi_pool(input, rois, spatial_scale, output_size[0], output_size[1]).
But this doesn't show details of how roi_pool is implemented and pycharm showed Cannot find declaration to go to when I tried to follow torch.ops.torchvision.roi_pool.
To summarize, I have two questions:
How does the forward() called by running roi_pool = roi_pool_obj(feature_map, [proposals])?
How can I view the source code of torch.ops.torchvision.roi_pool or where is the file containing it's implementaion located?
Last but not least, I've just started reading source code which is pretty difficult for me. I'd appreciate it if you can also provide some advice or tutorials.
RoIPool is a subclass of torch.nn.Module. Source code:
https://github.com/pytorch/vision/blob/07ae61bf9c21ddd1d5f65d326aa9636849b383ca/torchvision/ops/roi_pool.py#L56
nn.Module defines __call__ method which in turn calls forward method. Source code:
https://github.com/pytorch/pytorch/blob/b2311192e6c4745aac3fdd774ac9d56a36b396d4/torch/nn/modules/module.py#L1234
When you executing roi_pool = roi_pool_obj(feature_map, [proposals]) statement the __call__ method uses the forward() of RoiPool. Source code:
https://github.com/pytorch/vision/blob/07ae61bf9c21ddd1d5f65d326aa9636849b383ca/torchvision/ops/roi_pool.py#L67
RoiPool.forward calls torch.ops.torchvision.roi_pool.
https://github.com/pytorch/vision/blob/07ae61bf9c21ddd1d5f65d326aa9636849b383ca/torchvision/ops/roi_pool.py#L52
ops is a object which loads native libraries implemented in c++:
https://github.com/pytorch/pytorch/blob/b2311192e6c4745aac3fdd774ac9d56a36b396d4/torch/_ops.py#L537
so when you call torch.ops.torchvision it will use torchvision library.
Here the roi_pool function is registered:
https://github.com/pytorch/vision/blob/7947fc8fb38b1d3a2aca03f22a2e6a3caa63f2a0/torchvision/csrc/ops/roi_pool.cpp#L53
Here you can find the actual implementation of rol_pool
CPU:
https://github.com/pytorch/vision/blob/7947fc8fb38b1d3a2aca03f22a2e6a3caa63f2a0/torchvision/csrc/ops/cpu/roi_pool_kernel.cpp
GPU:
https://github.com/pytorch/vision/blob/7947fc8fb38b1d3a2aca03f22a2e6a3caa63f2a0/torchvision/csrc/ops/cuda/roi_pool_kernel.cu
Related
I was debugging my pytorch code and found that an instance of the class DataLoader seems to be a global variable by default. I don't understand why this is the case but I've set up a minimum working example as below that should reproduce my observation. The code is below:
import numpy as np
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
class MyDataset(Dataset):
def __init__(self, df, n_feats, mode):
data = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]]).transpose()
x = data[:, list(range(n_feats))] # features
y = data[:, -1] # target
self.x = torch.FloatTensor(x)
self.y = torch.FloatTensor(y)
def __getitem__(self, index):
return self.x[index], self.y[index]
def __len__(self):
return len(self.x)
def prep_dataloader(df, n_feats, mode, batch_size):
dataset = MyDataset(df, n_feats, mode)
dataloader = DataLoader(dataset, batch_size, shuffle=False)
return dataloader
tr_set = prep_dataloader(df, 1, 'train', 200)
def test():
print(tr_set)
As shown above, tr_set was created before the function test and is not passed to test. However, running the code above, I got the following result:
<torch.utils.data.dataloader.DataLoader object at 0x7fb6c2ea7610>
Originally, I was expecting to get an error like "NameError: name 'tr_set' is not defined". However, the function was aware of tr_set and printed the object of tr_set even if tr_set was not passed as an argument. I'm confused with this because in this case tr_set seems like a global variable.
I'm wondering about the reason for this and possible ways that I can prevent it from becoming a global variable. Thank you!
(Update: In case that this matters, I was running the code above in a jupyter notebook.)
This doesn't have to do with DataLoader or how PyTorch functions.
It is not actually a global variable, but since tr_set was in the outer scope, in the first level of your file it's accessible to other components of this same file. However this same variable won't be accessible by other modules, for example, hence the fact it is not a global variable. The reason function test has access to tr_set is because of closure made on that variable, i.e. the variable is carried through to test's inner scope.
I try to freeze the batch_norm layer and analyse their inputs/outputs with forward hooks
For fixed BN layers, I just couldn't understand why the hooked output is different from the output reproduced by the hooked input.
Really appreciate that if anyone could help me
Here's the code:
import torch
import torchvision
import numpy
def set_bn_eval(m):
classname = m.__class__.__name__
if classname.find('BatchNorm') != -1:
m.eval()
image = torch.randn((1, 3, 224, 224))
res = torchvision.models.resnet50(pretrained=True)
res.apply(set_bn_eval)
b = res(image)
layer_out = []
layer_in = []
def layer_hook(mod, inp, out):
layer_out.append(out)
layer_in.append(inp[0])
for name, key in res.named_modules():
hook = key.register_forward_hook(layer_hook)
res(image)
hook.remove()
out = layer_out.pop()
inp = layer_in.pop()
try:
assert (out.equal(key(inp)))
except AssertionError:
print(name)
break
TLDR; Some operators will only appear in the forward of the module: such as non-parametrized layers.
Some components are not registered in the child module list. This can usually be the case for activation functions but will ultimately depend on the module implementation. In your case, ResNet's Bottleneck section as its ReLUs applied in the forward definition, just after the batch normalization layer is called.
This means the output you will catch with the layer hook will be different from the tensor you compute from just the module and its input.
for name, module in res.named_modules():
if name != 'bn1':
hook = module.register_forward_hook(layer_hook)
res(image)
hook.remove()
inp = layer_in.pop()
out = layer_out.pop()
assert out.equal(F.relu(module(inp)))
Therefore, it's a bit tricky to actually implement since you can't rely entirely on the content of res.named_modules().
I am trying to convert a pre-trained torch model to ONNX, but recive the following error:
RuntimeError: step!=1 is currently not supported
I'm trying this on a pre-trained colorization model: https://github.com/richzhang/colorization
Here is the code I ran in Google Colab:
!git clone https://github.com/richzhang/colorization.git
cd colorization/
import colorizers
model = colorizer_siggraph17 = colorizers.siggraph17(pretrained=True).eval()
input_names = [ "input" ]
output_names = [ "output" ]
dummy_input = torch.randn(1, 1, 256, 256, device='cpu')
torch.onnx.export(model, dummy_input, "test_converted_model.onnx", verbose=True,
input_names=input_names, output_names=output_names)
I appreciate any help :)
UPDATE 1: #Proko suggestion solved the ONNX export issue. Now I have a new possibly related problem when I try to convert the ONNX to TensorRT. I get the following error:
[TensorRT] ERROR: Network must have at least one output
Here is the code I used:
import torch
import pycuda.driver as cuda
import pycuda.autoinit
import tensorrt as trt
import onnx
TRT_LOGGER = trt.Logger()
def build_engine(onnx_file_path):
# initialize TensorRT engine and parse ONNX model
builder = trt.Builder(TRT_LOGGER)
builder.max_workspace_size = 1 << 25
builder.max_batch_size = 1
if builder.platform_has_fast_fp16:
builder.fp16_mode = True
network = builder.create_network()
parser = trt.OnnxParser(network, TRT_LOGGER)
# parse ONNX
with open(onnx_file_path, 'rb') as model:
print('Beginning ONNX file parsing')
parser.parse(model.read())
print('Completed parsing of ONNX file')
# generate TensorRT engine optimized for the target platform
print('Building an engine...')
engine = builder.build_cuda_engine(network)
context = engine.create_execution_context()
print("Completed creating Engine")
return engine, context
ONNX_FILE_PATH = 'siggraph17.onnx' # Exported using the code above
engine,_ = build_engine(ONNX_FILE_PATH)
I tried to force the build_engine function to use the output of the network by:
network.mark_output(network.get_layer(network.num_layers-1).get_output(0))
but it did not work.
I appropriate any help!
Like I have mentioned in a comment, this is because slicing in torch.onnx supports only step = 1 but there are 2-step slicing in the model:
self.model2(conv1_2[:,:,::2,::2])
Your only option as for now is to rewrite slicing to be some other ops. You can do it by using range and reshape to obtain proper indices. Consider the following function "step-less-arange" (I hope it is generic enough for anyone with similar problem):
def sla(x, step):
diff = x % step
x += (diff > 0)*(step - diff) # add length to be able to reshape properly
return torch.arange(x).reshape((-1, step))[:, 0]
usage:
>> sla(11, 3)
tensor([0, 3, 6, 9])
Now you can replace every slice like this:
conv2_2 = self.model2(conv1_2[:,:,self.sla(conv1_2.shape[2], 2),:][:,:,:, self.sla(conv1_2.shape[3], 2)])
NOTE: you should optimize it. Indices are calculated for every call so it might be wise to pre-compute it.
I have tested it with my fork of the repo and I was able to save the model:
https://github.com/prokotg/colorization
What works for me was to add the opset_version=11 on torch.onnx.export
First I had tried use opset_version=10, but the API suggest 11 so it works.
So your function should be:
torch.onnx.export(model, dummy_input, "test_converted_model.onnx", verbose=True,opset_version=11,
input_names=input_names, output_names=output_names)
I can't work out how to invoke my .tflite model that does matmul on the coral accelerator using the python api.
The .tflite model is generated from some example code here. It works well using the tf.lite.Interpreter() class but I don't know how to transform it to work with the edgetpu class. I have tried edgetpu.basic.basic_engine.BasicEngine() by changing the models datatype from numpy.float32 to numpy.uint8, but that did not help. I am a complete beginner with TensorFlow and just want to use my tpu for matmul.
import numpy
import tensorflow as tf
import edgetpu
from edgetpu.basic.basic_engine import BasicEngine
def export_tflite_from_session(session, input_nodes, output_nodes, tflite_filename):
print("Converting to tflite...")
converter = tf.lite.TFLiteConverter.from_session(session, input_nodes, output_nodes)
tflite_model = converter.convert()
with open(tflite_filename, "wb") as f:
f.write(tflite_model)
print("Converted %s." % tflite_filename)
#This does matmul just fine but does not use the TPU
def test_tflite_model(tflite_filename, examples):
print("Loading TFLite interpreter for %s..." % tflite_filename)
interpreter = tf.lite.Interpreter(model_path=tflite_filename)
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
print("input details: %s" % input_details)
print("output details: %s" % output_details)
for i, input_tensor in enumerate(input_details):
interpreter.set_tensor(input_tensor['index'], examples[i])
interpreter.invoke()
model_output = []
for i, output_tensor in enumerate(output_details):
model_output.append(interpreter.get_tensor(output_tensor['index']))
return model_output
#this should use the TPU, but I don't know how to run the model or if it needs
#further processing. One matrix can be constant for my use case
def test_tpu(tflite_filename,examples):
print("Loading TFLite interpreter for %s..." % tflite_filename)
#TODO edgetpu.basic
interpreter = BasicEngine(tflite_filename)
interpreter.allocate_tensors()#does not work...
def main():
tflite_filename = "model.tflite"
shape_a = (2, 2)
shape_b = (2, 2)
a = tf.placeholder(dtype=tf.float32, shape=shape_a, name="A")
b = tf.placeholder(dtype=tf.float32, shape=shape_b, name="B")
c = tf.matmul(a, b, name="output")
numpy.random.seed(1234)
a_ = numpy.random.rand(*shape_a).astype(numpy.float32)
b_ = numpy.random.rand(*shape_b).astype(numpy.float32)
with tf.Session() as session:
session_output = session.run(c, feed_dict={a: a_, b: b_})
export_tflite_from_session(session, [a, b], [c], tflite_filename)
tflite_output = test_tflite_model(tflite_filename, [a_, b_])
tflite_output = tflite_output[0]
#test the TPU
tflite_output = test_tpu(tflite_filename, [a_, b_])
print("Input example:")
print(a_)
print(a_.shape)
print(b_)
print(b_.shape)
print("Session output:")
print(session_output)
print(session_output.shape)
print("TFLite output:")
print(tflite_output)
print(tflite_output.shape)
print(numpy.allclose(session_output, tflite_output))
if __name__ == '__main__':
main()
You're only converting your model once, and your model is not fully compiled for the Edge TPU. From the docs:
At the first point in the model graph where an unsupported operation occurs, the compiler partitions the graph into two parts. The first part of the graph that contains only supported operations is compiled into a custom operation that executes on the Edge TPU, and everything else executes on the CPU
There are several specific requirements that the model must meet:
quantization-aware training
constant tensor sizes and model parameters at compile time
tensors are 3-dimensional or smaller.
models only use operations supported by the Edge TPU.
There is an online compiler as well as a CLI version that is useful for translating .tflite models into Edge TPU compatible .tflite models.
Your code is also incomplete. You've passed your model to the class here:
interpreter = BasicEngine(tflite_filename)
but you're missing the step of actually running the inference on the tensor:
output = RunInference(interpreter)
I'm trying to use PyTorch with complex loss function. In order to accelerate the code, I hope that I can use the PyTorch multiprocessing package.
The first trial, I put 10x1 features into the NN and get 10x4 output.
After that, I want to pass 10x4 parameters into a function to do some calculation. (The calculation will be complex in the future.)
After calculating, the function will return a 10x1 array in total. This array will be set as NN_energy and calculate loss function.
Besides, I also want to know if there is another method to create a backward-able array to store the NN_energy array, instead of using
NN_energy = net(Data_in)[0:10,0]
Thanks a lot.
Full Code:
import torch
import numpy as np
from torch.autograd import Variable
from torch import multiprocessing
def func(msg,BOP):
ans = (BOP[msg][0]+BOP[msg][1]/BOP[msg][2])*BOP[msg][3]
return ans
class Net(torch.nn.Module):
def __init__(self, n_feature, n_hidden_1, n_hidden_2, n_output):
super(Net, self).__init__()
self.hidden_1 = torch.nn.Linear(n_feature , n_hidden_1) # hidden layer
self.hidden_2 = torch.nn.Linear(n_hidden_1, n_hidden_2) # hidden layer
self.predict = torch.nn.Linear(n_hidden_2, n_output ) # output layer
def forward(self, x):
x = torch.tanh(self.hidden_1(x)) # activation function for hidden layer
x = torch.tanh(self.hidden_2(x)) # activation function for hidden layer
x = self.predict(x) # linear output
return x
if __name__ == '__main__': # apply_async
Data_in = Variable( torch.from_numpy( np.asarray(list(range( 0,10))).reshape(10,1) ).float() )
Ground_truth = Variable( torch.from_numpy( np.asarray(list(range(20,30))).reshape(10,1) ).float() )
net = Net( n_feature=1 , n_hidden_1=15 , n_hidden_2=15 , n_output=4 ) # define the network
optimizer = torch.optim.Rprop( net.parameters() )
loss_func = torch.nn.MSELoss() # this is for regression mean squared loss
NN_output = net(Data_in)
args = range(0,10)
pool = multiprocessing.Pool()
return_data = pool.map( func, zip(args, NN_output) )
pool.close()
pool.join()
NN_energy = net(Data_in)[0:10,0]
for i in range(0,10):
NN_energy[i] = return_data[i]
loss = torch.sqrt( loss_func( NN_energy , Ground_truth ) ) # must be (1. nn output, 2. target)
print(loss)
Error messages:
File
"C:\ProgramData\Anaconda3\lib\site-packages\torch\multiprocessing\reductions.py",
line 126, in reduce_tensor
raise RuntimeError("Cowardly refusing to serialize non-leaf tensor which requires_grad, "
RuntimeError: Cowardly refusing to serialize non-leaf tensor which
requires_grad, since autograd does not support crossing process
boundaries. If you just want to transfer the data, call detach() on
the tensor before serializing (e.g., putting it on the queue).
First of all, Torch Variable API is deprecated since a very long time, just don't use it.
Next, torch.from_numpy( np.asarray(list(range( 0,10))).reshape(10,1) ).float() is wrong at many levels: np.asarray of list is useless since a copy will be performed anyway, and np.array takes list as input by design. Then, np.arange is available to return a range as numpy array, and it is also available on Torch. Next, specifying both dimension for reshape is useless and error prone, you could simply do reshape((-1, 1)), or even better unsqueeze(-1).
Here is the simplified expression torch.arange(10, dtype=torch.float32, requires_grad=True).unsqueeze(-1).
Using multiprocessing pool is a bad practice if using batch processing is possible. It will be both way more efficient and readable. Indeed, performing N small algebraic operations in parallel is always slower and a larger single algebraic operation, and even more on GPU. More importantly, computing the gradient is not supported by multiprocessing, hence the error that you get. Yet, this is partially true, because it is supports for tensors on cpu since 1.6.0. Have a lok, to the official release changelog.
Could you post a more representative example of what func method could be to make sure you really need it ?
NB: Distributed autograd as you are looking is now available in Pytorch as an experimental feature available in beta since 1.6.0. Have a look to the official documentation.