Documenting jit-compiled PyTorch Class Method (Sphinx) - pytorch

I am having a problem trying to document custom PyTorch models with Sphinx: methods that are jit-compiled show up without docstrings in the documentation. How do I fix this? I checkoed out Python Sphinx autodoc and decorated members and How to autodoc decorated methods with sphinx? but the proposed solutions don't seem to work. When I try using ..automethod I get
AttributeError: '_CachedForward' object has no attribute '__getattr__'
Here's a MWE; at the moment I circumvent the problem by writing a my_forward and calling it in the forward.
from torch import jit, Tensor
class DummyModel(jit.ScriptModule):
"""My dummy model"""
def __init__(self, const: float):
super(DummyModel, self).__init__()
self.const = Tensor(const)
#jit.script_method
def forward(self, x: Tensor) -> Tensor:
"""This method does show as a :undoc-member: in the documentation"""
return self.my_forward(x)
def my_forward(self, x: Tensor) -> Tensor:
"""This method shows up as a :member: in the documentation"""
return x + self.const

Set the PYTORCH_JIT environment variable to 0 when running Sphinx. This will disable script and tracing annotations (decorators).
See https://pytorch.org/docs/stable/jit.html#disable-jit-for-debugging.

Related

Got error "AttributeError: 'TimestepEmbedSequential' object has no attribute '__globals__'" while torch.jit.script()

While trying to script Stable Diffusion model using torch.jit.script(), I got this following error:
AttributeError: 'TimestepEmbedSequential' object has no attribute '__globals__'
I'm trying to export this model to ONNX and found out that running torch.onnx.export() will torch.jit.trace the models, which unrolls every loops, so I'm trying to use script first.
When I follow the traceback, the error occurs in this function while reading fn.__globals__ in _jit_internal.py from torch
def get_closure(fn):
"""
Get a dictionary of closed over variables from a function
"""
captures = {}
captures.update(fn.__globals__)
for index, captured_name in enumerate(fn.__code__.co_freevars):
captures[captured_name] = fn.__closure__[index].cell_contents
return captures
code for scripting is as follows:
stablediffusion_wrapper = StableDiffusionWrapper(model, sampler, opt)
scripted_module = torch.jit.script(stablediffusion_wrapper, example_inputs=[(token_dummy, img_dummy)])
and TimestepEmbedSequential module:
class TimestepEmbedSequential(nn.Sequential, TimestepBlock):
"""
A sequential module that passes timestep embeddings to the children that
support it as an extra input.
"""
def forward(self, x, emb, context=None):
for layer in self:
if isinstance(layer, TimestepBlock):
x = layer(x, emb)
elif isinstance(layer, SpatialTransformer):
x = layer(x, context)
else:
x = layer(x)
return x
Any suggestions how can I figure it out?
I tried to set #torch.jit.export decorator to the parent class TimestepBlock, but showed no effect. Actually, I have no idea what to look for this problem. I would appreciate any suggestions. Thank you

Subclass of PyTorch DataLoader for changing batch output

I'm interested in a way of applying a transform to a batch generated by a PyTorch DataLoader class. My minimal example is something like this:
class CustomLoader(torch.utils.data.DataLoader):
def __iter__(self):
result = super().__iter__()
return some_function(result)
But this errors since the DataLoader.__iter()__ returns _MultiProcessingDataLoaderIter or _SingleProcessingDataLoaderIter. Weirdly though, directly returning the output does return a Tensor, so any explanation there would be greatly appreciated!
I understand that in general, transform to data should be done in the subclassed Dataset class. However, in my case the data is tabular and the transform is via numpy, and doing it on a sample-wise basis is much slower (5x) than doing it on an entire batch, since surely these operations are vectorized under the hood.
I know I can do something simple like
for X, y in loader:
X = some_function(X)
But I'd also like to use the DataLoader with pytorch-lightning, so this isn't an option.
What is the proper way to subclass PyTorch Dataloaders?
__iter__() is a generator. You will need to yield the result instead of returning it. You can read more about generators here
Regarding your problem to apply a transform to a batch, you can create a custom Dataset instead of DataLoader and then apply the transforms.
class MyDataset(Dataset):
def __init__(self, transforms=None):
super().__init__()
self.data = ... # define your data here
self.transforms = transforms
def __getitem__(self, idx):
x = self.data[idx]
if self.transforms: x = self.transforms(x)
return x
# use your `MyDataset` class for creating your dataloader
dataloader = DataLoader(MyDataset(transforms = CustomTransforms(), batch_size=4)
You can use this dataloader with PyTorch Lightning Trainer as well.
If you are using PyTorch Lightning, I would suggest you to join our Slack channel and ask questions on Github Discussions as well.
Thanks :)
EDIT: (Add transforms to Batch)
If you are using PyTorch Lightning then I would recommend to use LightningDataModule which provides on_before_batch_transfer hook that can be used to apply transforms on a batch ;)
Here is an example:
def on_before_batch_transfer(self, batch, dataloader_idx):
batch['x'] = transforms(batch['x'])
return batch
Checkout the documentation for more

Understanding the GIoU loss function in tensorflow

The custom Loss function I am looking at is as follows:
#tf.keras.utils.register_keras_serializable(package="Addons")
class GIoULoss(LossFunctionWrapper):
#typechecked
def __init__(
self,
mode: str = "giou",
reduction: str = tf.keras.losses.Reduction.AUTO,
name: Optional[str] = "giou_loss",
):
super().__init__(giou_loss, name=name, reduction=reduction, mode=mode)
#tf.keras.utils.register_keras_serializable(package="Addons")
def giou_loss(y_true: TensorLike, y_pred: TensorLike, mode: str = "giou") -> tf.Tensor:
My questions:
Majority of the custom loss functions I have looked at work without using LossFunctionWrapper. So under what circumstances is LossFunctionWrapper used and when can you work without it?
I haven't been able to fully understand from the tensorflow documentation what #tf.keras.utils.register_keras_serializable(package="Addons") does.
Any help?

How to use a Batchsampler within a Dataloader

I have a need to use a BatchSampler within a pytorch DataLoader instead of calling __getitem__ of the dataset multiple times (remote dataset, each query is pricy). I cannot understand how to use the batchsampler with any given dataset.
e.g
class MyDataset(Dataset):
def __init__(self, remote_ddf, ):
self.ddf = remote_ddf
def __len__(self):
return len(self.ddf)
def __getitem__(self, idx):
return self.ddf[idx] --------> This is as expensive as a batch call
def get_batch(self, batch_idx):
return self.ddf[batch_idx]
my_loader = DataLoader(MyDataset(remote_ddf),
batch_sampler=BatchSampler(Sampler(), batch_size=3))
The thing I do not understand, neither found any example online or in torch docs, is how do I use my get_batch function instead of the __getitem__ function.
Edit:
Following the answer of Szymon Maszke, this is what I tried and yet, \_\_get_item__ gets one index each call, instead of a list of size batch_size
class Dataset(Dataset):
def __init__(self):
...
def __len__(self):
...
def __getitem__(self, batch_idx): ------> here I get only one index
return self.wiki_df.loc[batch_idx]
loader = DataLoader(
dataset=dataset,
batch_sampler=BatchSampler(
SequentialSampler(dataset), batch_size=self.hparams.batch_size, drop_last=False),
num_workers=self.hparams.num_data_workers,
)
You can't use get_batch instead of __getitem__ and I don't see a point to do it like that.
torch.utils.data.BatchSampler takes indices from your Sampler() instance (in this case 3 of them) and returns it as list so those can be used in your MyDataset __getitem__ method (check source code, most of samplers and data-related utilities are easy to follow in case you need it).
I assume your self.ddf supports list slicing (e.g. self.ddf[[25, 44, 115]] returns values correctly and uses only one expensive call). In this case simply switch get_batch into __getitem__ and you are good to go.
class MyDataset(Dataset):
def __init__(self, remote_ddf, ):
self.ddf = remote_ddf
def __len__(self):
return len(self.ddf)
def __getitem__(self, batch_idx):
return self.ddf[batch_idx] -> batch_idx is a list
EDIT: You have to specify batch_sampler as sampler, otherwise the batch will be divided into single indices. This should be fine:
loader = DataLoader(
dataset=dataset,
# This line below!
sampler=BatchSampler(
SequentialSampler(dataset), batch_size=self.hparams.batch_size, drop_last=False
),
num_workers=self.hparams.num_data_workers,
)

python self keyword in a list

I am reading a tutorial about keras and came a cross this class that is inherited by an other class.
class LearningRateDecay:
def plot(self, epochs, title="Learning Rate Schedule"):
lrs = [self(i) for i in epochs]
plt.style.use("ggplot")
plt.figure()
plt.plot(epochs, lrs)
plt.title(title)
plt.xlabel("Epoch #")
plt.ylabel("Learning Rate")
in which epochs is defined like this :
N = 100
epochs = np.arange(0, N)
and is passed to plot function like this :
a = LearningRateDecay
a.plot(epochs,"Learning Rate Schedule")
I cant understand what is self(i) meaning? is this about accessing to self elements or some thing else?
Any help is greatly appreciated.
Thank you
self(i) refers to calling an object i.e. using their __call__ method.
For example, consider this class:
class MyClass:
def __call__(self, arg):
print(arg)
def method(self):
self('test')
a = MyClass()
a.method() # Prints 'test'
a('other') # Prints 'other'
In your example, this class probably have some defined behavior associated with __call__ (it can also be inherited from an upper object). Refer to the class documentation to learn about it.

Resources