Changing the checkpoint path of lr_find - pytorch

I want to tune the learning rate for my PyTorch Lightning model. My code runs on a GPU cluster, so I can only write to certain folders that I bind mount. However, trainer.tuner.lr_find tries to write the checkpoint to the folder where my script runs and since this folder is not writable, it fails with the following error:
OSError: [Errno 30] Read-only file system: '/opt/xrPose/.lr_find_43df1c5c-0aed-4205-ac56-2fe4523ca4a7.ckpt'
Is there anyway to change the checkpoint path for lr_find? I checked the documentation but I couldn't find any information on that, in the part related to checkpointing.
My code is below:
res = trainer.tuner.lr_find(model, train_dataloaders=train_dataloader, val_dataloaders=val_dataloader, min_lr=1e-5)
logging.info(f"suggested learning rate: {res.suggestion()}")
model.hparams.learning_rate = res.suggestion()

You may need to specify default_root_dir when initialize Trainer:
trainer = Trainer(default_root_dir='./my_dir')
Description from the Official Documentation:
default_root_dir - Default path for logs and weights when no logger or
pytorch_lightning.callbacks.ModelCheckpoint callback passed.
Code example:
import numpy as np
import torch
from pytorch_lightning import LightningModule, Trainer
from torch.utils.data import DataLoader, Dataset
class MyDataset(Dataset):
def __init__(self) -> None:
super().__init__()
def __getitem__(self, index):
x = np.zeros((10,), np.float32)
y = np.zeros((1,), np.float32)
return x, y
def __len__(self):
return 100
class MyModel(LightningModule):
def __init__(self):
super().__init__()
self.model = torch.nn.Linear(10, 1)
def forward(self, x):
return self.model(x)
def training_step(self, batch, batch_idx):
x, y = batch
y_hat = self(x)
loss = torch.nn.MSELoss()(y_hat, y)
return loss
def configure_optimizers(self):
return torch.optim.Adam(self.parameters(), lr=0.02)
model = MyModel()
trainer = Trainer(default_root_dir='./my_dir')
train_dataloader = DataLoader(MyDataset())
trainer.tuner.lr_find(model, train_dataloader)

As it is defined in the lr_finder.py as:
# Save initial model, that is loaded after learning rate is found
ckpt_path = os.path.join(trainer.default_root_dir, f".lr_find_{uuid.uuid4()}.ckpt")
trainer.save_checkpoint(ckpt_path)
The only way of changing the directory for saving the checkpoint is to change the default_root_dir. But be aware that this is also the directory that the lightning logs are saved to.
You can easily change it with trainer = Trainer(default_root_dir='./NAME_OF_THE_DIR').

Related

Access all batch outputs at the end of epoch in callback with pytorch lightning

The documentation for the on_train_epoch_end, https://pytorch-lightning.readthedocs.io/en/stable/extensions/callbacks.html#on-train-epoch-end, states:
To access all batch outputs at the end of the epoch, either:
Implement training_epoch_end in the LightningModule and access outputs via the module OR
Cache data across train batch hooks inside the callback implementation to post-process in this hook.
I am trying to use the first alternative with the following LightningModule and Callback setup:
import pytorch_lightning as pl
from pytorch_lightning import Callback
class LightningModule(pl.LightningModule):
def __init__(self, *args):
super().__init__()
self.automatic_optimization = False
def training_step(self, batch, batch_idx):
return {'batch': batch}
def training_epoch_end(self, training_step_outputs):
# training_step_outputs has all my batches
return
class MyCallback(Callback):
def on_train_epoch_end(self, trainer, pl_module):
# pl_module.batch ???
return
How do I access the outputs via the pl_module in the callback? What is the recommended way of getting access to training_step_outputs in my callback?
You can store the outputs of each training batch in a state and access it at the end of the training epoch. Here is an example -
from pytorch_lightning import Callback
class MyCallback(Callback):
def __init__(self):
super().__init__()
self.state = []
def on_train_batch_end(self, trainer, pl_module, outputs, batch, batch_idx, unused=0):
self.state.append(outputs)
def on_train_epoch_end(self, trainer, pl_module):
# access output using state
all_outputs = self.state
Hope this helps you! 😀

How keras.utils.Sequence works?

I am trying to create a data pipeline for U-net for Image Segmentation. I came across Keras.utils.Sequence class through which, I can create a data pipeline, But I am unable to understand how this is working.
link for the code Keras code , Source code
def __iter__(self):
"""Create a generator that iterate over the Sequence."""
for item in (self[i] for i in range(len(self))):
yield item
I will highly appreciate if anyone can tell me how this works ?
You don't need a generator. The sequence class is there to manage that. You need to define a class inherited from tensorflow.keras.utils.Sequence and define the methods:
__init__, __getitem__, __len__. In addition, you can define the method on_epoch_end, which is called at the end of each epoch and is usually used to shuffle the sample indexes.
There is an example in the link you gave Tensorflow Sequence.
Below is another example of Sequence.
Note that you can pass the data to the __init__ constructor, but you may as well read the data from files in the __getitem__ method, assuming you know where to read it, e.g. by passing the name of a directory or directories into the constructor. This is necessary if there is a lot of data.
from tensorflow import keras
import numpy as np
class SequenceExample(keras.utils.Sequence):
def __init__(self, x_in, y_in, batch_size, shuffle=True):
# Initialization
self.batch_size = batch_size
self.shuffle = shuffle
self.x = x_in
self.y = y_in
self.datalen = len(y_in)
self.indexes = np.arange(self.datalen)
if self.shuffle:
np.random.shuffle(self.indexes)
def __getitem__(self, index):
# get batch indexes from shuffled indexes
batch_indexes = self.indexes[index*self.batch_size:(index+1)*self.batch_size]
x_batch = self.x[batch_indexes]
y_batch = self.y[batch_indexes]
return x_batch, y_batch
def __len__(self):
# Denotes the number of batches per epoch
return self.datalen // self.batch_size
def on_epoch_end(self):
# Updates indexes after each epoch
self.indexes = np.arange(self.datalen)
if self.shuffle:
np.random.shuffle(self.indexes)

JIT the collate function in Pytorch

I need to create a DataLoader where the collator function would require to have non trivial computation, actually a double layer loop which is significantly slowing down the training process. For example, consider this toy code where I try to use numba to JIT the collate function:
import torch
import torch.utils.data
import numba as nb
class Dataset(torch.utils.data.Dataset):
def __init__(self):
self.A = np.zeros((100000, 300))
self.B = np.ones((100000, 300))
def __getitem__(self, index):
return self.A[index], self.B[index]
def __len__(self):
return self.A.shape[0]
#nb.njit(cache=True)
def _collate_fn(batch):
batch_data = np.zeros((len(batch), 300))
for i in range(len(batch)):
batch_data[i] = batch[i][0] + batch[i][1]
return batch_data
and then I create the DataLoader as follows:
train_dataset = Dataset()
train_loader = torch.utils.data.DataLoader(
train_dataset,
batch_size=256,
num_workers=6,
collate_fn=_collate_fn,
shuffle=True)
However this just gets stuck but works fine if I remove the JITing of the _collate_fn. I am not able to understand what is happening here. I don't have to stick to numba and can use anything which will help me overcome the loop inefficiencies in Python. TIA and Happy 12,021.

how to print each of the batch file being trained from imageDataGenerator in keras

I am using Keras ImageDataGenerator with flow_from_directory.
For training data, each class folder has 10,000-20,000 jpg files each, with 13 classes. While training, although keras shows the epoch , I want to print which of the image files are being trained/used in each each batch. How do I do that?
Thanks
sedy
You will probably have a flooding problem, but you can create a wrapping generator:
from keras.utils import Sequence
class PrintingGenerator(Sequence):
def __init__(self, keras_generator):
self.keras_generator = keras_generator
def __len__(self):
return len(self.keras_generator)
def __getitem__(self,i):
x, y = self.keras_generator[i]
#do the print
return x, y
def on_epoch_end(self):
self.keras_generator.on_epoch_end()
generator = PrintingGenerator(original.flow_from_directory(...))

Pytorch Data Wont Fit in Memory - Example?

I am trying to find an example of training in Pytorch in batch from data on disk - akin to the Keras fit_generator. How would I alter the code below to read the csv from disk instead of loading it to memory?
I have found that one can iterate over a custom data loader like below, but I am unsure how to do this without loading all the data in memory.
I would like to:
Train and validate the model with data held on disk
Use mini-batches of the full data on disk
Repeat x epochs
class testLoader(Dataset):
def __init__(self):
#regular old numpy
boston = load_boston()
x=boston.data
y=boston.target
self.x = torch.from_numpy(x)
self.y = torch.from_numpy(y)
self.length = x.shape[0]
self.vars =x.shape[1]
def __getitem__(self, index):
return self.x[index], self.y[index]
def __len__(self):
return self.length
training_samples=testLoader()
train_loader = utils_data.DataLoader(training_samples, batch_size=64, shuffle=True)

Resources