pytorch: how to change batchsize during training? - pytorch

I want to change batchsize during traing loop. I have tried re-instantiate a new dataloader so that i change batchsize by change 'batchsize' parameter. But this will spend some time.
How can I change batchsize like below:
for batch, (X, y) in enumerate(dataloader):
# do sth here like traing model
dataloader.setBatchsize(newBatchsize) #this what i want to do
in this paper: Semi-Dynamic Load Balancing: Efficient Distributed
Learning in Non-Dedicated Environments ,the author say that he change batchsize by custom DataIter and BatchSampler, but I have no idea how to do that.
Hope that colleagues and seniors can give some guidance

Related

How to sample along with another dataloader in PyTorch

Assume I have train/valid/test dataset with batch_size and shuffleed as normal.
When I do train/valid/test, I want to sample a certain number (called memory_size) of new samples from the entire dataset for each sample.
For example, I set batch_size as 256, let dataset shuffled, and memory_size as 80.
In every forward step, not only use each sample from dataset, but sample data from entire original dataset which size is memory_size and I want to use it inside forward. Let new samples as Memory (Yeah, I want to adopt idea from Memory Networks). Memory can be overlapped between each sample in train set.
I'm using PyTorch and PyTorch-Lightning. Can I create new memory dataloader per each train_dataloader, val_dataloader, and test_dataloader then load it with original dataloader? or is there a better way to achieve what I want?

Keras curriculum learning- change shuffle method

I want to change the way we shuffle training data in Keras. Conventionally we shuffle samples in each epoch and then we batch, and fit the model. Now I want to first batch the samples, and then shuffle these "BATCHES" (all samples in each batch should not shuffle). The reason is that I ordered all samples based on criteria (Curriculum Learning) and I want to preserve such an order.
Do you know how I can do this?
I found the answer according to this link:
How does shuffle = 'batch' argument of the .fit() layer work in the background?
Short answer: we need to set shuffle='batch' in fit.

What is the use of repeat() when creating a tf.data.Dataset object?

I am reproducing the code of TensorFlow's Time series forecasting tutorial.
They use tf.data to shuffle, batch, and cache the dataset. More precisely they do the following:
BATCH_SIZE = 256
BUFFER_SIZE = 10000
train_univariate = tf.data.Dataset.from_tensor_slices((x_train_uni, y_train_uni))
train_univariate = train_univariate.cache().shuffle(BUFFER_SIZE).batch(BATCH_SIZE).repeat()
val_univariate = tf.data.Dataset.from_tensor_slices((x_val_uni, y_val_uni))
val_univariate = val_univariate.batch(BATCH_SIZE).repeat()
I can't understand why they use repeat() and, even more so, why they don't specify the count argument of repeat. What is the point of making the process repeat indefinitely? And how can the algorithm read all the elements in an infinitely big dataset?
As can be seen in the tutorials of tensorflow federated for image classification the repeat method is used to use repetitions of the dataset that will also indicate the number of epochs for the training.
So use .repeat(NUM_EPOCHS) where NUM_EPOCHS is the number of epochs for the training.

How to get current batch prediction inside keras training cycle?

I am using _loss, _acc = model.train_on_batch(x, y) training approach. Now I want to get current batch model predictions(outputs) to find incorrect predictions and save same to use later in hard negative mining. How to get current outputs?

Keras- Loss per sample within batch

How do I get the sample loss while training instead of the total loss? The loss history is available which gives the total batch loss but it doesn't provide the loss for individual samples.
If possible I would like to have something like this:
on_batch_end(batch, logs, **sample_losses**)
Is something like this available and if not can you provide some hints how to change the code to support this?
To the best of my knowledge it is not possible to get this information via callbacks since the loss is already computed once the callbacks are called (have a look at keras/engine/training.py). To simply inspect the losses you may override the loss function, e.g.:
def myloss(ytrue, ypred):
x = keras.objectives.mean_squared_error(ytrue, ypred)
return theano.printing.Print('loss for each sample')(x)
model.compile(loss=myloss)
Actually this can be done using a callback. This is now included in the keras documentation on callbacks. Define your own callback like this
class LossHistory(keras.callbacks.Callback):
def on_train_begin(self, logs={}):
self.losses = []
def on_batch_end(self, batch, logs={}):
self.losses.append(logs.get('loss'))
And then pass in this callback to your model. You should get per batch losses appended to the history ojbect.
I have also not found any existing functions in the Keras API that can return individual sample losses, while still computing on a minibatch. It seems you have to hack keras, or maybe access the tensorflow graph directly.
set batch size to 1 and use callbacks in model.evaluate OR manually calculate the loss between prediction (model.predict) and ground truth.

Resources