Suppose I call np.random.seed(epoch) inside a PyTorch DistributedSampler’s set_epoch(epoch) method, what will happen?
def set_epoch(self, epoch):
np.random.seed(epoch)
Will that screw up the randomness of my samples?
Related
How can one effectively stop the fit process of a training model via callback in keras? Thus far I have tried various approaches including the one below.
class EarlyStoppingCallback(tf.keras.callbacks.Callback):
def __init__(self, threshold):
super(EarlyStoppingCallback, self).__init__()
self.threshold = threshold
def on_epoch_end(self, epoch, logs=None):
accuracy = logs["accuracy"]
if accuracy >= self.threshold:
print("Stopping early!")
self.model.stop_training = True
The callback is executed, however the self.model.stop_training = True does not seem to have an effect. The print succeeds, but the model continues training. Any idea how to resolve this issue?
My tensorflow version is: tensorflow==1.14.0
You're probably affected by the following issue: https://github.com/tensorflow/tensorflow/issues/37587.
In short - whenever model.predict or model.evaluate are called, model.stop_training is reset to False. I was able to reproduce this behavior using your EarlyStoppingCallback followed by another callback which was calling model.predict on some fixed dataset.
The workaround is to put callbacks which are calling model.predict or model.evaluate first before any callbacks which might want to set model.stop_training to True. It also looks like the issue was fixed in TF 2.2.
I have an existing model where I load some pre-trained weights and then do prediction (one image at a time) in pytorch. I am trying to basically convert it to a pytorch lightning module and am confused about a few things.
So currently, my __init__ method for the model looks like this:
self._load_config_file(cfg_file)
# just creates the pytorch network
self.create_network()
self.load_weights(weights_file)
self.cuda(device=0) # assumes GPU and uses one. This is probably suboptimal
self.eval() # prediction mode
What I can gather from the lightning docs, I can pretty much do the same, except not to do the cuda() call. So something like:
self.create_network()
self.load_weights(weights_file)
self.freeze() # prediction mode
So, my first question is whether this is the correct way to use lightning? How would lightning know if it needs to use the GPU? I am guessing this needs to be specified somewhere.
Now, for the prediction, I have the following setup:
def infer(frame):
img = transform(frame) # apply some transformation to the input
img = torch.from_numpy(img).float().unsqueeze(0).cuda(device=0)
with torch.no_grad():
output = self.__call__(Variable(img)).data.cpu().numpy()
return output
This is the bit that has me confused. Which functions do I need to override to make a lightning compatible prediction?
Also, at the moment, the input comes as a numpy array. Is that something that would be possible from the lightning module or do things always have to use some sort of a dataloader?
At some point, I want to extend this model implementation to do training as well, so want to make sure I do it right but while most examples focus on training models, a simple example of just doing prediction at production time on a single image/data point might be useful.
I am using 0.7.5 with pytorch 1.4.0 on GPU with cuda 10.1
LightningModule is a subclass of torch.nn.Module so the same model class will work for both inference and training. For that reason, you should probably call the cuda() and eval() methods outside of __init__.
Since it's just a nn.Module under the hood, once you've loaded your weights you don't need to override any methods to perform inference, simply call the model instance. Here's a toy example you can use:
import torchvision.models as models
from pytorch_lightning.core import LightningModule
class MyModel(LightningModule):
def __init__(self):
super().__init__()
self.resnet = models.resnet18(pretrained=True, progress=False)
def forward(self, x):
return self.resnet(x)
model = MyModel().eval().cuda(device=0)
And then to actually run inference you don't need a method, just do something like:
for frame in video:
img = transform(frame)
img = torch.from_numpy(img).float().unsqueeze(0).cuda(0)
output = model(img).data.cpu().numpy()
# Do something with the output
The main benefit of PyTorchLighting is that you can also use the same class for training by implementing training_step(), configure_optimizers() and train_dataloader() on that class. You can find a simple example of that in the PyTorchLightning docs.
Even though above answer suffices, if one takes note of following line
img = torch.from_numpy(img).float().unsqueeze(0).cuda(0)
One has to put both the model as well as image to the right GPU. On multi-gpu inference machine, this becomes a hassle.
To solve this, .predict was also recently produced, see more at https://pytorch-lightning.readthedocs.io/en/stable/deploy/production_basic.html
I have been using PyTorch for a while now and am making a general RL framework. I am running into the question of whether to use np.arrays or tensors.
When would you not want to use tensors when available? What would make you choose numpy over pytorch? Obviously tensors are important for ML models, but what if you want to just do basic image processing or list manipulation?
I am tempted to use Tensors whenever possible but do not know of any pitfalls. (graph confusion? memory leaks??)
For example, I have a basic unfinished code snippet of collecting actions for an env, not sure whether to stick with numpy or not.
#dataclass
class Action(object):
"""
Handles actions, action space, and value verification.
"""
taken_action: np.array
raw_action: np.array
n_possible_values: int
action_space: gym.Space
def __post_init__(self):
if type(self.taken_action) is not np.array: taken_action = np.array([self.taken_action])
#pytest.mark.parametrize("env", sorted([env.id for env in gym.envs.registry.all()]))
def test_action_data_structure(env):
try:
init_env = gym.make(env)
except error.DependencyNotInstalled as e:
print(e)
return
taken_action = init_env.action_space.sample()
raw_action = np.random.rand(init_env.action_space.n)
state, reward, done, info = init_env.step(taken_action)
action_dataclass = Action(taken_action=taken_action, raw_action=raw_action,
n_possible_values=init_env.action_space.n, action_space=init_env.action_space)
Well, the answer is it depends. There is a tradeoff between speed and clarity when you debug your code. In case your matrices are huge you can jump between numpy and pytorch tensor from the reason the GPU will run the operations much faster than the delay that the conversion did. So it is tough to say where the threshold (of the dataset size) lies. I would try some simple operations to compare the options and then decide the best method that works for you.
In addition, I suggest you read this answer
and also this blog post
I have a script that performs a Gatys-like neural style transfer. It uses style loss, and a total variation loss. I'm using the GradientTape() to compute my gradients. The losses that I have implemented seem to work fine, but a new loss that I added isn't being properly accounted for by the GradientTape(). I'm using TensorFlow with eager execution enabled.
I suspect it has something to do with how I compute the loss based on the input variable. The input is a 4D tensor (batch, h, w, channels). At the most basic level, the input is a floating point image, and in order to compute this new loss I need to convert it to a binary image to compute the ratio of one pixel color to another. I don't want to actually go and change the image like that during every iteration, so I just make a copy of the tensor(in numpy form) and operate on that to compute the loss. I do not understand the limitations of the GradientTape, but I believe it is "losing the thread" of how the input variable is used to get to the loss when it's converted to a numpy array.
Could I make a copy of the image tensor and perform binarizing operations & loss computation using that? Or am I asking tensorflow to do something that it just can not do?
My new loss function:
def compute_loss(self, **kwargs):
loss = 0
image = self.model.deprocess_image(kwargs['image'].numpy())
binarized_image = self.image_decoder.binarize_image(image)
volume_fraction = self.compute_volume_fraction(binarized_image)
loss = np.abs(self.volume_fraction_target - volume_fraction)
return loss
My implementation using the GradientTape:
def compute_grads_and_losses(self, style_transfer_state):
"""
Computes gradients with respect to input image
"""
with tf.GradientTape() as tape:
loss = self.loss_evaluator.compute_total_loss(style_transfer_state)
total_loss = loss['total_loss']
return tape.gradient(total_loss, style_transfer_state['image']), loss
An example that I believe might illustrate my confusion. The strangest thing is that my code doesn't have any problem running; it just doesn't seem to minimize the new loss term whatsoever. But this example won't even run due to an attribute error: AttributeError: 'numpy.float64' object has no attribute '_id'.
Example:
import tensorflow.contrib.eager as tfe
import tensorflow as tf
def compute_square_of_value(x):
a = turn_to_numpy(x['x'])
return a**2
def turn_to_numpy(arg):
return arg.numpy() #just return arg to eliminate the error
tf.enable_eager_execution()
x = tfe.Variable(3.0, dtype=tf.float32)
data_dict = {'x': x}
with tf.GradientTape() as tape:
tape.watch(x)
y = compute_square_of_value(data_dict)
dy_dx = tape.gradient(y, x) # Will compute to 6.0
print(dy_dx)
Edit:
From my current understanding the issue arises that my use of the .numpy() operation is what makes the Gradient Tape lose track of the variable to compute the gradient from. My original reason for doing this is because my loss operation requires me to physically change values of the tensor, and I don't want to actually change the values used for the tensor that is being optimized. Hence the use of the numpy() copy to work on in order to compute the loss properly. Is there any way around this? Or is shall I consider my loss calculation to be impossible to implement because of this constraint of having to perform essentially non-reversible operations on the input tensor?
The first issue here is that GradientTape only traces operations on tf.Tensor objects. When you call tensor.numpy() the operations executed there fall outside the tape.
The second issue is that your first example never calls tape.watche on the image you want to differentiate with respect to.
How do I get the sample loss while training instead of the total loss? The loss history is available which gives the total batch loss but it doesn't provide the loss for individual samples.
If possible I would like to have something like this:
on_batch_end(batch, logs, **sample_losses**)
Is something like this available and if not can you provide some hints how to change the code to support this?
To the best of my knowledge it is not possible to get this information via callbacks since the loss is already computed once the callbacks are called (have a look at keras/engine/training.py). To simply inspect the losses you may override the loss function, e.g.:
def myloss(ytrue, ypred):
x = keras.objectives.mean_squared_error(ytrue, ypred)
return theano.printing.Print('loss for each sample')(x)
model.compile(loss=myloss)
Actually this can be done using a callback. This is now included in the keras documentation on callbacks. Define your own callback like this
class LossHistory(keras.callbacks.Callback):
def on_train_begin(self, logs={}):
self.losses = []
def on_batch_end(self, batch, logs={}):
self.losses.append(logs.get('loss'))
And then pass in this callback to your model. You should get per batch losses appended to the history ojbect.
I have also not found any existing functions in the Keras API that can return individual sample losses, while still computing on a minibatch. It seems you have to hack keras, or maybe access the tensorflow graph directly.
set batch size to 1 and use callbacks in model.evaluate OR manually calculate the loss between prediction (model.predict) and ground truth.