How to reproduce RNN results on several runs? - pytorch

I call same model on same input twice in a row and I don't get the same result, this model have nn.GRU layers so I suspect that it have some internal state that should be release before second run?
How to reset RNN hidden state to make it the same as if model was initially loaded?
UPDATE:
Some context:
I'm trying to run model from here:
https://github.com/erogol/WaveRNN/blob/master/models/wavernn.py#L93
I'm calling generate:
https://github.com/erogol/WaveRNN/blob/master/models/wavernn.py#L148
Here it's actually have some code using random generator in pytorch:
https://github.com/erogol/WaveRNN/blob/master/models/wavernn.py#L200
https://github.com/erogol/WaveRNN/blob/master/utils/distribution.py#L110
https://github.com/erogol/WaveRNN/blob/master/utils/distribution.py#L129
I have placed (I'm running code on CPU):
torch.manual_seed(0)
torch.cuda.manual_seed_all(0)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
np.random.seed(0)
in
https://github.com/erogol/WaveRNN/blob/master/utils/distribution.py
after all imports.
I have checked GRU weights between runs and they are the same:
https://github.com/erogol/WaveRNN/blob/master/models/wavernn.py#L153
Also I have checked logits and sample between runs and logits are the same but sample are not, so #Andrew Naguib seems were right about random seeding, but I'm not sure where the code that fixes random seed should be placed?
https://github.com/erogol/WaveRNN/blob/master/models/wavernn.py#L200
UPDATE 2:
I have placed seed init inside generate and now results are consistent:
https://github.com/erogol/WaveRNN/blob/master/models/wavernn.py#L148

I believe this may be highly related to Random Seeding. To ensure reproducible results (as stated by them) you have to seed torch as in this:
import torch
torch.manual_seed(0)
And also, the CuDNN module.
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
If you're using numpy, you could also do:
import numpy as np
np.random.seed(0)
However, they warn you:
Deterministic mode can have a performance impact, depending on your model.
A suggested script I regularly use which has been working very good to reproduce results is:
# imports
import numpy as np
import random
import torch
# ...
""" Set Random Seed """
if args.random_seed is not None:
"""Following seeding lines of code are to ensure reproducible results
Seeding the two pseudorandom number generators involved in PyTorch"""
random.seed(args.random_seed)
np.random.seed(args.random_seed)
torch.manual_seed(args.random_seed)
# https://pytorch.org/docs/master/notes/randomness.html#cudnn
if not args.cpu_only:
torch.cuda.manual_seed(args.random_seed)
cudnn.deterministic = True
cudnn.benchmark = False

You can use model.init_hidden() to reset the RNN hidden state.
def init_hidden(self):
# Initialize hidden and cell states
return Variable(torch.zeros(num_layers, batch_size, hidden_size))
So, before calling the same model on the same data next time, you can call model.init_hidden() to reset the hidden and cell states to the initial values.
This will clear out the history, in order words, the weights the model learned after running on the data first time.

Related

Is torch Conv2d filter random? Can I see the kernel shape?

I'm studying pytorch Conv2d package, for convolution filter.
I coded like below, to check what the Conv2d function exactly do to image.
And I found that the image filtered by conv2d looks different at every try, randomly. Just like the picture attached. So I've got questions like this.
Is the kernel of Conv2d changes randomly for each trying? why?
Can I see the shape of kernel inside the conv2d?
below is the code.
from matplotlib import pyplot
from numpy import asarray
import numpy as np
import cv2
import torch
import torch.nn as nn
img = cv2.imread('data/dog.jpg') # 29 *30 *3
data = asarray(img)
conv1 = nn.Conv2d(3,1,3)
pyplot.subplot(1,2,2)
data = np.transpose(data, (2,0,1))
data = conv1(torch.Tensor(data))
data = np.transpose(data.detach().numpy(), (1,2,0))
pyplot.imshow(data, cmap='gray')
pyplot.show()
This is called Reproducibility
Just add the following code and it will work with the same weights every time.
torch.manual_seed(0)
Everytime you initialize Conv2D layer, it is initialized with random weights. But setting PyTorch's manual_seed to a constant number generates the same sequence of random numbers every time you run your code. This is what Reproducibility aims for. You can check PyTorch's reproducibility reference for more information regarding that topic.
TQCH's code helps you check out the weights every time you run your code. You will notice that the weights change every time you run print(conv1.weight). And you will also notice that they do not change every time you run the code if you specify the manual seed
Each time you create a Conv2d object, the weights are randomly initialized. That's why you would see different results. To inpect the shape of the kernel, run
print(conv1.weight.shape)

ValueError: Tensor Tensor("dense_4/Sigmoid:0", shape=(?, 1025), dtype=float32) is not an element of this graph

Today I suddenly started getting this error for no apparent reason, while I was running model.fit(). This used to work before, I am using TF 2.3.0, more specifically its Keras module.
The function is called on validation inside a generator, which is fed into model.predict().
Basically, I load a checkpoint, I resume training the network, and I make a prediction on validation.
The error keeps occurring even when training a model from scratch, and erasing all the related data. It's like if something has been hardcoded, somewhere, as I was able to run model.fit() up until a few hours ago.
I saw several solutions like THIS, but none of these variations really work for me, as they lead to more tricky error messages.
I even tried installing a different version of TF, thinking that this was due to some old version, but the error still occurs.
I will answer my own question, as this one was particularly tricky and none of the solutions I found on the internet has worked for me, probably because outdated.
I'll write down just the relevant part to add in the code, feel free to add more technical explanations.
I like using args for passing variables, but it can work without:
from tensorflow.python.keras.backend import set_session
from tensorflow.keras.models import load_model
import generator # custom generator
def main(args):
# open new session and define TF graph
args.sess = tf.compat.v1.Session()
args.graph = tf.compat.v1.get_default_graph()
set_session(args.sess)
# define training generator
train_generator = generator(args.train_data)
# load model
args.model = load_model(args.model_path)
args.model.fit(train_generator)
Then, in the model prediction function:
# In my specific case, the predict_output() function is
# called inside the generator function
def predict_output(args, x):
with args.graph.as_default():
set_session(args.sess)
y = model.predict(x)
return y

How do I make ray.tune.run reproducible?

I'm using Tune class-based Trainable API. See code sample:
from ray import tune
import numpy as np
np.random.seed(42)
# first run
tune.run(tune.Trainable, ...)
# second run, expecting same result
np.random.seed(42)
tune.run(tune.Trainable, ...)
The problem is that tune.run results are still different, likely reason being that each ray actor still has different seed.
Question: how do I make ray.tune.run reproducible?
(This answer focuses on class API and ray version 0.8.7. Function API does not support reproducibility due to implementation specifics)
There are two main sources of undeterministic results.
1. Search algorithm
Every search algorithm supports random seed, although interface to it may vary. This initializes hyperparameter space sampling.
For example, if you're using AxSearch, it looks like this:
from ax.service.ax_client import AxClient
from ray.tune.suggest.ax import AxSearch
client = AxClient(..., random_seed=42)
client.create_experiment(...)
algo = AxSearch(client)
2. Trainable API
This is distributed among worker processes, which requires seeding within tune.Trainable class. Depending on the tune.Trainable.train logic that you implement, you need to manually seed numpy, tf, or whatever other framework you use, inside tune.Trainable.setup by passing seed with config argument of tune.run.
The following code is based on RLLib PR5197 that handled the same issue:
See the example:
from ray import tune
import numpy as np
import random
class Tuner(tune.Trainable):
def setup(self, config):
seed = config['seed']
np.random.seed(seed)
random.seed(seed)
...
...
seed = 42
tune.run(Tuner, config={'seed': seed})

example of doing simple prediction with pytorch-lightning

I have an existing model where I load some pre-trained weights and then do prediction (one image at a time) in pytorch. I am trying to basically convert it to a pytorch lightning module and am confused about a few things.
So currently, my __init__ method for the model looks like this:
self._load_config_file(cfg_file)
# just creates the pytorch network
self.create_network()
self.load_weights(weights_file)
self.cuda(device=0) # assumes GPU and uses one. This is probably suboptimal
self.eval() # prediction mode
What I can gather from the lightning docs, I can pretty much do the same, except not to do the cuda() call. So something like:
self.create_network()
self.load_weights(weights_file)
self.freeze() # prediction mode
So, my first question is whether this is the correct way to use lightning? How would lightning know if it needs to use the GPU? I am guessing this needs to be specified somewhere.
Now, for the prediction, I have the following setup:
def infer(frame):
img = transform(frame) # apply some transformation to the input
img = torch.from_numpy(img).float().unsqueeze(0).cuda(device=0)
with torch.no_grad():
output = self.__call__(Variable(img)).data.cpu().numpy()
return output
This is the bit that has me confused. Which functions do I need to override to make a lightning compatible prediction?
Also, at the moment, the input comes as a numpy array. Is that something that would be possible from the lightning module or do things always have to use some sort of a dataloader?
At some point, I want to extend this model implementation to do training as well, so want to make sure I do it right but while most examples focus on training models, a simple example of just doing prediction at production time on a single image/data point might be useful.
I am using 0.7.5 with pytorch 1.4.0 on GPU with cuda 10.1
LightningModule is a subclass of torch.nn.Module so the same model class will work for both inference and training. For that reason, you should probably call the cuda() and eval() methods outside of __init__.
Since it's just a nn.Module under the hood, once you've loaded your weights you don't need to override any methods to perform inference, simply call the model instance. Here's a toy example you can use:
import torchvision.models as models
from pytorch_lightning.core import LightningModule
class MyModel(LightningModule):
def __init__(self):
super().__init__()
self.resnet = models.resnet18(pretrained=True, progress=False)
def forward(self, x):
return self.resnet(x)
model = MyModel().eval().cuda(device=0)
And then to actually run inference you don't need a method, just do something like:
for frame in video:
img = transform(frame)
img = torch.from_numpy(img).float().unsqueeze(0).cuda(0)
output = model(img).data.cpu().numpy()
# Do something with the output
The main benefit of PyTorchLighting is that you can also use the same class for training by implementing training_step(), configure_optimizers() and train_dataloader() on that class. You can find a simple example of that in the PyTorchLightning docs.
Even though above answer suffices, if one takes note of following line
img = torch.from_numpy(img).float().unsqueeze(0).cuda(0)
One has to put both the model as well as image to the right GPU. On multi-gpu inference machine, this becomes a hassle.
To solve this, .predict was also recently produced, see more at https://pytorch-lightning.readthedocs.io/en/stable/deploy/production_basic.html

What are common sources of randomness in Machine Learning projects with Keras?

Reproducibility is important. In a closed-source machine learning project I'm currently working on it is hard to achieve it. What are the parts to look at?
Setting seeds
Computers have pseudo-random number generators which are initialized with a value called the seed. For machine learning, you might need to do the following:
# I've heard the order here is important
import random
random.seed(0)
import numpy as np
np.random.seed(0)
import tensorflow as tf
tf.set_random_seed(0)
session_conf = tf.ConfigProto(intra_op_parallelism_threads=1,
inter_op_parallelism_threads=1)
sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)
from keras import backend as K
K.set_session(sess) # tell keras about the seeded session
# now import keras stuff
See also: Keras FAQ: How can I obtain reproducible results using Keras during development?
sklearn
sklearn.model_selection.train_test_split has a random_state parameter.
What to check
Am I loading the data in the same order every time?
Do I initialize the model the same way?
Do you use external data that might change?
Do you use external state that might change (e.g. datetime.now)?

Resources