The result is not fixed after setting random seed in pytorch - pytorch

def setup_seed(seed):
np.random.seed(seed)
random.seed(seed)
torch.manual_seed(seed) # cpu
torch.cuda.manual_seed_all(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = True
I set random seed when run the code, but I can not get fixed result with pytorch. Besides, I use batchnorm in my code. When evaluate and test, I have set model.eval(). I cannot figure out the reason for that.

I think the line torch.backends.cudnn.benchmark = True causing the problem. It enables the cudnn auto-tuner to find the best algorithm to use. For example, convolution can be implemented using one of these algorithms:
CUDNN_CONVOLUTION_FWD_ALGO_GEMM,
CUDNN_CONVOLUTION_FWD_ALGO_FFT,
CUDNN_CONVOLUTION_FWD_ALGO_FFT_TILING,
CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_GEMM,
CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM,
CUDNN_CONVOLUTION_FWD_ALGO_DIRECT,
CUDNN_CONVOLUTION_FWD_ALGO_WINOGRAD,
CUDNN_CONVOLUTION_FWD_ALGO_WINOGRAD_NONFUSED,
There are several algorithms without reproducibility guarantees.
So use torch.backends.cudnn.benchmark = False for deterministic outputs(this may slow execution time).
And also there are some pytorch functions which cannot be deterministic refer this doc.

Related

Pytorch `torch.no_grad()` doesn't affect modules

I was under the (evidently wrong) impression from the documentation that torch.no_grad(), as a context manager, was supposed to make everything requires_grad=False. Indeed that's what I intended to use torch.no_grad() for, as just a convenient context manager for instantiating a bunch of things that I want to stay constant (through training). but that's only the case for torch.Tensors it seems; it doesn't seem to affect torch.nn.Modules, as the following example code shows:
with torch.no_grad():
linear = torch.nn.Linear(2, 3)
for p in linear.parameters():
print(p.requires_grad)
This will output:
True
True
That's a bit counterintuitive in my opinion. Is this the intended behaviour? If so, why? And is there a similarly convenient context manager in which I can be assured that anything I instantiate under it will not require gradient?
This is expected behavior, but I agree it is somewhat unclear from the documentation. Note that the documentation says :
In this mode, the result of every computation will have
requires_grad=False, even when the inputs have requires_grad=True.
This context disables the gradient on the output of any computation done within the context. Technically, declaring/creating a layer is not computation, so the parameter's requires_grad is True. However, for any calculation you'd do inside this context, you won't be able to compute gradients. The requires_grad for the output of calculation would be False. This is probably best explained by extending your code snippet as below:
with torch.no_grad():
linear = torch.nn.Linear(2, 3)
for p in linear.parameters():
print(p.requires_grad)
out = linear(torch.rand(10,2))
print(out.requires_grad)
out = linear(torch.rand(10,2))
print(out.requires_grad)
True
True
False
True
Even if the requires_grad for layer parameters is True, you won't be able to compute the gradient as the output has requires_grad False.

Why is convolution in cuDNN non-deterministic?

The PyTorch documentary says, when using cuDNN as backend for a convolution, one has to set two options to make the implementation deterministic. The options are torch.backends.cudnn.deterministic = True and torch.backends.cudnn.benchmark = False. Is this because of the way weights are initialized?
When torch.backends.cudnn.deterministic is set to True, CuDNN will use deterministic algorithms for these operations, meaning that given the same input and parameters, the output will always be the same. This can be useful in situations where you need reproducibility in your results, such as when debugging or when comparing different model architectures.
However, using deterministic algorithms can come at a cost of performance, as some of the optimizations that make CuDNN fast may not be compatible with determinism. Therefore, setting torch.backends.cudnn.deterministic to True may result in slower training times.

How to reproduce RNN results on several runs?

I call same model on same input twice in a row and I don't get the same result, this model have nn.GRU layers so I suspect that it have some internal state that should be release before second run?
How to reset RNN hidden state to make it the same as if model was initially loaded?
UPDATE:
Some context:
I'm trying to run model from here:
https://github.com/erogol/WaveRNN/blob/master/models/wavernn.py#L93
I'm calling generate:
https://github.com/erogol/WaveRNN/blob/master/models/wavernn.py#L148
Here it's actually have some code using random generator in pytorch:
https://github.com/erogol/WaveRNN/blob/master/models/wavernn.py#L200
https://github.com/erogol/WaveRNN/blob/master/utils/distribution.py#L110
https://github.com/erogol/WaveRNN/blob/master/utils/distribution.py#L129
I have placed (I'm running code on CPU):
torch.manual_seed(0)
torch.cuda.manual_seed_all(0)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
np.random.seed(0)
in
https://github.com/erogol/WaveRNN/blob/master/utils/distribution.py
after all imports.
I have checked GRU weights between runs and they are the same:
https://github.com/erogol/WaveRNN/blob/master/models/wavernn.py#L153
Also I have checked logits and sample between runs and logits are the same but sample are not, so #Andrew Naguib seems were right about random seeding, but I'm not sure where the code that fixes random seed should be placed?
https://github.com/erogol/WaveRNN/blob/master/models/wavernn.py#L200
UPDATE 2:
I have placed seed init inside generate and now results are consistent:
https://github.com/erogol/WaveRNN/blob/master/models/wavernn.py#L148
I believe this may be highly related to Random Seeding. To ensure reproducible results (as stated by them) you have to seed torch as in this:
import torch
torch.manual_seed(0)
And also, the CuDNN module.
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
If you're using numpy, you could also do:
import numpy as np
np.random.seed(0)
However, they warn you:
Deterministic mode can have a performance impact, depending on your model.
A suggested script I regularly use which has been working very good to reproduce results is:
# imports
import numpy as np
import random
import torch
# ...
""" Set Random Seed """
if args.random_seed is not None:
"""Following seeding lines of code are to ensure reproducible results
Seeding the two pseudorandom number generators involved in PyTorch"""
random.seed(args.random_seed)
np.random.seed(args.random_seed)
torch.manual_seed(args.random_seed)
# https://pytorch.org/docs/master/notes/randomness.html#cudnn
if not args.cpu_only:
torch.cuda.manual_seed(args.random_seed)
cudnn.deterministic = True
cudnn.benchmark = False
You can use model.init_hidden() to reset the RNN hidden state.
def init_hidden(self):
# Initialize hidden and cell states
return Variable(torch.zeros(num_layers, batch_size, hidden_size))
So, before calling the same model on the same data next time, you can call model.init_hidden() to reset the hidden and cell states to the initial values.
This will clear out the history, in order words, the weights the model learned after running on the data first time.

TensorFlow - reproducing results when using dropout

I am training a neural network using dropout regularization. I save the weights and biases the network is initialized with, so that I can repeat the experiment when I get good results.
However, the use of dropout introduces some randomness in the network: since dropout drops units randomly, each time I rerun the network, different units are being dropped - even though I initialize the network with the exact same weights and biases (if I understand this correctly).
Is there a way to make the dropout deterministic?
There are two primary ways to perform dropout in tensorflow:
tf.nn.dropout (low-level)
tf.layers.dropout (high-level, uses tf.nn.dropout under the hood)
Both functions accept a seed parameter that is used to generate the random mask. By default, seed=None, which means random seed, i.e. non-deterministic. In order to make the result deterministic, you either set the seed on per-op level or call tf.set_random_seed (sets the the graph-level random seed) or, better, both.
Example:
import tensorflow as tf
tf.InteractiveSession()
tf.set_random_seed(0)
x = tf.ones([10])
y = tf.nn.dropout(x, keep_prob=0.5, seed=0)
for i in range(5):
print(y.eval())
z = tf.layers.dropout(inputs=x, rate=0.5, training=True, seed=0)
for i in range(5):
print(z.eval())
Caveat: in general, there are other sources in randomness in the training scripts, so you have to set also pure python seed (random.seed) and numpy seed (numpy.random.seed).

Different score every time I run sklearn model with random_state set

I'm trying to determine why every time I rerun a model I obtain a slightly different score. I've defined:
# numpy seed (don't know if needed, but figured it couldn't hurt)
np.random.seed(42)
# Also tried re-seeding every time I ran the `cross_val_predict()` block, but that didn't work either
# cross-validator with random_state set
cv5 = KFold(n_splits=5, random_state=42, shuffle=True)
# scoring as RMSE of natural logs (to match Kaggle competition I'm trying)
def custom_scorer(actual, predicted):
actual = np.log1p(actual)
predicted = np.log1p(predicted)
return np.sqrt(np.sum(np.square(actual-predicted))/len(actual))
Then I ran this once with cv=cv5:
# Running GridSearchCV
rf_test = RandomForestRegressor(n_jobs = -1)
params = {'max_depth': [20,30,40], 'n_estimators': [500], 'max_features': [100,140,160]}
gsCV = GridSearchCV(estimator=rf_test, param_grid=params, cv=cv5, n_jobs=-1, verbose=1)
gsCV.fit(Xtrain,ytrain)
print(gsCV.best_estimator_)
After running that to get gsCV.best_estimator_, I rerun this several times, and get slightly different scores each time:
rf_test = gsCV.best_estimator_
rf_test.random_state=42
ypred = cross_val_predict(rf_test, Xtrain, ytrain, cv=cv2)
custom_scorer(np.expm1(ytrain),np.expm1(ypred))
Example of (extremely small) score differences:
0.13200993923446158
0.13200993923446164
0.13200993923446153
0.13200993923446161
I'm trying to set seeds so I get the same score every time for the same model, in order to be able to compare different models. In Kaggle competitions very small differences in scores seem to matter (although admittedly not this small), but I'd just like to understand why. Does it have something to do with rounding in my machine when performing calculations? Any help is greatly appreciated!
Edit: I forgot the line rf_test.random_state=42 which made a much larger difference in score disparity, but even with this line included I still have minuscule differences.
Random forest, is a set of decision trees, it uses randomness to select the height and split of these tree. It is really unlikely that you will get the same random forests when you run your program twice. I think, you are getting this slight variation because of it.
You are using cv2 while testing out your RandomForest Regressor. Have you set it's random seed as well ? Otherwise the splits while testing out your regressor will be different.

Resources