How to set the random seed of DeformConv2d for pytorch - pytorch

I have used following codes to fix the random seed, but when I train my model with DeformConv2d in torchvision every time, the result shows different. So do anyone know how to set the random seed of DeformConv2d in torchvision? The vision of pytorch I used is 1.8.0. This problem only occurs when I use DeformConv2d.
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
np.random.seed(seed)
random.seed(seed)
torch.backends.cudnn.deterministic = True
I have tried to use these codes to set random seed and train for some times, but the results are different. My expected result is with a specific seed, the results are same for every training.
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
np.random.seed(seed)
random.seed(seed)
torch.backends.cudnn.deterministic = True

Related

Role that learning_rate plays in the reproducibility of the model in PyTorch models

I have a Bayesian neural netowrk which is implemented in PyTorch and is trained via a ELBO loss. I have faced some reproducibility issues even when I have the same seed and I set the following code:
# python
seed = args.seed
random.seed(seed)
logging.info("Python seed: %i" % seed)
# numpy
seed += 1
np.random.seed(seed)
logging.info("Numpy seed: %i" % seed)
# torch
seed += 1
torch.manual_seed(seed)
logging.info("Torch CPU seed: %i" % seed)
# torch cuda
seed += 1
torch.cuda.manual_seed_all(seed)
torch.cuda.manual_seed(seed)
logging.info("Torch CUDA seed: %i" % seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
I need to add that I use XE loss and this is not a deterministic loss in PyTorch. This is the only possible source of randomness I am aware of. What I have observed is that, when I use a large learning_rate (=0.1), I cannot reproduce my results and I see huge gaps. However, when the learning_rate is reduced by a factor of 10 (=0.01), I see that the gap disappears. My intuition is that the culprit here is the non-deterministic loss and the large lr is just a catalyzer. What do you think? I appreciate any hints and intuitions.

the same pretrained model with same input , running multiple times gives different outputs

I load a pretrained Resnet152 from torchvision. I evaluate the model multiple times with the same input image, but each time the output is different. It's very strange. Anyone knows what is the reason? My code is
from torchvision import transforms
import torch
from torchvision import models
from PIL import Image
# load a pretrained model
model = models.resnet152(pretrained=True)
# load a image and preprocess it
preprocessor = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]
)])
input_image = Image.open('lion2.jpg')
input_tensor = preprocessor(input_image)
input_batch = torch.unsqueeze(input_tensor, 0)
# run multiple times and print output
for k in range(5):
model.train()
out = model(input_batch)
model.eval()
out2 = model(input_batch)
print(out2[0][:10].cpu().detach())
The output are
tensor([ 0.4722, -2.1463, -0.5993, -0.3880, -2.6292, 1.9123, -1.7939, -0.3289,
-0.3189, 0.5306])
tensor([ 0.4407, -2.0370, -0.7397, -0.4447, -2.6059, 1.9052, -1.9715, -0.6495,
-0.5361, 0.2618])
tensor([ 0.3874, -1.9249, -0.8254, -0.5408, -2.5266, 1.8302, -2.1151, -0.8739,
-0.7206, 0.0478])
tensor([ 0.3150, -1.8490, -0.9004, -0.6544, -2.4615, 1.7409, -2.2083, -1.0194,
-0.8352, -0.1017])
tensor([ 0.2310, -1.7754, -0.8858, -0.7081, -2.3238, 1.5943, -2.2625, -1.1185,
-0.9551, -0.2954])
(If I remove either model.train() or model.eval(), the output keep constant. )
The model torchvision.models.resnet152 contains batch normalization layers with track_running_stats set to True. This means that whenever the model is called in training mode (i.e., when model.train() is set), the running_mean and running_var parameters of such batch normalization layers get updated to include the data of the batch passed in that call.
In your example, each time you call the model under model.train() in the loop, this causes the running_mean and running_var parameters of all the batch normalization layers to be updated. These are then freezed at the updated values when you call model.eval(), and used in the second forward pass, which causes the outputs to be different.
Since you are passing exactly the same inputs every time in the loop, it implies that the running_mean and running_var will converge to a constant after a large number of iterations. You can check that as a result the outputs in eval() mode eventually become identical.
The standard way of evaluating networks typically involves calling model.eval() once and using it over the entire test set, and so does not exhibit this apparent discrepancy.

Do I have to recompile my Gan every batch, to prevent the discriminator from learning?

I have a Gan like so
generator = Model(g_in, g_out)
generator.compile(...)
discriminator = Model(d_in, d_out)
discriminator.trainable = True
discriminator.compile(..)
discriminator.trainable = False
gan = Model(inputs=.., outputs=..)
gan.compile(..)
#iterate over epochs and batches, without compiling
It learns and gives acceptable output. However I get the warning:
"keras\engine\training.py:490: UserWarning: Discrepancy between trainable weights and collected trainable weights, did you set model.trainable without calling model.compile after ?
'Discrepancy between trainable weights and collected trainable'"
If I recompile the discriminator and gan every batch, the warning disappears, but one iteration takes much longer and training speed is slower.
for epoch:
for batch:
fakes=generator.predict_on_batch(batch)
discriminator.trainable = True
discriminator.compile(..)
discriminator.train_on_batch(batch, ..)
discriminator.train_on_batch(fakes, ..)
discriminator.trainable = False
discriminator.compile(..)
gan.compile(..)
gan.train_on_batch(batch,..)
Which one of them is correct?
That's expected and there's no need to recompile every batch. Keras has an open bug about this: https://github.com/keras-team/keras/issues/8585
The replies there have some examples of how to by pass the warning, I'm not going to repeat them here. There's also a reply which gives great advice on how to verify you're really training what you're supposed to be training if you feel unsure about the specifics of your model: https://github.com/keras-team/keras/issues/8585#issuecomment-385729276

Random results from pre-trained InceptionV3 CNN

I'm trying to create an InceptionV3 CNN which has previously been trained on Imagenet. While the creation and the loading of the checkpoint seems to be working correctly, the result seems to be random, as everytime I run the script, I get a different result, even though I don't change anything. The network is recreated from scratch, the same unchanged network is loaded and the same image is classified (which to my understanding should still lead to the same result, even if it can't decide what the image actually is).
I just noticed that even if I try to classify the same image multiple times within the same execution of the script, I end up with a random result.
I create the CNN using like this
from tensorflow.contrib.slim.nets import inception as nn_architecture
from tensorflow.contrib import slim
with slim.arg_scope([slim.conv2d, slim.fully_connected], normalizer_fn=slim.batch_norm,
normalizer_params={'updates_collections': None}): ## this is a fix for an issue where the model doesn't fit the checkpoint https://github.com/tensorflow/models/issues/2977
logits, endpoints = nn_architecture.inception_v3(input, # input
1001, #NUM_CLASSES, #num classes
# num classes #maybe set to 0 or none to ommit logit layer and return input for logit layer instead.
True, # is training (dropout = zero if false for eval
0.8, # dropout keep rate
16, # min depth
1.0, # depth multiplayer
layers_lib.softmax, # prediction function
True, # spatial squeeze
tf.AUTO_REUSE,
# reuse, use get variable to get variables directly... probably
'InceptionV3') # scope
afterwards I load the imagenet trained checkpoint like this
saver = tf.train.Saver()
saver.restore(sess, CHECKPOINT_PATH)
then I verify that it is workingby classifying this image
which I squish from it's original resolution to 299x299 which is required as input for the network
from skimage import io
car = io.imread("data/car.jpg")
car_scaled = zoom(car, [299 / car.shape[0], 299 / car.shape[1], 1])
car_cnnable = np.array([car_scaled])
Then I try to classify the image and print which class the image belongs to most likely and with what likelihood.
predictions = sess.run(logits, feed_dict={images: car_cnnable})
predictions = np.squeeze(predictions) #shape (1, 1001) to shape (1001)
print(np.argmax(predictions))
print(predictions[np.argmax(predictions)])
The class is (or seems to be) random and the likelihood varies as well.
My last few executions were:
Class - likelihood
899 - 0.98858
660 - 0.887204
734 - 0.904047
675 - 0.886952
Here is my full code: https://gist.github.com/Syzygy2048/ddb8602652b547a71316ee0febfddbef
Since I set isTraining to true, it applied the dropout rate every time the network was used. I was under the impression that this only happened during back propagation.
To get it to work correctly, the code should be
logits, endpoints = nn_architecture.inception_v3(input, # input
1001, #NUM_CLASSES, #num classes
# num classes #maybe set to 0 or none to ommit logit layer and return input for logit layer instead.
False, # is training (dropout = zero if false for eval
0.8, # dropout keep rate
16, # min depth
1.0, # depth multiplayer
layers_lib.softmax, # prediction function
True, # spatial squeeze
tf.AUTO_REUSE,
# reuse, use get variable to get variables directly... probably
'InceptionV3') # scope

Scikit learn SGDClassifier: precision and recall change the values each time

I have a question about the precision and recall values in scikit learn. I am using the function SGDClassifier to classify my data.
To evaluate the performances, I am using the precision and recall function precision_recall_fscore_support but each time that I run the program I have different values in the precision and recall matrix. How can I have the true values?
My code is:
scalerI = preprocessing.StandardScaler()
X_train = scalerI.fit_transform(InputT)
X_test = scalerI.transform(InputCross)
clf = SGDClassifier(loss="log", penalty="elasticnet",n_iter=70)
y_rbf = clf.fit(X_train,TargetT)
y_hat=clf.predict(X_test)
a= clf.predict_proba(X_test)
p_and_rec=precision_recall_fscore_support(TargetCross,y_hat,beta=1)
Thank you
From the docs SGDClassifier has a random_state param that is initialised to None this is a seed value used for the random number generator. You need to fix this value so the results are repeatable so set random_state=0 or whatever favourite number you want
clf = SGDClassifier(loss="log", penalty="elasticnet",n_iter=70, random_state=0)
should produce the same results for each run
From the docs:
random_state : int seed, RandomState instance, or None (default) The
seed of the pseudo random number generator to use when shuffling the
data.

Resources