I'm trying to train SRGAN. (Super Resolution GAN)
However, the discriminator's output converge to 0 or 1 whatever the input is.
Discriminator's loss function is only
D_loss = 0.5*(D_net(fake) + 1 - D_net(real))
D_net(fake) and D_net(real) both becomes 0 or 1. (sigmoid)
How can I fix it?
for epoch_idx in range(epoch_num):
for batch_idx, data in enumerate(data_loader):
D_net.zero_grad()
#### make real, low, fake
real = data[0]
for img_idx in range(batch_size):
low[img_idx] = trans_low_res(real[img_idx])
fake = G_net(Variable(low).cuda())
#### get Discriminator loss and train Discriminator
real_D_out = D_net(Variable(real).cuda()).mean()
fake_D_out = D_net(Variable(fake).cuda()).mean()
D_loss = 0.5*(fake_D_out + 1 - real_D_out)
D_loss.backward()
D_optim.step()
#### train Generator
G_net.zero_grad()
#### get new fake D out with updated Discriminator
fake_D_out = D_net(Variable(fake).cuda()).mean()
G_loss = generator_criterion(fake_D_out.cuda(), fake.cuda(), real.cuda())
G_loss.backward()
G_optim.step()
Batch : [10/6700] Discriminator_Loss: 0.0860 Generator_Loss : 0.1393
Batch : [20/6700] Discriminator_Loss: 0.0037 Generator_Loss : 0.1282
Batch : [30/6700] Discriminator_Loss: 0.0009 Generator_Loss : 0.0838
Batch : [40/6700] Discriminator_Loss: 0.0002 Generator_Loss : 0.0735
Batch : [50/6700] Discriminator_Loss: 0.0001 Generator_Loss : 0.0648
Batch : [60/6700] Discriminator_Loss: 0.5000 Generator_Loss : 0.0634
Batch : [70/6700] Discriminator_Loss: 0.5000 Generator_Loss : 0.0706
Batch : [80/6700] Discriminator_Loss: 0.5000 Generator_Loss : 0.0691
Batch : [90/6700] Discriminator_Loss: 0.5000 Generator_Loss : 0.0538
...
I am not sure if I understand your problem correctly. You meant that the sigmoid output from the discriminator is either 0 or 1?
In your loss function: D_loss = 0.5 * (fake_D_out + 1 - real_D_out), you are directly optimizing on the sigmoid output and looks like the discriminator overfits to your data that it can accurately predict 0 and 1 for fake and real examples respectively.
There are some GAN hacks suggested by experts in this subject matters. You can find a list of tips and tricks here. I would like to suggest you use soft-labels rather than hard-labels (see ref).
You can use BCEWithLogitsLoss() and compute loss based on soft labels instead of hard labels.
Difference between hard and soft labels:
# hard labels
real = 1
fake = 0
# soft labels
real = np.random.uniform(0.7, 1.0) # 1
fake = np.random.uniform(0.0, 0.3) # 0
Related
I would like to update learning rates corresponding to each weight matrix and each bias in pytorch during training. The answers here and here and many other answers I found online talk about doing this using the model's param_groups which to the best of my knowledge applies learning rates in groups, not layer weight/bias specific. I also want to update the learning rates during training, not pre-setting them with torch.optim.
Any help is appreciated.
Updates to model parameters are handled by an optimizer in PyTorch. When you define the optimizer you have the option of partitioning the model parameters into different groups, called param groups. Each param group can have different optimizer settings. For example one group of parameters could have learning rate of 0.1 and another could have learning rate of 0.01.
To do what you're asking, you can just make every parameter belong to a different param group. You'll need some way to keep track of which param group corresponds to which parameter. Once you've defined the optimizer with different groups you can update the learning rate whenever you want, including at training time.
For example, say we have the following simple linear model
import torch
import torch.nn as nn
import torch.optim as optim
class LinearModel(nn.Module):
def __init__(self):
super().__init__()
self.layer1 = nn.Linear(10, 20)
self.layer2 = nn.Linear(20, 1)
def forward(self, x):
return self.layer2(self.layer1(x))
model = LinearModel()
and suppose we want learning rates for each trainable parameter initialized according to the following:
learning_rates = {
'layer1.weight': 0.01,
'layer1.bias': 0.1,
'layer2.weight': 0.001,
'layer2.bias': 1.0}
We can use this dictionary to define a different learning rate for each parameter when we initialize the optimizer.
# Build param_group where each group consists of a single parameter.
# `param_group_names` is created so we can keep track of which param_group
# corresponds to which parameter.
param_groups = []
param_group_names = []
for name, parameter in model.named_parameters():
param_groups.append({'params': [parameter], 'lr': learning_rates[name]})
param_group_names.append(name)
# optimizer requires default learning rate even if its overridden by all param groups
optimizer = optim.SGD(param_groups, lr=10)
Alternatively, we could omit the 'lr' entry and each param group would be initialized with the default learning rate (lr=10 in this case).
At training time if we wanted to update the learning rates we could do so by iterating over each of the optimizer.param_groups and updating the 'lr' entry for each of them. For example, in the following simplified training loop, we update the learning rates before each step.
for i in range(10):
output = model(torch.zeros(1, 10))
loss = output.sum()
optimizer.zero_grad()
loss.backward()
# we can change the learning rate whenever we want for each param group
print(f'step {i} learning rates')
for name, param_group in zip(param_group_names, optimizer.param_groups):
param_group['lr'] = learning_rates[name] / (i + 1)
print(f' {name}: {param_group["lr"]}')
optimizer.step()
which prints
step 0 learning rates
layer1.weight: 0.01
layer1.bias: 0.1
layer2.weight: 0.001
layer2.bias: 1.0
step 1 learning rates
layer1.weight: 0.005
layer1.bias: 0.05
layer2.weight: 0.0005
layer2.bias: 0.5
step 2 learning rates
layer1.weight: 0.0033333333333333335
layer1.bias: 0.03333333333333333
layer2.weight: 0.0003333333333333333
layer2.bias: 0.3333333333333333
step 3 learning rates
layer1.weight: 0.0025
layer1.bias: 0.025
layer2.weight: 0.00025
layer2.bias: 0.25
step 4 learning rates
layer1.weight: 0.002
layer1.bias: 0.02
layer2.weight: 0.0002
layer2.bias: 0.2
step 5 learning rates
layer1.weight: 0.0016666666666666668
layer1.bias: 0.016666666666666666
layer2.weight: 0.00016666666666666666
layer2.bias: 0.16666666666666666
step 6 learning rates
layer1.weight: 0.0014285714285714286
layer1.bias: 0.014285714285714287
layer2.weight: 0.00014285714285714287
layer2.bias: 0.14285714285714285
step 7 learning rates
layer1.weight: 0.00125
layer1.bias: 0.0125
layer2.weight: 0.000125
layer2.bias: 0.125
step 8 learning rates
layer1.weight: 0.0011111111111111111
layer1.bias: 0.011111111111111112
layer2.weight: 0.00011111111111111112
layer2.bias: 0.1111111111111111
step 9 learning rates
layer1.weight: 0.001
layer1.bias: 0.01
layer2.weight: 0.0001
layer2.bias: 0.1
The error message indicates that [torch.cuda.FloatTensor [256, 1, 4, 4]] is at version 2; expected version 1 instead, and execution breaks on d_loss.backward() — i.e., the backward call on my Discriminator.
UPDATE: Okay, I tracked it down to an optimizer.step() for my Generator that was happening before running .backward() on my Discriminator.
UPDATE 2: So once I got the model running on PyTorch 1.5 (by moving G's optimizer to after the d_loss.backward() call, as above), I noticed that losses were suddenly much higher during training. I let the model run for a few epochs and the images were basically noise. So, out of curiosity I switched back to my PyTorch 1.4 environment and ran the original for a few epochs, and the images were good again. It's a ClusterGAN that I'm training — so not the standard routine — and I'm wondering why this change is so detrimental to the output. Also, how can I get the model to run in PyTorch 1.5 without the degradation in performance? Presumably I have to keep the optimizer update where it was originally (right after ge_loss.backward(retain_graph=True)), but somehow avoid the error PyTorch 1.5 reports when we hit d_loss.backward() later in the code. I suppose I have to clone() something, but I'm not clear what... ?
[...]
# main training block
for epoch in range(n_epochs):
for i, (imgs, itruth_label) in enumerate(dataloader):
iter_count += 1
# Ensure generator/encoder are trainable
generator.train()
encoder.train()
# Zero gradients for models
generator.zero_grad()
encoder.zero_grad()
discriminator.zero_grad()
# Configure input
real_imgs = Variable(imgs.type(Tensor))
# ---------------------------
# Train Generator + Encoder
# ---------------------------
optimizer_GE.zero_grad()
# Sample random latent variables
zn, zc, zc_idx = sample_z(shape=imgs.shape[0],
latent_dim=latent_dim,
n_c=n_c)
# Generate a batch of images
gen_imgs = generator(zn, zc)
# Discriminator output from real and generated samples
D_gen = discriminator(gen_imgs)
D_real = discriminator(real_imgs)
# Step for Generator & Encoder, n_skip_iter times less than for discriminator
did_update = False
if (i % n_skip_iter == 0):
# Encode the generated images
enc_gen_zn, enc_gen_zc, enc_gen_zc_logits = encoder(gen_imgs)
# Calculate losses for z_n, z_c
zn_loss = mse_loss(enc_gen_zn, zn)
zc_loss = xe_loss(enc_gen_zc_logits, zc_idx)
# additional top-k step (from Sinha et al, 2020)
if top_k <= D_gen.size()[0]:
top_k_gen = torch.topk(D_gen, top_k, 0)
else:
top_k_gen = torch.topk(D_gen, D_gen.size()[0], 0)
# Check requested metric
if wass_metric:
# Wasserstein GAN loss
ge_loss = torch.mean(top_k_gen[0]) + betan * zn_loss + betac * zc_loss
else:
# Vanilla GAN loss
valid = Variable(Tensor(gen_imgs.size(0), 1).fill_(1.0), requires_grad=False)
v_loss = bce_loss(D_gen, valid)
ge_loss = v_loss + betan * zn_loss + betac * zc_loss
ge_loss.backward(retain_graph=True)
# ---- ORIGINAL OPTIMIZER UPDATE ---- #
optimizer_GE.step()
scheduler.step(epoch + i / iters)
did_update = True
# ---------------------
# Train Discriminator
# ---------------------
optimizer_D.zero_grad()
# Measure discriminator's ability to classify real from generated samples
if wass_metric:
# Gradient penalty term
grad_penalty = calc_gradient_penalty(discriminator, real_imgs, gen_imgs)
# Wasserstein GAN loss w/gradient penalty
d_loss = torch.mean(D_real) - torch.mean(D_gen) + grad_penalty
else:
# Vanilla GAN loss
fake = Variable(Tensor(gen_imgs.size(0), 1).fill_(0.0), requires_grad=False)
real_loss = bce_loss(D_real, valid)
fake_loss = bce_loss(D_gen, fake)
d_loss = (real_loss + fake_loss) / 2
d_loss.backward()
# --- REVISED OPTIMIZER UPDATE FOR PyTorch 1.5 ------ #
# if did_update:
# optimizer_GE.step()
optimizer_D.step()
# scheduler.step(epoch + i / iters)
[...]
If I understand correctly, the error occurs at the second time you call .backward().
The problem is caused by calling .backward() on D_gen and D_real twice.
I don't know exactly what you're doing with this model, but I guess you don't have to backward and update the parameters of Discriminators while training Generators, right?
So, try this:
1.set the requires_grad of D.parameters() to False in the Train Generator + Encoder stage
2.set the requires_grad of D.parameters() to True in the Train Discriminator stage
I am using TensorFlow 2.0 and Python 3.8 and I want to use a learning rate scheduler for which I have a function. I have to train a neural network for 160 epochs with the following where the learning rate is to be decreased by a factor of 10 at 80 and 120 epochs, where the initial learning rate = 0.01.
def scheduler(epoch, current_learning_rate):
if epoch == 79 or epoch == 119:
return current_learning_rate / 10
else:
return min(current_learning_rate, 0.001)
How can I use this learning rate scheduler function with 'tf.GradientTape()'? I know how to use this using "model.fit()" as a callback:
callback = tf.keras.callbacks.LearningRateScheduler(scheduler)
How do I use this while using custom training loops with "tf.GradientTape()"?
Thanks!
The learning rate for different epochs can be set using lr attribute of tensorflow keras optimizer. lr attribute of the optimizer still exists since tensorflow 2 has backward compatibility for keras (For more details refer the source code here).
Below is a small snippet of how the learning rate can be varied across different epochs. self._train_step is similar to the train_step function defined here.
def set_learning_rate(epoch):
if epoch > 180:
optimizer.lr = 0.5e-6
elif epoch > 160:
optimizer.lr = 1e-6
elif epoch > 120:
optimizer.lr = 1e-5
elif epoch > 3:
optimizer.lr = 1e-4
def train(epochs, train_data, val_data):
prev_val_loss = float('inf')
for epoch in range(epochs):
self.set_learning_rate(epoch)
for images, labels in train_data:
self._train_step(images, labels)
for images, labels in val_data:
self._test_step(images, labels)
Another alternative would be to use tf.keras.optimizers.schedules
learning_rate_fn = keras.optimizers.schedules.PiecewiseConstantDecay(
[80*num_steps, 120*num_steps, 160*num_steps, 180*num_steps],
[1e-3, 1e-4, 1e-5, 1e-6, 5e-6]
)
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate_fn)
Note that here one cant directly provide the epochs, instead the number of steps have to be given, where each step is len(train_data)/batch_size.
A learning rate schedule needs a step value that can not be specified when using GradientTape followed by optimizer.apply_gradient().
So you should not pass directly the schedule as the learning_rate of the optimizer.
Instead, you can first call the schedule function to get the value for current step and then update the learning rate value in the optimizer:
optim = tf.keras.optimizers.SGD()
lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(1e-2,1000,.9)
for step in range(0,1000):
lr = lr_schedule(step)
optim.learning_rate = lr
with GradientTape() as tape:
call func to differentiate
optim.apply_gradient(func,...)
I have question about the accuracy from keras evaluate generator
I would provide my code and the question would follows up:
from model part,at the end of model:
o = (Reshape(( outputHeight*outputWidth, n_classes)))(o)
o = (Activation('softmax'))(o)
#create model
model = Model( img_input , o )
And the evaluation part:
K.set_learning_phase(0)
m = load_model(join(save_weights_path,"model.h5"))
batch_size = 1
test_path = "data/validation"
G =Generator(test_path,test_path,batch_size)
images = glob(join(test_path,"*.jpg"))
steps =len(images)
evaluator= m.evaluate_generator(G,steps = steps ,verbose = 1)
print("Accuracy :",evaluator[1])
And I get result:
256/256 [==============================] - 10s 41ms/step
Accuracy : 0.7758417576551437
And then, I predict:
#feed data 256 images
X=Generator()
#predict
pr = m.predict( np.array([X]))[0]
#reshape
pr = pr.reshape(( output_height , output_width , n_classes ) ).argmax(axis=-1)
The result looks okay to me ,but some of them are not good...
The size of image is 512*512
I wonder does the accuracy returns from evaluate_generator really means
my model predict 512*512*0.77 pixels correctly?
Thanks !
edit:
I just do a single experiment:
I leave only one image in the folder and then
run the evaluate_generator ,the accuracy returns is :
1/1 [==============================] - 1s 1s/step
Accuracy : 0.5572433471679688
while I test it on my own :
img_name = glob("*.jpg")
gt_name = glob("*.png")
img = cv.imread(img_name[0])
gt = cv.imread(gt_name[0])
c = (img==gt).all(axis=-1)
total = img.shape[0]*img.shape[1]
print(np.sum(c)/total)
Guess what:
0.41905975341796875
It turns out that I was wrong
I saved my segmentation output as jpg file which changes some pixels value
After I find out and save as png file ,the acc is the same as what Keras gives.
I am currently working on my mini-project, where I predict movie genres based on their posters. So in the dataset that I have, each movie can have from 1 to 3 genres, therefore each instance can belong to multiple classes. I have total of 15 classes(15 genres). So now I am facing with the problem of how to do predictions using pytorch for this particular problem.
In pytorch CIFAR-tutorial, where each instance can have only one class ( for example, if image is a car it should belong to class of cars) and there are 10 classes in total. So in this case, model prediction is defined in the following way(copying code snippet from pytorch website):
import torch.optim as optim
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
for epoch in range(2): # loop over the dataset multiple times
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
# get the inputs
inputs, labels = data
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# print statistics
running_loss += loss.item()
if i % 2000 == 1999: # print every 2000 mini-batches
print('[%d, %5d] loss: %.3f' %
(epoch + 1, i + 1, running_loss / 2000))
running_loss = 0.0
print('Finished Training')
Question 1(for training part). What could you suggest to use as an activation function. I was thinking about BCEWithLogitsLoss() but I am not sure how good it will be.
and then the accuracy of prediction for testset is defined in the following way:
for the entire network:
correct = 0
total = 0
with torch.no_grad():
for data in testloader:
images, labels = data
outputs = net(images)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
print('Accuracy of the network on the 10000 test images: %d %%' % (
100 * correct / total))
and for each class:
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
with torch.no_grad():
for data in testloader:
images, labels = data
outputs = net(images)
_, predicted = torch.max(outputs, 1)
c = (predicted == labels).squeeze()
for i in range(4):
label = labels[i]
class_correct[label] += c[i].item()
class_total[label] += 1
for i in range(10):
print('Accuracy of %5s : %2d %%' % (
classes[i], 100 * class_correct[i] / class_total[i]))
where the output is as follows:
Accuracy of plane : 36 %
Accuracy of car : 40 %
Accuracy of bird : 30 %
Accuracy of cat : 19 %
Accuracy of deer : 28 %
Accuracy of dog : 17 %
Accuracy of frog : 34 %
Accuracy of horse : 43 %
Accuracy of ship : 57 %
Accuracy of truck : 35 %
Now here is question 2:
How can I determine the accuracy so it would look in the following way:
For example:
The Matrix (1999) ['Action: 91%', 'Drama: 25%', 'Adventure: 13%']
The Others (2001) ['Drama: 76%', 'Horror: 65%', 'Action: 41%']
Alien: Resurrection (1997) ['Horror: 67%', 'Action: 64%', 'Drama: 43%']
The Martian (2015) ['Drama: 95%', 'Adventure: 81%']
Considering that every movie does not always have 3 genres, sometimes is 2 and sometimes is 1. So as I see it, I should find 3 maximum values, 2 maximum values or 1 maximum value of my output list , which is list of 15 genres so, for example, if
my predicted genres are [Movie, Adventure] then
some_kind_of_function(outputs) should give me output of
[1 0 0 0 0 0 0 0 0 0 0 1 0 0 0] ,
which I can compare afterwards with ground_truth.
I don't think torchmax will work in this case, cause it gives only one max value from [weigts array], so
What's the best way to implement it?
Thank you in advance, appreciate any help or suggestion:)
You're right, you're looking to perform binary classification (is poster X a drama movie or not? Is it an action movie or not?) for each poster-genre pair. BinaryCrossEntropy(WithLogits) is the way to go.
Regarding the best metric to evaluate your resulting algorithm, it's up to you, what are you looking for. But you may want to investigate ideas like precision and recall or f1 score. Personally, I would probably pick the top 3 for each genre (since that's at max number of genres assigned to each poster) and look if the ones to be expected show up with high probability and if the unexpected ones (in case of a movie with 2 "ground truth" genres) show at the last places, with significantly less probability assigned.