Does anyone knows the difference between the two lines below? - pytorch

I read the codes in pytorch tutorial resently and I find a interesting thing below:
model_conv = torchvision.models.resnet18(pretrained=True)
for param in model_conv.parameters(): **# 1**
param.requires_grad = False **# 1**
# Parameters of newly constructed modules have requires_grad=True by default
num_ftrs = model_conv.fc.in_features
model_conv.fc = nn.Linear(num_ftrs, 2)
model_conv = model_conv.to(device)
criterion = nn.CrossEntropyLoss()
# Observe that only parameters of final layer are being optimized as
# opposed to before.
optimizer_conv = optim.SGD(model_conv.fc.parameters(), lr=0.001, momentum=0.9) **# 2**
# Decay LR by a factor of 0.1 every 7 epochs
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_conv, step_size=7, gamma=0.1)`
I just wonder what's the difference between # 1 and # 2,
if I set #1, Can I just set #2 to code like this?:
optimizer_ft = optim.SGD(model_ft.parameters(), lr=1e-3, momentum=0.9, weight_decay=0.1)
or if I just delete # 1 and leave # 2 alone?
I just wonder what is the difference between #1 and #2...

Yes, if you set #1 the code for #2 could go like this optimizer_ft = optim.SGD(model_ft.parameters(), lr=1e-3, momentum=0.9, weight_decay=0.1) it will automatically get which parameters to set gradient True for.
See here: https://pytorch.org/tutorials/beginner/finetuning_torchvision_models_tutorial.html
This helper function sets the .requires_grad attribute of the
parameters in the model to False when we are feature extracting. By
default, when we load a pretrained model all of the parameters have
.requires_grad=True, which is fine if we are training from scratch or
finetuning. However, if we are feature extracting and only want to
compute gradients for the newly initialized layer then we want all of
the other parameters to not require gradients. This will make more
sense later.
def set_parameter_requires_grad(model, feature_extracting):
if feature_extracting:
for param in model.parameters():
param.requires_grad = False
For
if I just delete # 1 and leave # 2 alone?
You could do that too but imagine if you had to finetune multiple layers in that case it would be redundant to use model_conv.new_layer.parameters() for every new layer so the first way that you said and used seems a better way to do it in that case.

Related

how effective is transfer learning? keeping only two specific output features without resetting features

I want to keep only two specific output features without resetting features.
Resetting features would lose the pre-trained weights.
For example, I don't want to do...
# https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html?highlight=transfer%20learning%20ant%20bees
model_ft = models.resnet18(pretrained=True)
num_ftrs = model_ft.fc.in_features
model_ft.fc = nn.Linear(num_ftrs, 2)
Here is code (following the transfer learning tutorial on Pytorch)
I want to do this to see how effective transfer learning is.
Even without transfer learning, a model might be effective. Removing 998 out of 1000 categories and leaving only two categories, ant and bee, could be a great categorical model since you are left with only two choices.
I do not want to re-train the model, I want to use the weights as it is, otherwise, it will be the same as transfer learning.
You can certainly try this. You can reduce the model output to just the two logits you want to compare with:
chosen_cats = torch.Tensor([ant_index, bee_index]).long()
with torch.set_grad_enabled(phase == 'train'):
outputs = model(inputs)
outputs = torch.index_select(output, 1, chosen_cats)
_, preds = torch.max(outputs, 1)
loss = criterion(outputs, labels)
In this scenario, the preds will be 0 or 1, with 0 predicting ant and 1 predicting bee, so you will need to also modify your labels to reflect this.

How to stack same RNN for every layer?

I would like to know how to stack many layers of RNN but every layer are the same RNN. I want every layer share the same weight. I have read stack LSTM and RNN, but I found that each layer was not the same.
1 layer code:
inputs = keras.Input(shape=(maxlen,), batch_size = batch_size)
Emb_layer = layers.Embedding(max_features,word_dim)
Emb_output = Emb_layer(inputs)
first_layer = layers.SimpleRNN(n_hidden,use_bias=True,return_sequences=False,stateful =False)
first_layer_output = first_layer(Emb_output)
dense_layer = layers.Dense(1, activation='sigmoid')
dense_output = dense_layer(first_layer_output )
model = keras.Model(inputs=inputs, outputs=dense_output)
model.summary()
enter image description here
RNN 1 layer
inputs = keras.Input(shape=(maxlen,), batch_size = batch_size)
Emb_layer = layers.Embedding(max_features,word_dim)
Emb_output = Emb_layer(inputs)
first_layer = layers.SimpleRNN(n_hidden,use_bias=True,return_sequences=True,stateful =True)
first_layer_output = first_layer(Emb_output)
first_layer_state = first_layer.states
second_layer = layers.SimpleRNN(n_hidden,use_bias=True,return_sequences=False,stateful =False)
second_layer_set_state = second_layer(first_layer_output, initial_state=first_layer_state)
dense_layer = layers.Dense(1, activation='sigmoid')
dense_output = dense_layer(second_layer_set_state )
model = keras.Model(inputs=inputs, outputs=dense_output)
model.summary()
enter image description here
Stack RNN 2 layer.
For example, I want to build two layers RNN, but the first layer and the second must have the same weight, such that when I update the weight in the first layer the second layer must be updated and share the same value. As far as I know, TF has RNN.state. It returns the value from the previous layer. However, when I use this, it seems that each layer is treated independently. The 2-layer RNN that I want should have trainable parameters equal to the 1-layer since they shared the same weight, but this did not work.
You can view the layer object as a container for the weights that knows how to apply the weights. You can use the layer object as many times as you want. Assuming the embedding and the RNN dimension are the same, you can do:
states = Emb_layer(inputs)
first_layer = layers.SimpleRNN(n_hidden, use_bias=True, return_sequences=True)
for _ in range(10):
states = first_layer(states)
There is no reason to set stateful to true. This is used when you split long sequences into multiple batches and what the RNN to remember the state between batches, so you do not have yo manually set initial states. You can get the final state of the RNN (that you wany you want to use for classification) by simply indexing the last position from states.

PyTorch "modified by inplace operation" error

The error message indicates that [torch.cuda.FloatTensor [256, 1, 4, 4]] is at version 2; expected version 1 instead, and execution breaks on d_loss.backward() — i.e., the backward call on my Discriminator.
UPDATE: Okay, I tracked it down to an optimizer.step() for my Generator that was happening before running .backward() on my Discriminator.
UPDATE 2: So once I got the model running on PyTorch 1.5 (by moving G's optimizer to after the d_loss.backward() call, as above), I noticed that losses were suddenly much higher during training. I let the model run for a few epochs and the images were basically noise. So, out of curiosity I switched back to my PyTorch 1.4 environment and ran the original for a few epochs, and the images were good again. It's a ClusterGAN that I'm training — so not the standard routine — and I'm wondering why this change is so detrimental to the output. Also, how can I get the model to run in PyTorch 1.5 without the degradation in performance? Presumably I have to keep the optimizer update where it was originally (right after ge_loss.backward(retain_graph=True)), but somehow avoid the error PyTorch 1.5 reports when we hit d_loss.backward() later in the code. I suppose I have to clone() something, but I'm not clear what... ?
[...]
# main training block
for epoch in range(n_epochs):
for i, (imgs, itruth_label) in enumerate(dataloader):
iter_count += 1
# Ensure generator/encoder are trainable
generator.train()
encoder.train()
# Zero gradients for models
generator.zero_grad()
encoder.zero_grad()
discriminator.zero_grad()
# Configure input
real_imgs = Variable(imgs.type(Tensor))
# ---------------------------
# Train Generator + Encoder
# ---------------------------
optimizer_GE.zero_grad()
# Sample random latent variables
zn, zc, zc_idx = sample_z(shape=imgs.shape[0],
latent_dim=latent_dim,
n_c=n_c)
# Generate a batch of images
gen_imgs = generator(zn, zc)
# Discriminator output from real and generated samples
D_gen = discriminator(gen_imgs)
D_real = discriminator(real_imgs)
# Step for Generator & Encoder, n_skip_iter times less than for discriminator
did_update = False
if (i % n_skip_iter == 0):
# Encode the generated images
enc_gen_zn, enc_gen_zc, enc_gen_zc_logits = encoder(gen_imgs)
# Calculate losses for z_n, z_c
zn_loss = mse_loss(enc_gen_zn, zn)
zc_loss = xe_loss(enc_gen_zc_logits, zc_idx)
# additional top-k step (from Sinha et al, 2020)
if top_k <= D_gen.size()[0]:
top_k_gen = torch.topk(D_gen, top_k, 0)
else:
top_k_gen = torch.topk(D_gen, D_gen.size()[0], 0)
# Check requested metric
if wass_metric:
# Wasserstein GAN loss
ge_loss = torch.mean(top_k_gen[0]) + betan * zn_loss + betac * zc_loss
else:
# Vanilla GAN loss
valid = Variable(Tensor(gen_imgs.size(0), 1).fill_(1.0), requires_grad=False)
v_loss = bce_loss(D_gen, valid)
ge_loss = v_loss + betan * zn_loss + betac * zc_loss
ge_loss.backward(retain_graph=True)
# ---- ORIGINAL OPTIMIZER UPDATE ---- #
optimizer_GE.step()
scheduler.step(epoch + i / iters)
did_update = True
# ---------------------
# Train Discriminator
# ---------------------
optimizer_D.zero_grad()
# Measure discriminator's ability to classify real from generated samples
if wass_metric:
# Gradient penalty term
grad_penalty = calc_gradient_penalty(discriminator, real_imgs, gen_imgs)
# Wasserstein GAN loss w/gradient penalty
d_loss = torch.mean(D_real) - torch.mean(D_gen) + grad_penalty
else:
# Vanilla GAN loss
fake = Variable(Tensor(gen_imgs.size(0), 1).fill_(0.0), requires_grad=False)
real_loss = bce_loss(D_real, valid)
fake_loss = bce_loss(D_gen, fake)
d_loss = (real_loss + fake_loss) / 2
d_loss.backward()
# --- REVISED OPTIMIZER UPDATE FOR PyTorch 1.5 ------ #
# if did_update:
# optimizer_GE.step()
optimizer_D.step()
# scheduler.step(epoch + i / iters)
[...]
If I understand correctly, the error occurs at the second time you call .backward().
The problem is caused by calling .backward() on D_gen and D_real twice.
I don't know exactly what you're doing with this model, but I guess you don't have to backward and update the parameters of Discriminators while training Generators, right?
So, try this:
1.set the requires_grad of D.parameters() to False in the Train Generator + Encoder stage
2.set the requires_grad of D.parameters() to True in the Train Discriminator stage

Keras fit vs. fit_generator extra smaples

I have training data and validation data stacked up in two tensors. At first, I ran a NN using keras.model.fit() function. for my purposes, I wish to move to keras.model.fit_generator(). I build a generator and I have noticed the number of samples is not a multiplication of the batch size.
My implementation to overcome this:
indices = np.arange(len(dataset))# generate indices of len(dataset)
num_of_steps = int(np.ceil(len(dataset)/batch_size)) #number of steps per epoch
extra = num_of_steps *batch_size-len(dataset)#find the size of extra samples needed to complete the next multiplication of batch_size
additional = np.random.randint(len(dataset),size = extra )#complete with random samples
indices = np.append(indices ,additional )
After randomizing the indices at each epoch I simply iterate this in batches skips and pool the correct data and labels.
I am observing a degradation in the performance of the model. When training with fit() I get 0.99 training accuracy and 0.93 validation accuracy while with fit_generator() I am getting 0.95 and 0.9 respectively. note, this is consistent and not a single experiment. I thought it might be due to fit() handling the extra samples required differently. Is my implementation reasonable? how does fit() handles datasets of a size different from a batch_size multiplication?
Sharing the full generator code:
def generator(self,batch_size,train):
"""
Generates batches of samples
:return:
"""
while 1:
nb_of_steps=0
if(train):
nb_of_steps = self._num_of_steps_train
indices = np.arange(len(self._x_train))
additional = np.random.randint(len(self._x_train), size=self._num_of_steps_train*batch_size-len(self._x_train))
else:
nb_of_steps = self._num_of_steps_test
indices = np.arange(len(self._x_test))
additional = np.random.randint(len(self._x_test), size=self._num_of_steps_test*batch_size-len(self._x_test))
indices = np.append(indices,additional)
np.random.shuffle(indices)
# print(indices.shape)
# print(nb_of_steps)
for i in range(nb_of_steps):
batch_indices=indices[i:i+batch_size]
if(train):
feat = self._x_train[batch_indices]
label = self._y_train[batch_indices]
else:
feat = self._x_test[batch_indices]
label = self._y_test[batch_indices]
feat = np.expand_dims(feat,axis=1)
# print(feat.shape)
# print(label.shape)
yield feat, label
It looks like you can simplify the generator significantly!
The number of steps etc can be set outside the loop as they do not really change. Moreover, it looks like the batch_indices is not going through the entire dataset. Finally, if your data fits in memory you might not need a generator at all, but will leave this to your judgement.
def generator(self, batch_size, train):
nb_of_steps = 0
if (train):
nb_of_steps = self._num_of_steps_train
indices = np.arange(len(self._x_train)) #len of entire dataset
else:
nb_of_steps = self._num_of_steps_test
indices = np.arange(len(self._x_test))
while 1:
np.random.shuffle(indices)
for i in range(nb_of_steps):
start_idx = i*batch_size
end_idx = min(i*batch_size+batch_size, len(indices))
batch_indices=indices[start_idx : end_idx]
if(train):
feat = self._x_train[batch_indices]
label = self._y_train[batch_indices]
else:
feat = self._x_test[batch_indices]
label = self._y_test[batch_indices]
feat = np.expand_dims(feat,axis=1)
yield feat, label
For a more robust generator consider creating a class for your set using the keras.utils.Sequence class. It will add a few extra lines of code, but it is certainly working with keras.

How to combine FCNN and RNN in Tensorflow?

I want to make a Neural Network, which would have recurrency (for example, LSTM) at some layers and normal connections (FC) at others.
I cannot find a way to do it in Tensorflow.
It works, if I have only FC layers, but I don't see how to add just one recurrent layer properly.
I create a network in a following way :
with tf.variable_scope("autoencoder_variables", reuse=None) as scope:
for i in xrange(self.__num_hidden_layers + 1):
# Train weights
name_w = self._weights_str.format(i + 1)
w_shape = (self.__shape[i], self.__shape[i + 1])
a = tf.multiply(4.0, tf.sqrt(6.0 / (w_shape[0] + w_shape[1])))
w_init = tf.random_uniform(w_shape, -1 * a, a)
self[name_w] = tf.Variable(w_init,
name=name_w,
trainable=True)
# Train biases
name_b = self._biases_str.format(i + 1)
b_shape = (self.__shape[i + 1],)
b_init = tf.zeros(b_shape)
self[name_b] = tf.Variable(b_init, trainable=True, name=name_b)
if i+1 == self.__recurrent_layer:
# Create an LSTM cell
lstm_size = self.__shape[self.__recurrent_layer]
self['lstm'] = tf.contrib.rnn.BasicLSTMCell(lstm_size)
It should process the batches in a sequential order. I have a function for processing just one time-step, which will be called later, by a function, which process the whole sequence :
def single_run(self, input_pl, state, just_middle = False):
"""Get the output of the autoencoder for a single batch
Args:
input_pl: tf placeholder for ae input data of size [batch_size, DoF]
state: current state of LSTM memory units
just_middle : will indicate if we want to extract only the middle layer of the network
Returns:
Tensor of output
"""
last_output = input_pl
# Pass through the network
for i in xrange(self.num_hidden_layers+1):
if(i!=self.__recurrent_layer):
w = self._w(i + 1)
b = self._b(i + 1)
last_output = self._activate(last_output, w, b)
else:
last_output, state = self['lstm'](last_output,state)
return last_output
The following function should take sequence of batches as input and produce sequence of batches as an output:
def process_sequences(self, input_seq_pl, dropout, just_middle = False):
"""Get the output of the autoencoder
Args:
input_seq_pl: input data of size [batch_size, sequence_length, DoF]
dropout: dropout rate
just_middle : indicate if we want to extract only the middle layer of the network
Returns:
Tensor of output
"""
if(~just_middle): # if not middle layer
numb_layers = self.__num_hidden_layers+1
else:
numb_layers = FLAGS.middle_layer
with tf.variable_scope("process_sequence", reuse=None) as scope:
# Initial state of the LSTM memory.
state = initial_state = self['lstm'].zero_state(FLAGS.batch_size, tf.float32)
tf.get_variable_scope().reuse_variables() # THIS IS IMPORTANT LINE
# First - Apply Dropout
the_whole_sequences = tf.nn.dropout(input_seq_pl, dropout)
# Take batches for every time step and run them through the network
# Stack all their outputs
with tf.control_dependencies([tf.convert_to_tensor(state, name='state') ]): # do not let paralelize the loop
stacked_outputs = tf.stack( [ self.single_run(the_whole_sequences[:,time_st,:], state, just_middle) for time_st in range(self.sequence_length) ])
# Transpose output from the shape [sequence_length, batch_size, DoF] into [batch_size, sequence_length, DoF]
output = tf.transpose(stacked_outputs , perm=[1, 0, 2])
return output
The issue is with a variable scopes and their property "reuse".
If I run this code as it is I am getting the following error:
' Variable Train/process_sequence/basic_lstm_cell/weights does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope? '
If I comment out the line, which tell it to reuse variables ( tf.get_variable_scope().reuse_variables() ) I am getting the following error:
'Variable Train/process_sequence/basic_lstm_cell/weights already exists, disallowed. Did you mean to set reuse=True in VarScope?'
It seems, that we need "reuse=None" for the weights of the LSTM cell to be initialized and we need "reuse=True" in order to call the LSTM cell.
Please, help me to figure out the way to do it properly.
I think the problem is that you're creating variables with tf.Variable. Please, use tf.get_variable instead -- does this solve your issue?
It seems that I have solved this issue using the hack from the official Tensorflow RNN example (https://www.tensorflow.org/tutorials/recurrent) with the following code
with tf.variable_scope("RNN"):
for time_step in range(num_steps):
if time_step > 0: tf.get_variable_scope().reuse_variables()
(cell_output, state) = cell(inputs[:, time_step, :], state)
outputs.append(cell_output)
The hack is that when we run LSTM first time, tf.get_variable_scope().reuse is set to False, so that the new LSTM cell is created. When we run it next time, we set tf.get_variable_scope().reuse to True, so that we are using the LSTM, which was already created.

Resources