hi i'm going through pytorch tutorial about transfer learning.
(https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html)
what is model.training for??
enter def visualize_model(model,num_images=6):
was_training=model.training
model.eval()
images_so_far=0
fig=plt.figure()
with torch.no_grad():
for i, (inputs,labels) in enumerate(dataloaders['val']):
inputs=inputs.to(device)
labels=labels.to(device)
outputs=model(inputs)
_,pred=torch.max(outputs,1)
for j in range(inputs.size()[0]):
images_so_far+=1
ax=plt.subplot(num_images//2,2,images_so_far)
ax.axis('off')
ax.set_title('predicted: {}'.format(class_names[preds[j]]))
imshow(inputs.cpu().data[j])
if images_so_far==num_images:
model.train(mode=was_training)
return
model.train(mode=was_training)code here
i cannot understand "model.train(model=was_training)". any help?? thank you so much
I think this will help (link)
All nn.Modules have an internal training attribute, which is changed by calling model.train() and model.eval() to switch the behavior of the model.
The was_training variable stores the current training state of the model, calls model.eval(), and resets the state at the end using model.train(training=was_training).
You can find great answers in pytorch discuss forum ;)
I wonder why they use model.train in the test session. why do they put that code inside the with torch.no_grad()? Isn't it obvious that was_training=false?
It is a bit misleading usage of train because train can be used to put the model in inference (evaluation) mode as well:
>>> model.train(mode=True)
>>> model.training
True # <- train mode
>>> model.train(mode=False)
False # <- eval mode
I agree it is not ideal, a more appropriate formulation would have been simply:
>>> model.eval()
Related
I am trying to do a seq2seq prediction. For this, I have a LSTM layer followed by a fully connected layer. I employ Teacher training during the training phase and would like to skip this (I maybe wrong here) during testing phase. I have not found a direct way of doing this so I have taken the approach shown below.
def forward(self, inputs, future=0, teacher_force_ratio=0.2, target=None):
outputs = []
for idx in range(future):
rnn_out, _ = self.rnn(inputs)
output = self.fc1(rnn_out)
if self.teacher_training:
new_input = output if np.random.random() >= teacher_force_ratio else target[idx]
else:
new_input = output
inputs = new_input
I use a bool variable teacher_training to check if Teacher training is needed or not. Is this correct? If yes, is there a better way to do this? Thanks.
In PyTorch all classes that extend nn.Module have a kwarg boolean param called training . So instead of teacher_training we should simply use training param. This param is automatically set depending on your model training mode (model.train() and model.eval()).
I have implemented a new loss function in PyTorch.
#model_1 needs to be trained
outputs = model_1(input)
loss = myloss(outputs,labels)
#output is how much to resize an image
#label give the image file index
#Below I am explaining myloss() function
org_file_name = pic_ + str(labels[0]) + ".png"
new_image = compress(org_image,outputs)
accuracy_loss = run_pretrained_yolov3(org_image, new_image)
#next two lines are modifying the same DAG
prev_loss = torch.mean((outputs-labels)**2)
new_loss = (accuracy_loss/prev_loss.item())*prev_loss
new_loss.backward()
Can anyone plz help me suggesting how can I know regarding how the loss gradient backpropagation through the computational graph?
[i.e., Actually, inside the myloss() function, I used some other pre-trained model applied in testing mode to get the difference or final loss value.] Now I want to know whether my new_loss.grad backpropagated through model1 or first through yolov3 then through model1? pretrained yolov3 is used on testing mode only.
I have tried tensorboard, it's not providing me that option. Any suggestions will be highly helpful.
I would like to know if Keras can be used as an interface to TensoFlow for only doing computation on my GPU.
I tested TF directly on my GPU. But for ML purposes, I started using Keras, including the backend. I would find it 'comfortable' to do all my stuff in Keras instead of Using two tools.
This is also a matter of curiosity.
I found some examples like this one:
http://christopher5106.github.io/deep/learning/2018/10/28/understand-batch-matrix-multiplication.html
However this example does not actually do the calculation.
It also does not get input data.
I duplicate the snippet here:
'''
from keras import backend as K
a = K.ones((3,4))
b = K.ones((4,5))
c = K.dot(a, b)
print(c.shape)
'''
I would simply like to know if I can get the result numbers from this snippet above, and how?
Thanks,
Michel
Keras doesn't have an eager mode like Tensorflow, and it depends on models or functions with "placeholders" to receive and output data.
So, it's a little more complicated than Tensorflow to do basic calculations like this.
So, the most user friendly solution would be creating a dummy model with one Lambda layer. (And be careful with the first dimension that Keras will insist to understand as a batch dimension and require that input and output have the same batch size)
def your_function_here(inputs):
#if you have more than one tensor for the inputs, it's a list:
input1, input2, input3 = inputs
#if you don't have a batch, you should probably have a first dimension = 1 and get
input1 = input1[0]
#do your calculations here
#if you used the batch_size=1 workaround as above, add this dimension again:
output = K.expand_dims(output,0)
return output
Create your model:
inputs = Input(input_shape)
#maybe inputs2 ....
outputs = Lambda(your_function_here)(list_of_inputs)
#maybe outputs2
model = Model(inputs, outputs)
And use it to predict the result:
print(model.predict(input_data))
I defined a simple generative adversarial network that consists of a generator and a discriminator. The generator is compiled two times: The first time for non-adversarial training (without the discriminator extension), and the second one for adversarial training.
After I have built and compiled everything, I can ask my compiled models for their losses and metrics. This is what I get:
net.generator.loss -> 'mean_absolute_error'
net.generator.metrics -> []
net.combined.loss -> ['mean_absolute_error', 'binary_crossentropy']
net.combined.metrics -> []
So far so good, this seems to be plausible. But when I then use the train_on_batch method on net.generator or net.combined, the format of the returned loss does not match my expectations. I found out that I can check this by using model.metrics_names:
net.generator.metrics_names -> ['loss']
net.combined.metrics_names -> ['loss', 'sequential_15_loss', 'discriminator_loss']
My Question is: Why does my net.combined loss contain 3 instead of just two elements as I defined (loss=[generator_loss_fct,
'binary_crossentropy'). I don't want it to be 3 elements long.
Additionally the first two are almost always the same, or at least
very very very similar.
Does someone understand this? If yes, please explain me why this is like this and if I did something wrong. :)
Thanks in advance!
# build and compile the generator
self.encoder = self._build_encoder(input_shape, encoder_filters, kernel_size, latent_size)
self.decoder = self._build_decoder(self.encoder.layers[-1].output_shape[1:], decoder_filters, kernel_size)
self.generator = Sequential([self.encoder, self.decoder])
# compile for non-adversarial training
self.generator.compile(loss=generator_loss_fct, optimizer=self.optimizer)
# get the inputs
masked_img= Input(self.input_shape, name='masked-image')
filled_img = self.generator(masked_img)
# build and compile the (global) discriminator
self.discriminator = self._build_discriminator(input_shape, discriminator_filters, kernel_size)
self.discriminator.compile(loss='binary_crossentropy', optimizer=self.optimizer, metrics=['accuracy'])
# let the discriminator judge the validity of the reconstruction
valid = self.discriminator(filled_img)
# we freeze the discriminator when training the generator
self.discriminator.trainable = False
# build and compile the combined adversarial model
self.combined = Model(masked_img, [filled_img, valid])
self.combined.compile(loss=[generator_loss_fct, 'binary_crossentropy'], loss_weights=[self.alpha, self.beta], optimizer=self.optimizer)
When you have a multioutput model, Keras will report the total loss, together with the loss corresponding to each output.
Besides, if, as you say, the first two losses are so close, probably your last loss does nothing.
If you are willing to train a GAN model you can take a look at this Keras example
I implement stacked autoencoder using tensorflow library. it work properly. now i am trying to see hidden layer activation values (y1,y2,y3,y4,y5). but i do not find any way to see that. here is my code.
x= tf.placeholder(tf.float32,[None,784])
y_=tf.placeholder(tf.float32,[None,6])
k=190
l=180
m=150
n=130
o=100
num_of_epoch=10
w1=tf.Variable(tf.truncated_normal([784,k],stddev=0.1))
b1=tf.Variable(tf.zeros([k]))
w2=tf.Variable(tf.truncated_normal([k,l],stddev=0.1))
b2=tf.Variable(tf.zeros([l]))
w3=tf.Variable(tf.truncated_normal([l,m],stddev=0.1))
b3=tf.Variable(tf.zeros([m]))
w4=tf.Variable(tf.truncated_normal([m,n],stddev=0.1))
b4=tf.Variable(tf.zeros([n]))
w5=tf.Variable(tf.truncated_normal([n,o],stddev=0.1))
b5=tf.Variable(tf.zeros([o]))
w6=tf.Variable(tf.truncated_normal([o,6],stddev=0.1))
b6=tf.Variable(tf.zeros([6]))
y1=tf.nn.relu(tf.matmul(x,w1)+b1)
y2=tf.nn.relu(tf.matmul(y1,w2)+b2)
y3=tf.nn.relu(tf.matmul(y2,w3)+b3)
y4=tf.nn.relu(tf.matmul(y3,w4)+b4)
y5=tf.nn.relu(tf.matmul(y4,w5)+b5)
y=tf.nn.softmax(tf.matmul(y5,w6)+b6)
cross_entropy=tf.reduce_mean(-tf.reduce_sum(y_*tf.log(y),
reduction_indices=[1]))
train_step=tf.train.GradientDescentOptimizer(0.1).minimize(cross_entropy)
init=tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
for i in range(num_of_epoch):
train_data = {x:x_train,y_:y_train}
sess.run(train_step,feed_dict=train_data)
currect_prediction=tf.equal(tf.argmax(y,1),tf.argmax(y_,1))
accuracy=tf.reduce_mean(tf.cast(currect_prediction,tf.float32))
sess.run(accuracy,feed_dict={x:x_train,y_:y_train})
currect_prediction=tf.equal(tf.argmax(y,1),tf.argmax(y_,1))
accuracy=tf.reduce_mean(tf.cast(currect_prediction,tf.float32))
sess.run(accuracy,feed_dict= {x:x_test,y_:y_test})
if you know the way please share with me or give the any effective link where i can find the wright answer. thank in advance.