How to get the inner module of Unet? - pytorch

I created a UNet with the the UnetGenerator. You can find the resulting structure here.
How do I get the module Conv2d(128, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))?
How do I get the module (5): ConvTranspose2d(256, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))?
I want to get the inner modules to change the attributes of certain layers.
I tried something like net.modules(i).modules(i), but it doesn't work. I refer the docs, I haven't get a good idea to do it.
My initial intention is to change the attributes of certain layers when training. I may add a custom layer myLayer, in which self.mode='normal'. When training, I hope I can change its attribute myLayer.mode = 'capture' to make it change its behaviour in training.

All the subclasses of nn.Module has an attribute called children. Which you will be able to access using the code below.
unet = UnetGenerator(512,512,4)
layers = list(unet.children())
len(layers)
For the network I created using the above code , I can access one of the layers inside the network and change the properties like below.
l = layers[0]
conv = list(l.children())[0][0]
conv.kernel_size = (2,2)
If you are training the network without using any pre-trained weights then you could make the changes in the source code before you create the network object.

To add to the correct answer of Vishnu:
If you want to make fixed changes, it will be better to change the code before the network object is created.
However, an important thing to mention about pytorch is, that you can change the properties dynamically: Most Deep Learning Libraries like tensorflow use static graphs for performance reasons. That means the graph of the network is build once and executed every forward pass.
pytorch, on the other hand, uses dynamic computational graphs, meaning that for each forward pass the graph is build on the fly. This gives you opportunities to change the network architecture dependent on some varying parameters.
You could for example decrease your kernel size if your loss falls under a certain value or your epoch is an odd number (I do not imply that these are sensible application). All these dynamic changes should happen in the implementation of the forward function, to which you can pass additional arguments, e.g.:
class MyNet(nn.Module):
def __init__(input_nc, output_nc):
super().__init__()
# define layers
# ...
self.choice_A = nn.Conv2d(input_nc, output_nc, kernel_size = 3)
self.choice_B = nn.Conv2d(input_nc, output_nc, kernel_size = 4)
# continue init
def forward(self, x, epoch):
# start forward pass
# ...
if epoch % 2 == 0:
x = self.choice_A(x)
else:
x = self.choice_B(x)
# ...
return x

Related

In a Pytorch LSTM, does the base class take care of a hidden layer on its own, or must it be defined (additional questions)

Let's take the following code:
'''
LSTM class
'''
import torch
import pandas as pd
import numpy as np
import torch.nn.CrossEntropyLoss
class LSTM(nn.Module):
def __init__(self, input_size, hidden_size, num_layers):
super (LSTM, self).__init__()
self.hidden_size = hidden_size
self.lstm = nn.LSTM(input_size, hidden_size, num_layers)
def forward(self, x):
# receive an input, create a new hidden state, return output?
# reset the hidden state?
hidden = (torch.zeros(num_layers, hidden_size), torch.zeros(num_layers, hidden_size))
x, hidden = self.lstm(x, hidden)
#since our observation has several sequences, we only want the output after the last sequence of the observation'''
x = x[:, -1]
return x
I have several questions here, and if permitted would rather ask them all at once rather than waiting 90 minutes between singular posts.
I've seen and followed quite a few examples of LSTMs in pytorch and each example seems to treat different pieces a bit differently. Since i'm not in expert in either python, or neural networks, this has lead me to a lot of confusion. I'll ask my question sequentially in order of how they appear in the code above.
I've seen the hidden layer both defined, zeroed, left out and ignored entirely in a few different implementations. I know what its for, but in the implementation i've produced (which is itself an amalgamation of several tutorials) the hidden layer doesn't appear to be connected to anything. in the forward function we take a single input pass it to the hidden layer (which is first zero'd) then call self.lstm on it. Is this the equivalent of letting lstm "handle" the hidden layer itself?
Will this properly produce a hidden state?
Am I correct that the optimization only occurs during the training loop? I was using this particular tutorial as an example:
https://pytorch.org/tutorials/beginner/basics/optimization_tutorial.html
def train_loop(dataloader, model, loss_fn, optimizer):
size = len(dataloader.dataset)
for batch, (X, y) in enumerate(dataloader):
# Compute prediction and loss
pred = model(X)
loss = loss_fn(pred, y)
# Backpropagation
optimizer.zero_grad()
loss.backward()
optimizer.step()
in the optimizer tutorial, i assume y is the true label for the observation, is that correct?
My intention is to use cross entropy loss, as that seems like the right one to define what i'm doing with my data (the labels are not discrete, and are real positive floats with a range, and there are 3 of them), so the output size should be 3. Given the optimizer tutorial, all i need to do is hand the loss function the output from the training step, and the correct label, and then backpropagate. Is that correct as well?
I know there's a lot here, so i'd appreciate answers to any of the questions from anyone inclined to help, even if you cannot answer all of them. Thank you for your time.
I've had a talk with a colleague and I've been able to answer some of my own questions, so I'll post those, since it may help others.
The hidden state zero-out is NOT modifying the hidden_layer, it is zero-ing out the hidden state at the start because the cells start empty, as you'd expect in any language object-oriented. It turns out this is unnecessary, since for quite some time the pytorch default is to zero out these values if they're not initialized manually.
This simple implementation will produce a hidden state as written.
The answer to this is yes. We don't "grade" the result of network until we get to the optimization section, or, more specifically, the loss function.
y is the true label, it was not identified in the tutorial. Also, important to note, pred is not a "prediction" but a pytorch object that points to the result of the network acting upon the observation that was fed in. In other words, printing out "pred" would not show you a vector of values that represents a prediction
This is also correct. Pytorch handles the "distance measure" between the true label and the predicted label, on its own.

Normalization of input data in Keras

One common task in DL is that you normalize input samples to zero mean and unit variance. One can "manually" perform the normalization using code like this:
mean = np.mean(X, axis = 0)
std = np.std(X, axis = 0)
X = [(x - mean)/std for x in X]
However, then one must keep the mean and std values around, to normalize the testing data, in addition to the Keras model being trained. Since the mean and std are learnable parameters, perhaps Keras can learn them? Something like this:
m = Sequential()
m.add(SomeKerasLayzerForNormalizing(...))
m.add(Conv2D(20, (5, 5), input_shape = (21, 100, 3), padding = 'valid'))
... rest of network
m.add(Dense(1, activation = 'sigmoid'))
I hope you understand what I'm getting at.
Add BatchNormalization as the first layer and it works as expected, though not exactly like the OP's example. You can see the detailed explanation here.
Both the OP's example and batch normalization use a learned mean and standard deviation of the input data during inference. But the OP's example uses a simple mean that gives every training sample equal weight, while the BatchNormalization layer uses a moving average that gives recently-seen samples more weight than older samples.
Importantly, batch normalization works differently from the OP's example during training. During training, the layer normalizes its output using the mean and standard deviation of the current batch of inputs.
A second distinction is that the OP's code produces an output with a mean of zero and a standard deviation of one. Batch Normalization instead learns a mean and standard deviation for the output that improves the entire network's loss. To get the behavior of the OP's example, Batch Normalization should be initialized with the parameters scale=False and center=False.
There's now a Keras layer for this purpose, Normalization. At time of writing it is in the experimental module, keras.layers.experimental.preprocessing.
https://keras.io/api/layers/preprocessing_layers/core_preprocessing_layers/normalization/
Before you use it, you call the layer's adapt method with the data X you want to derive the scale from (i.e. mean and standard deviation). Once you do this, the scale is fixed (it does not change during training). The scale is then applied to the inputs whenever the model is used (during training and prediction).
from keras.layers.experimental.preprocessing import Normalization
norm_layer = Normalization()
norm_layer.adapt(X)
model = keras.Sequential()
model.add(norm_layer)
# ... Continue as usual.
Maybe you can use sklearn.preprocessing.StandardScaler to scale you data,
This object allow you to save the scaling parameters in an object,
Then you can use Mixin types inputs into you model, lets say:
Your_model
[param1_scaler, param2_scaler]
Here is a link https://www.pyimagesearch.com/2019/02/04/keras-multiple-inputs-and-mixed-data/
https://keras.io/getting-started/functional-api-guide/
There's BatchNormalization, which learns mean and standard deviation of the input. I haven't tried using it as the first layer of the network, but as I understand it, it should do something very similar to what you're looking for.

Repeating part of Keras model, depending on number of inputs

I'm trying to use part of Google Deepminds CGQN network in Keras (Deepmind Paper). Depending on how much input images they give to the network, the network understands more about the 3D environment it is trying to predict. Here is an scheme of their network:
I would also like to use multiple input "images" like they did with the Mθ network. So my question is: Using Keras, how can I reuse a part of the network an arbitrary number of times and then sum all of the outputs it generates, which will be used as an input to the next part of the network?
Thanks in advance!
You can achieve this using the functional API, I'll just give a proof-of-concept here:
images_in = Input(shape=(None, 32, 32, 3)) # Some number of 32x32 colour images
# think of it as a video, a sequence of images for example
shared_conv = Conv2D(32, 2, ...) # some shared layer that you want to apply to every image
features = TimeDistributed(shared_conv)(images_in) # applies shared_conv to every image
Here TimeDistributed applies a given layer across the time dimension, which in our case means it applies to every image and you'll get an output for every image. There are more examples in the documentation linked above and you can implement a shared set of layers / submodel and then apply that to every image and the take the reduced sum.

How can I use tensorboard with tf.estimator.Estimator

I am considering to move my code base to tf.estimator.Estimator, but I cannot find an example on how to use it in combination with tensorboard summaries.
MWE:
import numpy as np
import tensorflow as tf
tf.logging.set_verbosity(tf.logging.INFO)
# Declare list of features, we only have one real-valued feature
def model(features, labels, mode):
# Build a linear model and predict values
W = tf.get_variable("W", [1], dtype=tf.float64)
b = tf.get_variable("b", [1], dtype=tf.float64)
y = W*features['x'] + b
loss = tf.reduce_sum(tf.square(y - labels))
# Summaries to display for TRAINING and TESTING
tf.summary.scalar("loss", loss)
tf.summary.image("X", tf.reshape(tf.random_normal([10, 10]), [-1, 10, 10, 1])) # dummy, my inputs are images
# Training sub-graph
global_step = tf.train.get_global_step()
optimizer = tf.train.GradientDescentOptimizer(0.01)
train = tf.group(optimizer.minimize(loss), tf.assign_add(global_step, 1))
return tf.estimator.EstimatorSpec(mode=mode, predictions=y,loss= loss,train_op=train)
estimator = tf.estimator.Estimator(model_fn=model, model_dir='/tmp/tf')
# define our data set
x=np.array([1., 2., 3., 4.])
y=np.array([0., -1., -2., -3.])
input_fn = tf.contrib.learn.io.numpy_input_fn({"x": x}, y, 4, num_epochs=1000)
for epoch in range(10):
# train
estimator.train(input_fn=input_fn, steps=100)
# evaluate our model
estimator.evaluate(input_fn=input_fn, steps=10)
How can I display my two summaries in tensorboard? Do I have to register a hook in which I use a tf.summary.FileWriter or something else?
EDIT:
Upon testing (in v1.1.0, and probably in later versions as well), it is apparent that tf.estimator.Estimator will automatically write summaries for you. I confirmed this using OP's code and tensorboard.
(Some poking around r1.4 leads me to conclude that this automatic summary writing occurs due to tf.train.MonitoredTrainingSession.)
Ultimately, the automatic summarizing is accomplished with the use of hooks, so if you wanted to customize the Estimator's default summarizing, you could do so using hooks. Below are the (edited) details from the original answer.
You'll want to use hooks, formerly known as monitors. (Linked is a conceptual/quickstart guide; the short of it is that the notion of hooking into / monitoring training is built into the Estimator API. A bit confusingly, though, it doesn't seem like the deprecation of monitors for hooks is really documented except in a deprecation annotation in the actual source code...)
Based on your usage, it looks like r1.2's SummarySaverHook fits your bill.
summary_hook = tf.train.SummarySaverHook(
SAVE_EVERY_N_STEPS,
output_dir='/tmp/tf',
summary_op=tf.summary.merge_all())
You may want to customize the hook's initialization parameters, as by providing an explicity SummaryWriter or writing every N seconds instead of N steps.
If you pass this into the EstimatorSpec, you'll get your customized Summary behavior:
return tf.estimator.EstimatorSpec(mode=mode, predictions=y,loss=loss,
train_op=train,
training_hooks=[summary_hook])
EDIT NOTE:
A previous version of this answer suggested passing the summary_hook into estimator.train(input_fn=input_fn, steps=5, hooks=[summary_hook]). This does not work because tf.summary.merge_all() has to be called in the same context as your model graph.
For me this worked without adding any hooks or merge_all calls. I just added some tf.summary.image(...) in my model_fn and when I train the model they magically appear in tensorboard. Not sure what the exact mechanism is, however. I'm using TensorFlow 1.4.
estimator = tf.estimator.Estimator(model_fn=model, model_dir='/tmp/tf')
Code model_dir='/tmp/tf' means estimator write all logs to /tmp/tf, then run tensorboard --log.dir=/tmp/tf, open you browser with url: http://localhost"6006 ,you can see the graphic
You can create a SummarySaverHook with tf.summary.merger_all() as the summary_op in the model_fn itself. Pass this hook to the training_hooks param of the EstimatorSpec constructor in your model_fn.
I don't think what #jagthebeetle said is exactly applicable here. As the hooks that you transfer to the estimator.train method cannot be run for the summaries that you define in your model_fn, since they won't be added to the merge_all op as they remain bounded by the scope of model_fn

How to use model.reset_states() in Keras?

I have sequential data and I declared a LSTM model which predicts y with x in Keras. So if I call model.predict(x1) and model.predict(x2), Is it correct to call model.reset_states between the two predict() explicitly? Does model.reset_states clear history of inputs, not weights, right?
# data1
x1 = [2,4,2,1,4]
y1 = [1,2,3,2,1]
# dat2
x2 = [5,3,2,4,5]
y2 = [5,3,2,3,2]
And in my actual code, I use model.evaluate(). In evaluate(), is reset_states called implicitly for each data sample?
model.evaluate(dataX, dataY)
reset_states clears only the hidden states of your network. It's worth to mention that depending on if the option stateful=True was set in your network - the behaviour of this function might be different. If it's not set - all states are automatically reset after every batch computations in your network (so e.g. after calling fit, predict and evaluate also). If not - you should call reset_states every time, when you want to make consecutive model calls independent.
If you use explicitly either of:
model.reset_states()
to reset the states of all layers in the model, or
layer.reset_states()
to reset the states of a specific stateful RNN layer (also LSTM layer), implemented here:
def reset_states(self, states=None):
if not self.stateful:
raise AttributeError('Layer must be stateful.')
this means your layer(s) must be stateful.
In LSTM you need to:
explicitly specify the batch size you are using, by passing a batch_size argument to the first layer in your model or batch_input_shape argument
set stateful=True.
specify shuffle=False when calling fit().
The benefits of using stateful models are probable best explained here.

Resources