How to use custom activation on a custom keras layer - python-3.x

I want to create a gaussian activation function where I want to train the mean and standard deviation. So, using a custom layer, I've returned the mean and std, but then unable to extract the learned parameters and feed to activation unit.
class MyLayer(Layer):
def __init__(self, output_dim,m,st, **kwargs):
self.units = output_dim
self.m=m
self.st=st
super(MyLayer, self).__init__(**kwargs)
def build(self, input_shape):
# Create a trainable weight variable for this layer.
self.kernel = self.add_weight(name='kernel',
shape=(input_shape[1], self.output_dim),
initializer='uniform',
trainable=True)
self.m = self.add_weight(name='mean',
shape=(1, 1),
initializer='uniform',
trainable=True)
self.std = self.add_weight(name='std',
shape=(1, 1),
initializer='uniform',
trainable=True)
super(MyLayer, self).build(input_shape) # Be sure to call this at the end
def call(self, x):
return K.dot(x, self.kernel)
def compute_output_shape(self, input_shape):
return (input_shape[0], self.output_dim,m,std)
How to use a custom activation function along with this custom layer

Related

Where and how are the parameters initialised to calculate the gradients during loss.backwards() in Pytorch?

I have two Neural networks Child1 and Child2. The output of Child1 is fed to the child2. The output of child2 is fed to the child1 again. Each one has their respective cost function. I have created a parent module to pack both the modules in a single module. I have done this because I have a requirement that the back propagation(derivative calculation) of each of the loss function of the neural networks(child1 and child2) should be done with respect to the weights of both the neural networks combined(child1 + child2). Could I do it this way? Could I backpropagate through my Parent module and get the gradient of each of the loss functions with repect to the combined weights?
class Parent(nn.Module):
def __init__(self,in_features,z_dim):
super().__init__()
self.my_child1 = Child1(z_dim)
self.my_child2 = Child2 (in_features)
def forward(self,input):
input=self.my_child1(input)
input=self.my_child2(input)
return input
class Child1(nn.Module):
def __init__(self, in_features):
super().__init__()
self.child1 = nn.Sequential(
nn.Linear(in_features, 128),
nn.LeakyReLU(0.01),
nn.Linear(128, 1),
nn.Sigmoid(),
)
def forward(self, x):
return self.child1(x)
class Child2(nn.Module):
def __init__(self, z_dim, img_dim):
super().__init__()
self.child2 = nn.Sequential(
nn.Linear(z_dim, 256),
nn.LeakyReLU(0.01),
nn.Linear(256, img_dim),
nn.Tanh(),
)
def forward(self, x):
return self.child2(x)

Constructing parameter groups in pytorch

In the torch.optim documentation, it is stated that model parameters can be grouped and optimized with different optimization hyperparameters. It says that
For example, this is very useful when one wants to specify per-layer
learning rates:
optim.SGD([
{'params': model.base.parameters()},
{'params': model.classifier.parameters(), 'lr': 1e-3}
], lr=1e-2, momentum=0.9)
This means that model.base’s parameters will use the default
learning rate of 1e-2, model.classifier’s parameters will use a
learning rate of 1e-3, and a momentum of 0.9 will be used for all
parameters.
I was wondering how to define such groups that have parameters() attribute. What came to my mind was something in the form of
class MyModel(nn.Module):
def __init__(self):
super(MyModel, self).__init__()
self.base()
self.classifier()
self.relu = nn.ReLU()
def base(self):
self.fc1 = nn.Linear(1, 512)
self.fc2 = nn.Linear(512, 264)
def classifier(self):
self.fc3 = nn.Linear(264, 128)
self.fc4 = nn.Linear(128, 964)
def forward(self, y0):
y1 = self.relu(self.fc1(y0))
y2 = self.relu(self.fc2(y1))
y3 = self.relu(self.fc3(y2))
return self.fc4(y3)
How should I modify the snippet above to be able to get model.base.parameters()? Is the only way to define a nn.ParameterList and explicitly add weights and biases of the desired layers to that list? What is the best practice?
I will show three approaches to solving this. In the end though, it comes down to personal preference.
- Grouping parameters with nn.ModuleDict.
I noticed here an answer using nn.Sequential to group the layers which allow to
target different sections of the model using the parameters attribute of nn.Sequential. Indeed base and classifier might be more than sequential layers. I believe a more general approach is to leave the module as is, but instead, initialize an additional nn.ModuleDict module which will contain all parameters ordered by the optimization group in separate nn.ModuleLists:
class MyModel(nn.Module):
def __init__(self):
super().__init__()
self.fc1 = nn.Linear(1, 512)
self.fc2 = nn.Linear(512, 264)
self.fc3 = nn.Linear(264, 128)
self.fc4 = nn.Linear(128, 964)
self.params = nn.ModuleDict({
'base': nn.ModuleList([self.fc1, self.fc2]),
'classifier': nn.ModuleList([self.fc3, self.fc4])})
def forward(self, y0):
y1 = self.relu(self.fc1(y0))
y2 = self.relu(self.fc2(y1))
y3 = self.relu(self.fc3(y2))
return self.fc4(y3)
Then you can define your optimizer with:
optim.SGD([
{'params': model.params.base.parameters()},
{'params': model.params.classifier.parameters(), 'lr': 1e-3}
], lr=1e-2, momentum=0.9)
Do note MyModel's parameters' generator won't contain duplicate parameters.
- Creating an interface for accessing parameter groups.
A different solution is to provide an interface in the nn.Module to separate the parameters into groups:
class MyModel(nn.Module):
def __init__(self):
super().__init__()
self.fc1 = nn.Linear(1, 512)
self.fc2 = nn.Linear(512, 264)
self.fc3 = nn.Linear(264, 128)
self.fc4 = nn.Linear(128, 964)
def forward(self, y0):
y1 = self.relu(self.fc1(y0))
y2 = self.relu(self.fc2(y1))
y3 = self.relu(self.fc3(y2))
return self.fc4(y3)
def base_params(self):
return chain(m.parameters() for m in [self.fc1, self.fc2])
def classifier_params(self):
return chain(m.parameters() for m in [self.fc3, self.fc4])
Having imported itertools.chain as chain.
Then define your optimizer with:
optim.SGD([
{'params': model.base_params()},
{'params': model.classifier_params(), 'lr': 1e-3}
], lr=1e-2, momentum=0.9)
- Using child nn.Modules.
Lastly, you can define your module sections as submodules (here it comes down as the method as the nn.Sequential one, yet you can generalize this to any submodules).
class Base(nn.Sequential):
def __init__(self):
super().__init__(nn.Linear(1, 512),
nn.ReLU(),
nn.Linear(512, 264),
nn.ReLU())
class Classifier(nn.Sequential):
def __init__(self):
super().__init__(nn.Linear(264, 128),
nn.ReLU(),
nn.Linear(128, 964))
class MyModel(nn.Module):
def __init__(self):
super().__init__()
self.base = Base()
self.classifier = Classifier()
def forward(self, y0):
features = self.base(y0)
out = self.classifier(features)
return out
Here again you can use the same interface as the first method:
optim.SGD([
{'params': model.base.parameters()},
{'params': model.classifier.parameters(), 'lr': 1e-3}
], lr=1e-2, momentum=0.9)
I would argue this is the best practice. However, it forces you to define each of your components into separate nn.Module, which can be a hassle when experimenting with more complex models.
You can use torch.nn.Sequential to define base and classifier. Your class definition can then be:
class MyModel(nn.Module):
def __init__(self):
super(MyModel, self).__init__()
self.base = nn.Sequential(nn.Linear(1, 512), nn.ReLU(), nn.Linear(512,264), nn.ReLU())
self.classifier = nn.Sequential(nn.Linear(264,128), nn.ReLU(), nn.Linear(128,964))
def forward(self, y0):
return self.classifier(self.base(y0))
Then, you can access parameters using model.base.parameters() and model.classifier.parameters().

How to implement some trainable parameters in the model of Keras like nn.Parameters() in Pytorch?

I just wanna to implement some trainable parameters in my model with Keras. In Pytorch, we can do it by using torch.nn.Parameter() like below:
self.a = nn.Parameter(torch.ones(8))
self.b = nn.Parameter(torch.zeros(16,8))
I think by doing this in pytorch it can add some trainable parameters into the model. And now I wanna to know, how to achieve similar operations in keras?
Any suggestions or advice are welcomed!
THX! :)
p.s. I just write a custom layer in Keras as below:
class Mylayer(Layer):
def __init__(self,input_dim,output_dim,**kwargs):
self.input_dim = input_dim
self.output_dim = output_dim
super(Mylayer,self).__init__(**kwargs)
def build(self):
self.kernel = self.add_weight(name='pi',
shape=(self.input_dim,self.output_dim),
initializer='zeros',
trainable=True)
self.kernel_2 = self.add_weight(name='mean',
shape=(self.input_dim,self.output_dim),
initializer='ones',
trainable=True)
super(Mylayer,self).build()
def call(self,x):
return x,self.kernel,self.kernel_2
and I wanna to know if I haven't change the tensor which pass through the layer, should I write the function def compute_output_shape() for necessary?
You need to create the trainable weights in a custom layer:
class MyLayer(Layer):
def __init__(self, my_args, **kwargs):
#do whatever you need with my_args
super(MyLayer, self).__init__(**kwargs)
#you create the weights in build:
def build(self, input_shape):
#use the input_shape to infer the necessary shapes for weights
#use self.whatever_you_registered_in_init to help you, like units, etc.
self.kernel = self.add_weight(name='kernel',
shape=the_shape_you_calculated,
initializer='uniform',
trainable=True)
#create as many weights as necessary for this layer
#build the layer - equivalent to self.built=True
super(MyLayer, self).build(input_shape)
#create the layer operation here
def call(self, inputs):
#do whatever operations are needed
#example:
return inputs * self.kernel #make sure the shapes are compatible
#tell keras about the output shape of your layer
def compute_output_shape(self, input_shape):
#calculate the output shape based on the input shape and your layer's rules
return calculated_output_shape
Now use your layer in the model.
If you are using eager execution on with tensorflow and creating a custom training loop, you can work pretty much the same way you do with PyTorch, and you can create weights outside layers with tf.Variable, passing them as parameters to the gradient calculation methods.

Understanding Pytorch vanilla RNN architectures

Standard interpretation: in the original RNN, the hidden state and output are calculated as
In other words, we obtain the the output from the hidden state.
According to Wiki, the RNN architecture can be unfolded like this:
And the code I have been using is like:
class Model(nn.Module):
def __init__(self, input_size, output_size, hidden_dim, n_layers):
super(Model, self).__init__()
self.hidden_dim = hidden_dim
self.rnn = nn.RNN(input_size, hidden_dim, 1)
self.fc = nn.Linear(hidden_dim, output_size)
def forward(self, x):
batch_size = x.size(0)
out, hidden = self.rnn(x)
# getting output from the hidden state
out = out..view(-1, self.hidden_dim)
out = self.fc(out)
return out, hidden
RNN as "pure" feed-forward layers: but today, I see another implementation from the Pytorch Tutorial
And their code is like
class RNN(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(RNN, self).__init__()
self.hidden_size = hidden_size
self.i2h = nn.Linear(input_size + hidden_size, hidden_size)
self.i2o = nn.Linear(input_size + hidden_size, output_size)
self.softmax = nn.LogSoftmax(dim=1)
def forward(self, input, hidden):
combined = torch.cat((input, hidden), 1)
hidden = self.i2h(combined)
output = self.i2o(combined)
output = self.softmax(output)
return output, hidden
def initHidden(self):
return torch.zeros(1, self.hidden_size)
The hidden layer calculation is same as the standard interpretation, but the output is is calculated independently from the current hidden state h.
To me, the math behind this implementation is:
So, this implementation is different from the original RNN implementation?
I have been using RNN for almost 1 year and I thought I understand it, not until today when I see this post from Pytorch. I am really confused now.

LSTM Architecture

i'm trying to understand if my two layer LSTM is like the one in the image. Someone can help me?
p.s. i have cleared mean pooling layer.
input layer(green), LSTM layer hidden(blue), linear layer(yellow)
https://ibb.co/1qN6q1m
class RNN(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(RNN, self).__init__()
self.hidden_size = hidden_size
self.num_layer=2
self.i2h = nn.LSTM(input_size, hidden_size, self.num_layer, dropout=0.5)
self.i2o = nn.Linear(hidden_size, output_size)
self.softmax = nn.LogSoftmax(dim=1)
self.hidden = self.initHidden()
def forward(self, input):
lstmout, self.hidden= self.i2h(input.view(len(input), 1, -1), self.hidden)
output = self.i2o(lstmout.view(len(input), -1))
output = self.softmax(output)
return output
def initHidden(self):
return (torch.zeros(self.num_layer, 1, self.hidden_size).to(device),torch.zeros(self.num_layer, 1, self.hidden_size).to(device))

Resources