I am trying to introduce sparsity in the training samples. My data matrix has a size of (say) NxP and I want to pass it through a layer (keras layer) which has weights of size same as the input size. That is trainable weight matrix W has a shape of NxP. I want to do an hadamard product (element-wise multiplication) of Input matrix to this layer. W multiplied element-wise with input. How to get a trainable layer for W in this case ?
EDIT:
By the way, thank you so much for the quick reply. However, the hadamard product I want to do is between two matrices, one is the input, lets call it X and my X is shape of NxP. And I want my kernel in the hadamard layer to be the same size as X. So kernel should have a size of NxP too. And element wise multiplication of two matrices is achived by the call function.
But the current implementation gives the kernel size as P only. Also,I tried changing the shape of the kernel in the build as follows:
self.kernel = self.add_weight(name='kernel',
shape=input_shape,
initializer='uniform',
trainable=True)
But it gives me the error below:
TypeError: Failed to convert object of type to Tensor. Contents: (None, 16). Consider casting elements to a supported type.
Here P is 16 and I will get my N during the runtime and N is similar to the number of training samples.
Thank you in advance for the help.
Take the example of the documentation to create a layer, and in the call function just define it to be x * self.kernel.
This is my POC:
from keras import backend as K
from keras.engine.topology import Layer
from keras.models import Sequential
from keras.layers import Dense, Activation
import numpy as np
np.random.seed(7)
class Hadamard(Layer):
def __init__(self, **kwargs):
super(Hadamard, self).__init__(**kwargs)
def build(self, input_shape):
# Create a trainable weight variable for this layer.
self.kernel = self.add_weight(name='kernel',
shape=(1,) + input_shape[1:],
initializer='uniform',
trainable=True)
super(Hadamard, self).build(input_shape) # Be sure to call this somewhere!
def call(self, x):
print(x.shape, self.kernel.shape)
return x * self.kernel
def compute_output_shape(self, input_shape):
print(input_shape)
return input_shape
N = 10
P = 64
model = Sequential()
model.add(Dense(128, input_shape=(N, P), activation='relu'))
model.add(Dense(64))
model.add(Hadamard())
model.add(Activation('relu'))
model.add(Dense(32))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
print(model.summary())
model.fit(np.ones((10, N, P)), np.ones((10, N, 1)))
print(model.predict(np.ones((20, N, P))))
If you need to use it as the first layer you should include the input shape parameter:
N = 10
P = 64
model = Sequential()
model.add(Hadamard(input_shape=(N, P)))
model.compile(loss='mean_squared_error', optimizer='adam')
print(model.summary())
This results in:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
hadamard_1 (Hadamard) (None, 10, 64) 640
=================================================================
Total params: 640
Trainable params: 640
Non-trainable params: 0
Related
I am trying to implement Bayesian CNN using Mc Dropout on Pytorch,
the main idea is that by applying dropout at test time and running over many forward passes , you get predictions from a variety of different models.
I’ve found an application of the Mc Dropout and I really did not get how they applied this method and how exactly they did choose the correct prediction from the list of predictions
here is the code
def mcdropout_test(model):
model.train()
test_loss = 0
correct = 0
T = 100
for data, target in test_loader:
if args.cuda:
data, target = data.cuda(), target.cuda()
data, target = Variable(data, volatile=True), Variable(target)
output_list = []
for i in xrange(T):
output_list.append(torch.unsqueeze(model(data), 0))
output_mean = torch.cat(output_list, 0).mean(0)
test_loss += F.nll_loss(F.log_softmax(output_mean), target, size_average=False).data[0] # sum up batch loss
pred = output_mean.data.max(1, keepdim=True)[1] # get the index of the max log-probability
correct += pred.eq(target.data.view_as(pred)).cpu().sum()
test_loss /= len(test_loader.dataset)
print('\nMC Dropout Test set: Average loss: {:.4f}, Accuracy: {}/{} ({:.2f}%)\n'.format(
test_loss, correct, len(test_loader.dataset),
100. * correct / len(test_loader.dataset)))
train()
mcdropout_test()
I have replaced
data, target = Variable(data, volatile=True), Variable(target)
by adding
with torch.no_grad(): at the beginning
And this is how I have defined my CNN
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 192, 5, padding=2)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(192, 192, 5, padding=2)
self.fc1 = nn.Linear(192 * 8 * 8, 1024)
self.fc2 = nn.Linear(1024, 256)
self.fc3 = nn.Linear(256, 10)
self.dropout = nn.Dropout(p=0.3)
nn.init.xavier_uniform_(self.conv1.weight)
nn.init.constant_(self.conv1.bias, 0.0)
nn.init.xavier_uniform_(self.conv2.weight)
nn.init.constant_(self.conv2.bias, 0.0)
nn.init.xavier_uniform_(self.fc1.weight)
nn.init.constant_(self.fc1.bias, 0.0)
nn.init.xavier_uniform_(self.fc2.weight)
nn.init.constant_(self.fc2.bias, 0.0)
nn.init.xavier_uniform_(self.fc3.weight)
nn.init.constant_(self.fc3.bias, 0.0)
def forward(self, x):
x = self.pool(F.relu(self.dropout(self.conv1(x)))) # recommended to add the relu
x = self.pool(F.relu(self.dropout(self.conv2(x)))) # recommended to add the relu
x = x.view(-1, 192 * 8 * 8)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(self.dropout(x)))
x = self.fc3(self.dropout(x)) # no activation function needed for the last layer
return x
Can anyone help me to get the right implementation of the Monte Carlo Dropout method on CNN?
Implementing MC Dropout in Pytorch is easy. All that is needed to be done is to set the dropout layers of your model to train mode. This allows for different dropout masks to be used during the different various forward passes. Below is an implementation of MC Dropout in Pytorch illustrating how multiple predictions from the various forward passes are stacked together and used for computing different uncertainty metrics.
import sys
import numpy as np
import torch
import torch.nn as nn
def enable_dropout(model):
""" Function to enable the dropout layers during test-time """
for m in model.modules():
if m.__class__.__name__.startswith('Dropout'):
m.train()
def get_monte_carlo_predictions(data_loader,
forward_passes,
model,
n_classes,
n_samples):
""" Function to get the monte-carlo samples and uncertainty estimates
through multiple forward passes
Parameters
----------
data_loader : object
data loader object from the data loader module
forward_passes : int
number of monte-carlo samples/forward passes
model : object
keras model
n_classes : int
number of classes in the dataset
n_samples : int
number of samples in the test set
"""
dropout_predictions = np.empty((0, n_samples, n_classes))
softmax = nn.Softmax(dim=1)
for i in range(forward_passes):
predictions = np.empty((0, n_classes))
model.eval()
enable_dropout(model)
for i, (image, label) in enumerate(data_loader):
image = image.to(torch.device('cuda'))
with torch.no_grad():
output = model(image)
output = softmax(output) # shape (n_samples, n_classes)
predictions = np.vstack((predictions, output.cpu().numpy()))
dropout_predictions = np.vstack((dropout_predictions,
predictions[np.newaxis, :, :]))
# dropout predictions - shape (forward_passes, n_samples, n_classes)
# Calculating mean across multiple MCD forward passes
mean = np.mean(dropout_predictions, axis=0) # shape (n_samples, n_classes)
# Calculating variance across multiple MCD forward passes
variance = np.var(dropout_predictions, axis=0) # shape (n_samples, n_classes)
epsilon = sys.float_info.min
# Calculating entropy across multiple MCD forward passes
entropy = -np.sum(mean*np.log(mean + epsilon), axis=-1) # shape (n_samples,)
# Calculating mutual information across multiple MCD forward passes
mutual_info = entropy - np.mean(np.sum(-dropout_predictions*np.log(dropout_predictions + epsilon),
axis=-1), axis=0) # shape (n_samples,)
Moving on to the implementation which is posted in the question above, multiple predictions from T different forward passes are obtained by first setting the model to train mode (model.train()). Note that this is not desirable because unwanted stochasticity will be introduced in the predictions if there are layers other than dropout such as batch-norm in the model. Hence the best way is to just set the dropout layers to train mode as shown in the snippet above.
I don't know the name of what I'm looking for, but I want to make a layer in keras where each input is multiplied by its own, independent weight and bias. E.g. if there were 10 inputs, there would be 10 weights, and 10 biases, and each input would be multiplied by its weight and summed with its bias to get 10 outputs.
For example here is a simple Dense network:
from keras.layers import Input, Dense
from keras.models import Model
N = 10
input = Input((N,))
output = Dense(N)(input)
model = Model(input, output)
model.summary()
As you can see, this model has 110 parameters, because it is fully connected:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) (None, 10) 0
_________________________________________________________________
dense_2 (Dense) (None, 10) 110
=================================================================
Total params: 110
Trainable params: 110
Non-trainable params: 0
_________________________________________________________________
I want to replace output = Dense(N)(input) with something like output = SinglyConnected()(input), such that the model now has 20 parameters: 10 weights and 10 Biases.
Create a custom layer:
class SingleConnected(Layer):
#creator
def __init__(self, **kwargs):
super(SingleConnected, self).__init__(**kwargs)
#creates weights
def build(self, input_shape):
weight_shape = (1,) * (len(input_shape) - 1)
weight_shape = weight_shape + (input_shape[-1]) #(....., input)
self.kernel = self.add_weight(name='kernel',
shape=weight_shape,
initializer='uniform',
trainable=True)
self.bias = self.add_weight(name='bias',
shape=weight_shape,
initializer='zeros',
trainable=True)
self.built=True
#operation:
def call(self, inputs):
return (inputs * self.kernel) + self.bias
#output shape
def compute_output_shape(self, input_shape):
return input_shape
#for saving the model - only necessary if you have parameters in __init__
def get_config(self):
config = super(SingleConnected, self).get_config()
return config
Use the layer:
model.add(SingleConnected())
Following the "Temporal Encoding" section on page 5 of https://arxiv.org/pdf/1503.08895.pdf (an excellent paper by the way), I have say N many embedded vectors of dimension M. So my Keras tensor is (batch size, N, M) and I want to add an N by M matrix of weights to each of the batch-size-many samples. To that end I've created my own Keras layer:
from constants import BATCH_SIZE
class Added_Weights(Layer):
def __init__(self, input_dim, output_dim, **kwargs):
self.output_dim = output_dim
self.input_dim = input_dim
super(Added_Weights, self).__init__(**kwargs)
def build(self, input_shape):
# Create a trainable weight variable for this layer.
self.kernel = self.add_weight(name='kernel',
shape=(BATCH_SIZE, self.input_dim[0], self.input_dim[1]),
initializer=RandomNormal(mean=0., stddev=0.05, seed=None),
trainable=True)
print("kernel has shape "+self.kernel.shape + " or "+K.int_shape(self.kernel))
super(Added_Weights, self).build(input_shape)
def call(self, x, **kwargs):
return Add()([x, self.kernel])
def compute_output_shape(self, input_shape):
return (BATCH_SIZE, self.input_dim[0], self.input_dim[1])
And this WORKS, but the problem is that each of the BATCH_SIZE many matrices has DifferenT weights. I need to be adding the same weights to each of the samples in the batch.
So I've tried a couple things. Keras has a built in RepeatVector layer, so I tried giving the kernel shape (N, M) and doing RepeatVector (BATCH_SIZE)(kernel), but for some reason that ends up with shape (N, BATCH_SIZE, M). I'd like to use a Reshape there, but Reshape() treats the first dimension as the batch_size and won't allow me to modify it. Permute() has the same problem.
Another thought was to make the initial shape as it is in the code, and then loop over the tensor to set slices 1 through BATCH_SIZE-1 equal to slice 0, so they're all holding the same weights, but I'm not allowed to assign values to Keras tensors that way.
The only other thought I had was to just try it with shape (N, M) and hope Keras is smart enough to add it to each slice of the input, but after the Add() is applied to my (?, N, M) and the (N, M) kernel, somehow I end up with an (N, N, M) tensor, at which point we're dead.
I think you are overcomplicating things. Just define the weights as a N x M tensor in build and perform a sum with the input tensor in call. I tweaked your code as follows:
from keras.engine.topology import Layer
from keras.models import Model
from keras.layers import Input
import numpy as np
N = 3
M = 4
BATCH_SIZE = 1
class Added_Weights(Layer):
def __init__(self, **kwargs):
super(Added_Weights, self).__init__(**kwargs)
def build(self, input_shape):
# Create a trainable weight variable for this layer.
self.kernel = self.add_weight(name='kernel',
shape=(input_shape[1], input_shape[2]),
initializer='ones', # TODO: Choose your initializer
trainable=True)
super(Added_Weights, self).build(input_shape)
def call(self, x, **kwargs):
# Implicit broadcasting occurs here.
# Shape x: (BATCH_SIZE, N, M)
# Shape kernel: (N, M)
# Shape output: (BATCH_SIZE, N, M)
return x + self.kernel
def compute_output_shape(self, input_shape):
return input_shape
a = Input(shape=(N, M))
layer = Added_Weights()(a)
model = Model(inputs=a,
outputs=layer)
a = np.zeros(shape=(BATCH_SIZE, N, M))
pred = model.predict(a)
print(pred)
Note that self.kernel is being implicitly broadcast in call to match the shape of x, so the same weights are being added to each sample in the batch.
I have created the following SimpleRNN using Keras:
X = X.reshape((X.shape[0], X.shape[1], 1))
tr_X, ts_X, tr_y, ts_y = train_test_split(X, y, train_size=.8)
batch_size = 1000
print('RNN model...')
model = Sequential()
model.add(SimpleRNN(64, activation='relu', batch_input_shape=(batch_size, X.shape[1], 1)))
model.add(Dense(1, activation='relu'))
print('Training...')
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print (model.summary())
print ('\n')
model.fit(tr_X, tr_y,
batch_size=batch_size, epochs=1,
shuffle=True, validation_data=(ts_X, ts_y))
For the model summary, I get the following:
Layer (type) Output Shape Param #
=================================================================
simple_rnn_1 (SimpleRNN) (1000, 64) 4224
_________________________________________________________________
dense_1 (Dense) (1000, 1) 65
=================================================================
Total params: 4,289
Trainable params: 4,289
Non-trainable params: 0
_________________________________________________________________
Given that I have a dataset of 10,000 samples and 64 features. My goal is to generate a classification model by training it using this dataset (class labels are binary 0 and 1). Now, I am trying to understand what is going on here. As seen in 'Output Shape' column, the simple_rnn_1 has (1000, 64). I interpret it as 1000 rows (which is the batch) and 64 features. Assuming the code above is logically correct, my questions is:
How does RNN handle this matrix (i.e., (1000,64))? Does it input
each column something like this figure?
Should SimpleRNN() units always be equal to the number of features?
Thank you
In the code, you defined batch_input_shape to be with shape: (batch_size, X.shape[1], 1)
which means that you will insert to the RNN, batch_size examples, each example contains X.shape[1] time-stamps (number of pink boxes in your image) and each time-stamp is shape 1 (scalar).
So yes, input shape of (1000,64,1) will be exactly like you said - each column will be input to the RNN.
No! units will be your output dim. Usually more units means more complex network (just like in regular neural network) -> more parameters to learn.
Units will be the shape of the RNN's internal state.
(So, in your example, if you declare units=2000 your output will be (1000,2000).)
I have designed a layer in Keras. This is the first layer of the network. The input to this layer must be an RGB image ie of shape (height , width , 3). However when i run the code , i get the following error.
ValueError: Layer sequential_1 was called with an input that isn't a symbolic tensor. Received type: . Full input: [<main.CountPix object at 0x7fa9a5e81518>]. All inputs to the layer should be tensors.
How should I input my image or what should I modify in my layer?
class CountPix(Layer):
def __init__(self, **kwargs):
super(CountPix, self).__init__(**kwargs)
def build(self, input_shape):
# Create a trainable weight variable for this layer.
self.kernel = self.add_weight(name='kernel', shape=((200,200,3)),initializer='uniform',trainable=True)
super(MyLayer, self).build(input_shape) # Be sure to call this somewhere!
You need to define an input.
from keras.layers import Input
input_X = Input(shape=(height, width, 3), dtype='float32', name='input_image')
Also, if your self.kernel line you need to explicitly tell keras it is has an input shape similar to this example:
from keras.layers import Dense
from keras.models import Sequential
model = Sequential()
model.add(Dense(32, input_shape=(height, width, 3)))