Replacing layer weights with torch.sparse tensors - pytorch

Let's say I have a layer defined C = nn.Conv2d(1,3,3, bias=False), ie, 1 input channel, 3 output channels and a kernel size of 3x3. The internal weight of this layer is thus a tensor of shape (3,1,3,3); I can access this with C.weight.data. Now suppose that this internal weight is very sparse; it's full of zeros and has only a few nonzero values. I can easily construct a sparse tensor from the weight by:
idx = C.weight.data.nonzero().T
values = C.weight.data[C.weight.data!=0]
sp_T = torch.sparse.FloatTensor(idx, values, C.weight.data.size())
Is it possible to store the conv layer's weights as this sparse tensor somehow? I tried simply doing C.weight.data = sp_T but it throws an error. It would be pretty convenient if we could store all the weights in a model in this sparsified way.

Related

CrossEntropyLoss for multi-label time series

I'm confused on how to apply cross entropy loss for my time series model where the output is in the shape of [batch_size, classes, time_steps] and target of shape [batch_size, time_steps, classes]. I'm trying to made the model determine the confidence of the 16 classes at each timesteps. By using the following approach, I get a large loss and the model doesn't seems to be learning:
batch_size = 256
time_steps = 224
classes = 16
y_est = torch.randn((batch_size, classes, time_steps))
y_true = torch.randn((batch_size, time_steps, classes)).view(batch_size, classes, -1)
loss = torch.nn.functional.cross_entropy(y_est, y_true)
Do you think I've made a mistake here?
Pytorch documentation for CrossEntropyLoss:
Input shape: (N, C, d1,...dk)
Output shape: (N, d1,...dk)
Where N is the batch size, and C is the number of classes, with K >= 1 in the case of K-dimensional loss.
So based on the docs, the code should be
batch_size = 256
time_steps = 224
classes = 16
y_est = torch.randn((batch_size, classes, time_steps))
y_true = torch.randn((batch_size, time_steps))
loss = torch.nn.functional.cross_entropy(y_est, y_true)
As #Hatem described, your target tensor should have one dimension less than the predicted tensor because its representation is not a one-hot-encoding but rather a dense encoding (the values represent the class label itself). Whereas your prediction tensor will contain a probability distribution across all possible classes.
So here since your prediction tensor y_est is shaped (batch_size, classes, time_steps), then your target tensor should have a shape of (batch_size, time_steps). If your target is in one-hot-encoding format, you can easily switch back to the required format by applying torch.argmax:
loss = F.cross_entropy(y_est, y_true.argmax(1))

Graph disconnected: cannot obtain value for tensor Tensor

I have to train a GAN network with Generator and Discriminator. My Generator Network is as below.
def Generator(image_shape=(512,512,3):
inputs = Input(image_shape)
# 5 convolution Layers
# 5 Deconvolution Layers along with concatenation
# output shape is (512,512,3)
model=Model(inputs=inputs,outputs=outputs, name='Generator')
return model, output
My Discriminator Network is as below. The first step in Discriminator network is that I have to concatenate the input of discriminator with output of Generator.
def Discriminator(Generator_output, image_shape=(512,512,3)):
inputs=Input(image_shape)
concatenated_input=concatenate([Generator_output, inputs], axis=-1)
# Now start applying Convolution Layers on concatenated_input
# Deconvolution Layers
return Model(inputs=inputs,outputs=outputs, name='Discriminator')
Initiating the Architectures
G, Generator_output=Generator(image_shape=(512,512,3))
G.summary
D=Discriminator(Generator_output, image_shape=(512,512,3))
D.summary()
My Problem is when I pass concatenated_input to convolution layers it gets me the following error.
Graph disconnected: cannot obtain value for tensor Tensor("input_1:0", shape=(?, 512, 512, 3), dtype=float32) at layer "input_1". The following previous layers were accessed without issue: []
If I remove the concatenation layer it works perfectly but why it's not working after concatenation layer although the shape of inputs and Generator_output in concatenation is also same i.e. (512,512,3).
The key insight that will help you here is that Models are just like layers in Keras but self contained. So to connect one model output to another, you need to say the second model receieves an input of matching shape rather than directly passing that tensor:
def Discriminator(gen_output_shape, image_shape=(512,512,3)):
inputs=Input(image_shape)
gen_output=Input(gen_output_shape)
concatenated_input=concatenate([gen_output, inputs], axis=-1)
# Now start applying Convolution Layers on concatenated_input
# Deconvolution Layers
return Model(inputs=[inputs, gen_output],outputs=outputs, name='Discriminator')
And then you can use it like a layer:
G=Generator(image_shape=(512,512,3))
D=Discriminator((512,512,3), image_shape=(512,512,3))
some_other_image_input = Input((512,512,3))
discriminator_output = D(some_other_image_input, G) # model is used like a layer
# so the output of G is connected to the input of D
D.summary()
gan = Model(inputs=[all,your,inputs], outputs=[outputs,for,training])
# you can still use G and D like separate models, save them, train them etc
To train them together you can create another Model that has all the required inputs, calls the generator / discriminator. Think of using a lock and key idea, every model has some inputs and you can use them like layers in another Model so long you provide the correct inputs.

Does 1D Convolutional layer support variable sequence lengths?

I have a series of processed audio files I am using as input into a CNN using Keras. Does the Keras 1D Convolutional layer support variable sequence lengths? The Keras documentation makes this unclear.
https://keras.io/layers/convolutional/
At the top of the documentation it mentions you can use (None, 128) for variable-length sequences of 128-dimensional vectors. Yet at the bottom it declares that the input shape must be a
3D tensor with shape: (batch_size, steps, input_dim)
Given the following example how should I input sequences of variable length into the network
Lets say I have two examples (a and b) containing X 1 dimensional vectors of length 100 that I want to feed into the 1DConv layer as input
a.shape = (100, 100)
b.shape = (200, 100)
Can I use an input shape of (2, None, 100)? Do I need to concatenate these tensors into c where
c.shape = (300, 100)
Then reshape it to be something
c_reshape.shape = (3, 100, 100)
Where 3 is the batch size, 100, is the number of steps, and the second 100 is the input size? The documentation on the input vector is not very clear.
Keras supports variable lengths by using None in the respective dimension when defining the model.
Notice that often input_shape refers to the shape without the batch size.
So, the 3D tensor with shape (batch_size, steps, input_dim) suits perfectly a model with input_shape=(steps, input_dim).
All you need to make this model accept variable lengths is use None in the steps dimension:
input_shape=(None, input_dim)
Numpy limitation
Now, there is a numpy limitation about variable lengths. You cannot create a numpy array with a shape that suits variable lengths.
A few solutions are available:
Pad your sequences with dummy values until they all reach the same size so you can put them into a numpy array of shape (batch_size, length, input_dim). Use Masking layers to disconsider the dummy values.
Train with separate numpy arrays of shape (1, length, input_dim), each array having its own length.
Group your images by sizes into smaller arrays.
Be careful with layers that don't support variable sizes
In convolutional models using variable sizes, you can't for instance, use Flatten, the result of the flatten would have a variable size if this were possible. And the following Dense layers would not be able to have a constant number of weights. This is impossible.
So, instead of Flatten, you should start using GlobalMaxPooling1D or GlobalAveragePooling1D layers.

How to handle variable shape bias in TensorFlow?

I was just modifying some an LSTM network I had written to print out the test error. The issues, I realized, is that the model I had defined depends on the batch size.
Specifically, the input is a tensor of shape [batch_size, time_steps, features]. The input enters the LSTM cell and the output, which I turn into a list of time_steps 2D tensors, with each 2D tensor having shape [batch_size, hidden_units]. Each 2D tensor is then multiplied by a weight vector of shape [hidden_units] to yield a vector of shape [batch_size] which has added to it a bias vector of shape [batch_size].
In words, I give the model N sequences, and I expect it to output a scalar for each time step for each sequence. That is, the output is a list of N vectors, one for each time step.
For training, I give the model batches of size 13. For the test data, I feed the entire data set, which consists of over 400 examples. Thus, an error is raised, since the bias has fixed shape batch_size.
I haven't found a way to make it's shape variable without raising an error.
I can add complete code if requested. Added code anyways.
Thanks.
def basic_lstm(inputs, number_steps, number_features, number_hidden_units, batch_size):
weights = {
'out': tf.Variable(tf.random_normal([number_hidden_units, 1]))
}
biases = {
'out': tf.Variable(tf.constant(0.1, shape=[batch_size, 1]))
}
lstm_cell = rnn.BasicLSTMCell(number_hidden_units)
init_state = lstm_cell.zero_state(batch_size, dtype=tf.float32)
hidden_layer_outputs, states = tf.nn.dynamic_rnn(lstm_cell, inputs,
initial_state=init_state, dtype=tf.float32)
results = tf.squeeze(tf.stack([tf.matmul(output, weights['out'])
+ biases['out'] for output
in tf.unstack(tf.transpose(hidden_layer_outputs, (1, 0, 2)))], axis=1))
return results
You want the biases to be a shape of (batch_size, )
For example (using zeros instead of tf.constant but similar problem), I was able to specify the shape as a single integer:
biases = tf.Variable(tf.zeros(10,dtype=tf.float32))
print(biases.shape)
prints:
(10,)

What is an Embedding in Keras?

Keras documentation isn't clear what this actually is. I understand we can use this to compress the input feature space into a smaller one. But how is this done from a neural design perspective? Is it an autoenocder, RBM?
As far as I know, the Embedding layer is a simple matrix multiplication that transforms words into their corresponding word embeddings.
The weights of the Embedding layer are of the shape (vocabulary_size, embedding_dimension). For each training sample, its input are integers, which represent certain words. The integers are in the range of the vocabulary size. The Embedding layer transforms each integer i into the ith line of the embedding weights matrix.
In order to quickly do this as a matrix multiplication, the input integers are not stored as a list of integers but as a one-hot matrix. Therefore the input shape is (nb_words, vocabulary_size) with one non-zero value per line. If you multiply this by the embedding weights, you get the output in the shape
(nb_words, vocab_size) x (vocab_size, embedding_dim) = (nb_words, embedding_dim)
So with a simple matrix multiplication you transform all the words in a sample into the corresponding word embeddings.
The Keras Embedding layer is not performing any matrix multiplication but it only:
1. creates a weight matrix of (vocabulary_size)x(embedding_dimension) dimensions
2. indexes this weight matrix
It is always useful to have a look at the source code to understand what a class does. In this case, we will have a look at the class Embedding which inherits from the base layer class called Layer.
(1) - Creating a weight matrix of (vocabulary_size)x(embedding_dimension) dimensions:
This is occuring at the build function of Embedding:
def build(self, input_shape):
self.embeddings = self.add_weight(
shape=(self.input_dim, self.output_dim),
initializer=self.embeddings_initializer,
name='embeddings',
regularizer=self.embeddings_regularizer,
constraint=self.embeddings_constraint,
dtype=self.dtype)
self.built = True
If you have a look at the base class Layer you will see that the function add_weight above simply creates a matrix of trainable weights (in this case of (vocabulary_size)x(embedding_dimension) dimensions):
def add_weight(self,
name,
shape,
dtype=None,
initializer=None,
regularizer=None,
trainable=True,
constraint=None):
"""Adds a weight variable to the layer.
# Arguments
name: String, the name for the weight variable.
shape: The shape tuple of the weight.
dtype: The dtype of the weight.
initializer: An Initializer instance (callable).
regularizer: An optional Regularizer instance.
trainable: A boolean, whether the weight should
be trained via backprop or not (assuming
that the layer itself is also trainable).
constraint: An optional Constraint instance.
# Returns
The created weight variable.
"""
initializer = initializers.get(initializer)
if dtype is None:
dtype = K.floatx()
weight = K.variable(initializer(shape),
dtype=dtype,
name=name,
constraint=constraint)
if regularizer is not None:
with K.name_scope('weight_regularizer'):
self.add_loss(regularizer(weight))
if trainable:
self._trainable_weights.append(weight)
else:
self._non_trainable_weights.append(weight)
return weight
(2) - Indexing this weight matrix
This is occuring at the call function of Embedding:
def call(self, inputs):
if K.dtype(inputs) != 'int32':
inputs = K.cast(inputs, 'int32')
out = K.gather(self.embeddings, inputs)
return out
This functions returns the output of the Embedding layer which is K.gather(self.embeddings, inputs). What tf.keras.backend.gather exactly does is to index the weights matrix self.embeddings (see build function above) according to the inputs which should be lists of positive integers.
These lists can be retrieved for example if you pass your text/words inputs to the one_hot function of Keras which encodes a text into a list of word indexes of size n (this is NOT one hot encoding - see also this example for more info: https://machinelearningmastery.com/use-word-embedding-layers-deep-learning-keras/).
Therefore, that's all. There is no matrix multiplication.
On the contrary, the Keras Embedding layer is only useful because exactly it avoids performing a matrix multiplication and hence it economizes on some computational resources.
Otherwise, you could just use a Keras Dense layer (after you have encoded your input data) to get a matrix of trainable weights (of (vocabulary_size)x(embedding_dimension) dimensions) and then simply do the multiplication to get the output which will be exactly the same with the output of the Embedding layer.
In Keras, the Embedding layer is NOT a simple matrix multiplication layer, but a look-up table layer (see call function below or the original definition).
def call(self, inputs):
if K.dtype(inputs) != 'int32':
inputs = K.cast(inputs, 'int32')
out = K.gather(self.embeddings, inputs)
return out
What it does is to map each a known integer n in inputs to a trainable feature vector W[n], whose dimension is the so-called embedded feature length.
In simple words (from the functionality point of view), it is a one-hot encoder and fully-connected layer. The layer weights are trainable.

Resources