Keras - passing different parameter for different data point onto Lambda Layer - python-3.x

I am working on a CNN model in Keras/TF background. At the end of final convolutional layer, I need to pool the output maps from the filters. Instead of using GlobalAveragePooling or any other sort of pooling, I had to pool according to time frames which exist along the width of the output map.
So if a sample output from one filter is let's say n x m, n being time frames and m outputs along the features. Here I just need to pool output from frames n1 to n2 where n1 and n2 <= n. So my output slice is (n2-n1)*m, on which I will apply pooling. I came across Lambda Layer of keras to do this. But I am stuck at a point where n1 and n2 will be different for each points. So my question is how can pass a custom argument for each data point onto a Lambda Layer? or am I approaching this in a wrong way?
A sample snippet:
# for slicing a tensor
def time_based_slicing(x, crop_at):
dim = x.get_shape()
len_ = crop_at[1] - crop_at[0]
return tf.slice(x, [0, crop_at[0], 0, 0], [1, len_, dim[2], dim[3]])
# for output shape
def return_out_shape(input_shape):
return tuple([input_shape[0], None, input_shape[2], input_shape[3]])
# lambda layer addition
model.add(Lambda(time_based_slicing, output_shape=return_out_shape, arguments={'crop_at': (2, 5)}))
The above argument crop_at needs to be custom for each data point when fitting in a loop. Any pointers/clues to this will be helpful.

Given that you know the indices of the time frames that belong to each datapoint from before, you can store them in a text file and pass them as an additional Input to your model:
slice_input = Input((2,))
And use those in your time_based_slicing function.

Switch from Sequential API - it starts to fall apart when you need to use multiple inputs: use Functional API https://keras.io/models/model/
Assuming that your lambda functions are correct:
def time_based_slicing(inputs_list):
x, crop_at = inputs_list
... (will probably need to do some work to subset crop_at since it will be a tensor now instead of constants
inp = Input(your_shape)
inp_additional = Inp((2,)
x=YOUR_CNN_LOGIC(inp)
out = Lambda(time_based_slicing)([x,inp_additional])

Related

How to get a 2D output from linear layer in pytorch?

I would like to project a tensor into a space with an additional dimension.
I tried
torch.nn.Linear(
in_features=num_inputs,
out_features=(num_inputs, num_additional),
)
But this results in an error
A workaround would be to
torch.nn.Linear(
in_features=num_inputs,
out_features=num_inputs*num_additional,
)
and then change the view the output
output.view(batch_size, num_inputs, num_additional)
But I imagine this workaround will get tricky to read, especially when a projection into more than one additional dimension is desired.
Is there a more direct way to code this operation?
Perhaps the source code for linear can be changed
https://pytorch.org/docs/stable/_modules/torch/nn/modules/linear.html#Linear
To accept more dimensions for the weight and bias initialization, and F.linear seems like it would need to be replaced with a different function.
IMO the workaround you provided is already clear enough. However, if you want to express this as a single operation, you can always write your own module by subclassing torch.nn.Linear:
import numpy as np
import torch
class MultiDimLinear(torch.nn.Linear):
def __init__(self, in_features, out_shape, **kwargs):
self.out_shape = out_shape
out_features = np.prod(out_shape)
super().__init__(in_features, out_features, **kwargs)
def forward(self, x):
out = super().forward(x)
return out.reshape((len(x), *self.out_shape))
if __name__ == '__main__':
tmp = torch.empty((32, 10))
linear = MultiDimLinear(in_features=10, out_shape=(10, 10))
out = linear(tmp)
print(out.shape) # (32, 10, 10)
Another way would be to use torch.einsum
https://pytorch.org/docs/stable/generated/torch.einsum.html
torch.einsum can prevent summation across dimensions in tensor to tensor multiplication operations. This can allow separate multiplication operations to happen in parallel. [ I do not know if this would necessarily result in GPU efficiency; if the operations are still occurring in the same kernel. In fact, it may be slower https://github.com/pytorch/pytorch/issues/32591 ]
How this would work is to directly initialize the weight and bias tensors (look at source code for the torch linear layer for that code)
Say that the input (X) has dimensions (a, b), where a is the batch size.
Say that you want to pass this input through a series of classifiers, represented in a single weight tensor (W) with dimensions (c, d, e), where c is the number of classifiers, and e is the number of classes for the classifier
import torch
x = torch.arange(2*4).view(2, 4)
w = torch.arange(5*4*6).view(5, 4, 2)
torch.einsum('ab, cbe -> ace', x, w)
in the last line, a and b are the dimensions of the input as mentioned above. What might be the tricky part is c, b, and e are the dimensions of the classifiers weight tensor; I didn't use d, I used b instead. That is because the vector multiplication is happening along that dimension for the inputs tensor and the weight tensor. So that's why the left side of the einsum equation is ab, cbe. The right side of the einsum equation is simply what dimensions to exclude from summation.
The final dimensions we want is (a, c, e). a is the batch size, c is the number of classifiers, and e is the number of classes for each classifier. We do not want to add those values, so to preserve their separation, the left side of the equation is ace.
For those unfamiliar with einsum, this will be harder to read than the word around I created (though I highly recommend learning it, because it gets very easy and intuitive very fast even though it's a bit tricky at first https://www.youtube.com/watch?v=pkVwUVEHmfI )
However, for paralyzing certain operations (especially on GPU), it seems that einsum is the only way to do it. For example so that in my previous example, I didn't want to use a classification head yet, I just wanted to project to multiple dimensions.
import torch
x = torch.arange(2*4).view(2, 4)
w = torch.arange(5*4*6).view(5, 4, 4)
y = torch.einsum('ab, cbe -> ace', x, w)
And say I do a few other operations to y, perhaps some non linear operations, activations, etc.
z = f(y)
z will still have the dimensions 2, 5, 4. Batch size two, 5 hidden states per batch, and the dimension of those hidden states are 4.
And then I want to apply a classifier to each separate tensor.
w2 = torch.arange(4*2).view(4, 2)
final = torch.einsum('fgh, hj -> fgj', z, w2)
Quick refresh, 2 is the batch size, 5 is the number of classifier, and 2 is the number of outputs for each classifier.
The output dimensions, f, g, j (2, 5, 2) will not be summed across, and thus will be preserved in the output.
As cited in the github link, this may be slower than just using regular linear layers. There may be efficiencies in a very large number of parallel operations.

Retrieve only the last hidden state from lstm layer in pytorch sequential

I have a pytorch model:
model = torch.nn.Sequential(
torch.nn.LSTM(40, 256, 3, batch_first=True),
torch.nn.Linear(256, 256),
torch.nn.ReLU()
)
And for the LSTM layer, I want to retrieve only the last hidden state from the batch to pass through the rest of the layers. Ex:
_, (hidden, _) = lstm(data)
hidden = hidden[-1]
Though, that example only works for a subclassed model. I need to somehow do this on a nn.Sequential() model that way when I save it, it can properly be converted to a tensorflow.js model. The reason I can't make and train this model in tensorflow.js is because I'm trying to implement this repo: Resemblyzer in tensorflow.js while still using the same weights as the pretrained Resemblyzer model which was made in pytorch as a subclassed model. I thought of using the torchvisions.transformations.Lambda() transformation but I would assume that would make it incompatible with tensorflow.js. Is there any way to make this possible while still allowing the model to convert properly?
You could split up your sequential but only doing so in the forward definition of your model on inference. Once defined:
model = nn.Sequential(nn.LSTM(40, 256, 3, batch_first=True),
nn.Linear(256, 256),
nn.ReLU())
You can split it:
>>> lstm, fc = model[0], model[1:]
Then infer in two steps:
>>> out, (hidden, _) = lstm(data)
>>> hidden = hidden[-1]
>>> out = fc(out) # <- or fc(out[-1]) depending on what you want
Though the answer is provided above, I thought of elaborating on the same as PyTorch LSTM documentation is confusing.
In TF, we directly get the last_state as the output. No further action needed.
Let us check the Torch output of LSTM:
There are 2 outputs - a sequence and a tuple. We are interested in the last state so we can ignore the sequence and focus on the tuple. The tuple consists of 2 values - the first is the hidden state of the last cell (of all layers in the LSTM) and the second is the cell state of the last cell (again of all layers in the LSTM). We are interested in the hidden state. So
_, tup = self.bilstm(inp)
We are interested in tup[0]. Let us dig further into this.
The shape of tup[0] is somewhat odd with batch size at the centre. On the left of the batch size is the number of layers in the LSTM (multiply 2 if is biLSTM). On the right is the dimension you have provided while defining the LSTM. You could take the output from the last layer by simply doing a tup[0][-1] which is the answer provided above.
Alternatively if you want to make use of hidden states across layers, you may try something like:
out = tup[0].swapaxes(0,1)
out = out.reshape(*out.shape[:-2], -1)
The first line produces shape of batch_size, num_layers, hidden_size_specified. The second line produces shape of batch_size, num_layers x hidden_size_specified
(For e.g., Let us say, yours is a biLSTM and you have 3 layers and your hiddensize is 100, you could choose to concatenate the output such that you get one vector of 2 x 3 x 100 = 600 dimensions and then run a simple linear layer on top of this to get the output you want.)
There is another way to get the output of the LSTM. We discussed that the first output of an LSTM is a sequence:
sequence, tup = self.bilstm(inp)
This sequence is the output of the LAST hidden layer of the LSTM. It is a sequence because it contains hidden states of EVERY cell in this layer. So its length will be the input sequence length that you have provided. We could choose to take the hidden state of the last element in the sequence by doing a:
#shape of sequence is: batch_size, seq_size, dim
sequence = sequence.swapaxes(0,1)
#shape of sequence is: seq_size, batch_size, dim
sequence = sequence[-1]
#shape of sequence is: batch_size, dim (ie last seq is taken)
Needless to say this will be the same value we got by taking the last layer from tup[0]. Well, not quite! If the LSTM is a biLSTM, then using the sequence approach returns is 2 x hidden_size dim output (which is correct) wheras using the tup[0][-1] approach will give us only hidden_size dim even for a biLSTM. OP's LSTM is a non-biLSTM so both answers hold true.

How (can?) I vectorize an NN training input, instead of using a for loop? Are there other large optimizations?

I am trying to create a simple neural network program/module. Because I am using Pythonista on an iPad Pro, the speed could use some improvement. My understanding is that for loops have a lot of overhead, so I was wondering if it is possible to train 50000 input:target sets using some form of vectorization.
I have been trying to find resources on how numpy arrays pass through functions, but it’s very difficult to wrap my head around. I have been trying to create one large array that holds the “train” function inputs as small lists, but the function is operating on the entire array, not the smaller ones individually.
# train function that accepts inputs and targets
# currently called inside a for loop
# input: [[a], [b]], target: [[c]]
def train(self, input_mat, target_mat):
# generate hidden layer neuron values
hidden = self.weights_in_hid.dot(input_mat)
hidden += self.bias_hid
hidden = sigmoid(hidden)
# generate output neuron values
output_mat = self.weights_hid_out.dot(hidden)
output_mat += self.bias_out
# activation function
output_mat = sigmoid(output_mat)
# more of the function continues
# ...
# Datum converts simple lists into numpy matrices, so they don’t have to be reinstantiated 50000 times
training_data = [
Datum([0, 0], [0]),
Datum([0, 1], [1]),
Datum([1, 0], [1]),
Datum([1, 1], [0]),
]
# ...
# XXX Does this for loop cause a lot of the slowdown?
for _ in range(50000):
datum = rd.choice(training_data)
brain.train(datum.inputs, datum.targets)
In the shown state everything works, but somewhat slowly. Whenever I try to pass all the data in one matrix, the function cannot vectorize it. It instead attempts to operate on the matrix as a whole, obviously raising an error on the first step, as the array is way too big.

How to fix 'TypeError: Object arrays are not currently supported' error in numpy python 3 (matrix multiplication)

I'm trying to make my own neural network "library" (if you can call it that) for myself to use, since I am hobby-learning about them.
I wrote this code that makes a propagatable neural network by feeding it a structure of the desired network, and it worked pretty well.
But then when I tried giving the model a different amount of nodes, the code BUGGED
I've already tried to edit the amount of nodes in each layer and see where that takes me, and I've found out that I only get this error when the first and the second layer have the same amount of nodes in them, but the output layer has a different amount. I've also tried to do the matrix multiplication of the structure that outputs the bug on paper, and it gave me an actual result (which I've double-checked for legitness a lot of times). So now I know that it has something to do with the practical and not theoretical.
There's clearly something wrong with the matrix multiplication, I think.
The script's functions
I had to include these functions in the question, so you can have a better inside on how this code works.
is_iterable()
This function returns a boolean value that describes if the input is iterable
def is_iterable(x):
try:
x[0]
return True
except:
return False
blueprint()
This function returns a copy of the input array but changes the elements that aren't iterable to 0's
def blueprint(x):
return [blueprint(e) if is_iterable(e) else 0 for e in x]
build()
This function takes a model of your desired neural network structure as input, and outputs suited randomized biases and weights seperated in two different arrays
The 'randomize()' function returns a copy of the input array but changes the elements that aren't iterable to random floats between -1's and 1's.
The 'build-weights()' function returns randomized weights based on a model of a neural network.
def build(x):
def randomize(x):
return np.array([randomize(n) if type(n) is list else random.uniform(-1, 1) for n in x])
def build_weighs(x):
y = []
for i, l in enumerate(x):
if i == len(x) - 1:
break
y.append([randomize(x[i + 1]) for n in l])
return np.array(y)
return (randomize(x), build_weighs(x))
apply_funcs()
This function applies a list of functions to another list of functions and then returns them. If the function list contains a 0, an element from the other list positioned in the same place will not be applied to any function.
def apply_funcs(x, f):
y = x
i = 0
for xj, fj in zip(x, f):
if fj == 0:
y[i] = xj
else:
y[i] = fj(xj)
i += 1
return y
nn()
This is the class for making a neural network.
You can see that it has a function named, 'prop' for the forward propagation of the network.
class nn:
def __init__(self, structure, a_funcs=None):
self.structure = structure
self.b = np.array(structure[0])
self.w = np.array(structure[1])
if a_funcs == None:
a_funcs = blueprint(self.b)
self.a_funcs = np.array(a_funcs).
def prop(self, x):
y = np.array(x)
if y.shape != self.b[0].shape:
raise ValueError("The input needs to be intact with the Input Nodes\nInput: {} != Input Nodes: {}".format(blueprint(y), blueprint(self.b[0])))
wi = 0
# A loop through the layers of the neural network
for i in range(len(self.b)):
# This if statement is here so that the weights get applied in the right order
if i != 0:
y = np.matmul(y, self.w[wi])
wi += 1
# Applying the biases of layer i to the current information
y = np.add(y, self.b[i])
# Applying the activation functions to the current information
y = apply_funcs(y, self.a_funcs[i])
return y
Defining a neural network structure and propagating it
n is containing the structure which is a 3 layer network containing respectively 2 nodes, 2 nodes and 3 nodes.
n = [[0] * 2, [0] * 2, [0] * 3]
bot = nn(build(n))
print(bot.prop([1] * 2))
When I do this I expect the code to output an array of three semi-random numbers like this:
[-0.55889818 0.62762604 0.59222784]
but instead I get an error from numpy saying this:
File "C:\Users\Black\git\Changbot\oper.py.py", line 78, in prop
y = np.matmul(y, self.w[wi])
TypeError: Object arrays are not currently supported
And the weirdest thing about this is that (as I said earlier) I only get this error when the first and the second layer have the same amount of nodes in them, but the output layer has a different amount. All the other times I get the expected output...
I have now again checked the values that are causing this error and I don't see any objects other than a list. It's the same when it's not bugging...
So I added this try-except statement:
try:
y = np.matmul(np.array(y), self.w[wi])
except TypeError:
print("y:{}\nself.w[wi]:{}".format(y, self.w[wi]))
It then outputs this:
y:[1.6888437]
self.w[wi]:[array([-0.19013173])]
Which should have the ability to be multiplied with each other
I have even tried copy pasting the values into an interpreter and multiplying them there, and it works there...
NOTE: THIS IS A VERY BAD TEST AS THE COPY PASTE ARRAYS DOESN'T HAVE THE SAME DTYPES AS THE ACTUAL ARRAYS
np.matmul([1.6888437], [np.array([-0.19013173])])
Output for the above:
[-0.32110277]
After looking at the answers
Okay. I have now found out that the object dtype arrays lies in the structure of the neural network by doing this at the end of the script:
print("STRUCTURE:{}".format(n))
It then outputs this:
STRUCTURE:(array([array([0.6888437]), array([ 0.51590881, -0.15885684]),
array([-0.4821665 , 0.02254944, -0.19013173])], dtype=object), array([list([array([ 0.56759718, -0.39337455])]),
list([array([-0.04680609, 0.16676408, 0.81622577]), array([ 0.00937371, -0.43632431, 0.51160841])])],
dtype=object))
Solving the bug
I can understand from one of the answer to this post that np.array() tries to create as high a dimensional array as it can, and failing that falls back on object dtype (or for some combinations of inputs raises an error).
The object dtype gets created in the build() function so I tried to remove all np.array() functions in that. Actually i removed all of such from the whole script. And guess what? It worked! Thanks a 1000 times to you contributers!
Btw Happy New Year
Regarding your copy-paste testing:
In [55]: np.matmul([1.6888437], [np.array([-0.19013173])])
Out[55]: array([-0.32110277])
But this isn't what your code is using. Instead we have to make arrays that match in dtype.
In [59]: x = np.array([1.6888437]); y = np.array([np.array([-0.19013173]),None])[:1]
In [60]: x
Out[60]: array([1.6888437])
In [61]: y
Out[61]: array([array([-0.19013173])], dtype=object)
I used the None funny business to force it to create an object dtype containing an array, which will print as [array([-0.19013173])].
Now I get your error:
In [62]: np.matmul(x,y)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-62-b6212b061655> in <module>()
----> 1 np.matmul(x,y)
TypeError: Object arrays are not currently supported
Even if did work as with dot
In [66]: np.dot(x,y)
Out[66]: array([-0.32110277])
the calculations with object dtype arrays are slower.
I won't try to figure out why you have an object dtype array at this point. But I think you should avoid those in code where speed matters.
If you construct an array from arrays or lists that differ in size, the result is likely to be object dtype with a lower number of dimensions. np.array tries to create as high a dimensional array as it can, and failing that falls back on object dtype (or for some combinations of inputs raises an error).

How to correctly implement a batch-input LSTM network in PyTorch?

This release of PyTorch seems provide the PackedSequence for variable lengths of input for recurrent neural network. However, I found it's a bit hard to use it correctly.
Using pad_packed_sequence to recover an output of a RNN layer which were fed by pack_padded_sequence, we got a T x B x N tensor outputs where T is the max time steps, B is the batch size and N is the hidden size. I found that for short sequences in the batch, the subsequent output will be all zeros.
Here are my questions.
For a single output task where the one would need the last output of all the sequences, simple outputs[-1] will give a wrong result since this tensor contains lots of zeros for short sequences. One will need to construct indices by sequence lengths to fetch the individual last output for all the sequences. Is there more simple way to do that?
For a multiple output task (e.g. seq2seq), usually one will add a linear layer N x O and reshape the batch outputs T x B x O into TB x O and compute the cross entropy loss with the true targets TB (usually integers in language model). In this situation, do these zeros in batch output matters?
Question 1 - Last Timestep
This is the code that i use to get the output of the last timestep. I don't know if there is a simpler solution. If it is, i'd like to know it. I followed this discussion and grabbed the relative code snippet for my last_timestep method. This is my forward.
class BaselineRNN(nn.Module):
def __init__(self, **kwargs):
...
def last_timestep(self, unpacked, lengths):
# Index of the last output for each sequence.
idx = (lengths - 1).view(-1, 1).expand(unpacked.size(0),
unpacked.size(2)).unsqueeze(1)
return unpacked.gather(1, idx).squeeze()
def forward(self, x, lengths):
embs = self.embedding(x)
# pack the batch
packed = pack_padded_sequence(embs, list(lengths.data),
batch_first=True)
out_packed, (h, c) = self.rnn(packed)
out_unpacked, _ = pad_packed_sequence(out_packed, batch_first=True)
# get the outputs from the last *non-masked* timestep for each sentence
last_outputs = self.last_timestep(out_unpacked, lengths)
# project to the classes using a linear layer
logits = self.linear(last_outputs)
return logits
Question 2 - Masked Cross Entropy Loss
Yes, by default the zero padded timesteps (targets) matter. However, it is very easy to mask them. You have two options, depending on the version of PyTorch that you use.
PyTorch 0.2.0: Now pytorch supports masking directly in the CrossEntropyLoss, with the ignore_index argument. For example, in language modeling or seq2seq, where i add zero padding, i mask the zero padded words (target) simply like this:
loss_function = nn.CrossEntropyLoss(ignore_index=0)
PyTorch 0.1.12 and older: In the older versions of PyTorch, masking was not supported, so you had to implement your own workaround. I solution that i used, was masked_cross_entropy.py, by jihunchoi. You may be also interested in this discussion.
A few days ago, I found this method which uses indexing to accomplish the same task with a one-liner.
I have my dataset batch first ([batch size, sequence length, features]), so for me:
unpacked_out = unpacked_out[np.arange(unpacked_out.shape[0]), lengths - 1, :]
where unpacked_out is the output of torch.nn.utils.rnn.pad_packed_sequence.
I have compared it with the method described here, which looks similar to the last_timestep() method Christos Baziotis is using above (also recommended here), and the results are the same in my case.

Resources