Is there a way to compute a circulant matrix in Pytorch? - pytorch

I want a similar function as in https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.circulant.html to create a circulant matrix using PyTorch. I need this as a part of my Deep Learning model (in order to reduce over-parametrization in some of my Fully Connected layers as suggested in https://arxiv.org/abs/1907.08448 (Fig.3))
The input of the function shall be a 1D torch tensor, and the output should be the 2D circulant matrix.

You can make use of unfold to extract sliding windows. But to get the correct order you need to flip (later unflip) the tensors, and first concatenate the flipped tensor to itself.
circ=lambda v:torch.cat([f:=v.flip(0),f[:-1]]).unfold(0,len(v),1).flip(0)

Here is a generic function for pytorch tensors, to get the circulant matrix for one dimension. It's based on unfold and it works for 2d circulant matrix or high-dimension tensors.
def circulant(tensor, dim):
"""get a circulant version of the tensor along the {dim} dimension.
The additional axis is appended as the last dimension.
E.g. tensor=[0,1,2], dim=0 --> [[0,1,2],[2,0,1],[1,2,0]]"""
S = tensor.shape[dim]
tmp = torch.cat([tensor.flip((dim,)), torch.narrow(tensor.flip((dim,)), dim=dim, start=0, length=S-1)], dim=dim)
return tmp.unfold(dim, S, 1).flip((-1,))
Essentially, this is a PyTorch version of scipy.linalg.circulant and works for multi-dimension tensors.
Also a similar question: Create array/tensor of cycle shifted arrays

Related

How to handle samples with multiple images in a pytorch image processing model?

My model training involves encoding multiple variants of a same image then summing the produced representation over all variants for the image.
The data loader produces tensor batches of the shape: [batch_size,num_variants,1,height,width].
The 1 corresponds to image color channels.
How can I train my model with minibatches in pytorch?
I am looking for a proper way to forward all the batch_size×num_variant images through the network and summing the results over all groups of variants.
My current solution involves flattening the first two dimensions and doing a for loop to sum the representations, but I feel like there should be a better way an d I am not sure the gradients will remember everything.
Not sure I understood you correctly, but I guess this is what you want (say the batched image tensor is called image):
Nb, Nv, inC, inH, inW = image.shape
# treat each variant as if it's an ordinary image in the batch
image = image.reshape(Nb*Nv, inC, inH, inW)
output = model(image)
_, outC, outH, outW = output.shape[1]
# reshapes the output such that dim==1 indicates variants
output = output.reshape(Nb, Nv, outC, outH, outW)
# summing over the variants and lose the dimension of summation, [Nb, outC, outH, outW]
output = output.sum(dim=1, keepdim=False)
I used inC, outC, inH, etc. in case the input and output channels/sizes are different.

ValuerError regarding dimensions when declaring PyTorch tensor

I'm currently trying to convert a list of values into a PyTorch tensor and am facing some difficulties.
The exact code that's causing the error is:
input_tensor = torch.cuda.FloatTensor(data)
Here, data is a list with two elements: The first element is another list of NumPy arrays and the second element is a list of tuples. The sizes of both lists differ, and I believe this is causing the following error:
*** ValueError: expected sequence of length x at dim 2 (got y)
Usually y is larger than x. I've tried playing around with an IPython terminal to see what's wrong, and it appears that trying to convert data of this format directly into PyTorch tensors doesn't work. Taking each individual element of the data list and converting those into tensors works, though.
Does anybody know why this doesn't work and perhaps also be able to provide some feedback on how to achieve my original goal? Thanks in advance.
Let's say that the first sublist of data contains n 1D arrays, each of size m, and the second sublist contains k tuples, each of size p.
When calling torch.FloatTensor(data) each sublist is converted to a 2D tensor, of shape (n, m) and of shape (k, p) respectively; then they are stack together to form a 3D tensor. This is possible only if n=k and m=p -- think of a 3D tensor as a cuboid.
This is quite obvious I think, so I guess you have m = p and want to create a 2D tensor of shape (n+k, m) by simply concatenating the two sublists:
torch.FloatTensor(np.concatenate(data))

Get Keras LSTM output inside Tensorflow code

I'm working with time-variant graph embedding, where at each time step, the adjacency matrix of the graph changes. The main idea is to perform the node embedding of each timestep of the graph by looking to a set of node features and the adjacency matrix. The node embedding step is long and complicated, and is not part of the core of the problem, so I will skip this part. Suffice it to say that I use Graph Convolutional Network to embed the nodes.
Consider that I have a stack of B adjacency matrices A with sizes NxN, where B = batch size and N = number of nodes in the graph. Also, the matrices are stacked according to a time series, where matrix in index i comes before matrix in index i+1. I have already embedded the nodes of the graph, which results in a matrix of dimensions B x N x E, where E = size of the embedding (parameter). Note that the model has to deal with any graph, therefore, N is not a parameter. Another important comment is that each batch contains adjacency matrices from the same graph, and therefore all matrices of a batch have the same number of node, but the matrices of other batches may have different number of nodes.
I now need to pass these embedding through an LSTM cell. I never used Keras before, so I'm having a hard time making the Keras LSTM blend in my Tensorflow code. What I want to do is: pass each node embedding through an LSTM such that the number of timesteps = B and the LSTM batch size = N, that is, the input to my LSTM has the shape [N, B, E], where N and B are only known through execution time. I want the output of my LSTM to have the shape of [B, E*E]. The embedding matrix is called here self.embed_mat. Here is my code:
def _LSTM_layer(self):
with tf.variable_scope(self.scope, reuse=tf.AUTO_REUSE), tf.device(self.device):
in_shape = tf.shape(self.embed_mat)
lstm_input = tf.reshape(self.embed_mat, [in_shape[1], in_shape[0], EMBED_SIZE]) #lstm = [N, B, E]
input_plh = K.placeholder(name="lstm_input", shape=(None, None, EMBED_SIZE))
lstm = LSTM(EMBED_SIZE*EMBED_SIZE, input_shape=(None, None, EMBED_SIZE))
get_output = K.function(inputs=[input_plh], outputs=[lstm(input_plh)])
h = get_output([lstm_input])
I am a bit lost with the K.function part. All I want is the output tensor of the LSTM cell. I've seen that in order to get that with Keras, we need to use K.function, but I don't quite get it what it does. When I call get_output([lstm_input]), I get the following error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'worker_global/A/shape' with dtype int64 and shape [?]
Here, A is the stacked adjacency matrices with dimension BxNxN. What is going on here? Does the value of N needs to be known during graph building step? I think I made some dumb mistake with the LSTM cell, but I can't get what it is.
Thanks in advance!
If you want to get the output of your LSTM layer "out" given input of "inp" in a keras Sequential() model called "model," where "inp" is your first / input layer and "out" is an LSTM layer that happens to be, for the sake of this example, in the 4th position in your sequential model, you would obtain the output of that LSTM layer from the data you call "lstm_input" above with the following code:
inp = model.layers[0].input
out = model.layers[3].output
inp_to_out = K.function([inp], [out])
output = inp_to_out([lstm_input])

Concatenate two tensors with different shapes in Keras

In the following diagram, I have two different tensors: tensor1 and tensor2.
How do I merge (concatenate) these two tensors such that input to LSTM is now:
(tensor1[0], tensor11, concatenate(tensor1[2], tensor21)) ??
It's impossible to concatenate them.
You need to manipulate, transform them somehow.
The most logical thing I can think of is repeating tensor 2 six times to fill the timesteps that it doesn't have.
If this is ok (transforming tensor 2 into a sequence of 6 constant steps), the solution is:
tensor2Repeated = RepeatVector(6)(tensor2)
tensor = Concatenate()([tensor1,tensor2Repeated])
Isn't it better to reduce redundancy? You only have to replicate the second tensor 3 times to produce the same amount of information as the first tensor, then you simply reshape. To concatenate an arbitrary number of tensors, simply calculate the size of each minus the last axis (multiply all the axes before last to get size), find the largest tensor m, then upsample or repeat each tensor x by ceiling(m.size / x.size). Then you simply reshape each with the same axes as m except for the last axis, which you either calculate or let your framework calculate implicitly with -1.
tensor2Repeated = RepeatVector(3)(tensor2)
tensor2Reshaped = reshape(tensor2Repeated, (32, 6, 1))
tensor = Concatenate()([tensor1,tensor2Reshaped])

How to correctly implement a batch-input LSTM network in PyTorch?

This release of PyTorch seems provide the PackedSequence for variable lengths of input for recurrent neural network. However, I found it's a bit hard to use it correctly.
Using pad_packed_sequence to recover an output of a RNN layer which were fed by pack_padded_sequence, we got a T x B x N tensor outputs where T is the max time steps, B is the batch size and N is the hidden size. I found that for short sequences in the batch, the subsequent output will be all zeros.
Here are my questions.
For a single output task where the one would need the last output of all the sequences, simple outputs[-1] will give a wrong result since this tensor contains lots of zeros for short sequences. One will need to construct indices by sequence lengths to fetch the individual last output for all the sequences. Is there more simple way to do that?
For a multiple output task (e.g. seq2seq), usually one will add a linear layer N x O and reshape the batch outputs T x B x O into TB x O and compute the cross entropy loss with the true targets TB (usually integers in language model). In this situation, do these zeros in batch output matters?
Question 1 - Last Timestep
This is the code that i use to get the output of the last timestep. I don't know if there is a simpler solution. If it is, i'd like to know it. I followed this discussion and grabbed the relative code snippet for my last_timestep method. This is my forward.
class BaselineRNN(nn.Module):
def __init__(self, **kwargs):
...
def last_timestep(self, unpacked, lengths):
# Index of the last output for each sequence.
idx = (lengths - 1).view(-1, 1).expand(unpacked.size(0),
unpacked.size(2)).unsqueeze(1)
return unpacked.gather(1, idx).squeeze()
def forward(self, x, lengths):
embs = self.embedding(x)
# pack the batch
packed = pack_padded_sequence(embs, list(lengths.data),
batch_first=True)
out_packed, (h, c) = self.rnn(packed)
out_unpacked, _ = pad_packed_sequence(out_packed, batch_first=True)
# get the outputs from the last *non-masked* timestep for each sentence
last_outputs = self.last_timestep(out_unpacked, lengths)
# project to the classes using a linear layer
logits = self.linear(last_outputs)
return logits
Question 2 - Masked Cross Entropy Loss
Yes, by default the zero padded timesteps (targets) matter. However, it is very easy to mask them. You have two options, depending on the version of PyTorch that you use.
PyTorch 0.2.0: Now pytorch supports masking directly in the CrossEntropyLoss, with the ignore_index argument. For example, in language modeling or seq2seq, where i add zero padding, i mask the zero padded words (target) simply like this:
loss_function = nn.CrossEntropyLoss(ignore_index=0)
PyTorch 0.1.12 and older: In the older versions of PyTorch, masking was not supported, so you had to implement your own workaround. I solution that i used, was masked_cross_entropy.py, by jihunchoi. You may be also interested in this discussion.
A few days ago, I found this method which uses indexing to accomplish the same task with a one-liner.
I have my dataset batch first ([batch size, sequence length, features]), so for me:
unpacked_out = unpacked_out[np.arange(unpacked_out.shape[0]), lengths - 1, :]
where unpacked_out is the output of torch.nn.utils.rnn.pad_packed_sequence.
I have compared it with the method described here, which looks similar to the last_timestep() method Christos Baziotis is using above (also recommended here), and the results are the same in my case.

Resources