PyTorch: Computing the norm of batched tensors - pytorch

I have tensor t with shape (Batch_Size x Dims) and another tensor v with shape (Vocab_Size x Dims). I'd like to produce a tensor d with shape (Batch_Size x Vocab_Size), such that d[i,j] = norm(t[i] - v[j]).
Doing this for a single tensor (no batches) is trivial: d = torch.norm(v - t), since t would be broadcast. How can I do this when the tensors have batches?

Insert unitary dimensions into v and t to make them (1 x Vocab_Size x Dims) and (Batch_Size x 1 x Dims) respectively. Next, take the broadcasted difference to get a tensor of shape (Batch_Size x Vocab_Size x Dims). Pass that to torch.norm along with the optional dim=2 argument so that the norm is taken along the last dimension. This will result in the desired (Batch_Size x Vocab_Size) tensor of norms.
d = torch.norm(v.unsqueeze(0) - t.unsqueeze(1), dim=2)
Edit: As pointed out by #KonstantinosKokos in the comments, due to the broadcasting rules used by numpy and pytorch, the leading unitary dimension on v does not need to be explicit. I.e. you can use
d = torch.norm(v - t.unsqueeze(1), dim=2)

Related

Gradients of loss with respect to random parameters are the same in pytorch

In the simple code below, I perform a simple linear operation on an input tensor of ones and compute its binary cross-entropy loss considering a vector of zeros as the expected output.
When computing the gradient of the loss with respect to w, the rows are the same and equal to the gradient with respect to b. This is counter-intuitive since w and b have random values. What is the reason?
n_input, n_output = 5, 3
x = torch.ones(n_input)
y = torch.zeros(n_output) # expected output
w = torch.randn(n_input, n_output, requires_grad=True)
b = torch.randn(n_output, requires_grad=True)
z = torch.matmul(x,w) + b
loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)
loss.backward()
print(w.grad)
print(b.grad)
Output:
tensor([[0.2179, 0.4337, 0.1959],
[0.2179, 0.4337, 0.1959],
[0.2179, 0.4337, 0.1959],
[0.2179, 0.4337, 0.1959],
[0.2179, 0.4337, 0.1959]])
tensor([0.2179, 0.4337, 0.1959])
It's because Your input is symmetric.
Imagine the issue from the point of view of a perceptron (You have 3 of them in Your setup):
each input is 1.0 so the weights of a specific neuron don't matter (it is not important from which input You will take as there is 1.0 everywhere).
If You diversify the input, everything works just fine:
n_input, n_output = 5, 3
x = torch.randn(n_input)
y = torch.ones(n_output)/2. # expected output
w = torch.randn(n_input, n_output, requires_grad=True)
b = torch.randn(n_output, requires_grad=True)
z = torch.matmul(x, w) + b
loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)
loss.backward()
print(w.grad)
print(b.grad)
tensor([[-0.1939, 0.1657, -0.2501],
[ 0.0561, -0.0480, 0.0724],
[-0.3162, 0.2703, -0.4079],
[ 0.0947, -0.0809, 0.1221],
[-0.0140, 0.0120, -0.0181]])
tensor([-0.1263, 0.1080, -0.1630])
You have a single data point with an input feature size of 5. If you look at your operation performed you have z = x#w + b, then you have a binary cross-entropy from logits against a null label. The binary cross-entropy is defined by:
bce = -[y_true*log(Ļƒ(y_pred)) + (1 - y_true)*log(1 - Ļƒ(y_pred))]
The gradient of z is written as the partial derivative dL/dz, it consists of three elements (same size as z) let's say [dz1, dz2, dz3].
To compute the gradients of the weight parameter w and the bias parameter b we have the following:
dL/dw = x.T # dL/dz
dL/db = dL/dz (with a shape change)
Therefore b.grad is simply
[dz1, dz2, dz3]
And, since we have x made up of ones, x.T # dL/dz ends up being a matrix with rows equal to dL/dz as well, i.e. with five rows:
[[dz1, dz2, dz3],
[dz1, dz2, dz3],
[dz1, dz2, dz3],
[dz1, dz2, dz3],
[dz1, dz2, dz3]]

Sample a tensor of probability distributions in pytorch

I want to sample a tensor of probability distributions with shape (N, C, H, W), where dimension 1 (size C) contains normalized probability distributions with ā€˜Cā€™ possibilities. Is there a pytorch function to efficiently sample all the distributions in the tensor in parallel? I just need to sample each distribution once, so the result could either be a one-hot tensor with the same shape or a tensor of indices with shape (N, 1, H, W).
There was no single function to sample that I saw, but I was able to sample the tensor in several steps by computing the cumulative probabilities, sampling each point independently, and then picking the first point that sampled a 1 in the distribution dimension:
reverse_cumulative = torch.flip(torch.cumsum(torch.flip(probabilities, [1]), dim=1), [1])
cumulative = probabilities / reverse_cumulative
sampled = (torch.rand(cumulative.shape, device=device()) <= cumulative)
idxs = sampled * one_hot
idxs[~sampled] = self.tile_count
sampled_idxs = idxs.min(dim=1).indices

Indexing Pytorch tensor

I have a Pytorch code which generates a Pytorch tensor in each iteration of for loop, all of the same size. I want to assign each of those tensors to a row of new tensor, which will include all the tensors at the end. In other works something like this
for i=1:N:
X = torch.Tensor([[1,2,3], [3,2,5]])
#Y is a pytorch tensor
Y[i] = X
I wonder how I can implement this with Pytorch.
You can concatenate the tensors using torch.cat:
tensors = []
for i in range(N):
X = torch.tensor([[1,2,3], [3,2,5]])
tensors.append(X)
Y = torch.cat(tensors, dim=0) # dim 0 is the rows of the tensor

Multiply 3 matrix in Keras custom layer

I would like to create a custom Keras Layer that calculates the product between 2 input matrices and 1 weight matrix (diagonal matrix) : x W y
x = Input((8,200)) # (?,8,200)
y = Input((10,200)) # (?,10,200)
W # Weight matrix define with Keras (200,)
I want the output matrix that compute xWy with a shape (?, 8, 10)
I try :
K.dot(x*W, K.transpose(Y)) # Raise Dimension error
K.dot(x*W, Permute(2,1))(Y)) # (?, 8, ?, 10)
Without the first dimension (batch size) I see how to do it, but with it I'm a little lost.
You can use K.batch_dot, which is made for this purpose.
K.batch_dot(x*W, K.permute_dimensions(y, (0,2,1)), axes=[2, 1]) # (?, 8, 10)
will do the trick.
You can specify the axis along which to take the dot product in a Keras Dot layer. The following code shows how to multiply your inputs x and y. If you want to add a weight matrix W you can do that in a similar way (by first multiplying x and W).
x = Input((8,200)) # (?,8,200)
y = Input((10,200)) # (?,10,200)
output = keras.layers.Dot(axes=-1)([x, y]) # (?,8,10)

Time prediction using specialised setup in Keras

I'm working on a project where I have to predict the future states of a 1D vector with y entries. I'm trying to do this using an ANN setup with LSTM units in combination with a convolution layer. The method I'm using is based on the method they used in a (pre-release paper). The suggested setup is as follows:
In the picture c is the 1D vector with y entries. The ANN gets the n previous states as an input and produces o next states as an output.
Currently, my ANN setup looks like this:
inputLayer = Input(shape = (n, y))
encoder = LSTM(200)(inputLayer)
x = RepeatVector(1)(encoder)
decoder = LSTM(200, return_sequences=True)(x)
x = Conv1D(y, 4, activation = 'linear', padding = 'same')(decoder)
model = Model(inputLayer, x)
Here n is the length of the input sequences and y is the length of the state array. As can be seen I'm repeating the d vector only 1 time, as I'm trying to predict only 1 time step in the future. Is this the way to setup the above mentioned network?
Furthermore, I have a numpy array (data) with a shape of (Sequences, Time Steps, State Variables) to train with. I was trying to divide this in randomly selected batches with a generator like this:
def BatchGenerator(batch_size, n, y, data):
# Infinite loop.
while True:
# Allocate a new array for the batch of input-signals.
x_shape = (batch_size, n, y)
x_batch = np.zeros(shape=x_shape, dtype=np.float16)
# Allocate a new array for the batch of output-signals.
y_shape = (batch_size, 1, y)
y_batch = np.zeros(shape=y_shape, dtype=np.float16)
# Fill the batch with random sequences of data.
for i in range(batch_size):
# Select a random sequence
seq_idx = np.random.randint(data.shape[0])
# Get a random start-index.
# This points somewhere into the training-data.
start_idx = np.random.randint(data.shape[1] - n)
# Copy the sequences of data starting at this
# Each batch inside x_batch has a shape of [n, y]
x_batch[i,:,:] = data[seq_idx, start_idx:start_idx+n, :]
# Each batch inside y_batch has a shape of [1, y] (as we predict only 1 time step in advance)
y_batch[i,:,:] = data[seq_idx, start_idx+n, :]
yield (x_batch, y_batch)
The problem is that it gives an error if I'm using a batch_size of more than 1. Could anyone help me to set this data up in a way that it can be used optimally to train my neural network?
The model is now trained using:
generator = BatchGenerator(batch_size, n, y, data)
model.fit_generator(generator = generator, steps_per_epoch = steps_per_epoch, epochs = epochs)
Thanks in advance!

Resources