torch suppress to kth largest values - pytorch

I have the following function which works, but just not for half precision values (get a NotImplemented error for kthvalue).
def suppress_small_probabilities(probabilities: torch.FloatTensor, k: int) -> torch.FloatTensor:
kth_largest, _ = (-probabilities).kthvalue(k, dim=-1, keepdim=True)
return probabilities * (probabilities >= -kth_largest)
How would you do the equivalent without using kthvalue? I'm guessing topk has something to do with it, but I want to suppress the smaller values. probabilities is of size batch_size x 1000.

Implement your own topk, e.g.
def mytopk(xs: Tensor, k: int) -> Tensor:
mask = torch.zeros_like(xs)
batch_idx = torch.arange(0, len(xs))
for _ in range(k):
_, index = torch.where(mask == 0, xs, -1e4).max(-1)
mask[(batch_idx, index)] = 1
return mask
This will return a boolean mask tensor where the row-wise top-k elements will have value 1, rest 0.
Then use the mask to index your original tensor, e.g.
xs = torch.rand(3, 5, dtype=torch.float16)
# tensor([[0.0626, 0.9620, 0.5596, 0.4423, 0.1932],
# [0.5289, 0.0857, 0.7802, 0.7730, 0.4807],
# [0.8272, 0.5016, 0.1169, 0.4372, 0.1843]], dtype=torch.float16)
mask = mytopk(xs, 2)
# tensor([[0., 1., 1., 0., 0.],
# [0., 0., 1., 1., 0.],
# [1., 1., 0., 0., 0.]])
top_only = torch.where(mask == 1, xs, 0)
# tensor([[0.0000, 0.9620, 0.5596, 0.0000, 0.0000],
# [0.0000, 0.0000, 0.7802, 0.7730, 0.0000],
# [0.8271, 0.5016, 0.0000, 0.0000, 0.0000]], dtype=torch.float16)

Related

How do I mask a feed forward layer based on tensor in pytorch?

I have a really simple network with 2 inputs (x and m).
x is size 100
m is size 3
My network is simply...
f_1 = linear_layer(x)
f_2 = linear_layer(f_1)
f_3 = linear_layer(f_1)
f_4 = linear_layer(f_1)
f_5 = softmax(linear_layer(sum(f_2, f_3, f_4)))
based on the vector m, I want to zero out and ignore f_2, f_3, f_4 in the final sum and resulting gradient calculation. Is there a way to create a mask based on vector m to achieve this?
Ok, here is how you do it. Use list comprehensions to make it more generic:
# example input and output
x = torch.ones(5)
y = torch.zeros(3)
# mask tensor
mask = torch.tensor([0, 1, 0])
# initial layer
z0 = torch.nn.Linear(5, 5)
# layers to potentially mask
z1 = torch.nn.Linear(5, 3)
z2 = torch.nn.Linear(5, 3)
z3 = torch.nn.Linear(5, 3)
# defines how the data passes through the layers, specific mask element is applied to each of the maskable layers
layer1_output = z0(x)
layer2_output = mask[0]*z1(layer1_output) + mask[1]*z2(layer1_output) + mask[2]*z3(layer1_output)
# loss function
loss = torch.nn.functional.binary_cross_entropy_with_logits(layer2_output, y)
# run it and see
loss.backward()
print(z0.weight.grad)
print(z1.weight.grad)
print(z2.weight.grad)
print(z3.weight.grad)
as shown below, the masking tensor is effective in selecting subnets to apply computation to based on mask element
tensor([[ 0.0354, 0.0354, 0.0354, 0.0354, 0.0354],
[-0.0986, -0.0986, -0.0986, -0.0986, -0.0986],
[-0.0372, -0.0372, -0.0372, -0.0372, -0.0372],
[-0.0168, -0.0168, -0.0168, -0.0168, -0.0168],
[-0.0133, -0.0133, -0.0133, -0.0133, -0.0133]])
tensor([[-0., 0., 0., -0., 0.],
[-0., 0., 0., -0., 0.],
[-0., 0., 0., -0., 0.]])
tensor([[-0.0422, 0.1314, 0.1108, -0.1644, 0.0906],
[-0.0240, 0.0747, 0.0630, -0.0934, 0.0515],
[-0.0251, 0.0781, 0.0659, -0.0977, 0.0539]])
tensor([[-0., 0., 0., -0., 0.],
[-0., 0., 0., -0., 0.],
[-0., 0., 0., -0., 0.]])

How to map element in pytorch tensor to id?

Given a tensor:
A = torch.tensor([2., 3., 4., 5., 6., 7.])
Then, give each element in A an id:
id = torch.arange(A.shape[0], dtype = torch.int) # tensor([0,1,2,3,4,5])
In other words, id of 2. in A is 0 and id of 3. in A is 1:
2. -> 0
3. -> 1
4. -> 2
5. -> 3
6. -> 4
7. -> 5
Then, I have a new tensor:
B = torch.tensor([3., 6., 6., 5., 4., 4., 4.])
In pytorch, is there any way in Pytorch to map each element in B to id?
In other words, I want to obtain tensor([1, 4, 4, 3, 2, 2, 2]), in which each element is id of the element in B.
What you ask can be done with slowly iterating the whole B matrix and checking each element of it against all elements of A and then retrieving the index of each element:
In [*]: for x in B:
...: print(torch.where(x==A)[0][0])
...:
...:
tensor(1)
tensor(4)
tensor(4)
tensor(3)
tensor(2)
tensor(2)
tensor(2)
Here I used torch.where to find all the True elements in the matrix x==A, where x take the value of each element of matrix B. This is really slow but it allows you to add some functionality to deal with cases where some elements of B do not appear in matrix A
The fast and dirty method to get what you want with linear algebra operations is:
In [*]: (B.view(-1,1) == A).int().argmax(dim=1)
Out[*]: tensor([1, 4, 4, 3, 2, 2, 2])
This trick takes advantage of the fact that argmax returns the first 'max' index of each vector in dim=1.
Big warning here, if the element does not exist in the matrix no error will be raised and the result will silently be 0 for all elements that do not exist in A.
In [*]: C = torch.tensor([100, 1000, 1, 3, 9999])
In [*]: (C.view(-1,1) == A).int().argmax(dim=1)
Out[*]: tensor([0, 0, 0, 1, 0])
I don't think there is such a function in PyTorch to map a tensor.
It seems quite unreasonable to solve this by comparing each value from B to values from B.
Here are two possible solutions to solve this problem.
Using a dictionary as a map
You can use a dictionary. Not so not much of a pure-PyTorch solution but will most probably be the fastest and safest way...
Just create a dict to map each element to an id, then use it to map B:
>>> map = {x.item(): i for i, x in enumerate(A)}
>>> torch.tensor([map[x.item()] for x in B])
tensor([1, 4, 4, 3, 2, 2, 2])
Change of basis approach
An alternative only using torch.Tensors. This will require the values you want to map - the content of A - to be integers because they will be used to index a tensor.
Encode the content of A into one-hot encodings:
>>> A_enc = torch.zeros((int(A.max())+1,)*2)
>>> A_enc[A, torch.arange(A.shape[0])] = 1
>>> A_enc
tensor([[0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0.],
[1., 0., 0., 0., 0., 0., 0., 0.],
[0., 1., 0., 0., 0., 0., 0., 0.],
[0., 0., 1., 0., 0., 0., 0., 0.],
[0., 0., 0., 1., 0., 0., 0., 0.],
[0., 0., 0., 0., 1., 0., 0., 0.],
[0., 0., 0., 0., 0., 1., 0., 0.]])
We'll use A_enc as our basis to map integers:
>>> v = torch.argmax(A_enc, dim=0)
tensor([0, 0, 0, 1, 2, 3, 4, 5])
Now, given an integer for instance x=3, we can encode it into a one-hot-encoding: x_enc = [0, 0, 0, 1, 0, 0, 0, 0]. Then, use v to map it. With a simple dot product you can get the mapping of x_enc: here <v/x_enc> gives 1 which is the desired result (first element of mapped-B). But instead of giving x_enc, we will compute the matrix multiplication between v and encoded-B. First encode B then compute the matrix multiplcition vxB_enc:
>>> B_enc = torch.zeros(A_enc.shape[0], B.shape[0])
>>> B_enc[B, torch.arange(B.shape[0])] = 1
>>> B_enc
tensor([[0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0.],
[1., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 1., 1., 1.],
[0., 0., 0., 1., 0., 0., 0.],
[0., 1., 1., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0.]])
>>> v#B_enc.long()
tensor([1, 4, 4, 3, 2, 2, 2])
Note - you will have to define your tensors with Long type.
There is a similar issue for numpy so my answer is heavily inspired by their solution. I will compare some of the mentioned methods using perfplot. I will also generalize the problem to apply a mapping to a tensor (yours is just a specific case).
For the analysis, I will assume the mapping contains all the unique elements in the tensor and the number of elements to small and constant.
import torch
def apply(a: torch.Tensor, ids: torch.Tensor, b: torch.Tensor) -> torch.Tensor:
mapping = {k.item(): v.item() for k, v in zip(a, ids)}
return b.clone().apply_(lambda x: mapping.__getitem__(x))
def bucketize(a: torch.Tensor, ids: torch.Tensor, b: torch.Tensor) -> torch.Tensor:
mapping = {k.item(): v.item() for k, v in zip(a, ids)}
# From `https://stackoverflow.com/questions/13572448`.
palette, key = zip(*mapping.items())
key = torch.tensor(key)
palette = torch.tensor(palette)
index = torch.bucketize(b.ravel(), palette)
remapped = key[index].reshape(b.shape)
return remapped
def iterate(a: torch.Tensor, ids: torch.Tensor, b: torch.Tensor) -> torch.Tensor:
mapping = {k.item(): v.item() for k, v in zip(a, ids)}
return torch.tensor([mapping[x.item()] for x in b])
def argmax(a: torch.Tensor, ids: torch.Tensor, b: torch.Tensor) -> torch.Tensor:
return (b.view(-1, 1) == a).int().argmax(dim=1)
if __name__ == "__main__":
import perfplot
a = torch.arange(2, 8)
ids = torch.arange(0, 6)
perfplot.show(
setup=lambda n: torch.randint(2, 8, (n,)),
kernels=[
lambda x: apply(a, ids, x),
lambda x: bucketize(a, ids, x),
lambda x: iterate(a, ids, x),
lambda x: argmax(a, ids, x),
],
labels=["apply", "bucketize", "iterate", "argmax"],
n_range=[2 ** k for k in range(25)],
xlabel="len(a)",
)
Running this yields the following plot:
Hence depending on the number of elements in your tensor you can pick either the argmax method (with the caveats mentioned and the restriction that you have to map the values from 0 to N), apply, or bucketize.
Now if we increase the number of elements to be mapped lets say tens of thousands i.e. a = torch.arange(2, 10002) and ids = torch.arange(0, 10000) we get the following results:
This means the speed increase of bucketize will only be visible for a larger array but still outperforms the other methods (the argmax method was killed and therefore I had to remove it).
Last, if we have a mapping that does not have all keys present in the tensor we can just update a dictionary with all unique keys:
mapping = {x.item(): x.item() for x in torch.unique(a)}
mapping.update({k.item(): v.item() for k, v in zip(a, ids)})
Now, if the unique elements you want to map is orders of magnitude larger than the array computing this may shift the value of n for when bucketize is faster than apply (since for apply you can change the mapping.__getitem__(x) for mapping.get(x, x).
I guess there is an easier way. Create an array as mapper, cast your tensor back into np.ndarray first and then address it.
import numpy as np
a_array = A.numpy().astype(int)
b_array = B.numpy().astype(int)
mapper = np.zeros(10)
for i, x in enumerate(a_array):
mapper[x] = i
out = torch.Tensor(mapper[b_array])

Why do 'loss.backward()' and 'weight.grad' return a tensor containing all zeros?

When I run 'loss.backward()' and 'weight.grad' I get a tensor containing all zeros. Also, 'weight.grad_fn' retruns NONE.
However, it all seems to return the correct result for the second layer 'w2'.
If I play with simple operations such as x*2 or x**2 'backward()' and '.grad' return correct results
Here's my code:
import torch
from torch import nn
import torch.nn.functional as F
from torchvision import datasets, transforms
# Getting MNIST data
num_workers = 0
batch_size = 64
transform = transforms.ToTensor()
train_data = datasets.MNIST(root='data', train=True, download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, num_workers=num_workers)
dataiter = iter(train_loader)
images, labels = dataiter.next()
#####################################
#####################################
#### NN Part
def activation(x):
return 1/(1+torch.exp(-x))
inputs = torch.from_numpy(images.view())
# Flatten the inputs format from (64,1,28,28) into (64,784)
inputs = inputs.reshape(images.shape[0], int(images.shape[1]*images.shape[2]*images.shape[3]))
w1 = torch.randn(784, 256, requires_grad=True)# n_input, n_hidden
b1 = torch.randn(256)# n_hidden
w2 = torch.randn(256, 10, requires_grad=True)# n_hidden, n_output
b2 = torch.randn(10)# n_output
h = activation(torch.mm(inputs, w1) + b1)
y = torch.mm(h, w2) + b2
#print(h)
#print(y)
y.sum().backward()
print(w1.grad)
print(w1.grad_fn)
#print(w2.grad)
#print(w2.grad_fn)
By the way it gives me the same problem if I try to run it this way also:
images = images.reshape(images.shape[0], -1)
model = nn.Sequential(nn.Linear(784, 128),
nn.ReLU(),
nn.Linear(128, 64),
nn.ReLU(),
nn.Linear(64, 10),
nn.LogSoftmax(dim=1))
logits = model(images)
criterion = nn.NLLLoss()
loss = criterion(logits, labels)
print(loss)
print(loss.grad_fn)
print('Before backward pass: ', model[0].weight.grad)
loss.backward()
print('After: ', model[0].weight.grad)
#print('After: ', model[2].weight.grad)
#print('After: ', model[4].weight.grad)
The gradients of w1 are not all zero, there are simply a lot of zeros, especially around the border, because the MNIST images have a lot of black pixels (zeros). When multiplying with zero, the resulting gradients are also zero.
By printing w1.grad you only see a very small part of the values (borders), and you just can't see the non-zero values.
w1.grad
# => tensor([[0., 0., 0., ..., 0., 0., 0.],
# [0., 0., 0., ..., 0., 0., 0.],
# [0., 0., 0., ..., 0., 0., 0.],
# ...,
# [0., 0., 0., ..., 0., 0., 0.],
# [0., 0., 0., ..., 0., 0., 0.],
# [0., 0., 0., ..., 0., 0., 0.]])
# Indices of non-zero elements
w1.grad.nonzero()
# => tensor([[ 71, 0],
# [ 71, 1],
# [ 71, 2],
# ...,
# [746, 253],
# [746, 254],
# [746, 255]])

How to add to pytorch tensor at indices?

I have to admit, I'm a bit confused by the scatter* and index* operations - I'm not sure any of them do exactly what I'm looking for, which is very simple:
Given some 2-D tensor
z = tensor([[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]])
And a list (or tensor?) of 2-d indexes:
inds = tensor([[0, 0],
[1, 1],
[1, 2]])
I want to add a scalar to z at those indexes (and do it efficiently):
znew = z.something_add(inds, 3)
->
znew = tensor([[4., 1., 1., 1.],
[1., 4., 4., 1.],
[1., 1., 1., 1.]])
If I have to I can make that scalar a tensor of whatever shape (where all elements = 3), but I'd rather not...
You must provide two lists to your indexing. The first having the row positions and the second the column positions. In your example, it would be:
z[[0, 1, 1], [0, 1, 2]] += 3
torch.Tensor indexing follows Numpy. See https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#integer-array-indexing for more details.
This code achieves what you want:
z_new = z.clone() # copy the tensor
z_new[inds[:, 0], inds[:, 1]] += 3 # modify selected indices of new tensor
In PyTorch, you can index each axis of a tensor with another tensor.

How to feed a model with "a list of outputs"?

Sorry for the title but I could't come up with a better description here.
I am trying to apply batches for training on a model which should have 13 fully connected output layers. Each output layer has only two nodes (but are fully connected as stated).
Building the model's output looks like this:
outputs = list()
for i in range(num_labels):
out_y = Dense(2, activation='softmax', name='out_{:d}'.format(i))(convolution_layer)
outputs.append(out_y)
self.model = Model(input=inputs, output=outputs)
However, I can't manage to feed this model. I've tried to go with a [batch_size, 13, 1, 2] sized output array:
y = np.zeros((batch_size, 13, 1, 2))
But for a batch of size 2 I get:
ValueError: The model expects 13 input arrays, but only received one array. Found: array with shape (2, 13, 1, 2)
I've tried several other things but it's simply not clear to me how the input for the model looks like.
How can I train this model?
I have also tried to pass a list of lists of numpy arrays:
where the first level of the batch represent the sample (here 2) and the second level is the sample with the list of 13 numpy arrays. Yet I am getting:
ValueError: Error when checking model target: you are passing a list as input to your model, but the model expects a list of 13 Numpy arrays instead. The list you passed was: [[array([ 0., 1.]), array([ 0., 1.]), array([ 0., 1.]), array([ 0., 1.]), array([ 0., 1.]), array([ 0., 1.]), array([ 0., 1.]), array([ 0., 1.]), array([ 0., 1.]), array([ 1., 0.]), array([
As suggested, I also tried to return a list() of numpy arrays of size [13,2]:
Where the error becomes:
ValueError: Error when checking model target: the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 13 arrays but instead got the following list of 2 arrays: [array([[ 0., 1.],
[ 0., 1.],
[ 0., 1.],
[ 0., 1.],
[ 0., 1.],
[ 0., 1.],
[ 0., 1.],
[ 0., 1.],
[ 0., 1.],
[ 1., 0.],
[ ...
The code
Below you can find the current code which generates one sample in sample_generator and a full batch in batch_generator (which uses sample_generator).
def batch_generator(w2v, file_path, meta_info, batch_size, sample_generator_fn, embedding_size):
Please note: The code shows now how I generate a list() of [13,2] ndarrays whereas the number of such ndarrays in that list is defined by batch_size.
try:
x = np.zeros((batch_size, meta_info.max_sequence_length, embedding_size, 1))
y = list() #np.zeros((batch_size, 13, 1, 2))
file = open(file_path)
while True:
x[:] = 0.0
#y[:] = 0.0
for batch in range(batch_size):
sentence_info_json = file.readline()
if sentence_info_json == '':
file.seek(0)
sentence_info_json = file.readline()
sample = sample_generator_fn(w2v, sentence_info_json, meta_info)
if not sample:
continue
sentence_embedding = sample[0]
final_length = len(sentence_embedding)
x[batch, :final_length, :, 0] = sentence_embedding
y.append(sample[1])
shuffled = np.asarray(range(batch_size))
np.random.shuffle(shuffled)
x = x[shuffled]
#y = y[shuffled]
y = [y[i] for i in shuffled]
yield x, y
except Exception as e:
print('Error in generator.')
print(e)
raise e
def sample_generator(w2v, sentence_info_json, meta_info):
if not sentence_info_json:
print('???')
sentence_info = json.loads(sentence_info_json)
tokens = [token['word'] for token in sentence_info['corenlp']['tokens']]
sentence = Sentence(tokens=tokens)
sentence_embedding = w2v.get_word_vectors(sentence.tokens.tolist())
sentence_embedding = np.asarray([word_vector for word_vector in sentence_embedding if word_vector is not None])
final_length = len(sentence_embedding)
if final_length == 0:
return None
y = np.zeros((2, len(meta_info.category_dict)))
y[1, :] = 1.
#y_list = []
y_tar = np.zeros((len(meta_info.category_dict), 2))
for i in range(len(meta_info.category_dict)):
y_tar[i][1] = 1.0
# y_list.append(np.asarray([0.0, 1.0]))
for opinion in sentence_info['opinions']:
index = meta_info.category_dict[opinion['category']]
y_tar[index][0] = 1.0
y_tar[index][1] = 0.0
#y_list[index][0] = 1.0
#y_list[index][1] = 0.0
return sentence_embedding, y_tar
As requested, the call to fit_generator()
cnn.model.fit_generator(generator=batch_generator(word2vec,
train_file, train_meta_info,
num_batches, sample_generator,
embedding_size),
samples_per_epoch=2000,
nb_epoch=2,
# validation_data=batch_generator(test_file_path, train_meta_info),
# nb_val_samples=100,
verbose=True)
Your output should be a list as specified in the error. Each element of the list should be a numpy array of size [batch_size, nb_outputs]. So a list of 13 elements of size [batch_size,2] in your case.

Resources