kth-value per row in pytorch? - pytorch

Given
import torch
A = torch.rand(9).view((3,3)) # tensor([[0.7455, 0.7736, 0.1772],\n[0.6646, 0.4191, 0.6602],\n[0.0818, 0.8079, 0.6424]])
k = torch.tensor([0,1,0])
A.kthvalue_vectoriezed(k) -> [0.1772,0.6602,0.0818]
Meaning I would like to operate on each column with a different k.
Not kthvalue nor topk offers such API.
Is there a vectorized way around that?
Remark - kth value is not the value in the kth index, but the kth smallest element. Pytorch docs
torch.kthvalue(input, k, dim=None, keepdim=False, out=None) -> (Tensor, LongTensor)
Returns a namedtuple (values, indices) where values is the k th smallest element of each row of the input tensor in the given dimension dim. And indices is the index location of each element found.

Assuming you don't need indices into original matrix (if you do, just use fancy indexing for the second return value as well) you could simply sort the values (by last index by default) and return appropriate values like so:
def kth_smallest(tensor, indices):
tensor_sorted, _ = torch.sort(tensor)
return tensor_sorted[torch.arange(len(indices)), indices]
And this test case gives you your desired values:
tensor = torch.tensor(
[[0.7455, 0.7736, 0.1772], [0.6646, 0.4191, 0.6602], [0.0818, 0.8079, 0.6424]]
)
print(kth_smallest(tensor, [0, 1, 0])) # -> [0.1772,0.6602,0.0818]

Related

How does this function traverse the token through slicing, Can you please explain how it selects the elements in the list?

The code I'm referring to:
predicted_index = torch.argmax(predictions[0, -1, :]).item()
This is a tensor not a list, major difference being:
tensor has one specified dtype (usually float32 in PyTorch)
faster to run operations on
Your predictions are 3D tensor of which you are taking:
0th row
last column (-1 index)
all of the elements from third dimension (:)
Essentially your are left with a vector after the slicing.
torch.argmax returns the index under which the largest element resides, for example:
torch.argmax(torch.tensor([-1, 0, 1.5, 1, 0])) # would return 2'
Code of argmax is implemented in C++ and keeps the index of largest value found until now and returns the one found at the end (O(n) complexity).
.item() changes tensor to it's Python counterpart (usually float from any floating point, int from integer family type etc.).

Find n smallest values in a list of tensors

I am trying to find the indices of the n smallest values in a list of tensors in pytorch. Since these tensors might contain many non-unique values, I cannot simply compute percentiles to obtain the indices. The ordering of non-unique values does not matter however.
I came up with the following solution but am wondering if there is a more elegant way of doing it:
import torch
n = 10
tensor_list = [torch.randn(10, 10), torch.zeros(20, 20), torch.ones(30, 10)]
all_sorted, all_sorted_idx = torch.sort(torch.cat([t.view(-1) for t in tensor_list]))
cum_num_elements = torch.cumsum(torch.tensor([t.numel() for t in tensor_list]), dim=0)
cum_num_elements = torch.cat([torch.tensor([0]), cum_num_elements])
split_indeces_lt = [all_sorted_idx[:n] < cum_num_elements[i + 1] for i, _ in enumerate(cum_num_elements[1:])]
split_indeces_ge = [all_sorted_idx[:n] >= cum_num_elements[i] for i, _ in enumerate(cum_num_elements[:-1])]
split_indeces = [all_sorted_idx[:n][torch.logical_and(lt, ge)] - c for lt, ge, c in zip(split_indeces_lt, split_indeces_ge, cum_num_elements[:-1])]
n_smallest = [t.view(-1)[idx] for t, idx in zip(tensor_list, split_indeces)]
Ideally a solution would pick a random subset of the non-unique values instead of picking the entries of the first tensor of the list.
Pytorch does provide a more elegant (I think) way to do it, with torch.unique_consecutive (see here)
I'm going to work on a tensor, not a list of tensors because as you did yourself, there's just a cat to do. Unraveling the indices afterward is not hard either.
# We want to find the n=3 min values and positions in t
n = 3
t = torch.tensor([1,2,3,2,0,1,4,3,2])
# To get a random occurrence, we create a random permutation
randomizer = torch.randperm(len(t))
# first, we sort t, and get the indices
sorted_t, idx_t = t[randomizer].sort()
# small util function to extract only the n smallest values and positions
head = lambda v,w : (v[:n], w[:n])
# use unique_consecutive to remove duplicates
uniques_t, counts_t = head(*torch.unique_consecutive(sorted_t, return_counts=True))
# counts_t.cumsum gives us the position of the unique values in sorted_t
uniq_idx_t = torch.cat([torch.tensor([0]), counts_t.cumsum(0)[:-1]], 0)
# And now, we have the positions of uniques_t values in t :
final_idx_t = randomizer[idx_t[uniq_idx_t]]
print(uniques_t, final_idx_t)
#>>> tensor([0,1,2]), tensor([4,0,1])
#>>> tensor([0,1,2]), tensor([4,5,8])
#>>> tensor([0,1,2]), tensor([4,0,8])
EDIT : I think the added permutation solves your need-random-occurrence problem

Matrix Position

I need to figure out how to return the matrix position of the largest value in a given matrix. For example:
[[1,2,3],
[4,5,6],
[7,8,9]]
A simple method of finding the maximum of the matrix would be:
maximum = max(max(matrix))
return maximum
For this matrix, the maximum is the int value: 9.
However, I am slightly lost when it comes to finding the value's exact matrix position. I know that in matrices the upper-left corner is considered (0,0) and the values (i,j) (given that i,j ε int) are incremented by one each position further from (0,0)-- i increases horizontally and j increases vertically.
The correct output for this matrix should be (2,2).
Any pointers?
Using numpy:
import numpy as np
mat = np.arange(1,9).reshape(3,3) # Creates your example matrix
maxVal = np.amax(mat) # Returns 9 for your example
locMax = [np.where(mat == maxVal)[0][0],np.where(mat == maxVal)[1][0]] # Returns (2,2) as list

Rearranging a 3-D array using indices from sorting?

I have a 3-D array of random numbers of size [channels = 3, height = 10, width = 10].
Then I sorted it using sort command from pytorch along the columns and obtained the indices as well.
The corresponding index is shown below:
Now, I would like to return to the original matrix using these indices. I currently use for loops to do this (without considering the batches). The code is:
import torch
torch.manual_seed(1)
ch = 3
h = 10
w = 10
inp_unf = torch.randn(ch,h,w)
inp_sort, indices = torch.sort(inp_unf,1)
resort = torch.zeros(inp_sort.shape)
for i in range(ch):
for j in range(inp_sort.shape[1]):
for k in range (inp_sort.shape[2]):
temp = inp_sort[i,j,k]
resort[i,indices[i,j,k],k] = temp
I would like it to be vectorized considering batches as well i.e.input size is [batch, channel, height, width].
Using Tensor.scatter_()
You can directly scatter the sorted tensor back into its original state using the indices provided by sort():
torch.zeros(ch,h,w).scatter_(dim=1, index=indices, src=inp_sort)
The intuition is based on the previous answer below. As scatter() is basically the reverse of gather(), inp_reunf = inp_sort.gather(dim=1, index=reverse_indices) is the same as inp_reunf.scatter_(dim=1, index=indices, src=inp_sort):
Previous answer
Note: while correct, this is probably less performant, as calling the sort() operation a 2nd time.
You need to obtain the sorting "reverse indices", which can be done by "sorting the indices returned by sort()".
In other words, given x_sort, indices = x.sort(), you have x[indices] -> x_sort ; while what you want is reverse_indices such that x_sort[reverse_indices] -> x.
This can be obtained as follows: _, reverse_indices = indices.sort().
import torch
torch.manual_seed(1)
ch, h, w = 3, 10, 10
inp_unf = torch.randn(ch,h,w)
inp_sort, indices = inp_unf.sort(dim=1)
_, reverse_indices = indices.sort(dim=1)
inp_reunf = inp_sort.gather(dim=1, index=reverse_indices)
print(torch.equal(inp_unf, inp_reunf))
# True

Value and Index of MAX in each column of matrix

I'm aware of:
id,value = max(enumerate(trans_p), key=operator.itemgetter(1))
I'm trying to find something equivalent for matrices, where I'm looking for the value and row index of the max for each column of the matrix
so the function could take in any matrix, such as:
np.array([[0,0,1],[2,0,0],[5,0,0]])
and return two vectors: a vector of row numbers where the max is found, and the max values themselves - for each column. I'm trying to avoid a for-loop! Ideally the function returns two values, like that:
rowIdVect, maxVect = ...........
where the values for the example matrix above would be:
[2,0,0] #rowIdVect
[5,0,1] #maxVect
I can do this in two steps:
idVect = np.argmax( myMat , axis=0)
maxVect = np.max( trans_probs_mat, axis=0)
But is there a syntax that would perform both at the same time? Note: I'm trying to improve run times.
You can use the index to find the corresponding values:
In [201]: arr=np.array([[0,0,1],[2,0,0],[5,0,0]])
In [202]: idx=np.argmax(arr, axis=0)
In [203]: np.max(arr, axis=0)
Out[203]: array([5, 0, 1])
In [204]: arr[idx,np.arange(3)]
Out[204]: array([5, 0, 1])
Is this worth it? I doubt if the use of argmax and/or max is a bottleneck in your calculations. But feel free to time test with realistic data.

Resources