max operation for selective elements, not for all - pytorch

I am coding PyTorch. Between the torch inference code, I add some peripheral code for my own interest. This code works fine, but it is too slow. The reason might be for iteration. So, i need parallel and fast way of doing this.
It is okay to do this in tensor, Numpy, or just python array.
I made a function named selective_max to find maximum value in arrays. But the problem is that I don't want a maximum among the whole arrays, but among specific candidates which is designated by mask array. Let me show the gist of this function (below shows the code itself)
Input
x [batch_size , dim, num_points, k] : x is a original input, but this becomes [batch_size, num_points, dim, k] by x.permute(0,2,1,3).
batch_size is a well-known definition in the deep learning society. In every mini batch, there are many points. And a single point is represented by dim length feature. For each feature element, there are k potential candidates which is target of max function later.
mask [batch_size, num_points, k] : This array is similar to x without dim. Its element is either 0 or 1. So, I use this as a mask signal, like do max operation only on 1 masked value.
Kindly see the code below with this explanation. I use 3 for iteration. Let's say we target a specific batch and a specific point. For a specific batch and a specific point, x has [dim, k] array. And mask has [k] array which consists of either 0 or 1. So, I extract the non-zero index from [k] array and use this for extracting specific elements in x dim by dim('for k in range(dim)').
Toy example
Let's say we are in the second for iteration. So, we now have [dim, k] for x and [k] for mask. For this toy example, i presume k=3 and dim=4. x = [[3,2,1],[5,6,4],[9,8,7],[12,11,10]], k=[0,1,1]. So, output would be [2,6,8,11], not [3, 6, 9, 12].
Previous attempt
I try { mask.repeat(0,0,1,0) *(element-wise mul) x } and do the max operation. But, '0' might the max value, because the x might have minus values in all array. So, this would result in wrong operation.
def selective_max2(x, mask): # x : [batch_size , dim, num_points, k] , mask : [batch_size, num_points, k]
batch_size = x.size(0)
dim = x.size(1)
num_points = x.size(2)
k = x.size(3)
device = torch.device('cuda')
x = x.permute(0,2,1,3) # : [batch, num_points, dim, k]
#print('permuted x dimension : ',x.size())
x = x.detach().cpu().numpy()
mask = mask.cpu().numpy()
output = np.zeros((batch_size,num_points,dim))
for i in range(batch_size):
for j in range(num_points):
query=np.nonzero(mask[i][j]) # among mask entries, we get the index of nonzero values.
for k in range(dim): # for different k values, we get the max value.
# query is index of nonzero values. so, using query, we can get the values that we want.
output[i][j][k] = np.max(x[i][j][k][query])
output = torch.from_numpy(output).float().to(device=device)
output = output.permute(0,2,1).contiguous()
return output

Disclaimer: I've followed your toy example (however while retaining generality) to write the following solution.
The first thing is to expand your k as x (treating them both as PyTorch tensors):
k_expanded = k.expand_as(x)
Then you select the elements where your 1's exist in the k_expanded, and view the resulting tensor as x number of rows (written as x.shape[0]), and number of 1's in k (or the mask) as the number of columns. Up to this point, we have selected the range we want to query the maximum element for. Then, you find the maximum along the rows dimension (showed in .sum(0)) using max(1)
values, indices = x[k_expanded == 1].view(x.shape[0], (k == 1).sum(0)).max(1)
values
Out[29]: tensor([ 2, 6, 8, 11])
Benchmarks
def find_max_elements_inside_tensor_range(arr, mask, return_indices=False):
mask_expanded = mask.expand_as(arr)
values, indices = x[k_expanded==1].view(x.shape[0], (k == 1).sum(0)).max(1)
return (values, indices) if return_indices else values
Just added a third parameter in case you want to get the numbers indices
%timeit find_max_elements_inside_tensor_range(x, k)
38.4 µs ± 534 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Note: the above solution also works for tensors and masks of various shapes.

Related

Pytorch: Custom thresholding activation function - gradient

I created an activation function class Threshold that should operate on one-hot-encoded image tensors.
The function performs min-max feature scaling on each channel followed by thresholding.
class Threshold(nn.Module):
def __init__(self, threshold=.5):
super().__init__()
if threshold < 0.0 or threshold > 1.0:
raise ValueError("Threshold value must be in [0,1]")
else:
self.threshold = threshold
def min_max_fscale(self, input):
r"""
applies min max feature scaling to input. Each channel is treated individually.
input is assumed to be N x C x H x W (one-hot-encoded prediction)
"""
for i in range(input.shape[0]):
# N
for j in range(input.shape[1]):
# C
min = torch.min(input[i][j])
max = torch.max(input[i][j])
input[i][j] = (input[i][j] - min) / (max - min)
return input
def forward(self, input):
assert (len(input.shape) == 4), f"input has wrong number of dims. Must have dim = 4 but has dim {input.shape}"
input = self.min_max_fscale(input)
return (input >= self.threshold) * 1.0
When I use the function I get the following error, since the gradients are not calculated automatically I assume.
Variable._execution_engine.run_backward(RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
I already had a look at How to properly update the weights in PyTorch? but could not get a clue how to apply it to my case.
How is it possible to calculate the gradients for this function?
Thanks for your help.
The issue is you are manipulating and overwriting elements, this time of operation can't be tracked by autograd. Instead, you should stick with built-in functions. You example is not that tricky to tackle: you are looking to retrieve the minimum and maximum values along input.shape[0] x input.shape[1]. Then you will scale your whole tensor in one go i.e. in vectorized form. No for loops involved!
One way to compute min/max along multiple axes is to flatten those:
>>> x_f = x.flatten(2)
Then, find the min-max on the flattened axis while retaining all shapes:
>>> x_min = x_f.min(axis=-1, keepdim=True).values
>>> x_max = x_f.max(axis=-1, keepdim=True).values
The resulting min_max_fscale function would look something like:
class Threshold(nn.Module):
def min_max_fscale(self, x):
r"""
Applies min max feature scaling to input. Each channel is treated individually.
Input is assumed to be N x C x H x W (one-hot-encoded prediction)
"""
x_f = x.flatten(2)
x_min, x_max = x_f.min(-1, True).values, x_f.max(-1, True).values
x_f = (x_f - x_min) / (x_max - x_min)
return x_f.reshape_as(x)
Important note:
You would notice that you can now backpropagate on min_max_fscale... but not on forward. This is because you are applying a boolean condition which is not a differentiable operation.

Numpy finding the number of points within a specific distance in absolute value

I have a bumpy array. I want to find the number of points which lies within an epsilon distance from each point.
My current code is (for a n*2 array, but in general I expect the array to be n * m)
epsilon = np.array([0.5, 0.5])
np.array([ 1/np.float(np.sum(np.all(np.abs(X-x) <= epsilon, axis=1))) for x in X])
But this code might not be efficient when it comes to an array of let us say 1 million rows and 50 columns. Is there a better and more efficient method ?
For example data
X = np.random.rand(10, 2)
you can solve this using broadcasting:
1 / np.sum(np.all(np.abs(X[:, None, ...] - X[None, ...]) <= epsilon, axis=-1), axis=-1)

Vectorized implementation of field-aware factorization

I would like to implement the field-aware factorization model (FFM) in a vectorized way. In FFM, a prediction is made by the following equation
where w are the embeddings that depend on the feature and the field of the other feature. For more info, see equation (4) in FFM.
To do so, I have defined the following parameter:
import torch
W = torch.nn.Parameter(torch.Tensor(n_features, n_fields, n_factors), requires_grad=True)
Now, given an input x of size (batch_size, n_features), I want to be able to compute the previous equation. Here is my current (non-vectorized) implementation:
total_inter = torch.zeros(x.shape[0])
for i in range(n_features):
for j in range(i + 1, n_features):
temp1 = torch.mm(
x[:, i].unsqueeze(1),
W[i, feature2field[j], :].unsqueeze(0))
temp2 = torch.mm(
x[:, j].unsqueeze(1),
W[j, feature2field[i], :].unsqueeze(0))
total_inter += torch.sum(temp1 * temp2, dim=1)
Unsurprisingly, this implementation is horribly slow since n_features can easily be as large as 1000! Note however that most of the entries of x are 0. All inputs are appreciated!
Edit:
If it can help in any ways, here are some implementations of this model in PyTorch:
pytorch-fm
ctr_model_zoo
Unfortunately, I cannot figure out exactly how they have done it.
Additional update:
I can now obtain the product of x and W in a more efficient way by doing:
temp = torch.einsum('ij, jkl -> ijkl', x, W)
Thus, my loop is now:
total_inter = torch.zeros(x.shape[0])
for i in range(n_features):
for j in range(i + 1, n_features):
temp1 = temp[:, i, feature2field[j], :]
temp2 = temp[:, j, feature2field[i], :]
total_inter += 0.5 * torch.sum(temp1 * temp2, dim=1)
It is however still too long since this loop goes over for about 500 000 iterations.
Something that could potentially help you speed up the multiplication is using pytorch sparse tensors.
Also something that might work would be the following:
Create n arrays, one for each feature i that would hold its corresponding field factors in each row. e.g. for feature i = 0
[ W[0, feature2field[0], :],
W[0, feature2field[1], :],
W[0, feature2field[n], :]]
Then calculate the multiplication of those arrays, lets call them F, with X
R[i] = F[i] * X
So each element in R would hold the result of the multiplication, an array, of the F[i] with X.
Next you would multiply each R[i] with its transpose
R[i] = R[i] * R[i].T
Now you can do the summation in a loop like before
for i in range(n_features):
total_inter += torch.sum(R[i], dim=1)
Please take this with a grain of salt as i haven't tested it. In any case i think that it will point you in the right direction.
One problem that might occur is in the transpose multiplication in which each element will also be multiplied with itself and then be added in the sum. I don't think it will affect the classifier but in any case you can make the elements in the diagonal of the transpose and above 0 (including the diagonal).
Also although minor nevertheless please move the 1st unsqueeze operation outside of the nested for loop.
I hope it helps.

how can I insert a Tensor into another Tensor in pytorch

I have pytorch Tensor with shape (batch_size, step, vec_size), for example, a Tensor(32, 64, 128), let's call it A.
I have another Tensor(batch_size, vec_size), e.g. Tensor(32, 128), let's call it B.
I want to insert B into a certain position at axis 1 of A. The insert positions are given in a Tensor(batch_size), named P.
I understand there is no Empty tensor(like an empty list) in pytorch, so, I initialize A as zeros, and add B at a certain position at axis 1 of A.
A = Variable(torch.zeros(batch_size, step, vec_size))
What I'm doing is like:
for i in range(batch_size):
pos = P[i]
A[i][pos] = A[i][pos] + B[i]
But I get an Error:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation
Then, I make a clone of A each inside the loop:
for i in range(batch_size):
A_clone = A.clone()
pos = P[i]
A_clone[i][pos] = A_clone[i][pos] + B[i]
This is very slow for autograd, I wonder if there any better solutions? Thank you.
You can use a mask instead of cloning.
See the code below
# setup
batch, step, vec_size = 64, 10, 128
A = torch.rand((batch, step, vec_size))
B = torch.rand((batch, vec_size))
pos = torch.randint(10, (64,)).long()
# computations
# create a mask where pos is 0 if it is to be replaced
mask = torch.ones( (batch, step)).view(batch,step,1).float()
mask[torch.arange(batch), pos]=0
# expand B to have same dimension as A and compute the result
result = A*mask + B.unsqueeze(dim=1).expand([-1, step, -1])*(1-mask)
This way you avoid using for loops and cloning as well.

How to set up the number of inputs neurons in sklearn MLPClassifier?

Given a dataset of n samples, m features, and using [sklearn.neural_network.MLPClassifier][1], how can I set hidden_layer_sizes to start with m inputs? For instance, I understand that if hidden_layer_sizes= (10,10) it means there are 2 hidden layers each of 10 neurons (i.e., units) but I don't know if this also implies 10 inputs as well.
Thank you
This classifier/regressor, as implemented, is doing this automatically when calling fit.
This can be seen in it's code here.
Excerpt:
n_samples, n_features = X.shape
# Ensure y is 2D
if y.ndim == 1:
y = y.reshape((-1, 1))
self.n_outputs_ = y.shape[1]
layer_units = ([n_features] + hidden_layer_sizes +
[self.n_outputs_])
You see, that your potentially given hidden_layer_sizes is surrounded by layer-dimensions defined by your data within .fit(). This is the reason, the signature reads like this with a subtraction of 2!:
Parameters
hidden_layer_sizes : tuple, length = n_layers - 2, default (100,)
The ith element represents the number of neurons in the ith hidden layer.

Resources