PyTorch differentiable mask - pytorch

How would I go about blacking out a portion of an image or feature map such that AutoGrad can backprop through the operation?
Specifically I want to black out everything except for n layers of border pixels. So if we consider a single channel of the feature map which looks like:
[
[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1],
]
I set a constant n=1 so my operation does the following to the input:
[
[1, 1, 1, 1],
[1, 0, 0, 1],
[1, 0, 0, 1],
[1, 1, 1, 1],
]
In my case I'd be doing it to a multi channel feature map and all channels would be treated the same way.
If possible, I want to do it in a functional manner.

Considering the comments you added, i.e. that you don't need the output to be differentiable wrt. to the mask (said differently, the mask is constant), you could just store the indices of the 1s in the mask and act only on the corresponding elements of whatever Tensor you're considering. Or if you don't want to deal with fancy indexing, you could just keep the mask as a Tensor of 0s and 1s and do an element-wise multiplication of it with whatever Tensor you're considering. Or, if you truly just need to compute a loss along just the border pixels, just extract the first and last row, and first and last column, and avoid double-counting the corners. This latter solution is essentially just the first solution recast in a special case.
To address the question in your comment to my answer:
x = torch.tensor([[1.0,2,3],[4,5,6]], requires_grad = True)
print(x[:,0])
gives
tensor([1., 4.], grad_fn=<SelectBackward>)
, so we see that slicing does not mess with the autograd engine (it's still tracking the contribution to the gradient). It is not too surprising that this works automatically; slicing can be viewed as the (mathematical) function that of projecting onto a subspace of R^n, for which it's easy to compute the gradient.

Related

How can I apply a linear transformation on sparse matrix in PyTorch?

In PyTorch, we have nn.linear that applies a linear transformation to the incoming data:
y = WA+b
In this formula, W and b are our learnable parameters and A is my input data matrix. The matrix 'A' for my case is too large for RAM to complete loading, so I use it sparsely. Is it possible to perform such an operation on sparse matrices using PyTorch?
This is possible with PyTorch using sparse matrix multiply. In your case, I think you want something like:
>> i = [[0, 1, 1],
[2, 0, 2]]
>> v = [3, 4, 5]
>> A = torch.sparse_coo_tensor(i, v, (2, 3))
>> A.to_dense()
tensor([[0, 0, 3],
[4, 0, 5]])
# compute W#A by computing ((A.T)#(W.T)).T because...
# at time of writing, the sparse matrix must be first in the matmul
>> (A.t() # W.t()).t()

Crossover and mutation in Differential Evolution

I'm trying to solve Traveling Salesman problem using Differential Evolution. For example, if I have vectors:
[1, 4, 0, 3, 2, 5], [1, 5, 2, 0, 3, 5], [4, 2, 0, 5, 1, 3]
how can I make crossover and mutation? I saw something like a+Fx(b-c), but I have no idea how to use this.
I ran into this question when looking for papers on solving the TSP problem using evolutionary computing. I have been working on a similar project and can provide a modified version of my written code.
For mutation, we can swap two indices in a vector. I assume that each vector represents an order of nodes you will visit.
def swap(lst):
n = len(lst)
x = random.randint(0, n)
y = random.randint(0, n)
# store values to be swapped
old_x = lst[x]
old_y = lst[y]
# do swap
lst[y] = old_x
lst[x] = old_y
return lst
For the case of crossover in respect to the TSP problem, we would like to keep the general ordering of values in our permutations (we want a crossover with a positional bias). By doing so, we will preserve good paths in good permutations. For this reason, I believe single-point crossover is the best option.
def singlePoint(parent1, parent2):
point = random.randint(1, len(parent1)-2)
def helper(v1, v2):
# this is a helper function to save with code duplication
points = [i1.getPoint(i) for i in range(0, point)]
# add values from right of crossover point in v2
# that are not already in points
for i in range(point, len(v2)):
pt = v2[i]
if pt not in points:
points.append(pt)
# add values from head of v2 which are not in points
# this ensures all values are in the vector.
for i in range(0, point):
pt = v2[i]
if pt not in points:
points.append(pt)
return points
# notice how I swap parent1 and parent2 on the second call
offspring_1 = helper(parent1, parent2)
offspring_2 = helper(parent2, parent1)
return offspring_1, offspring_2
I hope this helps! Even if your project is done, this could come in handy GA's are great ways to solve optimization problems.
if F=0.6, a=[1, 4, 0, 3, 2, 5], b=[1, 5, 2, 0, 3, 5], c=[4, 2, 0, 5, 1, 3]
then a+Fx(b-c)=[-0.8, 5.8, 1.2, 0, 3.2, 6.2]
then change the smallest number in the array to 0, change the second smallest number in the array to 1, and so on.
so it return [0, 4, 2, 1, 3, 5].
This method is inefficient when used to solve the jsp problems.

Loss for binary sparsity

I have binary images (as the one below) at the output of my net. I need the '1's to be further from each other (not connected), so that they would form a sparse binary image (without white blobs). Something like salt-and-pepper noise. I am looking for a way to define a loss (in pytorch) that would punish based on the density of the '1's.
Thanks.
I
It depends on how you're generating said image. Since neural networks have to be trained by backpropagation, I'm rather sure your binary image is not the direct output of your neural network (ie not the thing you're applying loss to), because gradient can't blow through binary (discrete) variables. I suspect you do something like pixel-wise binary cross entropy or similar and then threshold.
I assume your code works like that: you densely regress real-valued numbers and then apply thresholding, likely using sigmoid to map from [-inf, inf] to [0, 1]. If it is so, you can do the following. Build a convolution kernel which is 0 in the center and 1 elsewhere, of size related to how big you want your "sparsity gaps" to be.
kernel = [
[1, 1, 1, 1, 1]
[1, 1, 1, 1, 1]
[1, 1, 0, 1, 1]
[1, 1, 1, 1, 1]
[1, 1, 1, 1, 1]
]
Then you apply sigmoid to your real-valued output to squash it to [0, 1]:
squashed = torch.sigmoid(nn_output)
then you convolve squashed with kernel, which gives you the relaxed number of non-zero neighbors.
neighborhood = nn.functional.conv2d(squashed, kernel, padding=2)
and your loss will be the product of each pixel's value in squashed with the corresponding value in neighborhood:
sparsity_loss = (squashed * neighborhood).mean()
If you think of this loss applied to your binary image, for a given pixel p it will be 1 if and only if both p and at least one of its neighbors have values 1 and 0 otherwise. Since we apply it to non-binary numbers in [0, 1] range, it will be the differentiable approximation of that.
Please note that I left out some of the details from the code above (like correctly reshaping kernel to work with nn.functional.conv2d).

python numpy stack matrices and add specific corner/column entries

Say we have two matrices A and B with a size of 2 by 2. Is there a command that can stack them horizontally and add A[:,1] to B[:,0] so that the resulting matrix C is 2 by 3, with C[:,0] = A[:,0], C[:,1] = A[:,1] + B[:,0], C[:,2] = B[:,1]. One step further, stacking them on diagonal so that C[0:2,0:2] = A, C[1:2,1:2] = B, C[1,1] = A[1,1] + B[0,0]. C is 3 by 3 in this case. Hard coding this routine is not hard, but I'm just curious since MATLAB has a similar function if my memory serves me well.
A straight forward approach is to copy or add the two arrays to a target:
In [882]: A=np.arange(4).reshape(2,2)
In [883]: C=np.zeros((2,3),int)
In [884]: C[:,:-1]=A
In [885]: C[:,1:]+=A # or B
In [886]: C
Out[886]:
array([[0, 1, 1],
[2, 5, 3]])
Another approach is to to pad A at the end, pad B at the start, and sum; while there is a convenient pad function, it won't be any faster.
And for the diagonal
In [887]: C=np.zeros((3,3),int)
In [888]: C[:-1,:-1]=A
In [889]: C[1:,1:]+=A
In [890]: C
Out[890]:
array([[0, 1, 0],
[2, 3, 1],
[0, 2, 3]])
Again the 2 arrays could be pad and added.
I'm not aware of any specialized function to do this; even if there were, it probably would do the same thing. This isn't a common enough operation to justify a compiled version.
I have built up finite element sparse matrices by adding over lapping element matrices. The sparse formats for both MATLAB and scipy facilitate this (duplicate coordinates are summed).
============
In [896]: np.pad(A,[[0,0],[0,1]],mode='constant')+np.pad(A,[[0,0],[1,0]],mode='
...: constant')
Out[896]:
array([[0, 1, 1],
[2, 5, 3]])
In [897]: np.pad(A,[[0,1],[0,1]],mode='constant')+np.pad(A,[[1,0],[1,0]],mode='
...: constant')
Out[897]:
array([[0, 1, 0],
[2, 3, 1],
[0, 2, 3]])
What's the special MATLAB code for doing this?
in Octave I found:
prepad(A,3,0,axis=2)+postpad(A,3,0,axis=2)

Get top-n items of every row in a scipy sparse matrix

After reading this similar question, I still can't fully understand how to go about implementing the solution im looking for. I have a sparse matrix, i.e.:
import numpy as np
from scipy import sparse
arr = np.array([[0,5,3,0,2],[6,0,4,9,0],[0,0,0,6,8]])
arr_csc = sparse.csc_matrix(arr)
I would like to efficiently get the top n items of each row, without converting the sparse matrix to dense.
The end result should look like this (assuming n=2):
top_n_arr = np.array([[0,5,3,0,0],[6,0,0,9,0],[0,0,0,6,8]])
top_n_arr_csc = sparse.csc_matrix(top_n_arr)
What is wrong with the linked answer? Does it not work in your case? or you just don't understand it? Or it isn't efficient enough?
I was going to suggest working out a means of finding the top values for a row of an lil format matrix, and apply that row by row. But I would just be repeating my earlier answer.
OK, my previous answer was a start, but lacked some details on iterating through the lol format. Here's a start; it probably could be cleaned up.
Make the array, and a lil version:
In [42]: arr = np.array([[0,5,3,0,2],[6,0,4,9,0],[0,0,0,6,8]])
In [43]: arr_sp=sparse.csc_matrix(arr)
In [44]: arr_ll=arr_sp.tolil()
The row function from the previous answer:
def max_n(row_data, row_indices, n):
i = row_data.argsort()[-n:]
# i = row_data.argpartition(-n)[-n:]
top_values = row_data[i]
top_indices = row_indices[i] # do the sparse indices matter?
return top_values, top_indices, i
Iterate over the rows of arr_ll, apply this function and replace the elements:
In [46]: for i in range(arr_ll.shape[0]):
d,r=max_n(np.array(arr_ll.data[i]),np.array(arr_ll.rows[i]),2)[:2]
arr_ll.data[i]=d.tolist()
arr_ll.rows[i]=r.tolist()
....:
In [47]: arr_ll.data
Out[47]: array([[3, 5], [6, 9], [6, 8]], dtype=object)
In [48]: arr_ll.rows
Out[48]: array([[2, 1], [0, 3], [3, 4]], dtype=object)
In [49]: arr_ll.tocsc().A
Out[49]:
array([[0, 5, 3, 0, 0],
[6, 0, 0, 9, 0],
[0, 0, 0, 6, 8]])
In the lil format, the data is stored in 2 object type arrays, as sublists, one with the data numbers, the other with the column indices.
Viewing the data attributes of sparse matrix is handy when doing new things. Changing those attributes has some risk, since it mess up the whole array. But it looks like the lil format can be tweaked like this safely.
The csr format is better for accessing rows than csc. It's data is stored in 3 arrays, data, indices and indptr. The lil format effectively splits 2 of those arrays into sublists based on information in the indptr. csr is great for math (multiplication, addition etc), but not so good when changing the sparsity (turning nonzero values into zeros).

Resources