Loss for binary sparsity

Loss for binary sparsity - pytorch

I have binary images (as the one below) at the output of my net. I need the '1's to be further from each other (not connected), so that they would form a sparse binary image (without white blobs). Something like salt-and-pepper noise. I am looking for a way to define a loss (in pytorch) that would punish based on the density of the '1's.
Thanks.
I

It depends on how you're generating said image. Since neural networks have to be trained by backpropagation, I'm rather sure your binary image is not the direct output of your neural network (ie not the thing you're applying loss to), because gradient can't blow through binary (discrete) variables. I suspect you do something like pixel-wise binary cross entropy or similar and then threshold.
I assume your code works like that: you densely regress real-valued numbers and then apply thresholding, likely using sigmoid to map from [-inf, inf] to [0, 1]. If it is so, you can do the following. Build a convolution kernel which is 0 in the center and 1 elsewhere, of size related to how big you want your "sparsity gaps" to be.
kernel = [
[1, 1, 1, 1, 1]
[1, 1, 1, 1, 1]
[1, 1, 0, 1, 1]
[1, 1, 1, 1, 1]
[1, 1, 1, 1, 1]
]
Then you apply sigmoid to your real-valued output to squash it to [0, 1]:
squashed = torch.sigmoid(nn_output)
then you convolve squashed with kernel, which gives you the relaxed number of non-zero neighbors.
neighborhood = nn.functional.conv2d(squashed, kernel, padding=2)
and your loss will be the product of each pixel's value in squashed with the corresponding value in neighborhood:
sparsity_loss = (squashed * neighborhood).mean()
If you think of this loss applied to your binary image, for a given pixel p it will be 1 if and only if both p and at least one of its neighbors have values 1 and 0 otherwise. Since we apply it to non-binary numbers in [0, 1] range, it will be the differentiable approximation of that.
Please note that I left out some of the details from the code above (like correctly reshaping kernel to work with nn.functional.conv2d).

Related

PyTorch differentiable mask

How would I go about blacking out a portion of an image or feature map such that AutoGrad can backprop through the operation?
Specifically I want to black out everything except for n layers of border pixels. So if we consider a single channel of the feature map which looks like:
[
[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1],
]
I set a constant n=1 so my operation does the following to the input:
[
[1, 1, 1, 1],
[1, 0, 0, 1],
[1, 0, 0, 1],
[1, 1, 1, 1],
]
In my case I'd be doing it to a multi channel feature map and all channels would be treated the same way.
If possible, I want to do it in a functional manner.

Considering the comments you added, i.e. that you don't need the output to be differentiable wrt. to the mask (said differently, the mask is constant), you could just store the indices of the 1s in the mask and act only on the corresponding elements of whatever Tensor you're considering. Or if you don't want to deal with fancy indexing, you could just keep the mask as a Tensor of 0s and 1s and do an element-wise multiplication of it with whatever Tensor you're considering. Or, if you truly just need to compute a loss along just the border pixels, just extract the first and last row, and first and last column, and avoid double-counting the corners. This latter solution is essentially just the first solution recast in a special case.
To address the question in your comment to my answer:
x = torch.tensor([[1.0,2,3],[4,5,6]], requires_grad = True)
print(x[:,0])
gives
tensor([1., 4.], grad_fn=<SelectBackward>)
, so we see that slicing does not mess with the autograd engine (it's still tracking the contribution to the gradient). It is not too surprising that this works automatically; slicing can be viewed as the (mathematical) function that of projecting onto a subspace of R^n, for which it's easy to compute the gradient.

Multilabel classification of a sequence, how to do it?

I am quite new to the deep learning field especially Keras. Here I have a simple problem of classification and I don't know how to solve it. What I don't understand is how the general process of the classification, like converting the input data into tensors, the labels, etc.
Let's say we have three classes, 1, 2, 3.
There is a sequence of classes that need to be classified as one of those classes. The dataset is for example
Sequence 1, 1, 1, 2 is labeled 2
Sequence 2, 1, 3, 3 is labeled 1
Sequence 3, 1, 2, 1 is labeled 3
and so on.
This means the input dataset will be
[[1, 1, 1, 2],
[2, 1, 3, 3],
[3, 1, 2, 1]]
and the label will be
[[2],
[1],
[3]]
Now one thing that I do understand is to one-hot encode the class. Because we have three classes, every 1 will be converted into [1, 0, 0], 2 will be [0, 1, 0] and 3 will be [0, 0, 1]. Converting the example above will give a dataset of 3 x 4 x 3, and a label of 3 x 1 x 3.
Another thing that I understand is that the last layer should be a softmax layer. This way if a test data like (e.g. [1, 2, 3, 4]) comes out, it will be softmaxed and the probabilities of this sequence belonging to class 1 or 2 or 3 will be calculated.
Am I right? If so, can you give me an explanation/example of the process of classifying these sequences?
Thank you in advance.

Here are a few clarifications that you seem to be asking about.
This point was confusing so I deleted it.
If your input data has the shape (4), then your input tensor will have the shape (batch_size, 4).
Softmax is the correct activation for your prediction (last) layer
given your desired output, because you have a classification problem
with multiple classes. This will yield output of shape (batch_size,
3). These will be the probabilities of each potential classification, summing to one across all classes. For example, if the classification is class 0, then a single prediction might look something like [0.9714,0.01127,0.01733].
Batch size isn't hard-coded to the network, hence it is represented in model.summary() as None. E.g. the network's last-layer output shape can be written (None, 3).
Unless you have an applicable alternative, a softmax prediction layer requires a categorical_crossentropy loss function.
The architecture of a network remains up to you, but you'll at least need a way in and a way out. In Keras (as you've tagged), there are a few ways to do this. Here are some examples:
Example with Keras Sequential
model = Sequential()
model.add(InputLayer(input_shape=(4,))) # sequence of length four
model.add(Dense(3, activation='softmax')) # three possible classes
Example with Keras Functional
input_tensor = Input(shape=(4,))
x = Dense(3, activation='softmax')(input_tensor)
model = Model(input_tensor, x)
Example including input tensor shape in first functional layer (either Sequential or Functional):
model = Sequential()
model.add(Dense(666, activation='relu', input_shape=(4,)))
model.add(Dense(3, activation='softmax'))
Hope that helps!

MNIST Tensorflow example

def conv2d(x, W):
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
def max_pool_2x2(x):
return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1], padding='SAME')
This is the code from the Deep MNIST for experts tutorial on Tensorflow website.
I have two questions:
1) The documentation k-size is an integer list of length greater than 4 that refers to the size of the max-pool window. Shouldn't that be just [2,2] considering that it's a 2X2 window? I mean why is it [1, 2, 2, 1] instead of [2,2] ?
2) If we are taking a stride step on size one. Why do we need a vector of 4 values, wouldn't one value suffice?
strides = [1]
3) If padding = 'SAME' why does the image size decrease by half? ( from 28 X 28 to 14 X 14 in the first convolutional process )

I'm not sure which documentation you're referring to in this question. The maxpool window is indeed 2x2.
The step size can be different depending on the dimensions. The 4 vector is the most general case where suppose you wanted to skip images in the batch, skip different height and width and potentially even skip based on channels. This is hardly used but has been left in.
If you have a stride of 2 along each direction then you skip every other pixel that you could potentially use for max pooling. If you set the skip size to be [1,1,1,1] with padding same then you would indeed return a result of the same size. The padding "SAME" refers to zero padding the image such that you add a border of height kernel hieght and a width of size kernel width to the image.

Scikit-learn R2 always zero

I'm trying to test my Scikit-learn machine learning algorithm with a simple R^2 score, but for some reason it always returns zero.
import numpy
from sklearn.metrics import r2_score
prediction = numpy.array([0.1567, 4.7528, 1.1260, 0.2294]).reshape(1, -1)
training = numpy.array([0, 3, 1, 0]).reshape(1, -1)
r2 = r2_score(training, prediction, multioutput="raw_values")
print r2
[ 0. 0. 0. 0.]
This is a single four-part value, not four separate values. How do I get proper R^2 scores?

If you are trying to calculate the r2 value between two vectors you should just pass two one dimensional arrays. See the documentation
In the example you provided, the first item is compared to the first item, but note you only have one list in each the prediction and training, so it is calculating R2 for 0.1567 to 0, which is 0, then it calculates it for 4.7528 to 3 which is also 0 and so on... It sounds like you want the R2 for the two vectors like the following:
prediction = numpy.array([0.1567, 4.7528, 1.1260, 0.2294])
training = numpy.array([0, 3, 1, 0])
print(r2_score(training, prediction))
0.472439485
If you have multi-dimensional arrays you can use the multioutput flag to determine what the output should look like:
#modified from the scikit-learn example
y_true = [[0.5, 1], [-1, 1], [7, -6]]
y_pred = [[0, 2], [-1, 2], [8, -5]]
print(r2_score(y_true, y_pred, multioutput='raw_values'))
array([ 0.96543779, 0.90816327])
Here the output is where the first item of each list in y_true is compared to the first item in each list of y_pred, the second item to the second and so on

Which Support Vectors returned in Multiclass SVM SKLearn

By default, SKLearn uses a One vs One classification scheme when training SVM's in the multiclass case.
I'm a bit confused as to, when you call attributes such as svm.n_support_ or svm.support_vectors_, which support vectors you're getting? For instance, in the case of iris dataset, there are 3 classes, so there should be a total of 3*(3-1)/2 = 3 different SVM classifiers built. Of which classifier are you getting support vectors back?

Update: dual_coef_ is the key, giving you the coefficients of the support vectors in the decision function. "Each of the support vectors is used in n_class - 1 classifiers. The n_class - 1 entries in each row correspond to the dual coefficients for these classifiers." .
Take a look at the very end of this section (1.4.1.1), the table clearly explains it http://scikit-learn.org/stable/modules/svm.html#multi-class-classification)
Implementation details are very confusing to me as well. Coefficients of the support vector in the decision function for multiclass are non-trivial.
But here is the rule of thumb I use whenever I want to go into detail of specific properties of chosen support vectors:
y[svm.support_]
outputs:
array([0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])
This way you get to know (maybe for debugging purposes) which support vector corresponds to which class. And of course you can check support vectors:
X[svm.support_]
My intuition here is that, as its name indicates, you take subsets of samples of the involved categories. Let's say we have 3 categories A, B and C:
A vs. B --> it gives you several support vectors from A and B (a,a,a,b,b,...)
A vs. C --> same... a,a,a,c,c,c,c (maybe some 'a' are repeated from before)
B vs. C --> idem
So the svm.support_vectors_ returns all the support vectors but how it uses then in the decision_function is still tricky to me as I'm not sure if it could use for example support vectors from A vs. B when doing the pair A vs. C - and I couldn't find implementation details (http://scikit-learn.org/stable/modules/generated/sklearn.multiclass.OneVsOneClassifier.html#sklearn.multiclass.OneVsOneClassifier.decision_function)

All support vectors of all 3 classifiers.
Look at svm.support_.shape it is 45.
19+19+7 = 45. All adds up.
Also if you look at svm.support_vectors_.shape it will be (45,4) - [n_SV, n_features]. Again makes sense, because we have 45 support vectors and 4 features in iris data set.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Loss for binary sparsity - pytorch

Related

PyTorch differentiable mask

Multilabel classification of a sequence, how to do it?

MNIST Tensorflow example

Scikit-learn R2 always zero

Which Support Vectors returned in Multiclass SVM SKLearn

Categories

Resources