Focal Loss + Label Smoothing - pytorch

I'm trying to implement focal loss with label smoothing, I used this implementation kornia and tried to plugin the label smoothing based on this implementation with Cross-Entropy Cross entropy + label smoothing but the loss yielded doesn't make sense.
Focal loss + LS (My implementation): Train loss 2.9761913128770314 accuracy 0.40519300987212814
Focal loss(Kornia implementation): Train loss 0.0602325857395604 accuracy 0.8621959099036829

Related

Label Smoothing in PyTorch - Using BCE loss -> doing it with the data itself

i am doing a classification task (binary) in PyTorch, so with labels 0 und 1.
No I want introduce label smoothing as another regularization technique.
Because I Use the ice loss, there is no such function to use label smoothing as
in the cross entropy loss (for man than 0,1).
Now I am considering to implement it not in the loss but in the data itself.
Would it be right to just replace my y_true to for example 0->0.1 and 1->0.9
before they go into the loss?
You can replace the 0 with 0.1 and 1 with 0.9 if label smoothing is 0.1
criterion(disc_fake_pred, torch.zeros_like(disc_fake_pred)+0.1) #0.1
criterion(disc_real_pred, torch.ones_like(disc_real_pred)-0.1) #0.9

Gradient Descent with Linear regression in Sklearn

The Linear regression model from sklearn uses a closed or normal equation to find the parameters. However with large datasets Gradient Descent is said to be more efficient. Is there any way to use the LinearRegression from sklearn using gradient descent.
The function you are looking for is: sklearn.linear_model.SGDRegressor
You can modify the loss hyperparameter which will define the loss function to be used.
Be aware that the SGD of SGDRegressor stands for Stochastic Gradient Descent. Which means that the gradient of the loss is estimated each sample at a time and the model is updated along the way with a decreasing strength schedule (aka learning rate).

Multi-class segmentation in Keras

I'm trying to implement a multi-class segmentation in Keras:
input image is grayscale (i.e 1 channel)
ground truth image has 3 channels, each pixel is a one-hot vector of length 3
prediction is standard U-Net trained with categorical_crossentropy outputting 3 channels (softmax-ed)
What is wrong with this setup? The training loss has some weird behaviour:
in my lucky cases it behaves as expected (decreases)
90 % of the time it's stuck at ~0.9
My implementation can be found here
I don't think there is anything wrong with the code: if my ground truth is 1-channel (i.e 0s everywhere and 1s somewhere) and use binary_crossentropy + sigmoid as final activation I see no weird behaviour.
I'll answer my own question. The solution is to weight each class i.e using a weighted cross entropy loss

What is the partial derivative of sklearn's SVM (Hinge) loss function with regards to the input?

Does sklearn have a method to get out the gradient of the loss function w.r.t the input for an SVM that you have trained? I am also using a Gaussian (rbf) kernel.

Gradient clipping in keras

I have a fully implemented LSTM RNN using Keras, and I want to use gradient clipping with the gradient norm limited to 5 (I'm trying to reproduce a research paper). I'm quite a beginner with regards to implementing Neural Networks, how would I implement this ?
Is it just (I'm using rmsprop optimizer):
sgd = optimizers.rmsprop(lr=0.01, clipnorm=5)
model.compile(optimizer=sgd,
loss='categorical_crossentropy',
metrics=['accuracy'])
According to the official documentation, any optimizer can have optional arguments clipnorm and clipvalue. If clipnorm provided, gradient will be clipped whenever gradient norm exceeds the threshold.

Resources