looking for a loss function sensitive to edges for medical image quality enhuncement - pytorch

For an image-to-image translation task in which I want to generate high-quality images from low-quality images ( MRI Images), I need a loss function to highlight the edges and generate images with sharper edges.
Do you have any recommendation for selecting the desired loss function between Pytorch's loss function??
https://pytorch.org/docs/stable/nn.html#loss-functions
I really appreciate it if anyone can even provide me the code of any predefined loss function for this task.
Thanks

I suppose you are using MSE loss function at the moment? This loss function indeed tends to prefer "smoother" outputs rather than sharp edges.
For image generation tasks, consider using perceptual loss that better correlates with human perception of image quality.
For more details on this loss function see
R. Zhang, P. Isola, A. A. Efros, E. Shechtman, O. Wang The Unreasonable effectiveness of Deep Features as a Perceptual Metric (CVPR 2018).

Related

Does Gpytorch use Analytic gradient or Automatic differentiation for training?

I am confused about how gpytorch calculates the gradients with respect to parameters of the model. For instance, lets say I am using ExactGP with Gaussian likelihood, RBF kernel, and constant mean and using MLE (maximum likelihood estimate) for finding the parameters of the model (mean, kernel parameters, and noise). One way to calculate the gradient w.r.t parameters of the model is using analytical gradient which means taking derivative of negative log-likelihood with respect to parameters and finding the equation for each derivation. Another way is to use automatic differentiation provided by pytorch.
Gpytorch authors have mentioned in their paper with the title of "GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration" that they are using analytical gradient or at least this is what I understood by reading the paper. Am I correct? Also, I couldn't find the code that they have implemented the analytical gradient.
Could anyone help me understand this better, please?
The "automatic differentiation provided by PyTorch" does compute the analytic gradient (via back-propagation, note that there is no finite differencing or anything like that involved) - it just does so automatically.
https://github.com/cornellius-gp/gpytorch/discussions/1949#discussioncomment-2384471

What might be the best loss function when target is a gaussian label?

I have a simple CNN with the inputs as
Cropped grayscale patches of size MxN centered on the object of interest. The intensity of each patch is rescaled to [0, 1].
Target Gaussian label of the same size MXN with values ranging
in [5.0155e-173, 1]. This label is kept fixed throughout the training.
The goal is to learn the target label and use the learned model to detect the object in a test image. I am using Adam optimizer with various loss functions such as categorical_crossentropy, mean_squared_error, and mean_absolute_error but training halts soon probably due to the low values returned by all these loss functions (vanishing gradients?). Increasing the batch size from 1 to 16~32 sometimes helps in completing the iteration but gives undesired outcomes at test time.
Is it because the loss function is too sensitive to the lower values in the target and even treats them as outliers hence steering the whole learning process in the wrong direction?
I'll be grateful for your help in fixing the loss function in such a scenario.
I think that the best choice here is to use some probability ditribution pseudo-distance, the first choice that came to my mind is to use Kullback-Leiber Divergence, it is already implemented in pytorch and keras( see [kldivloss](https://pytorch.org/docs/stable/nn.html#kldivloss and keras) Other famous ditances may include Jesnsen-Shanon divergence and Earth-Mover distance (This the same distance thatwas used in WGAN

Can someone explain me what happens in content loss, style loss and total loss

So I have been reading the paper published by Leon Gatys in 2016 explaing neural style transfer, but I still don't understand what is happening in content loss, style loss or total loss. Can someone explain it in simple terms what is happening in those steps of the algorithm.
Check out my GitHub repository: https://github.com/Bibhash123/Neural-Style-Transfer
So in content loss we are finding the mean squared error between an intermediate feature map for the generated image and the content image. In the style loss we find error between intermediate feature maps for style image and generated image using gram matrices. The total loss is calculated as a weighted sum of the above mentioned losses. What actually happens is when the total loss is optimized, presence of content loss in total loss ensures the presence of content features in the generated image. Use of gram matrices for style loss ensures optimized distribution of style features in the generated image. I am no expert and thus this answer might have mistakes, but this is what I understood so far.

Multilabel classification with class imbalance in Pytorch

I have a multilabel classification problem, which I am trying to solve with CNNs in Pytorch. I have 80,000 training examples and 7900 classes; every example can belong to multiple classes at the same time, mean number of classes per example is 130.
The problem is that my dataset is very imbalance. For some classes, I have only ~900 examples, which is around 1%. For “overrepresented” classes I have ~12000 examples (15%). When I train the model I use BCEWithLogitsLoss from pytorch with a positive weights parameter. I calculate the weights the same way as described in the documentation: the number of negative examples divided by the number of positives.
As a result, my model overestimates almost every class… Mor minor and major classes I get almost twice as many predictions as true labels. And my AUPRC is just 0.18. Even though it’s much better than no weighting at all, since in this case the model predicts everything as zero.
So my question is, how do I improve the performance? Is there anything else I can do? I tried different batch sampling techniques (to oversample minority class), but they don’t seem to work.
I would suggest either one of these strategies
Focal Loss
A very interesting approach for dealing with un-balanced training data through tweaking of the loss function was introduced in
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He and Piotr Dollar Focal Loss for Dense Object Detection (ICCV 2017).
They propose to modify the binary cross entropy loss in a way that decrease the loss and gradient of easily classified examples while "focusing the effort" on examples where the model makes gross errors.
Hard Negative Mining
Another popular approach is to do "hard negative mining"; that is, propagate gradients only for part of the training examples - the "hard" ones.
see, e.g.:
Abhinav Shrivastava, Abhinav Gupta and Ross Girshick Training Region-based Object Detectors with Online Hard Example Mining (CVPR 2016)
#Shai has provided two strategies developed in the deep learning era. I would like to provide you some additional traditional machine learning options: over-sampling and under-sampling.
The main idea of them is to produce a more balanced dataset by sampling before starting your training. Note that you probably will face some problems such as losing the data diversity (under-sampling) and overfitting the training data (over-sampling), but it might be a good start point.
See the wiki link for more information.

Meaning of Weight Gradient in CNN

I developed a CNN using MatConvNet and am able to visualize the weights of the 1st layer. It looked very similar to what is shown here (also attached below incase I am not specific enough) http://cs.stanford.edu/people/karpathy/convnetjs/demo/cifar10.html
My question is, what are the weight gradients ? I'm not sure what those are and am unable to generate those...
Weights in a NN
In a neural network, a series of linear functions represented as matrices are applied to features (usually with a nonlinear joint between them). These functions are determined by the values in the marices, referred to as weights.
You can visualize the weights of a normal neural network, but it usually means something slightly different to visualize the convolutional layers of a cnn. These layers are designed to learn a feature computation over the space.
When you visualize the weights, you're looking for patterns. A nice smooth filter may mean that the weights are well learned and "looking for something in particular". A noisy weight visualization may mean that you've undertrained your network, overfit it, need more regularization, or something else nefarious (a decent source for these claims).
From this decent review of weight visualizations, we can see patterns start to emerge from treating the weights as images:
Weight Gradients
"Visualizing the gradient" means taking the gradient matrix and treating like an image [1], just like you took the weight matrix and treated it like an image before.
A gradient is just a derivative; for images, it's usually computed as a finite difference - grossly simplified, the X gradient subtracts pixels next to each other in a row, and the Y gradient subtracts pixels next to each other in a column.
For the common example of a filter that extracts edges, we may see a strong gradient in a particular direction. By visualizing the gradients (taking the matrix of finite differences and treating it like an image), you can get a more immediate idea of how your filter is operating on the input. There are a lot of cutting edge techniques (eg, eg) for interpreting these results, but making the image pop up is the easy part!
A similar technique involves visualizing the activations after a forward pass over the input. In this case, you're looking at how the input was changed by the weights; by visualizing the weights, you're looking at how you expect them to change the input.
Don't over-think it - the weights are interesting because they let us see how the function behaves, and the gradients of the weights are just another feature to help explain what's going on. There's nothing sacred about that feature: here are some cool clustering features (t-SNE) from the google paper that look at space separability.
[1] It can be more complicated if you introduce weight sharing, but not that much
My answer here covers this question https://stackoverflow.com/a/68988426/10661506
Long story short, weight gradient of layer l is the gradient of the loss with respect to the weights of layer l.
If you have a correct implementation of backpropagation, you should have access to these gradients as they are needed to compute the weights update at every layer.

Resources