Understanding DRAGAN-Loss - loss-function

I have problems understanding the DRAGAN loss function in On Convergence and Stability of GANs,
because I don't understand what the term N_d(0, cI) means. I can't find it in the paper.

Related

Naive bayes classifiers are bad estimators

I read this in the Scikit-learn guide (https://scikit-learn.org/stable/modules/naive_bayes.html):
[...] although naive Bayes is known as a decent classifier,
it is known to be a bad estimator, so the probability outputs
from predict_proba are not to be taken too seriously'
Bad and decent at the same time?
Ok, let's put some context, with a quote from the 2nd edition of Francois Chollet's book on Deep Learning, that I believe will shed light on your interrogation. The point is that Naive Bayes is one of the 1st so-called Machine learning classifier, which assumes that the features in the input data are all independent (Naive assumption), still providing a "descent" result; but, as compared to more recent neural networks, the performance on classification tasks is often much lower.
Naive Bayes is a type of machine-learning classifier based on applying Bayes' theorem while assuming that the features in the input data are all independent (a strong, or “naive” assumption, which is where the name comes from). This form of data analysis predates computers and was applied by hand decades before its first computer implementation (most likely dating back to the 1950s). Bayes' theorem and the foundations of statistics date back to the eighteenth century, and these are all you need to start using Naive Bayes classifiers.
A closely related model is the logistic regression (logreg for short),
which is sometimes considered to be the “hello world” of modern
machine learning. Don’t be misled by its name—logreg is a
classification algorithm rather than a regression algorithm. Much like
Naive Bayes, logreg predates computing by a long time, yet it’s still
useful to this day, thanks to its simple and versatile nature. It’s
often the first thing a data scientist will try on a dataset to get a
feel for the classification task at hand.

what is the concordance index (c-index)?

I am relatively new in statistics and I need some help with some basic concepts,
could somebody explain the following questions relative to the c-index?
What is the c-index?
Why is it used over other methods?
The c-index is "A measure of goodness of fit for binary outcomes in a logistic regression model."
The reason we use the c-index is because we can predict more accurately whether a patient has a condition or not.
The C-statistic is actually NOT used very often as it only gives you a general idea about a model; A ROC curve contains much more information about accuracy, sensitivity and specificity.
ROC curve

When and why would you want to use a Probability Density Function?

A wanna-be data-scientist here and am trying to understand as a data scientist, when and why would you use a Probability Density Function (PDF)?
Sharing a scenario and a few pointers to learn about this and other such functions like CDF and PMF would be really helpful. Know of any book that talks about these functions from practice stand-point?
Why?
Probability theory is very important for modern data-science and machine-learning applications, because (in a lot of cases) it allows one to "open up a black box" and shed some light into the model's inner workings, and with luck find necessary ingredients to transform a poor model into a great model. Without it, a data scientist's work is very much restricted in what they are able to do.
A PDF is a fundamental building block of the probability theory, absolutely necessary to do any sort of probability reasoning, along with expectation, variance, prior and posterior, and so on.
Some examples here on StackOverflow, from my own experience, where a practical issue boils down to understanding data distribution:
Which loss-function is better than MSE in temperature prediction?
Binary Image Classification with CNN - best practices for choosing “negative” dataset?
How do neural networks account for outliers?
When?
The questions above provide some examples, here're a few more if you're interested, and the list is by no means complete:
What is the 'fundamental' idea of machine learning for estimating parameters?
Role of Bias in Neural Networks
How to find probability distribution and parameters for real data? (Python 3)
I personally try to find probabilistic interpretation whenever possible (choice of loss function, parameters, regularization, architecture, etc), because this way I can move from blind guessing to making reasonable decisions.
Reading
This is very opinion-based, but at least few books are really worth mentioning: The Elements of Statistical Learning, An Introduction to Statistical Learning: with Applications in R or Pattern Recognition and Machine Learning (if your primary interest is machine learning). That's just a start, there are dozens of books on more specific topics, like computer vision, natural language processing and reinforcement learning.

Learning method in keras? [duplicate]

While digging through the topic of neural networks and how to efficiently train them, I came across the method of using very simple activation functions, such as the rectified linear unit (ReLU), instead of the classic smooth sigmoids. The ReLU-function is not differentiable at the origin, so according to my understanding the backpropagation algorithm (BPA) is not suitable for training a neural network with ReLUs, since the chain rule of multivariable calculus refers to smooth functions only.
However, none of the papers about using ReLUs that I read address this issue. ReLUs seem to be very effective and seem to be used virtually everywhere while not causing any unexpected behavior. Can somebody explain to me why ReLUs can be trained at all via the backpropagation algorithm?
To understand how backpropagation is even possible with functions like ReLU you need to understand what is the most important property of derivative that makes backpropagation algorithm works so well. This property is that :
f(x) ~ f(x0) + f'(x0)(x - x0)
If you treat x0 as actual value of your parameter at the moment - you can tell (knowing value of a cost function and it's derivative) how the cost function will behave when you change your parameters a little bit. This is most crucial thing in backpropagation.
Because of the fact that computing cost function is crucial for a cost computation - you will need your cost function to satisfy the property stated above. It's easy to check that ReLU satisfy this property everywhere except a small neighbourhood of 0. And this is the only problem with ReLU - the fact that we cannot use this property when we are close to 0.
To overcome that you may choose the value of ReLU derivative in 0 to either 1 or 0. On the other hand most of researchers don't treat this problem as serious simply because of the fact, that being close to 0 during ReLU computations is relatively rare.
From the above - of course - from the pure mathematical point of view it's not plausible to use ReLU with backpropagation algorithm. On the other hand - in practice it usually doesn't make any difference that it has this weird behaviour around 0.

bruto - gam mgcv pkg

I was reading the paper, Comparative performance of generalized additive models and multivariate adaptive regression splines for statistical
modelling of species distributions by J.R. Leathwick , J. Elith, T. Hastie.
It mentions that the bruto gam implementation in R ‘mda’ library, helps identify which variables to include in the final GAM model and also identifies the optimal degree of smoothing for each variable.
After looking at some examples, on the bruto implementation, I was able to determine that $type helps identify the variables in the GAM model.
However , I am not able to understand how bruto helps identify the degree if smoothing for each variable.
I was wondering if someone had tried this, and could help with an example.

Resources