bruto - gam mgcv pkg - gam

I was reading the paper, Comparative performance of generalized additive models and multivariate adaptive regression splines for statistical
modelling of species distributions by J.R. Leathwick , J. Elith, T. Hastie.
It mentions that the bruto gam implementation in R ‘mda’ library, helps identify which variables to include in the final GAM model and also identifies the optimal degree of smoothing for each variable.
After looking at some examples, on the bruto implementation, I was able to determine that $type helps identify the variables in the GAM model.
However , I am not able to understand how bruto helps identify the degree if smoothing for each variable.
I was wondering if someone had tried this, and could help with an example.

Related

How to know which features contribute significantly in prediction models?

I am novice in DS/ML stuff. I am trying to solve Titanic case study in Kaggle, however my approach is not systematic till now. I have used correlation to find relationship between variables and have used KNN and Random Forest Classification, however my models performance has not improved. I have selected features based on the result of correlation between variables.
Please guide me if there are certain sk-learn methods which can be used to identify features which can contribute significantly in forecasting.
Through Various Boosting Techniques You can Improve accuracy approx 99% I suggest you to use Gradient Boosting.

What are the algorithms used in the pROC package for ROC analysis?

I am trying to figure out what algorithms are used within the pROC package to conduct ROC analysis. For instance what algorithm corresponds to the condition 'algorithm==2'? I only recently started using R in conjunction with Python because of the ease of finding CI estimates, significance test results etc. My Python code uses Linear Discriminant Analysis to get results on a binary classification problem. When using the pROC package to compute confidence interval estimates for AUC, sensitivity, specificity, etc., all I have to do is load my data and run the package. The AUC I get when using pROC is the same as the AUC that is returned by my Python code that uses Linear Discriminant Analysis (LDA). In order to be able to report consistent results I am trying to find out if LDA is one of the algorithm choices within pROC? Any ideas on this or how to go about figuring this out would be very helpful. Where can I access the source code for pROC?
The core algorithms of pROC are described in a 2011 BMC Bioinformatics paper. Some algorithms added later are described in the PDF manual. As every CRAN package, the source code is available from the CRAN package page. As many R packages these days it is also on GitHub.
To answer your question specifically, unfortunately I don't have a good reference for the algorithm to calculate the points of the ROC curve with algorithm 2. By looking at it you will realize it is ultimately equivalent to the standard ROC curve algorithm, albeit more efficient when the number of thresholds increases, as I tried to explain in this answer to a question on Cross Validated. But you have to trust me (and most packages calculating ROC curves) on it.
Which binary classifier you use, whether LDA or other, is irrelevant to ROC analysis, and outside the scope of pROC. ROC analysis is a generic way to assesses predictions, scores, or more generally signal coming out of a binary classifier. It doesn't assess the binary classifier itself, or the signal detector, only the signal itself. This makes it very easy to compare different classification methods, and is instrumental to the success of ROC analysis in general.

what are the methods to estimate probabilities of production rules?

I know that n-gram is useful for finding the probability of words,I want to know, How to estimate probabilities of production rules? How many methods or rules to calculate probabilities of production rules?
I could not find any good blog or something on this topic.Now I am studying on probabilistic context free grammar & CKY parsing algorithm.
As I understand your question, you are asking how to estimate the parameters of a PCFG model from data.
In short, it's easy to make empirical production-rule probability estimates when you have ground-truth parses in your training data. If you want to estimate the probability that S -> NP VP, this is something like Count(S -> NP VP) / Count(S -> *), where * is any possible subtree.
You can find a much more formal statement in lots of places on the web (search for "PCFG estimation" or "PCFG learning"). Here's a nice one from Michael Collins' lecture notes: http://www.cs.columbia.edu/~mcollins/courses/nlp2011/notes/pcfgs.pdf#page=9

Binary semi-supervised classification with positive only and unlabeled data set

My data consist of comments (saved in files) and few of them are labelled as positive. I would like to use semi-supervised and PU classification to classify these comments into positive and negative classes. I would like to know if there is any public implementation for semi-supervised and PU implementations in python (scikit-learn)?
You could try to train a one-class SVM and see what kind of results that gives you. I haven't heard about the PU paper. I think for all practical purposes you will be much better of labelling some points and then using semi-supervised methods.
If finding negative points is hard, I would try to use heuristics to find putative negative points (which I think is similar to the techniques in the PU paper). You could either classify unlabelled vs positive and then only look at the ones that score strongly for unlabelled, or learn a one-class SVM or similar and then look for negative points in the outliers.
If you are interested in actually solving the task, I would much rather invest time in manual labelling than implementing fancy methods.

Bayesian statistics, machine learning: prior v.s hyperprior

I have a linear regression (say) model
p(t|x;w) = N(t ; m , D);
Being Bayesian, I can put a Gaussian prior on parameter w.
However, I've realized for some models we can put Gaussian-Wishart hyperprior on the Gaussian to be 'more' Bayesian. Is this correct ? Are both of these two models valid Bayesian models ?
It seems to me that we can always put hyperprior, hyperhyperprior,.......... because it will still be a valid probabilistic model.
I am wondering what's the difference between putting a prior and putting the hyperprior on the prior. Are they both Bayesian ?
Using a hyperprior is still "valid Bayesian" in the sense that this sort of hierarchical modeling is comes naturally to Bayesian models, and just about any book/course on Bayesian modeling does go through the use of hyperpriors.
It's completely fine to use Normal-Wishart as the prior (or hyperprior) of a Gaussian distribution. I guess it's, in some sense, even "more Bayesian" to do so if doing so models the phenomenon at hand more accurately.
I'm not sure what you mean by "are they both Bayesian" when it comes to the difference between using a prior and a hyperprior. Bayesian hierarchical models with hyperpriors are still Bayesian models.
Using hyperpriors only makes sense in a hierarchical Bayesian model. In that case you would be looking at multiple groups and estimate a group specific coefficient w_group based on group specific priors, with coefficients drawn from a global hyperprior.
If your prior and hyperprior reside on the same hierarchical level, which seems to be the case you are think about, then the effect on the results is the same as using a simple prior with a wider standard deviation. Since it still requires additional computational costs, such stacking should be avoided.
There is a lot of statistical literature on how to pick non-informative priors, often theoretically best solutions are improper distributions (their total integral is infinite) and there is a large risk of getting improper posterior solutions without well defined means or even medians. So for practical purposes picking wide normal distributions usually works best.

Resources