Problems about univariate and multivariate approach - statistics

Why is OMP called multivariate approach? Where in the algorithm reflect the "multivariate"?
The same can be asked for ANOVA: why is ANOVA called univariate approach? Where in the algorithm reflect the "univariate" ?

Related

Lasso with Coordinate Descent in Scikit-Learn

I've tried to implement the lasso regression with coordinate descent. In the later process the objective function will include the first derivative of the function as well. All derivatives are computed by a automatic differentiation tool. In the first step I've tried to implement the lasso with simple cyclic coordinate descent without including the derivative.
In an small example with 4 features and ~100 samples the algorithm is converging to the right solution. But the solutions of my real dataset and the solution of the lasso regression from scikit-learn are diffrent. Furthermore, scikit-learns algorithm converges a lot faster. I've used default settings on the scikit-learn setup.
My question is: What is the diffrence between the defaulth scikit-learn algorithm of the lasso regression and the simple coordinate descent? Is there a paper which describes the implemented algorithm?
BR

p-values of scikit-learn LogisticRegressionCV?

I'm using scikit-learn's LogisticRegressionCV. Looks like the coefs_ field are the logistic regression coefficients. Is there any way to get p-values, z-values, or some measure of uncertainty for each feature? (For example, as discussed here in R.)
Unfortunately, scikit-learn does not have any such methods for the logistic regression (nor for the linear regression as a matter of fact). I found this which might be of interest for you, but honestly, I would try to stick to R for such tasks if you can.
https://datascience.stackexchange.com/questions/15398/how-to-get-p-value-and-confident-interval-in-logisticregression-with-sklearn

Can anyone explain me all the parameters of sklearn SVM.SVC in a simplified manner?

I am trying to lean SVC classifier of SVM model in sklearn. I have learned to use it on various datasets and even applied gridsearch to improve the results but I have not yet understood some parameters like C, gamma.
If anyone can give me simple but detail explanation of each parameter, it would be great.
Since we are trying to minimize some objective function, we can add some 'size' measure of the coefficient vector itself to the function. C is essentially the inverse of the weight on that 'regularization' term. Decreasing C will prevent overfitting by forcing the coefficients to be sparse or small, depending on the penalty. Increasing C too much will promote underfitting.
Gamma is a parameter for the RBF kernel. Increasing gamma allows for a more complex decision boundary (which can lead to overfitting, but can also improve results--it depends on the data).
This scikit-learn tutorial graphically shows the effect of changing both hyperparameters.

Comparing a Poisson Regression to a logistic Regression

I have data which has an associated binary outcome variable. Naturally I ran a logistic regression in order to see parameter estimates and odds ratios. I was curious though, to change this data from a binary outcome to count data. Then I ran a poisson regression (and negative binomial regression) on the count data.
I have no idea of how to compare these different models though, all comparisons I see seem to only be concerned with nested models.
How would you go about deciding on the best model to use in this situation?
Essentially both models will be roughly equal. What really matters is what is your objective- what you really want to predict. If you want to determine how many of cases are good or bad (1 or 0), then you go for logistic regression. If you are really interested on how much the cases are going to do (counts) then do poisson.
In other words, the only difference between these two models is the logistic transformation and the fact that logistic regression tries to minimize the misclassification error (-2 log likelihood) .To put it simply, even if you run a linear regression (OLS) on the binary outcome, you should not see big differences from your logistic model apart from the fact that the results may not be between 0 and 1 (e.g. the Area under the RoC curve will be similar to the logistic model) .
To sum up, don't worry about which of these two models is better, they should be roughly the same in the way the capture your features' information. Just think what makes more sense to optimize, counts or probabilties. The answer might have been different if you were considering non-linear models (e.g random forests or neural networks etc), but the two you are considering are both (almost) linear- so don't worry about it.
One thing to consider is the sample design. If you are using a case-control study, then logistic regression is the way to go because of its logit link function, rather than log of ratios as in Poisson regression. This is because, where there is an oversampling of cases such as in case-control study, odds ratio is unbiased.

Setting feature weights for KNN

I am working with sklearn's implementation of KNN. While my input data has about 20 features, I believe some of the features are more important than others. Is there a way to:
set the feature weights for each feature when "training" the KNN learner.
learn what the optimal weight values are with or without pre-processing the data.
On a related note, I understand generally KNN does not require training but since sklearn implements it using KDTrees, the tree must be generated from the training data. However, this sounds like its turning KNN into a binary tree problem. Is that the case?
Thanks.
kNN is simply based on a distance function. When you say "feature two is more important than others" it usually means difference in feature two is worth, say, 10x difference in other coords. Simple way to achive this is by multiplying coord #2 by its weight. So you put into the tree not the original coords but coords multiplied by their respective weights.
In case your features are combinations of the coords, you might need to apply appropriate matrix transform on your coords before applying weights, see PCA (principal component analysis). PCA is likely to help you with question 2.
The answer to question to is called "metric learning" and currently not implemented in Scikit-learn. Using the popular Mahalanobis distance amounts to rescaling the data using StandardScaler. Ideally you would want your metric to take into account the labels.

Resources