Logistc regression - changes in the deviance - statistics

I'm reading about the logistic regression and i came across a phrase that i can't understand. The sentence is as follows (from the book: Introductory Statistics with R, Peter Dalgaard):
"Changes in the deviance caused by a model reduction will be approximately Chi-squared distributed with degrees of freedom equal to the change in the number of parameters"
Could someone explain this phrase to me? To calculate this change i use the Probability density function or the Cumulative distribution function?
Thank you for your time.

Related

What to do if a significant factor in univariate linear regression become insignificant in multivariate analysis?

I am studying the impact of several factors "sleeping time, studying time, anxiety degree, depression degree ..." on students final exam mark.
when I did the univaraite linear regression analysis all models where significant (as final exam mark is a dependent variable) despite some have a small R^2.
then I tried to put all predictor factors in one multiple linear regression model, the result was most of the predictors are insignificant with exception to study time which was significant and has a big R^2 in uni and multi variate analysis.
How should I explain this in my paper? is it okay to have this result? or should I search for another model?
I sound like you have highly correlated predictors. This gives you a very unstable model, one where small changes in a few observations could produce large changes in regression coefficients.
You should try various models that use subsets of your predictors, and select a final model that has a significant overall F statistic, and significant t stats for your included predictors.
In your paper, you could explain that anxiety score and depression score were too highly correlated to allow them both into the model and you’ve selected the best model that doesn’t contain both of these scores.

what is the concordance index (c-index)?

I am relatively new in statistics and I need some help with some basic concepts,
could somebody explain the following questions relative to the c-index?
What is the c-index?
Why is it used over other methods?
The c-index is "A measure of goodness of fit for binary outcomes in a logistic regression model."
The reason we use the c-index is because we can predict more accurately whether a patient has a condition or not.
The C-statistic is actually NOT used very often as it only gives you a general idea about a model; A ROC curve contains much more information about accuracy, sensitivity and specificity.
ROC curve

Python AUC Calculation for Unsupervised Anomaly Detection (Isolation Forest, Elliptic Envelope, ...)

I am currently working in anomaly detection algorithms. I read papers comparing unsupervised anomaly algorithms based on AUC values. For example i have anomaly scores and anomaly classes from Elliptic Envelope and Isolation Forest. How can i compare these two algorithms based on AUC values.
I am looking for a python code example.
Thanks
Problem solved. Steps i done so far;
1) Gathering class and score after anomaly function
2) Converting anomaly score to 0 - 100 scale for better compare with different algorihtms
3) Auc requires this variables to be arrays. My mistake was to use them like Data Frame column which returns "nan" all the time.
Python Script:
#outlier_class and outlier_score must be array
fpr,tpr,thresholds_sorted=metrics.roc_curve(outlier_class,outlier_score)
aucvalue_sorted=metrics.auc(fpr,tpr)
aucvalue_sorted
Regards,
Seçkin Dinç
Although you already solved your problem, my 2 cents :)
Once you've decided which algorithmic method to use to compare them (your "evaluation protocol", so to say), then you might be interested in ways to run your challengers on actual datasets.
This tutorial explains how to do it, based on an example (comparing polynomial fitting algorithms on several datasets).
(I'm the author, feel free to provide feedback on the github page!)

Calculate probability MLLIB SVM multi-class

I would like to know how to calculate the probability using Spark MLLIB SVM in a multi-class classification problem.
The documentation shows there is no such function available.
LibSVM uses Platt-scaling.
My questions are:
Is there a function to calculate the probability somewhere?
If not, who can help me implementing such functionality?
I would simply take the average distances from all the support vectors for each category after training and compare the distance from a new data-point to hyperplanes from all the classifiers.
I think the SVMModel.predict() gives these distances, but I am uncertain.

calculating confidence while doing classification

I am using a Naive Bayes algorithm to predict movie ratings as positive or negative. I have been able to rate movies with 81% accuracy. I am, however, trying to assign a 'confidence level' for each of the ratings as well.
I am trying to identify how I can tell the user something like "we think that the review is positive with 80% confidence". Can someone help me understand how I can calculate a confidence level to our classification result?
You can report the probability P(y|x) that Naive Bayes calculates. (But note that Naive Bayes isn't a very good probability model, even if it's not too bad a classifier.)

Resources