I have to do this MC Simulation of but the parameters I change should have a lognormal distribution. My problem is that I don't know how to make them have a lognormal distribution. Are the mean and the std deviation found the same way as the normal distribution? Please help me.
Thank you,
Related
I am trying to figure out what algorithms are used within the pROC package to conduct ROC analysis. For instance what algorithm corresponds to the condition 'algorithm==2'? I only recently started using R in conjunction with Python because of the ease of finding CI estimates, significance test results etc. My Python code uses Linear Discriminant Analysis to get results on a binary classification problem. When using the pROC package to compute confidence interval estimates for AUC, sensitivity, specificity, etc., all I have to do is load my data and run the package. The AUC I get when using pROC is the same as the AUC that is returned by my Python code that uses Linear Discriminant Analysis (LDA). In order to be able to report consistent results I am trying to find out if LDA is one of the algorithm choices within pROC? Any ideas on this or how to go about figuring this out would be very helpful. Where can I access the source code for pROC?
The core algorithms of pROC are described in a 2011 BMC Bioinformatics paper. Some algorithms added later are described in the PDF manual. As every CRAN package, the source code is available from the CRAN package page. As many R packages these days it is also on GitHub.
To answer your question specifically, unfortunately I don't have a good reference for the algorithm to calculate the points of the ROC curve with algorithm 2. By looking at it you will realize it is ultimately equivalent to the standard ROC curve algorithm, albeit more efficient when the number of thresholds increases, as I tried to explain in this answer to a question on Cross Validated. But you have to trust me (and most packages calculating ROC curves) on it.
Which binary classifier you use, whether LDA or other, is irrelevant to ROC analysis, and outside the scope of pROC. ROC analysis is a generic way to assesses predictions, scores, or more generally signal coming out of a binary classifier. It doesn't assess the binary classifier itself, or the signal detector, only the signal itself. This makes it very easy to compare different classification methods, and is instrumental to the success of ROC analysis in general.
I am relatively new in statistics and I need some help with some basic concepts,
could somebody explain the following questions relative to the c-index?
What is the c-index?
Why is it used over other methods?
The c-index is "A measure of goodness of fit for binary outcomes in a logistic regression model."
The reason we use the c-index is because we can predict more accurately whether a patient has a condition or not.
The C-statistic is actually NOT used very often as it only gives you a general idea about a model; A ROC curve contains much more information about accuracy, sensitivity and specificity.
ROC curve
I'm reading about the logistic regression and i came across a phrase that i can't understand. The sentence is as follows (from the book: Introductory Statistics with R, Peter Dalgaard):
"Changes in the deviance caused by a model reduction will be approximately Chi-squared distributed with degrees of freedom equal to the change in the number of parameters"
Could someone explain this phrase to me? To calculate this change i use the Probability density function or the Cumulative distribution function?
Thank you for your time.
I study SVM and I will implement svm using python sklearn.svm.SVC.
As i know SVM problem can be represented a QP(Quadratic Programming)
So here i was wondering which QP solver is used to solve the SVM QP problem in sklearn svm.
I think it may be SMO or coordinate descent algorithm.
Please let me know what the exact algorithm is used in sklearn svm
Off-the-shelf QP-solvers have been used in the past, but for many years now dedicated code is used (much faster and more robust). Those solvers are not (general) QP-solvers anymore and are just build for this one use-case.
sklearn's SVC is a wrapper for libsvm (proof).
As the link says:
Since version 2.8, it implements an SMO-type algorithm proposed in this paper:
R.-E. Fan, P.-H. Chen, and C.-J. Lin. Working set selection using second order information for training SVM. Journal of Machine Learning Research 6, 1889-1918, 2005.
(link to paper)
For the multivariate normal model, Jeffreys' rule for generating a prior distribution on (theta, sigma) gives p_j(theta, sigma) proportional to |sigma|^{-(p+2)/2}.
My book notes in a footnote that p_j cannot actually be a probability density for theta, sigma. Why is this?
It's "improper", meaning it doesn't integrate to 1 as probability distributions have to do. For example, the marginal density with respect to theta is just a constant, whose integral over the real line is infinite. It's OK to use improper distributions as priors in Bayesian inference, as long as the posterior is a proper probability distribution.