ROC curves with BlueSky Statistics - statistics

I'm just beginning to explore the capabilities of BlueSky Statistics. I read in a review that ROC curves are among the available graphics but I cannot find them in the menus. Is it possible to draw them in BlueSky and also to get AUC and Youden's index?

In order to see ROC curves, you need to first create a model. You can create it using any of the dialogs under Model Fitting. Then on the top right hand corner of the screen, select the model and click the score button. We will automatically show you the ROC curve and the ROC table if applicable. The dependent variable must have 2 levels for a ROC curve to display

BlueSky does not yet overlay ROC curves, so you would have to do one model for each of your independent variables, then follow Val's answer. Overlaying ROC curves is on the roadmap for future versions.

Related

Kaplan Meier survival curve evaluation

I have generated a Kaplan Meier survival curve on the consumer data (the event of interest is 'Churn'). I have the survival curve for both buyers and nonbuyers. Before jumping into the use of the model. I want to know how I can evaluate the truthness of the curve?
I have already tried creating a separate curve for the two different consumers cohorts (who joined in a different year) for the span of 36 months. I noticed that these curves are not similar at all. I believe this is not the right way to evaluate. Can somebody tell me what can be tried to evaluate the survival curve apart from the statistical methods?

Deviation analysis color to vertex color using Meshlab

I would to know if is possible to do deviation analysis with Meshlab and transfer the result to vertex color in a mesh. So expand those 2 ideas...
1st. Is it possible to do deviation analysis with MeshLab? I have a scanned mesh and I will compare with a "ideal model". The difference between these 2 will generate a (grey or color) scale information that represent the distance I have from the points of the scanned model to the "ideal" one.
2nd. I want to get this information (color/grey grading that shows how distant the points are) and transfer to a vertex color information.
I don't know it was clear, but if you know what deviation analysis means I think you got it. The difference is that I would like the generate a 3d mesh with the vertex color provided by this deviation analysis.
Seems that mesh lab can compare two models and can deal with vertex colorizing, but I don't Know if is possible to deal with real measurements, transfer this information to vertex color and export a mesh that show it.
If its possible and If you know how just point me some direction. I'm not familiar with Meshlab and click here and there trying a impossible task can be very frustrating, so it will be good if someone can give me some tips.
Thanks.
Yes, MeshLab can compute deviation analysis between two similar surfaces (and the required alignment preprocessing too).
Estimating the deviation between two meshes means computing the hausdorff distance.
There is a small tutorial on how to compute and visualize it in MeshLab here:
http://meshlabstuff.blogspot.com/2010/01/measuring-difference-between-two-meshes.html

How do I visualise orthogonal parameter steps in gradient descent, using Matplotlib?

I have implemented multivariate linear regression, where parameters theta0 (intersect), theta1, theta2 are optimized by minimizing MSE loss, chosen with line search in gradient descent. How do I visually illustrate the mathematical property that the direction of steepest descent (negative gradient) of successive steps are orthogonal? I'm trying to generate a contour map similar to this image: Plot, but with respect to 2 parameters instead of 1 (if it's not possible, 2 separate plots would also be great).
Also, I originally wanted to perform multivariate linear regression with 4 features, but ultimately decided to use only the 2 most strongly correlated ones (after comparing their PCC) in order to be able to plot a graph. Although I'm not aware of any way to plot 4-dimensional data, does anyone know if this is possible and how?

Gaussian Mixture Models for pixel clustering

I have a small set of aerial images where different terrains visible in the image have been have been labelled by human experts. For example, an image may contain vegetation, river, rocky mountains, farmland etc. Each image may have one or more of these labelled regions. Using this small labeled dataset, I would like to fit a gaussian mixture model for each of the known terrain types. After this is complete, I would have N number of GMMs for each N types of terrains that I might encounter in an image.
Now, given a new image, I would like to determine for each pixel, which terrain it belongs to by assigning the pixel to the most probable GMM.
Is this the correct line of thought ? And if yes, how can I go about clustering an image using GMMs
Its not clustering if you use labeled training data!
You can, however, use the labeling function of GMM clustering easily.
For this, compute the prior probabilities, mean and covariance matrixes, invert them. Then classify each pixel of the new image by the maximum probability density (weighted by prior probabilities) using the multivariate Gaussians from the training data.
Intuitively, your thought process is correct. If you already have the labels that makes this a lot easier.
For example, let's pick on a very well known and non-parametric algorithm like Known Nearest Neighbors https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm
In this algorithm, you would take your new "pixels" which would then find the closest k-pixels like the one you are currently evaluating; where closest is determined by some distance function (usually Euclidean). From there, you would then assign this new pixel to the most frequently occurring classification label.
I am not sure if you are looking for a specific algorithm recommendation, but KNN would be a very good algorithm to begin testing this type of exercise out on. I saw you tagged sklearn, scikit learn has a very good KNN implementation I suggest you read up on.

Looking for a detailed method to plot contours of confidence level

I try to find out a method or a tutorial to know how are plotted the contours of different confidence levels (68%, 95%, 99.7% etc ...).
Here below an example of these contours on a plot that I would like to generate:
It represents the constraints on cosmological parameters (\omega_Lambda represents dark energy and \Omega_m total matter quantity).
Once I have data sets on \Omega_Lambda and \Omega_mat, how can I produce these contours : I know what is a confidence level but I only know the standard deviation.
If I plot standard deviation for both parameters from the expected values, I get a cross symbol on it (horizontally for \Omega_m and vertically for \Omega_Lambda) : but from this cross, how to draw contours at different confidence levels?
On the figure above, these contours look like a 2D parametric curve where I have points (Omega_Lambda(t), Omega_m(t)) with t parameter but I don't think they are drawn like this.
You might want to check out Matplotlib's contour plot: the levels parameter seems to be what you need.
The plots in your example are not obtained from raw data, but from a statistical model of raw data. So you could first fit multivariate normal distributions to your data using numpy.mean and numpy.cov, then generate the multivariate normal pdf values with scipy.stats.multivariate_normal. You can also find a code snippet doing confidence ellipses here (which seems to be exactly the kind of thing you were looking for).

Resources