Finding sigma of Gaussian - statistics

Is there a simple equation which given the area of the shaded part and the mean, gives you the corresponding sigma for a normal distribuion?
P.S the shaded part corresponds to the area under the section of the Gaussian curve which lies on the negative x-axis. In my application this will correspond to the cross over probability.
Thanks

Do I understand correctly that you mean area left of x=0?
Area left of zero is simply \Phi((0 - \mu)/\sigma) where \mu is the mean of the distribution (1) and \sigma is the variance (what you are looking for). \Phi() is the normal cdf. You can easily (sort of) solve it for \sigma:
In case of normals \Phi((0 - \mu)/\sigma) = a is equivalent to \Phi(1/\sigma) = 1 - a (a is the area under the curve).
You cannot invert \Phi() easily but software will just do it. In R the inverse is qnorm() and \sigma will be 1/qnorm(1-a).

Related

Converting intensities to probabilities in ppp

Apologies for the overlap with existing questions; mine is at a more basic skill level. I am working with very sparse occurrences spanning very large areas, so I would like to calculate probability at pixels using the density.ppp function (as opposed to relrisk.ppp, where specifying presences+absences would be computationally intractable). Is there a straightforward way to convert density (intensity) to probabilities at each point?
Maxdist=50
dtruncauchy=function(x,L=60) L/(diff(atan(c(-1,1)*Maxdist/L)) * (L^2 + x^2))
dispersfun=function(x,y) dtruncauchy(sqrt(x^2+y^2))
n=1e3; PPP=ppp(1:n,1:n, c(1,n),c(1,n), marks=rep(1,n));
density.ppp(PPP,cutoff=Maxdist,kernel=dispersfun,at="points",leaveoneout=FALSE) #convert to probabilies?
Thank you!!
I think there is a misunderstanding about fundamentals. The spatstat package is designed mainly for analysing "mapped point patterns", datasets which record the locations where events occurred or things were located. It is designed for "presence-only" data, not "presence/absence" data (with some exceptions).
The relrisk function expects input data about the presence of two different types of events, such as the mapped locations of trees belonging to two different species, and then estimates the spatially-varying probability that a tree will belong to each species.
If you have 'presence-only' data stored in a point pattern object X of class "ppp", then density(X, ....) will produce a pixel image of the spatially-varying intensity (expected number of points per unit area). For example if the spatial coordinates were expressed in metres, then the intensity values are "points per square metre". If you want to calculate the probability of presence in each pixel (i.e. for each pixel, the probability that there is at least one presence point in the pixel), you just need to multiply the intensity value by the area of one pixel, which gives the expected number of points in the pixel. If pixels are small (the usual case) then the presence probability is just equal to this value. For physically larger pixels the probability is 1 - exp(-m) where m is the expected number of points.
Example:
X <- redwood
D <- density(X, 0.2)
pixarea <- with(D, xstep * ystep)
M <- pixarea * D
p <- 1 - exp(-M)
then M and p are images which should be almost equal, and can both be interpreted as probability of presence.
For more information see Chapter 6 of the spatstat book.
If, instead, you had a pixel image of presence/absence data, with pixel values equal to 1 or 0 for presence or absence respectively, then you can just use the function blur in the spatstat package to perform kernel smoothing of the image, and the resulting pixel values are presence probabilities.

How to optimize two ranges for the determination of the intersection point between two curves

I start this thread asking for your help in Excel.
The main goal is to determine the coordinates of the intersection point P=(x,y) between two curves (curve A, curve B) modeled by points.
The curves are non-linear and each defining point is determined using complex equations (equations are dependent by a lot of parameters chosen by user, as well as user will choose the number of points which will define the accuracy of the curves). That is to say that each curve (curve A and curve B) is always changing in the plane XY (Z coordinate is always zero, we are working on the XY plane) according to the input parameters and the number of the defining points is also depending by the user choice.
My first attempt was to determine the intersection point through the trend equations of each curve (I used the LINEST function to determine the coefficients of the polynomial equation) and by solving the solution putting them into a system. The problem is that Excel is not interpolating very well the curves because they are too wide, then the intersection point (the solution of the system) is very far from the real solution.
Then, what I want to do is to shorten the ranges of points to be able to find two defining trend equations for the curves, cutting away the portion of curves where cannot exist the intersection.
Today, in order to find the solution, I plot the curves on Siemens NX cad using multi-segment splines with order 3 and then I can easily find the coordinates of the intersection point. Please notice that I am using the multi-segment splines to be more precise with the approximation of the functions curve A and curve B.
Since I want to avoid the CAD tool and stay always on Excel, is there a way to select a shorter range of the defining points close to the intersection point in order to better approximate curve A and curve B with trend equations (Linest function with 4 points and 3rd order spline) and then find the solution?
I attach a picture to give you an example of Curve A and Curve B on the plane:
https://postimg.cc/MfnKYqtk
At the following link you can find the Excel file with the coordinate points and the curve plot:
https://www.mediafire.com/file/jqph8jrnin0i7g1/intersection.xlsx/file
I hope to solve this problem with your help, thank you in advance!
kalo86
Your question gave me some days of thinking and research.
With the help of https://pomax.github.io/bezierinfo/
§ 27 - Intersections (Line-line intersections)
and
§ 28 - Curve/curve intersection
your problem can be solved in Excel.
About the mystery of Excel smoothed lines you find details here:
https://blog.splitwise.com/2012/01/31/mystery-solved-the-secret-of-excel-curved-line-interpolation/
The author of this fit is Dr. Brian T. Murphy, PhD, PE from www.xlrotor.com. You find details here:
https://www.xlrotor.com/index.php/our-company/about-dr-murphy
https://www.xlrotor.com/index.php/knowledge-center/files
=>see Smooth_curve_bezier_example_file.xls
https://www.xlrotor.com/smooth_curve_bezier_example_file.zip
These knitted together you get the following results for the intersection of your given curves:
for the straight line intersection:
(x = -1,02914127711195 / y = 23,2340949174492)
for the smooth line intersection:
(x = -1,02947493047196 / y = 23,2370611219553)
For a full automation of your task you would need to add more details regarding the needed accuracy and what details you need for further processing (and this is actually not the scope of this website ;-).
Intersection of the straight lines:
Intersection of the smoothed lines:
comparison charts:
solution,
Thank you very much for the anwer, you perfectly centered my goal.
Your solution (for the smoothed lines) is very very close to what I determine in Siemens NX.
I'm going to read the documentation at the provided link https://pomax.github.io/bezierinfo/ in order to better understand the math behind this argument.
Then, to resume my request, you have been able to find the coordinates (x,y) of the intersection point between two curves without passing through an advanced CAD system with a very good precision.
I am starting to study now, best regards!
kalo86

estimate the slope of the straight part in boltzmann curve

I was working with one dataset and found the curve to be sigmoidal. i have fitted the curve and got the equation A2+((A1-A2)/1+exp((x-x0)/dx)) where:
x0 : Mid point of the curve
dx : slope of the curve
I need to find the slope and midpoint in order to give generalized equation. any suggestions?
You should be able to simplify the modeling of the sigmoid with a function of the following form:
The source includes code in R showing how to fit your data to the sigmoid curve, which you can adapt to whatever language you're writing in. The source also notes the following form:
Which you can adapt the linked R code to solve for. The nice thing about the general functions here will be that you can solve for the derivative from them. Also, you should note that the midpoint of the sigmoid is just where the derivative of dx (or dx^2) is 0 (where it goes from neg to pos or vice versa).
Assuming your equation is a misprint of
A2+(A1-A2)/(1+exp((x-x0)/dx))
then your graph does not reflect zero residual, since in your graph the upper shoulder is sharper than the lower shoulder.
Likely the problem is your starting values. Try using the native R function SSfpl, as in
nls(y ~ SSfpl(x,A2,A1,x0,dx))

Calculate equidistant point with minimal distance from 3 points in N-dimensional space

I'm trying to code the Ritter's bounding sphere algorithm in arbitrary dimensions, and I'm stuck on the part of creating a sphere which would have 3 given points on it's edge, or in other words, a sphere which would be defined by 3 points in N-dimensional space.
That sphere's center would be the minimal-distance equidistant point from the (defining) 3 points.
I know how to solve it in 2-D (circumcenter of a triangle defined by 3 points), and I've seen some vector calculations for 3D, but I don't know what the best method would be for N-D, and if it's even possible.
(I'd also appreciate any other advice about the smallest bounding sphere calculations in ND, in case I'm going in the wrong direction.)
so if I get it right:
Wanted point p is intersection between 3 hyper-spheres of the same radius r where the centers of hyper-spheres are your points p0,p1,p2 and radius r is minimum of all possible solutions. In n-D is arbitrary point defined as (x1,x2,x3,...xn)
so solve following equations:
|p-p0|=r
|p-p1|=r
|p-p2|=r
where p,r are unknowns and p0,p1,p2 are knowns. This lead to 3*n equations and n+1 unknowns. So get all the nonzero r solutions and select the minimal. To compute correctly chose some non trivial equation (0=r) from each sphere to form system of n+1 =equations and n+1 unknowns and solve it.
[notes]
To ease up the processing you can have your equations in this form:
(p.xi-p0.xi)^2=r^2
and use sqrt(r^2) only after solution is found (ignoring negative radius).
there is also another simpler approach possible:
You can compute the plane in which the points p0,p1,p2 lies so just find u,v coordinates of these points inside this plane. Then solve your problem in 2D on (u,v) coordinates and after that convert found solution form (u,v) back to your n-D space.
n=(p1-p0)x(p2-p0); // x is cross product
u=(p1-p0); u/=|u|;
v=u x n; v/=|v|; // x is cross product
if memory of mine serves me well then conversion n-D -> u,v is done like this:
P0=(0,0);
P1=(|p1-p0|,0);
P2=(dot(p2-p0,u),dot(p2-p0,v));
where P0,P1,P2 are 2D points in (u,v) coordinate system of the plane corresponding to points p0,p1,p2 in n-D space.
conversion back is done like this:
p=(P.u*u)+(P.v*v);
My Bounding Sphere algorithm only calculates a near-optimal sphere, in 3 dimensions.
Fischer has an exact, minimal bounding hyper-sphere (N dimensions.) See his paper: http://people.inf.ethz.ch/gaertner/texts/own_work/seb.pdf.
His (C++/Java)code: https://github.com/hbf/miniball.
Jack Ritter
jack#houseofwords.com

Calculating margin and bias for SVM's

I apologise for the newbishness of this question in advance but I am stuck. I am trying to solve this question,
I can do parts i)-1v) but I am stuck on v. I know to calculate the margin y, you do
y=2/||W||
and I know that W is the normal to the hyperplane, I just don't know how to calculate it. Is this always
W=[1;1] ?
Similarly, the bias, W^T * x + b = 0
how do I find the value x from the data points? Thank you for your help.
Consider building an SVM over the (very little) data set shown in Picture for an example like this, the maximum margin weight vector will be parallel to the shortest line connecting points of the two classes, that is, the line between and , giving a weight vector of . The optimal decision surface is orthogonal to that line and intersects it at the halfway point. Therefore, it passes through . So, the SVM decision boundary is:
Working algebraically, with the standard constraint that , we seek to minimize . This happens when this constraint is satisfied with equality by the two support vectors. Further we know that the solution is for some . So we have that:
Therefore a=2/5 and b=-11/5, and . So the optimal hyperplane is given by
and b= -11/5 .
The margin boundary is
This answer can be confirmed geometrically by examining picture.

Resources