Specification of distances for calculation of marked variogram - spatstat

In the spatstat package, why doesn't the markvario function allow distances greater than 1/4 of the window length for variogram calculation?
markvario(X, correction = c("isotropic", "Ripley", "translate"),
r = NULL, method = "density", ..., normalise=FALSE)
The argument r is a numeric vector. The values of the argument r at which the mark variogram gamma(r) should be evaluated.
The window length for the spruces dataset is 200m, but the variogram plot shows distances until 50m only, even specificating r to [0, 200].
plot(markvario(longleaf,r=seq(0,200, by=0.5)))
marked variogram for spruces

The function markvario returns an object of class fv (function values).
It’s effectively a data.frame with additional attributes. One attribute
tells the plot command (dispatched to plot.fv) the recommended plotting
range. If you look at the bottom of the printed output you will see that r
runs from 0 to 200 but the recommended range is 0 to 50. You override the
default with the xlim argument:
library(spatstat)
rslt <- markvario(longleaf,r=seq(0,200, by=0.5))
rslt
#> Function value object (class 'fv')
#> for the function r -> gamma(r)
#> ...........................................................................
#> Math.label
#> r r
#> theo {gamma[]^{iid}}(r)
#> trans {hat(gamma)[]^{trans}}(r)
#> iso {hat(gamma)[]^{iso}}(r)
#> Description
#> r distance argument r
#> theo theoretical value (independent marks) for gamma(r)
#> trans translation-corrected estimate of gamma(r)
#> iso Ripley isotropic correction estimate of gamma(r)
#> ...........................................................................
#> Default plot formula: .~r
#> where "." stands for 'iso', 'trans', 'theo'
#> Recommended range of argument r: [0, 50]
#> Available range of argument r: [0, 200]
#> Unit of length: 1 metre
plot(rslt, xlim = c(0, 200))

Related

Finding the mean of a distribution

My code generates a number of distributions (I only plotted one below to make it more legible). Y axis - here represents a probability density function and the X axis - is a simple array of values.
In more detail.
Y = [0.02046505 0.10756612 0.24319883 0.30336375 0.22071875 0.0890625 0.015625 0 0 0]
And X is generated using np.arange(0,10,1) = [0 1 2 3 4 5 6 7 8 9]
I want to find the mean of this distribution (i.e where the curve peaks on the X-axis, not the Y value mean. I know how to use numpy packages np.mean to find the mean of Y but its not what I need.
By eye, the mean here is about x=3 but I would like to generate this with a code to make it more accurate.
Any help would be great.
By definition, the mean (actually, the expected value of a random variable x, but since you have the PDF, you could use the expected value) is sum(p(x[j]) * x[j]), where p(x[j]) is the value of the PDF at x[j]. You can implement this as code like this:
>>> import numpy as np
>>> Y = np.array(eval(",".join("[0.02046505 0.10756612 0.24319883 0.30336375 0.22071875 0.0890625 0.015625 0 0 0]".split())))
>>> Y
array([0.02046505, 0.10756612, 0.24319883, 0.30336375, 0.22071875,
0.0890625 , 0.015625 , 0. , 0. , 0. ])
>>> X = np.arange(0, 10)
>>> Y.sum()
1.0
>>> (X * Y).sum()
2.92599253
So the (approximate) answer is 2.92599253.

I have some questions about the return value of the make_circles function in python

X,y = make_circles(n_samples=50, shuffle=True, noise=None, random_state=0, factor=0.8)
I already know that there are two return values. But why X[y == 0,0] is right? How is it indexed?
Each line in X has only two parameters. Why can you use y to determine whether each sample is a class of 0 or 1?
Sklearn's make_circles returns a tuple X, y. X is a 2d array where each row represents the x and y coordinate of a point. y is an array representing whether the corresponding point (point on the same row/index) is a part of the inner (class 1) or outer circle (class 0). X[y==0,0] says give me all the points in X that are of class 0 (in the outer circle) and then give me their x coordinate.

How to visualize feasible region for linear programming (with arbitrary inequalities) in Numpy/MatplotLib?

I need to implement a solver for linear programming problems. All of the restrictions are <= ones such as
5x + 10y <= 10
There can be an arbitrary amount of these restrictions. Also , x>=0 y>=0 implicitly.
I need to find the optimal solutions(max) and show the feasible region in matplotlib. I've found the optimal solution by implementing the simplex method but I can't figure out how to draw the graph.
Some approaches I've found:
This link finds the minimum of the y points from each function and uses plt.fillBetween() to draw the region. But it doesn't work when I change the order of the equations. I'm not sure which y values to minimize(). So I can't use it for arbitrary restrictions.
Find solution for every pair of restrictions and draw a polygon. Not efficient.
An easier approach might be to have matplotlib compute the feasible region on its own (with you only providing the constraints) and then simply overlay the "constraint" lines on top.
# plot the feasible region
d = np.linspace(-2,16,300)
x,y = np.meshgrid(d,d)
plt.imshow( ((y>=2) & (2*y<=25-x) & (4*y>=2*x-8) & (y<=2*x-5)).astype(int) ,
extent=(x.min(),x.max(),y.min(),y.max()),origin="lower", cmap="Greys", alpha = 0.3);
# plot the lines defining the constraints
x = np.linspace(0, 16, 2000)
# y >= 2
y1 = (x*0) + 2
# 2y <= 25 - x
y2 = (25-x)/2.0
# 4y >= 2x - 8
y3 = (2*x-8)/4.0
# y <= 2x - 5
y4 = 2 * x -5
# Make plot
plt.plot(x, 2*np.ones_like(y1))
plt.plot(x, y2, label=r'$2y\leq25-x$')
plt.plot(x, y3, label=r'$4y\geq 2x - 8$')
plt.plot(x, y4, label=r'$y\leq 2x-5$')
plt.xlim(0,16)
plt.ylim(0,11)
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
plt.xlabel(r'$x$')
plt.ylabel(r'$y$')
This is a vertex enumeration problem. You can use the function lineqs which visualizes the system of inequalities A x >= b for any number of lines. The function will also display the vertices on which the graph was plotted.
The last 2 lines mean that x,y >=0
from intvalpy import lineqs
import numpy as np
A = -np.array([[5, 10],
[-1, 0],
[0, -1]])
b = -np.array([10, 0, 0])
lineqs(A, b, title='Solution', color='gray', alpha=0.5, s=10, size=(15,15), save=False, show=True)
Visual Solution Link

Calculate the volume of 3d plot

The data is from a measurement. The picture of the plotted data
I tried using trapz twice, but I get and error code: "ValueError: operands could not be broadcast together with shapes (1,255) (256,531)"
The x has 256 points and y has 532 points, also the Z is a 2d array that has a 256 by 532 lenght. The code is below:
import numpy as np
img=np.loadtxt('focus_x.txt')
m=0
m=np.max(img)
Z=img/m
X=np.loadtxt("pixelx.txt",float)
Y=np.loadtxt("pixely.txt",float)
[X, Y] = np.meshgrid(X, Y)
volume=np.trapz(X,np.trapz(Y,Z))
The docs state that trapz should be used like this
intermediate = np.trapz(Z, x)
result = np.trapz(intermediate, y)
trapz is reducing the dimensionality of its operand (by default on the last axis) using optionally a 1D array of abscissae to determine the sub intervals of integration; it is not using a mesh grid for its operation.
A complete example.
First we compute, using sympy, the integral of a simple bilinear function over a rectangular domain (0, 5) × (0, 7)
In [1]: import sympy as sp, numpy as np
In [2]: x, y = sp.symbols('x y')
In [3]: f = 1 + 2*x + y + x*y
In [4]: f.integrate((x, 0, 5)).integrate((y, 0, 7))
Out[4]: 2555/4
Now we compute the trapezoidal approximation to the integral (as it happens, the approximation is exact for a bilinear function) — we need coordinates arrays
In [5]: x, y = np.linspace(0, 5, 11), np.linspace(0, 7, 22)
(note that the sampling is different in the two directions and different from the defalt value used by trapz) — we need a mesh grid to compute the integrand and we need to compute the integrand
In [6]: X, Y = np.meshgrid(x, y)
In [7]: z = 1 + 2*X + Y + X*Y
and eventually we compute the integral
In [8]: 4*np.trapz(np.trapz(z, x), y)
Out[8]: 2555.0

Correaltion and regression analysis

How should I analysis the correlation between four ordinal numbers (0,1,2,3) and various range of the continuous values? The scatter plot looks like a 4 parallel horizontal dots .
You could run a Spearman rank correlation test. Using R,
require(pspearman)
x <- c(rep("a", 5), rep("b", 5), rep("c", 5), rep("d", 5))
x <- factor(x, levels=c("a", "b", "c", "d"), ordered=T)
y <- 1:20
spearman.test(x, y)
Spearman's rank correlation rho
data: x and y
S = 40.6203, p-value = 6.566e-06
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
0.9694584
Warning message:
In spearman.test(x, y) : Cannot compute exact p-values with ties
Non-significant correlation
set.seed(123)
y2 <- rnorm(20)
spearman.test(x, y2)
Spearman's rank correlation rho
data: x and y2
S = 1144.329, p-value = 0.5558
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
0.139602
Warning message:
In spearman.test(x, y2) : Cannot compute exact p-values with ties

Resources