Dependency in multidimensional marked point patterns - spatstat

As I understand, currently, if we have multi-type point pattern we can determine dependencies between points of various marks using functions like Jmulti, Gmulti etc.
Now, if each point is associated with multiple marks (say, as a data frame where each column is a mark variable) then how do we find dependency between points of different mark variables? Note that in this case, a point could have two different marks but have the same spatial coordinate.
I think in this case, the number of points having the same coordinates but different marks is in some sense a measure of dependency between the point patterns of different mark variables, but I am not sure if there are methods to do this analysis in spatstat.
Thanks for your clarification.

This is discussed in Chapter 15 of the spatstat book.
However I think you may be confusing two different things: (1) a point pattern in which each point carries several different mark variables, so that the marks for the pattern are represented by a data frame with one row for each point and one column for each mark variable; and (2) a marked point pattern in which there may be several points that have the same spatial coordinate but different mark values.
An example of (1) is the finpines dataset in spatstat in which each tree location is marked by the tree's height and diameter. An example of (2) would be a spatial pattern of road accidents in which each vehicle is represented by a point, so that two-vehicle accidents are represented by two points at the same location, perhaps with different labels.
To deal with (1), you could use functions like Kmulti, Gmulti, Jmulti. These functions always compare two groups of points, identified by the arguments I and J which can be logical vectors. You can define any two subsets of your point pattern as the subsets I and J. For example in the finpines data you could define I <- with(marks(finpines), height > 10 * diameter) which would select all the trees whose height in metres is greater than 10 times diameter in cm. and similarly make another, different rule for J.
Other ways of investigating dependence in marked point patterns include the mark correlation function markcorr, nearest neighbour correlation nncorr, the conditional moments Emark, Vmark and other tools described in Chapter 15.
Finally a caution that summary functions do not "determine" dependence; they are only measures of correlation.

Related

Is there a way to have the same pattern across all the faces of an icosahedron?

This is the scenario: I have an icosahedron, therefore I have 12 vertices and 20 faces.
From the point of view of each vertex he is the center of an "extruded" pentagon, whose triangles are the faces of the icosahedron.
Let's say we want to name each of the vertices of each of these triangles from 1 to 3, always in a counterclockwise fashion, imagining that each vertex is not shared among different triangles.
(can't upload the image here for some reason sorry)
https://ibb.co/FmYfRG4
Is there a way to arrange the naming of the vertices inside each triangle so that every pentagon yields the same pattern of numbers along the five triangles?
As you can see by arranging the vertex names that way there would be the first pentagon with 1,1,1,1,1 but around it other pentagons couldn't have the same pattern.
EDIT: following Andrew Morton's comment I tried to write a possible sequence
I came up with two sequences of triangles: 1,2,3,1,3 for most pentagons, and 2,2,2,2,2 for the two caps.
I wonder if there is some additional optimization so that I only have one sequence instead of two, or maybe if there's is some mathematical demonstration that makes this impossible.

Relative risk estimation in spatstat

I am running into problems when computing the relative risk estimation (relrisk.ppp) of two point patterns: One with four marks in a rectangular region and the other with two marks in a circular region.
For the first pattern with four marks, I am able to get the relative risk and the resulting object in a large imlist with 4 elements corresponding to each mark.
However, for the second pattern, it gives a list of 10 elements, of which the first matrix v is empty with NA entries. I am breaking my head on what possibly could be wrong when the created point pattern objects seems to be identical. Any help will be appreciated. Thanks.
For your first dataset, the result is a list of image objects (a list of four objects of class im). For your second dataset, the result of relrisk.ppp is a single image (object of class im). This is the default behaviour when there are only two possible types of points (two possible mark values). See help(relrisk.ppp).
In all cases, you should just be able to plot and print the resulting object. You don't need to examine the internal data of the image.
More explanation: when there are only two possible types of points, the default behaviour of relrisk.ppp is to treat them as case-control data, where the points belonging to the first type are treated as controls (e.g. non-infected people), and the points of the second type are treated as cases (e.g. infected people). The ratio of intensities (cases divided by controls) is estimated as an image.
If you don't want this to happen, set the argument casecontrol=FALSE and then relrisk.ppp will always return a list of images, with one image for each possible mark. Each image gives the spatially-varying probability of that type of point.
It's all explained in help(relrisk.ppp) or in the book.

Why is string interpolation named the way it is?

The term interpolation is usually used in mathematical functions when determining a function for given values, which makes perfect sense. I don't see how that applies for strings, what is being interpolated? Am I missing something obvious?
Interpolation in mathematics is simply working out the things between two points(a). For example, cubic spline fitting over a series of points will give you a curve of some description (I consider a straight line to be a degenerate curve here so don't bother pointing out that some formulae generate such a beast) between each set of points, even though you have no actual data there.
Contrast this with extrapolation which will give you data beyond the endpoints. An example of that is seeing that, based on history, the stock market indices rise at x percent per annum so, in a hundred years, will be much higher than they are now.
So it's a short step to the most likely explanation as to why variable substitution within strings is called interpolation, since you're changing things within the bounds of the data:
xyzzy="42"
plugh="abc${xyzzy}xyz"
// now plugh is equal to "abc42xyz"
(a) The actual roots of the word are Latin inter + polare, those translating to "within" and "polish" (in the sense of modify or improve). See here for more detail.

How to find the locus of common points of two segments that belong to the same line

i read various related answers. In particular here (How do you detect where two line segments intersect?) it is greatly explained how to find the intersection between two segments and to check for parallelism and if they belong to the same line. I wrote a fortran program following that great idea. The problem now consists of finding the union segment when the two segments belong to the same line. Here I found a C++ code (Detecting coincident subset of two coincident line segments) but it is not explained and I cant read C++ but only Fortran (here is a useful image depiction the problem, posted in another question but with no useful answer http://judark.myweb.hinet.net/parallel.JPG ). What is the best language-agnostic algorithm to find the locus of common points (i.e. the union segment, i.e. the two points defining this union) of two segments belonging to the same line? I have managed to do it with planty of "if" computing all the manhattan distances between the points (http://en.wikipedia.org/wiki/Taxicab_geometry) but I was wondering if there is a better way to do it.
Thanks
A.
If the two segments are on the same line and do overlap, then the union is simply the segment between those two of the four end points which are farthest apart from one another. So simply compute all squared distances (no need to compute square roots) and identify the pair with maximal distance. This approach handles many degenerate cases nicely, including the case where all 4 points coincide and the union of two equal points is simply that point.

Decomposition to Convex Polygons

This question is a little involved. I wrote an algorithm for breaking up a simple polygon into convex subpolygons, but now I'm having trouble proving that it's not optimal (i.e. minimal number of convex polygons using Steiner points (added vertices)). My prof is adamant that it can't be done with a greedy algorithm such as this one, but I can't think of a counterexample.
So, if anyone can prove my algorithm is suboptimal (or optimal), I would appreciate it.
The easiest way to explain my algorithm with pictures (these are from an older suboptimal version)
What my algorithm does, is extends the line segments around the point i across until it hits a point on the opposite edge.
If there is no vertex within this range, it creates a new one (the red point) and connects to that:
If there is one or more vertices in the range, it connects to the closest one. This usually produces a decomposition with the fewest number of convex polygons:
However, in some cases it can fail -- in the following figure, if it happens to connect the middle green line first, this will create an extra unneeded polygon. To this I propose double checking all the edges (diagonals) we've added, and check that they are all still necessary. If not, remove it:
In some cases, however, this is not enough. See this figure:
Replacing a-b and c-d with a-c would yield a better solution. In this scenario though, there's no edges to remove so this poses a problem. In this case I suggest an order of preference: when deciding which vertex to connect a reflex vertex to, it should choose the vertex with the highest priority:
lowest) closest vertex
med) closest reflex vertex
highest) closest reflex that is also in range when working backwards (hard to explain) --
In this figure, we can see that the reflex vertex 9 chose to connect to 12 (because it was closest), when it would have been better to connect to 5. Both vertices 5 and 12 are in the range as defined by the extended line segments 10-9 and 8-9, but vertex 5 should be given preference because 9 is within the range given by 4-5 and 6-5, but NOT in the range given by 13-12 and 11-12. i.e., the edge 9-12 elimates the reflex vertex at 9, but does NOT eliminate the reflex vertex at 12, but it CAN eliminate the reflex vertex at 5, so 5 should be given preference.
It is possible that the edge 5-12 will still exist with this modified version, but it can be removed during post-processing.
Are there any cases I've missed?
Pseudo-code (requested by John Feminella) -- this is missing the bits under Figures 3 and 5
assume vertices in `poly` are given in CCW order
let 'good reflex' (better term??) mean that if poly[i] is being compared with poly[j], then poly[i] is in the range given by the rays poly[j-1], poly[j] and poly[j+1], poly[j]
for each vertex poly[i]
if poly[i] is reflex
find the closest point of intersection given by the ray starting at poly[i-1] and extending in the direction of poly[i] (call this lower bound)
repeat for the ray given by poly[i+1], poly[i] (call this upper bound)
if there are no vertices along boundary of the polygon in the range given by the upper and lower bounds
create a new vertex exactly half way between the lower and upper bound points (lower and upper will lie on the same edge)
connect poly[i] to this new point
else
iterate along the vertices in the range given by the lower and upper bounds, for each vertex poly[j]
if poly[j] is a 'good reflex'
if no other good reflexes have been found
save it (overwrite any other vertex found)
else
if it is closer then the other good reflexes vertices, save it
else
if no good reflexes have been found and it is closer than the other vertices found, save it
connect poly[i] to the best candidate
repeat entire algorithm for both halves of the polygon that was just split
// no reflex vertices found, then `poly` is convex
save poly
Turns out there is one more case I didn't anticipate: [Figure 5]
My algorithm will attempt to connect vertex 1 to 4, unless I add another check to make sure it can. So I propose stuffing everything "in the range" onto a priority queue using the priority scheme I mentioned above, then take the highest priority one, check if it can connect, if not, pop it off and use the next. I think this makes my algorithm O(r n log n) if I optimize it right.
I've put together a website that loosely describes my findings. I tend to move stuff around, so get it while it's hot.
I believe the regular five pointed star (e.g. with alternating points having collinear segments) is the counterexample you seek.
Edit in response to comments
In light of my revised understanding, a revised answer: try an acute five pointed star (e.g. one with arms sufficiently narrow that only the three points comprising the arm opposite the reflex point you are working on are within the range considered "good reflex points"). At least working through it on paper it appears to give more than the optimal. However, a final reading of your code has me wondering: what do you mean by "closest" (i.e. closest to what)?
Note
Even though my answer was accepted, it isn't the counter example we initially thought. As #Mark points out in the comments, it goes from four to five at exactly the same time as the optimal does.
Flip-flop, flip flop
On further reflection, I think I was right after all. The optimal bound of four can be retained in a acute star by simply assuring that one pair of arms have collinear edges. But the algorithm finds five, even with the patch up.
I get this:
removing dead ImageShack link
When the optimal is this:
removing dead ImageShack link
I think your algorithm cannot be optimal because it makes no use of any measure of optimality. You use other metrics like 'closest' vertices, and checking for 'necessary' diagonals.
To drive a wedge between yours and an optimal algorithm, we need to exploit that gap by looking for shapes with close vertices which would decompose badly. For example (ignore the lines, I found this on the intertubenet):
concave polygon which forms a G or U shape http://avocado-cad.wiki.sourceforge.net/space/showimage/2007-03-19_-_convexize.png
You have no protection against the centre-most point being connected across the concave 'gap', which is external to the polygon.
Your algorithm is also quite complex, and may be overdoing it - just like complex code, you may find bugs in it because complex code makes complex assumptions.
Consider a more extensive initial stage to break the shape into more, simpler shapes - like triangles - and then an iterative or genetic algorithm to recombine them. You will need a stage like this to combine any unnecessary divisions between your convex polys anyway, and by then you may have limited your possible decompositions to only sub-optimal solutions.
At a guess something like:
decompose into triangles
non-deterministically generate a number of recombinations
calculate a quality metric (number of polys)
select the best x% of the recombinations
partially decompose each using triangles, and generate a new set of recombinations
repeat from 4 until some measure of convergence is reached
but vertex 5 should be given preference because 9 is within the range given by 4-5 and 6-5
What would you do if 4-5 and 6-5 were even more convex so that 9 didn't lie within their range? Then by your rules the proper thing to do would be to connect 9 to 12 because 12 is the closest reflex vertex, which would be suboptimal.
Found it :( They're actually quite obvious.
*dead imageshack img*
A four leaf clover will not be optimal if Steiner points are allowed... the red vertices could have been connected.
*dead imageshack img*
It won't even be optimal without Steiner points... 5 could be connected to 14, removing the need for 3-14, 3-12 AND 5-12. This could have been two polygons better! Ouch!

Resources