Relative risk estimation in spatstat - spatstat

I am running into problems when computing the relative risk estimation (relrisk.ppp) of two point patterns: One with four marks in a rectangular region and the other with two marks in a circular region.
For the first pattern with four marks, I am able to get the relative risk and the resulting object in a large imlist with 4 elements corresponding to each mark.
However, for the second pattern, it gives a list of 10 elements, of which the first matrix v is empty with NA entries. I am breaking my head on what possibly could be wrong when the created point pattern objects seems to be identical. Any help will be appreciated. Thanks.

For your first dataset, the result is a list of image objects (a list of four objects of class im). For your second dataset, the result of relrisk.ppp is a single image (object of class im). This is the default behaviour when there are only two possible types of points (two possible mark values). See help(relrisk.ppp).
In all cases, you should just be able to plot and print the resulting object. You don't need to examine the internal data of the image.
More explanation: when there are only two possible types of points, the default behaviour of relrisk.ppp is to treat them as case-control data, where the points belonging to the first type are treated as controls (e.g. non-infected people), and the points of the second type are treated as cases (e.g. infected people). The ratio of intensities (cases divided by controls) is estimated as an image.
If you don't want this to happen, set the argument casecontrol=FALSE and then relrisk.ppp will always return a list of images, with one image for each possible mark. Each image gives the spatially-varying probability of that type of point.
It's all explained in help(relrisk.ppp) or in the book.

Related

How to identify joints in the profile of a shape?

I'm working on a system to automatically take 2D profiles of components and assemble them into 3D shapes.
Imagine given these pieces:
You want to make this shape:
I'm highlighting one of the components to show how they fit together.
I'm open to any suggestions on how to go about doing this but the current approach I'm attempting first finds joints that may fit together just by looking at the 2D profile.
How could I go about identifying the "tabs" from the polyline profile?
The same technique should also work on assemblies like such:
see How to compare two shapes?
so you basically trying to find the "same" sequences in polylines encoded in the polar increment format (turn angle, line length) and then just check if relative position of matched sequences are the same in both shapes ...
Beware that the locks might have some gap between the joined shapes to ensure assembly is possible... in same case the gap might be even negative (overlap) depends on material and function so You need to compare the sequences with some margin ...
Also I would divide each shape into its sides to speed up the process as the lock is most likely not crossing sides ...
You may define the "code" for a tab. For example:
3,C,5,C,3 would mean: Three units length, then turn 90º counter-clockwise, then 5 units length, then turn 90º counter-clockwise, then 3 units length.
Of course more identifiers than C can be used, for different angles and so.
A tab in another piece that fits in the tab of the first piece has the same (or very similar) 3,C,5,C,3 code
So, finding same code in both pieces may be a fit. Check if adjacents codes in both pieces also fit, and you're done.
Notice that pieces can be rotated. This case doesn't change the code, but may change the order of adjacents codes.

Recognizing license plate characters using template characters in Python

For a university project I have to recognize characters from a license plate. I have to do this using python 3. I am not allowed to use OCR functions or use functions that use deep learning or neural networks. I have reached the point where I am able to segment the characters from a license plate and transform them to a uniform format. A few examples of segmented characters are here.
The format of the segmented characters is very dependent on the input. However, I can easily convert this to uniform dimensions using opencv. Additionally, I have a set of template characters and numbers that I can use to predict what character / number it is.
I therefore need a metric to express the similarity between the segmented character and the reference image. In this way, I can say that the reference image with the highest similarity score matches the segmented character. I have tried the following ways to compute the similarity.
For these operations I have made sure that the reference characters and the segmented characters have the same dimensions.
A bitwise XOR-operator
Inverting the reference characters and comparing them pixel by pixel. If a pixel matches increment the similarity score, if a pixel does not match decrement the similarity score.
hash both the segmented character and the reference character using 'imagehash'. Consequently comparing the hashes and see which ones are most similar.
None of these methods succeed to give me an accurate prediction for all characters. Most characters are usually correctly predicted. However, the program confuses characters like 8-B, D-0, 7-Z, P-R consistently.
Does anybody have an idea how to predict the segmented characters? I.e. defining a better similarity score.
Edit: Unfortunately, cv2.matchTemplate and cv2.matchShapes are not allowed for this assignment...
The general procedure for comparing two images consists in the extraction of features from the two images and their subsequent comparison. What you are actually doing in the first two methods is considering the value of every pixel as a feature. The similarity measure is therefore a distance-computation on a space of very high dimension. This methods are, however, subject to noise and this requires very big datasets in order not to obtain acceptable results.
For this reason, usually one attempts to reduce the space dimensionality. I'm not familiar with the third method, but it seems to go in this direction.
A way to reduce the space dimensionality consists in defining some custom features meaningful for the problem you are facing.
A possibility for the character classification problem could be to define features that measure the response of the input image on strategic subshapes of the characters (an upper horizontal line, a lower one, a circle in the upper part of the image, a diagonal line, etc.).
You could define a minimal set of shapes that, combined together, can generate every character. Then you should retrieve one feature for each shape, by measuring the response (i.e., integrating the signal of the input image inside the shape) of the original image on that particular shape. Finally, you should determine the class which the image belongs to by taking the nearest reference point in this, smaller, space of the features.

Dependency in multidimensional marked point patterns

As I understand, currently, if we have multi-type point pattern we can determine dependencies between points of various marks using functions like Jmulti, Gmulti etc.
Now, if each point is associated with multiple marks (say, as a data frame where each column is a mark variable) then how do we find dependency between points of different mark variables? Note that in this case, a point could have two different marks but have the same spatial coordinate.
I think in this case, the number of points having the same coordinates but different marks is in some sense a measure of dependency between the point patterns of different mark variables, but I am not sure if there are methods to do this analysis in spatstat.
Thanks for your clarification.
This is discussed in Chapter 15 of the spatstat book.
However I think you may be confusing two different things: (1) a point pattern in which each point carries several different mark variables, so that the marks for the pattern are represented by a data frame with one row for each point and one column for each mark variable; and (2) a marked point pattern in which there may be several points that have the same spatial coordinate but different mark values.
An example of (1) is the finpines dataset in spatstat in which each tree location is marked by the tree's height and diameter. An example of (2) would be a spatial pattern of road accidents in which each vehicle is represented by a point, so that two-vehicle accidents are represented by two points at the same location, perhaps with different labels.
To deal with (1), you could use functions like Kmulti, Gmulti, Jmulti. These functions always compare two groups of points, identified by the arguments I and J which can be logical vectors. You can define any two subsets of your point pattern as the subsets I and J. For example in the finpines data you could define I <- with(marks(finpines), height > 10 * diameter) which would select all the trees whose height in metres is greater than 10 times diameter in cm. and similarly make another, different rule for J.
Other ways of investigating dependence in marked point patterns include the mark correlation function markcorr, nearest neighbour correlation nncorr, the conditional moments Emark, Vmark and other tools described in Chapter 15.
Finally a caution that summary functions do not "determine" dependence; they are only measures of correlation.

Quasi-Monte-Carlo vs. variable dimensionality?

I've been looking through the Matlab documention on using quasi-random sampling of N-dimensional unit cubes. This represents a problem with N stochastic parameters. Based on the fact that it is a unit cube, I presume that I need to use the inverse CDF of each parameter to map from the [0,1] domain to the value range of each parameter.
I would like to try this on a problem for which I now use Monte Carlo. Unfortunately, the problem I'm analyzing does not have a fixed number of dimensions. For each instantiation of the problem, I generate a variable number of widgets (say) using a Poisson distribution. Only after that do I randomly generate the parameters for each widget. That whole process yields one instance of the problem to be analyzed, so the number of parameters varies from one instance to the next.
Is this kind of problem still amenable to Quasi-Monte-Carlo?
What I used once was to get highest possible dimension of the problem d, generate Sobol sequence in d and use whatever number of points necessary for a particular sampling. I would say it helped somewhat...
From talking to a much smarter colleague, we need to consider the various combinations of widget counts for each widget type. For example, if we have 2 of widget type#1, 4 of widget type #2, 1 of widget type #3, etc., that constitutes one combination. QMC can be applied to that one combination. We are assuming that number of widget#i is independent of the number of widget#j for i<>j, so the probability of each combination is just the product of p(2 widgets of type#1), p(4 widgets of type#2), p(1 widget of type#3), etc. The individual probabilities are easy to get from their Poisson distributions (or their flat distributions, or whatever distribution is being used). If there are N widget types, this is just a joint PMF in N-space. This probability is then used to weight the QMC result for that particular combination. Note that even when the exactly combination is nailed down, QMC is still needed because there each widget is associated with 3 stochastic parameters.

Interact with the five millions of dots on a screen

We need to display 5 millions of dots (or very simple graphics objects) on a screen at the same time and we want to interact with each of the dots (e.g., change their colors or drag/drop them).
To achieve this, we usually run a for-loop through 5 millions items in the worst case O(N) to access and change the states of the dot, according to the mouse coordinates (x, y). Due to the huge number of the objects, this approach causes lots of overhead (we have to run the for-loop of five millions whenever a user selects a dot). I have already tested this approach but it was almost impossible to make an interactive tool with this. Is there anyway to rapidly and efficiently access the dots without running the million for-loop and causing this performance problem?
You really haven’t given many details
These questions quickly come to mind:
Are dots the same size?
Are dots uniformly disbursed on the canvas?
If one dot is “selected”, is only that one dot recolored or moved?
Why are you violating good data visualization rules by overwhelming the user? :)
With this lack of specificity in mind...
...Divide and conquer:
Divide your dot array into multiple parts.
Divide your dots onto multiple overlaying canvases.
Divide your dot array into multiple parts
This will allow you to examine far fewer array elements when searching for the 1 you need.
Create a container object with 1980 elements representing the 1980 “x” coordinates on the screen.
var container={};
for(var x=1;x<=1980;x++){
container[x]=[];
}
Each container element is an array of dot objects with their dot centers on that x-coordinate.
Every dot object has enough info to locate and redraw itself.
A dot at x-coordinate == 125 might be defined like this:
{x:125,y:100,r:2,color:"red",canvas:1};
When you want to add a dot, push a dot object to the appropriate "x" element of the container object.
// add a dot with x screen coordinate == 952
container[952].push({x:952,y:100,r:2,color:"red",canvas:1});
Dots can be drawn based on the dot objects:
function drawDot(dot,context){
context.beginPath();
context.fillStyle=dot.color;
context.arc(dot.x,dot.y,dot.r,0,PI2,false);
context.closePath();
context.fill();
}
When the user selects a dot, you can find it quickly by pulling the few container elements around the X where the user clicked:
function getDotsNearX(x,radius){
// pull arrays from "x" plus/minus "radius"
var dotArrays=[]
for(var i=x-radius;i<=x+radius;i++){
dotArrays.push(container[i]);
}
return(dotArray);
}
Now you can process the dots in these highly targeted arrays instead of all 5 million array elements.
When the user moves a dot to a new position, just pull the dot object out of its current container element and push it into the appropriate new "x" container element.
Divide your dots onto multiple overlaying canvases
To improve drawing performance, you will want to disburse your dots across multiple canvas overlayed on each other.
The dot element includes a canvas property to identify on which canvas this dot will be drawn.
Have you already taken a look at the KineticJS framework? There is a very impressive stress-test with exactly the same drag-and-drop functionality you're looking for. If you use KineticJS, you can access every single dot with the following eventlistener, and of course change its color, size etc.:
stage.on('mousedown', function(evt) {
var circle = evt.targetNode;
});

Resources