Generating clustered spatstat marks for a ppp object - spatstat

This question is very close to what has been asked here. The answer is great if we want to generate random marks to an already existing point pattern - we draw from a multivariate normal distribution and associate with each point.
However, I need to generate marks that follows the marks given in the lansing dataset that comes with spatstat for my own point pattern. In other words, I have a point pattern without marks and I want to simulate marks with a definite pattern (for example, to illustrate the concept of segregation for my own data). How do I make such marks? I understand the number of points could be different between lansing and my data set but I am allowed to reduce the window or create more points. Thanks!

Here is another version of segregation in four different rectangular
regions.
library(spatstat)
p <- c(.6,.2,.1,.1)
prob <- rbind(p,
p[c(4,1:3)],
p[c(3:4,1:2)],
p[c(2:4,1)])
X <- unmark(spruces)
labels <- factor(LETTERS[1:4])
subwins <- quadrats(X, 2, 2)
Xsplit <- split(X, subwins)
rslt <- NULL
for(i in seq_along(Xsplit)){
Y <- Xsplit[[i]]
marks(Y) <- sample(labels, size = npoints(Y),
replace = TRUE, prob = prob[i,])
rslt <- superimpose(rslt, Y)
}
plot(rslt, main = "", cols = 1:4)
plot(subwins, add = TRUE)

Segregation refers to the fact that one species predominates in a
specific part of the observation window. An extreme example would be to
segregate completely based on e.g. the x-coordinate. This would generate strips
of points of different types:
library(spatstat)
X <- lansing
Y <- cut(X, X$x, breaks = 6, labels = LETTERS[1:6])
plot(Y, cols = 1:6)
Without knowing more details about the desired type of segregation it is
hard to suggest something more useful.

Related

Anova for multiple point patterns not working for Strauss model

I just started getting into spatial analysis and am fitting some models to my data. My main goal is to test for spatial regularity (whether there is inhibition between points).
I created my hyperframe for the data below. There are 6 point patterns (Areas), 4 in subhabitat 1, and 2 in subhabitat 2.
ALL_ppp <- list(a1ppp, a2ppp, a3ppp, a4ppp, a5ppp, a6ppp)
H <- hyperframe(Area = c("A1","A2","A3","A4","A5","A6"), Subhabitat = c("sbh1","sbh1","sbh1","sbh1","sbh2","sbh2"), Points = ALL_ppp )
I then created some models. This model fits a Strauss process with a different interaction radius for each area, with intensity depending on subhabitat type. It is very similar to the example in the book on page 700.
radii <- c(mean(area1$diameter), mean(area2$diameter),mean(area3$diameter),mean(area4$diameter),mean(area5$diameter),mean(area6$diameter))
Rad <- hyperframe(R=radii)
Str <- with(Rad, Strauss(R))
Int <- hyperframe(str=Str)
fittest8 <- mppm(Points ~ Subhabitat, H, interaction=Int, iformula = ~str:Area)
I would like to conduct a formal test for significance for the Strauss interaction parameters using anova.mppm to test for regularity. However, I am not sure if I am doing this properly, as I cannot seem to get this to work. I have tried:
fittest8 <- mppm(Points ~ Subhabitat, H, interaction=Int, iformula = ~str:Area)
fitex <- mppm(Points ~ Subhabitat, H)
anova.mppm(fittest8, fitex, test = "Chi")
I get the error "Error: Coefficient ‘str’ is missing from new.coef" and cannot find a way to resolve this. Any advice would be greatly appreciated.
Thanks!
First, please learn how to make a minimal reproducible example. This will make it easier for people to help you solve the problem, without having to guess what was in your data.
In your example, the columns named Area and Subhabitat in the hyperframe H are character vectors, but in your code, the call to mppm would require that they are factors. I assume you converted them to factors in order to be able to fit the model fittest8. (Another reason to make a working example)
You said that your example was similar to one on page 700 of the spatstat book which does work. In that case, a good strategy is to modify your example to make it as similar as possible to the example that works, because this will narrow down the possible cause.
A working example of the problem, similar to the one in the book, is:
Str <- hyperframe(str=with(simba, Strauss(mean(nndist(Points)))))
fit1 <- mppm(Points ~ group, simba, interaction=Str, iformula=~str:group)
fit0 <- mppm(Points ~ group, simba)
anova(fit0, fit1, test="Chi")
which yields the same error Error: Coefficient ‘str’ is missing from new.coef
The simplest way to avoid this is to replace the interaction formula ~str:group by str+str:group:
fit1x <- mppm(Points ~ group, simba, interaction=Str,
iformula = ~str + str:group)
anova(fit0, fit1x, test="Chi")
or in your example
fittest8X <- mppm(Points ~ Subhabitat, H, interaction=Int,
iformula=~str + str:Area)
anova(fittest8X, fitex, test="Chi")
Note that fittest8X and fittest8 are equivalent models but are expressed in a slightly different way.
The interaction formula and the trend formula are connected in a complicated way and the software is not always successful in disentangling them. If you get this kind of problem again, try different versions of the interaction formula.

Specifying random effect nested under an interaction of fixed effects

Probably an easy one.
I have data with fixed and random effects I'd like to fit a mixed effects model to:
set.seed(1)
df <- data.frame(group = c(rep("A",40),rep("B",40)),
treatment = rep(c(rep("T",20),rep("CT",20)),2),
class = c(rep("AT1",10),rep("ACT1",10),rep("AT2",10),rep("ACT2",10),rep("BT1",10),rep("BCT1",10),rep("BT2",10),rep("BCT2",10)),
value = rnorm(80),
stringsAsFactors = F)
df$group <- factor(df$group, levels = c("A","B"))
df$treatment <- factor(df$treatment, levels = c("CT","T"))
The fixed effects are group and treatment and the random effect is class, which to my understanding is nested within the group and treatment combinations.
The model I want to fit is:
value ~ group*treatment
Where the effect of interest if the group:treatment interaction.
Of course I want to account for class as a random effect, but I can't seem to find what the syntax for that is. I tried:
(1|group*treatment/class) and (1|group:treatment/class) but both give an error.
Defining a group:treatment column in df:
df <- df %>% dplyr::mutate(group_treatment = paste0(group,"_",treatment))
And fitting:
fit <- lmer(value ~ group*treatment + (1|group_treatment/class), data = df)
Does seem to work, but I'm wondering if that's the only way or whether there's a more explicit syntax for such cases of random effect nesting.
Any idea?

Random effects modeling using mgcv and using lmer. Basically identical fits but VERY different likelihoods and DF. Which to use for testing?

I am aware that there is a duality between random effects and smooth curve estimation. At this link, Simon Wood describes how to specify random effects using mgcv. Of particular note is the following passage:
For example if g is a factor then s(g,bs="re") produces a random coefficient for each level of g, with the radndom coefficients all modelled as i.i.d. normal.
After a quick simulation, I can see this is correct, and that the model fits are almost identical. However, the likelihoods and degrees of freedom are VERY different. Can anyone explain the difference? Which one should be used for testing?
library(mgcv)
library(lme4)
set.seed(1)
x <- rnorm(1000)
ID <- rep(1:200,each=5)
y <- x
for(i in 1:200) y[which(ID==i)] <- y[which(ID==i)] + rnorm(1)
y <- y + rnorm(1000)
ID <- as.factor(ID)
# gam (mgcv)
m <- gam(y ~ x + s(ID,bs="re"))
gam.vcomp(m)
coef(m)[1:2]
logLik(m)
# lmer
m2 <- lmer(y ~ x + (1|ID))
sqrt(VarCorr(m2)$ID[1])
summary(m2)$coef[,1]
logLik(m2)
mean( abs( fitted(m)-fitted(m2) ) )
Full disclosure: I encountered this problem because I want to fit a GAM that also includes random effects (repeated measures), but need to know if I can trust likelihood-based tests under those models.

Applying function to cartesian product of two unequal vectors

I am trying to avoid looping by using an documented apply function, but have not been able to find any examples to suit my purpose. I have two vectors, x which is (1 x p) and y which is (1 x q) and would like to feed the Cartesian product of their parameters into a function, here is a parsimonious example:
require(kernlab)
x = c("cranapple", "pear", "orange-aid", "mango", "kiwi",
"strawberry-kiwi", "fruit-punch", "pomegranate")
y = c("apple", "cranberry", "orange", "peach")
sk <- stringdot(type="boundrange", length = l, normalized=TRUE)
sk_map = function(x, y){return(sk(x, y))}
I realize I could use an apply function over one dimension and loop for the other, but I feel like there has to be a way to do it in one step... any ideas?
Is this what you had in mind:
sk <- stringdot(type="boundrange", length = 2, normalized=TRUE)
# Create data frame with every combination of x and y
dat = expand.grid(x=x,y=y)
# Apply sk by row
sk_map = apply(dat, 1, function(dat_row) sk(dat_row[1],dat_row[2]))
You can use the outer function for this if your function is vectorized, and you can use the Vectorize function to create a vectorized function if it is not.
outer(x,y,FUN=sk)
or
outer(x,y, FUN=Vectorize(sk))

Evenly distributing duplicated strings among several lists using R

So let's say I have a character vector of length 150000. Strings in the vector are not unique, in fact they're sorta normally distributed with the most frequent string being present 28 times, another 24, and over 1000 present more than 5 times. I want to divide the vector into 28 smaller vectors, distributing the strings among the smaller vectors such that no string is present more than twice in each smaller vector, ideally only once (or not present). I need to preserve every string, so I can't just do !duplicated() Ideally the vectors would be about the same size.
How the heck would I do this?
I'm thinking something like start adding to the first vector until you encounter the first non-unique string, skip it, continue filling skipping non-unique strings until you've reached 150000/28 = 5357, then proceed through the other vectors the same way, removing strings from the parent vector once they've been allocated to a smaller one? Any issues with this? Efficient ways of doing it without a nasty forest of for loops?
This seemed like a pretty interesting problem, although maybe it only seemed interesting because I misunderstood it- the solution I've got here creates length of character vector / frequency of most frequent item sub vectors, and then puts each string into f of those sub-vectors, where f is that string's frequency. This is possibly more complicated than what you were actually asking for.
library(plyr)
# I created a file with 10000 random strings and a roughly similar frequency
# distribution using python, and now I can't remember exactly what I did
strings <- read.csv("random_strings.txt", header=FALSE,
stringsAsFactors=FALSE)$V1
freq_table <- table(strings)
num_sub_vectors <- max(freq_table)
# Create a list of empty character vectors
split_list <- alply(1:num_sub_vectors, 1, function(x) return(character(0)))
for (s in names(freq_table)) {
# Put each string into f of the sub-vectors, where f is the string's
# frequency
freq <- freq_table[[s]]
# Choose f random indexes to put this string into
sub_vecs <- sample(1:num_sub_vectors, freq)
for (sub in sub_vecs) {
split_list[[sub]] <- c(split_list[[sub]], s)
}
}
To test that it's worked, pick a string, s or a frequency f, and check that s occurs in f of the sub vectors. Repeat until you're confident.
> head(freq_table[freq_table==15])
strings
ad ak bj cg cl cy
15 15 15 15 15 15
> sum(sapply(split_list, function(x) "ad" %in% x))
[1] 15
This meets your requirements (each string only once per subvector) fairly concisely by just tallying how often each string occurs and then partitioning based on "strings that appear i or more times":
inputs <- c("foo", "bar", "baz", "bar", "baz", "bar", "bar")
histo <- table(inputs)
lapply(1:max(histo), function(i) { names(histo)[histo>=i] }
This will of course yield partitions of wildly varying sizes, but you're not very clear on what your requirements in that area are.

Resources