Related
I have four variables: a point process pattern of species
occurrence, rivers, ponds polygons and land image data. I would like
to make a dataset similar to that of Murchison dataset using these
shape layers but I have failed to manoeuvre.
I need to make a data frame from these polygon shape layers of
rivers, ponds and land cover images together with the point pattern
data of species occurrences I tried using a hyper frame but I am
unable to use a distance function from the river or the ponds.
rivers <- readShapespatial("river.shp") ponds <-
readShapeSpatial(pond.shp") fro <- read.table("fro.txt",
header=TRUE) image <- raster("image.tif")
I would like to combine
these four files as a single spatstat object like that of Murchison
data which comes with spatstat package. if I can put them in a frame
then ponds, land cover, rivers are covariates.
I have used analyst function but return errors that they can not be
used as covariates, fore example x is a list can not be used as
covariates particularly for ponds and rivers when I call the dist
function.
Why do you need a hyperframe? You refer to murchison data and that is not
a hyperframe. It simply a standard R list (with extendend classes
listof, anylist and solist for better printing and plotting in
spatstat, but the actual data structure is just a plain list).
To recreate the murchison data:
library(spatstat)
P <- murchison$gold # Points
L <- murchison$faults # Lines
W <- murchison$greenstone # "Windows
mur <- solist(points = P, lines = L, windows = W)
mur
#> List of spatial objects
#>
#> points:
#> Planar point pattern: 255 points
#> window: rectangle = [352782.9, 682589.6] x [6699742, 7101484] metres
#>
#> lines:
#> planar line segment pattern: 3252 line segments
#> window: rectangle = [352782.9, 682589.6] x [6699742, 7101484] metres
#>
#> windows:
#> window: polygonal boundary
#> enclosing rectangle: [352782.9, 681699.6] x [6706467, 7100804] metres
To use the data in a model they don’t have to be collected in a single list,
but it may be convenient. The following two models are identical:
(mod1 <- ppm(P ~ W))
#> Nonstationary Poisson process
#>
#> Log intensity: ~W
#>
#> Fitted trend coefficients:
#> (Intercept) WTRUE
#> -21.918688 3.980409
#>
#> Estimate S.E. CI95.lo CI95.hi Ztest Zval
#> (Intercept) -21.918688 0.1666667 -22.24535 -21.592028 *** -131.51213
#> WTRUE 3.980409 0.1798443 3.62792 4.332897 *** 22.13252
(mod2 <- ppm(points ~ windows, data = mur))
#> Nonstationary Poisson process
#>
#> Log intensity: ~windows
#>
#> Fitted trend coefficients:
#> (Intercept) windowsTRUE
#> -21.918688 3.980409
#>
#> Estimate S.E. CI95.lo CI95.hi Ztest Zval
#> (Intercept) -21.918688 0.1666667 -22.24535 -21.592028 *** -131.51213
#> windowsTRUE 3.980409 0.1798443 3.62792 4.332897 *** 22.13252
If you insist on a hyperframe you should have a column for each measured
variable, but these are primarily used for when you have several replications
of an experiment, and is not of much use here. The function call is simply:
murhyp <- hyperframe(points = P, lines = L, windows = W)
I have two points pattern (ppp) objects p1 and p2. There are X and Y points in p1 and p2 respectively. I have fitted a ppm model (with location coordinates as independent variables) in p1 and then used it to predict "intensity" for each of the Y points in p2.
Now I want to get the probability for event occurrence at that point/zone in p2. How can I use the predicted intensities for this purpose?
Can I do this using Spatstat?
Are there any other alternative.
The intensity is the expected number of points per unit area. In small areas (such as pixels) you can just multiply the intensity by the pixel area to get the probability of presence of a point in the pixel.
fit <- ppm(p1, .......)
inten <- predict(fit)
pixarea <- with(inten, xstep * ystep)
prob <- inten * pixarea
This rule is accurate provided the prob values are smaller than about 0.4.
In a larger region W, the expected number of points is the integral of the intensity function over that region:
EW <- integrate(inten, domain=W)
The result EW is a numeric value, the expected total number of points in W. To get the probability of at least one point,
P <- 1- exp(-EW)
You can also compute prediction intervals for the number of points, using predict.ppm with argument interval="prediction".
Your question, objective and current method are not very clear to me. It
would be beneficial, if you could provide code and graphics, that explains
more clearly what you have done, and what you are trying to obtain. If you
cannot share your data you can use e.g. the built-in dataset chorley as an
example (or simply simulate artificial data):
library(spatstat)
plot(chorley, cols = c(rgb(0,0,0,1), rgb(.8,0,0,.2)))
X <- split(chorley)
X1 <- X$lung
X2 <- X$larynx
mod <- ppm(X1 ~ polynom(x, y, 2))
inten <- predict(mod)
summary(inten)
#> real-valued pixel image
#> 128 x 128 pixel array (ny, nx)
#> enclosing rectangle: [343.45, 366.45] x [410.41, 431.79] km
#> dimensions of each pixel: 0.18 x 0.1670312 km
#> Image is defined on a subset of the rectangular grid
#> Subset area = 315.291058349571 square km
#> Subset area fraction = 0.641
#> Pixel values (inside window):
#> range = [0.002812544, 11.11172]
#> integral = 978.5737
#> mean = 3.103715
plot(inten)
Predicted intensities at the 58 locations in X2
intenX2 <- predict.ppm(mod, locations = X2)
summary(intenX2)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 0.1372 4.0025 6.0544 6.1012 8.6977 11.0375
These predicted intensities intenX2[i] say that in a small neighbourhood
around each point X2[i] the estimated number of points from X1 is Poisson
distributed with mean intenX2[i] times the area of the small neighbourhood.
So in fact you have estimated a model where in any small area you have a
probability distribution for any number of points happening in that area. If
you want the distribution in a bigger region you just have to integrate the
intensity over that region.
To get a better answer you have to provide more details about your problem.
Created on 2018-12-12 by the reprex package (v0.2.1)
This question is very close to what has been asked here. The answer is great if we want to generate random marks to an already existing point pattern - we draw from a multivariate normal distribution and associate with each point.
However, I need to generate marks that follows the marks given in the lansing dataset that comes with spatstat for my own point pattern. In other words, I have a point pattern without marks and I want to simulate marks with a definite pattern (for example, to illustrate the concept of segregation for my own data). How do I make such marks? I understand the number of points could be different between lansing and my data set but I am allowed to reduce the window or create more points. Thanks!
Here is another version of segregation in four different rectangular
regions.
library(spatstat)
p <- c(.6,.2,.1,.1)
prob <- rbind(p,
p[c(4,1:3)],
p[c(3:4,1:2)],
p[c(2:4,1)])
X <- unmark(spruces)
labels <- factor(LETTERS[1:4])
subwins <- quadrats(X, 2, 2)
Xsplit <- split(X, subwins)
rslt <- NULL
for(i in seq_along(Xsplit)){
Y <- Xsplit[[i]]
marks(Y) <- sample(labels, size = npoints(Y),
replace = TRUE, prob = prob[i,])
rslt <- superimpose(rslt, Y)
}
plot(rslt, main = "", cols = 1:4)
plot(subwins, add = TRUE)
Segregation refers to the fact that one species predominates in a
specific part of the observation window. An extreme example would be to
segregate completely based on e.g. the x-coordinate. This would generate strips
of points of different types:
library(spatstat)
X <- lansing
Y <- cut(X, X$x, breaks = 6, labels = LETTERS[1:6])
plot(Y, cols = 1:6)
Without knowing more details about the desired type of segregation it is
hard to suggest something more useful.
Question 1:
I am trying to work with the plot() function on an AggExResult object and the clusters in the documentation (https://cran.r-project.org/web/packages/apcluster/apcluster.pdf) work as expected.
In my own data, I have an additional column in the input which provides a pre-defined “target” for classification purposes, and I am wondering if there is a way to have the dendogram labels highlighted by color (e.g. red=class 0, blue=class 1) with the class of the targets being factors (or characters). I am ultimately trying to visually display how many clusters contain "pure" vs. "mixed" classes. Here is some slightly modified code from the online documentation to show roughly what my input data looks like:
cl1Targ <- matrix(nrow=50,ncol=1)
for(c1t in 1:nrow(cl1Targ)){ cl1Targ[c1t] <- as.factor(0) }
cl2Targ <- matrix(nrow=50,ncol=1)
for(c2t in 1:nrow(cl2Targ)){ cl2Targ[c2t] <- as.factor(1) }
## create two Gaussian clouds
#cl1 <- cbind(rnorm(50,0.2,0.05),rnorm(50,0.8,0.06))
#cl2 <- cbind(rnorm(50,0.7,0.08),rnorm(50,0.3,0.05))
cl1 <- cbind(rnorm(50,0.2,0.05),rnorm(50,0.8,0.06),cl1Targ)
cl2 <- cbind(rnorm(50,0.7,0.08),rnorm(50,0.3,0.05),cl2Targ)
x <- rbind(cl1,cl2)
colnames(x) <- c('Column 1','Column 2','Class_ID')
## compute similarity matrix (negative squared Euclidean)
sim <- negDistMat(x, r=2)
## run affinity propagation
apres <- apcluster(sim, q=0.7)
## compute agglomerative clustering from scratch
aggres1 <- aggExCluster(sim)
## plot dendrogram
plot(aggres1, main='aggres1 w/ target') #
How would I color the dendogram by the target defined in the input?
Question 2:
When I show() the example data’s APResult, I see the following:
show(apres)
APResult object
Number of samples = 100
Number of iterations = 165
Input preference = -0.01281384
Sum of similarities = -0.1222309
Sum of preferences = -0.1409522
Net similarity = -0.2631832
Number of clusters = 11
Exemplars:
8 17 24 37 43 52 58 68 92 95 99
Clusters:
Cluster 1, exemplar 8:
7 8 9 25 31 36 39 42 47 48
Cluster 2, exemplar 17:
6 11 13 15 17 18 19 23 32 35
Cluster 3, exemplar 24:
2 5 10 24 45
When I use my own data, I see the following (the row.names, which are the drugs being clustered by gene expression mean fold change values)
show(apclr2q05_mean)
APResult object
Number of samples = 1045
Number of iterations = 429
Input preference = -390.0822
Sum of similarities = -89326.99
Sum of preferences = -83477.58
Net similarity = -172804.6
Number of clusters = 214
Exemplars:
amantadine_58mg6h_fc amiodarone_147mg3d_fc clarithromycin_56mg1d_fc fluconazole_394mg5d_fc ketoconazole_114mg5d_fc ketoconazole_2274mg1d_fc
pantoprazole_1100mg1d_fc pantoprazole_1100mg3d_fc quetiapine_500mg5d_fc roxithromycin_312mg5d_fc torsemide_3mg3d_fc acetazolamide_250mg3d_fc
Clusters:
Cluster 1, exemplar amantadine_58mg6h_fc:
amantadine_58mg6h_fc promazine_100mg1d_fc cyproteroneAcetate_2500mg6h_fc danazol_2g5d_fc ivermectin_7500ug1d_fc letrozole_250mg6h_fc
mefenamicAcid_93mg3d_fc olanzapine_23mg1d_fc secobarbital_20mg6h_fc zaleplon_100mg3d_fc
Cluster 2, exemplar amiodarone_147mg3d_fc:
amiodarone_147mg3d_fc amiodarone_147mg5d_fc aspirin_375mg5d_fc betaNapthoflavone_80mg5d_fc clofibrate_130mg3d_fc finasteride_800mg5d_fc
Cluster 3, exemplar clarithromycin_56mg1d_fc:
ciprofloxacin_72mg5d_fc ciprofloxacin_450mg6h_fc clarithromycin_56mg1d_fc clarithromycin_56mg3d_fc clarithromycin_56mg5d_fc
Cluster 4, exemplar fluconazole_394mg5d_fc:
fluconazole_394mg5d_fc
Also what I would expect in terms of content but I would like to format this for reporting purposes. I have tried to export this using dput() but I get a lot of extra unnecessary information in the output file. I am wondering how I might be able to export the same type of information from above along with the object name and target classifier mentioned above into a table that would look like the following (and add the name of the object to the output):
Name of object = apclr2q05_mean
Number of samples = 1045
Number of iterations = 429
Input preference = -390.0822
Sum of similarities = -89326.99
Sum of preferences = -83477.58
Net similarity = -172804.6
Number of clusters = 214
Exemplars: Target
amantadine_58mg6h_fc 1
amiodarone_147mg3d_fc 1
clarithromycin_56mg1d_fc 1
fluconazole_394mg5d_fc 0
ketoconazole_114mg5d_fc 0
ketoconazole_2274mg1d_fc 0
Clusters:
Cluster 1, exemplar amantadine_58mg6h_fc:
Drug Target
amantadine_58mg6h_fc 1
promazine_100mg1d_fc 1
cyproteroneAcetate_2500mg6h_fc 1
danazol_2g5d_fc 0
ivermectin_7500ug1d_fc 0
Cluster 2, exemplar amiodarone_147mg3d_fc:
Drug Target
Etc…
A big THANK YOU to Ulrich for his quick response to these questions by email and we wanted to share our discussion with the community so I will let him respond with his solution so that he gets the credit he deserves :-)
As an update, I tried to implement the answer to Question 1 and the sample code works as expected, but I am having trouble getting this to work on my data. The input data has two parts. The first is a matrix with the numeric measurement data including column and row labels:
> fci[1:3,1:3]
M30596_PROBE1 AI231309_PROBE1 NM_012489_PROBE1
amantadine_58mg1d_fc 0.05630744 -0.10441722 0.41873201
amantadine_58mg6h_fc -0.42780274 -0.26222322 0.02703001
amantadine_220mg1d_fc 0.35260779 -0.09902214 0.04067055
The second is the "target" values in Factor format, each of which corresponds to same row in fci above:
> targs[1:3]
amantadine_58mg1d_fc amantadine_58mg6h_fc amantadine_220mg1d_fc
0 0 0
Levels: 0 1
From here, the tree was built as below:
# build the AggExResult:
aglomr1 <- aggExCluster(negDistMat(r=2), fci)
# convert the data
tree <- as.dendrogram(aglomr1)
# assign the color codes
colorCodes <- c("0"="red", "1"="green")
names(targs) <- rownames(fci)
xColor <- colorCodes[as.character(targs)]
names(xColor) <- rownames(fci)
# plot the colored tree
labels_colors(tree) <- xColor[order.dendrogram(tree)]
plot(tree, main="Colored Tree")
The tree was generated but the leaves were not colored. Doing some digging:
> head(xColor)
0 0 0 0 0 0
"red" "red" "red" "red" "red" "red"
That part seems to work as expected in terms of the targets having the correct colors assigned, but the rownames are not in xColor, and the line labels_colors(tree) <- xColor[order.dendrogram(tree)] does not return similar labels, but rather what appear to be row numbers, or NAs:
> head(order.dendrogram(tree))
[1] "295" "929" "488" "493" "233" "235"
> head(labels_colors(tree))
295 929 488 493 233 235
> head(xColor[order.dendrogram(tree)])
<NA> <NA> <NA> <NA> <NA> <NA>
NA NA NA NA NA NA
How would I get the line labels_colors(tree) <- xColor[order.dendrogram(tree)] to behave in the same way as the example provided? Specifically, what I am trying to show is the leaf lables such as amantadine_58mg1d_fc being highlighted in the color that corresponds to the target (0/1).
Here is my answer to your Question 1: the plot() method for 'AggExResult' objects internally uses the plot.dendrogram() method. Since this method does not allow for coloring leaves of dendrograms, this will not work. However, there is the 'dendextend' package which offers such a functionality. (BTW, I found that solution in another thread: Label and color leaf dendrogram in r) Since 'apcluster' offers some casts to 'hclust' and 'dendrogram' objects, this package's functionality can be used more or less directly.
So, here is some sample code:
library(apcluster)
## create two Gaussian clouds along with class labels 0/1
cl1 <- cbind(rnorm(50, 0.2, 0.05), rnorm(50, 0.8, 0.06))
cl2 <- cbind(rnorm(50, 0.7, 0.08), rnorm(50, 0.3, 0.05))
x <- cbind(Columns=data.frame(rbind(cl1, cl2)),
"Class_ID"=factor(as.character(c(rep(0, 50), rep(1, 50)))))
## compute similarity matrix (negative squared Euclidean)
sim <- negDistMat(x[, 1:2], r=2)
## compute agglomerative clustering from scratch
aggres1 <- aggExCluster(sim)
## load 'dendextend' package
## install.packages("dendextend") ## if not yet installed
library(dendextend)
## convert object
tree <- as.dendrogram(aggres1)
## assign color codes
colorCodes <- c("0"="red", "1"="green")
xColor <- colorCodes[x$Class_ID]
names(xColor) <- rownames(x)
## plot color-labeled tree
labels_colors(tree) <- xColor[order.dendrogram(tree)]
plot(tree)
Here is my answer to your Question 2: Sorry, no such functionality is implemented in the 'apcluster' package. And since this is quite a special request, I am reluctant to include it the package (let alone the fact that show() methods cannot have additional arguments). So, alternatively, I want to provide you with a custom function that allows for labeling/grouping exemplars and samples:
library(apcluster)
## create two Gaussian clouds along with class labels 0/1
cl1 <- cbind(rnorm(50, 0.2, 0.05), rnorm(50, 0.8, 0.06))
cl2 <- cbind(rnorm(50, 0.7, 0.08), rnorm(50, 0.3, 0.05))
x <- cbind(Columns=data.frame(rbind(cl1, cl2)),
"Class_ID"=factor(as.character(c(rep(0, 50), rep(1, 50)))))
## compute similarity matrix (negative squared Euclidean)
sim <- negDistMat(x[, 1:2], r=2)
## special show() function with labeled data
show.ExClust.labeled <- function(object, labels=NULL)
{
if (!is(object, "ExClust"))
stop("'object' is not of class 'ExClust'")
if (is.null(labels))
{
show(object)
return(invisible(NULL))
}
cat("\n", class(object), " object\n", sep="")
if (!is.finite(object#l) || !is.finite(object#it))
stop("object is not result of an affinity propagation run; ",
"it is pointless to create 'APResult' objects yourself.")
cat("\nNumber of samples = ", object#l, "\n")
if (length(object#sel) > 0)
{
cat("Number of sel samples = ", length(object#sel),
paste(" (", round(100*length(object#sel)/object#l,1),
"%)\n", sep=""))
cat("Number of sweeps = ", object#sweeps, "\n")
}
cat("Number of iterations = ", object#it, "\n")
cat("Input preference = ", object#p, "\n")
cat("Sum of similarities = ", object#dpsim, "\n")
cat("Sum of preferences = ", object#expref, "\n")
cat("Net similarity = ", object#netsim, "\n")
cat("Number of clusters = ", length(object#exemplars), "\n\n")
if (length(object#exemplars) > 0)
{
if (length(names(object#exemplars)) == 0)
{
cat("Exemplars:\n")
df <- data.frame("Sample"=object#exemplars,
Label=labels[object#exemplars])
print(df, row.names=FALSE)
for (i in 1:length(object#exemplars))
{
cat("\nCluster ", i, ", exemplar ",
object#exemplars[i], ":\n", sep="")
df <- data.frame(Sample=object#clusters[[i]],
Label=labels[object#clusters[[i]]])
print(df, row.names=FALSE)
}
}
else
{
df <- data.frame("Exemplars"=names(object#exemplars),
Label=labels[names(object#exemplars)])
print(df, row.names=FALSE)
for (i in 1:length(object#exemplars))
{
cat("\nCluster ", i, ", exemplar ",
names(object#exemplars)[i], ":\n", sep="")
df <- data.frame(Sample=names(object#clusters[[i]]),
Label=labels[names(object#clusters[[i]])])
print(df, row.names=FALSE)
}
}
}
else
{
cat("No clusters identified.\n")
}
}
## create label vector (with proper names)
label <- x$Class_ID
names(label) <- rownames(x)
## run apcluster()
apres <- apcluster(sim, q=0.3)
## show with labels
show.ExClust.labeled(apres, label)
Imagine I've got 100 numeric matrixes with 5 columns each.
I keep the names of that matrixes in a vector or list:
Mat <- c("GON1EU", "GON2EU", "GON3EU", "NEW4", ....)
I also have a vector of coefficients "coef",
coef <- c(1, 2, 2, 1, ...)
And I want to calculate a resulting vector in this way:
coef[1]*GON1EU[,1]+coef[2]*GON2EU[,1]+coef[3]*GON3EU[,1]+coef[4]*NEW4[,1]+.....
How can I do it in a compact way, using the the vector of names?
Something like:
coef*(Object(Mat)[,1])
I think the key is how to call an object from a string with his name and use and vectorial notation. But I don't know how.
get() allows you to refer to an object by a string. It will only get you so far though; you'll still need to construct the repeated call to get() on the list matrices etc. However, I wonder if an alternative approach might be feasible? Instead of storing the matrices separately in the workspace, why not store the matrices in a list?
Then you can use sapply() on the list to extract the first column of each matrix in the list. The sapply() step returns a matrix, which we multiply by the coefficient vector. The column sums of that matrix are the values you appear to want from your above description. At least I'm assuming that coef[1]*GON1EU[,1] is a vector of length(GON1EU[,1]), etc.
Here's some code implementing this idea.
vec <- 1:4 ## don't use coef - there is a function with that name
mat <- matrix(1:12, ncol = 3)
myList <- list(mat1 = mat, mat2 = mat, mat3 = mat, mat4 = mat)
colSums(sapply(myList, function(x) x[, 1]) * vec)
Here is some output:
> sapply(myList, function(x) x[, 1]) * vec
mat1 mat2 mat3 mat4
[1,] 1 1 1 1
[2,] 4 4 4 4
[3,] 9 9 9 9
[4,] 16 16 16 16
> colSums(sapply(myList, function(x) x[, 1]) * vec)
mat1 mat2 mat3 mat4
30 30 30 30
The above example suggest you create, or read in, your 100 matrices as components of a list from the very beginning of your analysis. This will require you to alter the code you used to generate the 100 matrices. Seeing as you already have your 100 matrices in your workspace, to get myList from these matrices we can use the vector of names you already have and use a loop:
Mat <- c("mat","mat","mat","mat")
## loop
for(i in seq_along(myList2)) {
myList[[i]] <- get(Mat[i])
}
## or as lapply call - Kudos to Ritchie Cotton for pointing that one out!
## myList <- lapply(Mat, get)
myList <- setNames(myList, paste(Mat, 1:4, sep = ""))
## You only need:
myList <- setNames(myList, Mat)
## as you have the proper names of the matrices
I used "mat" repeatedly in Mat as that is the name of my matrix above. You would use your own Mat. If vec contains what you have in coef, and you create myList using the for loop above, then all you should need to do is:
colSums(sapply(myList, function(x) x[, 1]) * vec)
To get the answer you wanted.
See help(get) and that's that.
If you'd given us a reproducible example I'd have said a bit more. For example:
> a=1;b=2;c=3;d=4
> M=letters[1:4]
> M
[1] "a" "b" "c" "d"
> sum = 0 ; for(i in 1:4){sum = sum + i * get(M[i])}
> sum
[1] 30
Put whatever you need in the loop, or use apply over the vector M and get the object:
> sum(unlist(lapply(M,function(n){get(n)^2})))
[1] 30