Dissolve output of rasterToPolygons - geospatial

When using rasterToPolygons within the raster package each cell that meets the formula criteria becomes its own polygon:
library(raster)
r <- raster(nrow=18, ncol=36)
r[] <- runif(ncell(r)) * 10
r[r>8] <- NA
pol <- rasterToPolygons(r, fun=function(x){x>6})
plot(pol)
I however want each polygon that has an adjacent side or corner to be part of one larger polygon, decreasing the number of total polygons. Is there any way to accomplish this?

OLD ANSWER:
You can use the argument dissolve=TRUE
library(raster)
r <- raster(nrow=18, ncol=36)
r[] <- sample(2, ncell(r), replace=TRUE)
pol <- rasterToPolygons(r, dissolve=TRUE)
plot(pol)
NEW ANSWER
If you do not care about the values, you can do something like this
Your example data
library(raster)
r <- raster(nrow=18, ncol=36)
r[] <- runif(ncell(r)) * 10
r[r>8] <- NA
Set all values cells you want to one value, all others to NA
x <- reclassify(r, rbind(c(-Inf, 6, NA), c(6, Inf, 1)))
pol <- rasterToPolygons(x, dissolve=TRUE)
Note that pol now has only 1 (multi-)polygon. If you want to separate the non-connected parts, you can do
pols <- disaggregate(pol)
pols
#class : SpatialPolygonsDataFrame
#features : 80
Note that diagonally adjecent polygons are separate from each other as they cannot for a valid single polygon (it would be self-intersecting).

This can be accomplished by using the poly2nb function in the spdep package to define the neighbors of each polygon, using the function created below to create a vector of region assignments, using spCbind from the maptools package to bind regions to pol, then finally dissolving over regions using the unionSpatialPolygons function from maptools. The basic structure of the created function is if at least one of the polygon's neighbors has been assigned to a group then assign polygon and neighbors to that group else assign polygon and neighbors to new group.
library(raster)
library(spdep)
library(maptools)
r <- raster(nrow=18, ncol=36)
r[] <- runif(ncell(r)) * 10
r[r>8] <- NA
pol <- rasterToPolygons(r, fun=function(x){x>6}, dissolve = T)
plot(pol)
nb <- poly2nb(pol)
create_regions <- function(data) {
group <- rep(NA, length(data))
group_val <- 0
while(NA %in% group) {
index <- min(which(is.na(group)))
nb <- unlist(data[index])
nb_value <- group[nb]
is_na <- is.na(nb_value)
if(sum(!is_na) != 0){
prev_group <- nb_value[!is_na][1]
group[index] <- prev_group
group[nb[is_na]] <- prev_group
} else {
group_val <- group_val + 1
group[index] <- group_val
group[nb] <- group_val
}
}
group
}
region <- create_regions(nb)
pol_rgn <- spCbind(pol, region)
pol2 <- unionSpatialPolygons(pol_rgn, region)
plot(pol2)

Related

how to make an optimal combinatorial selection in R

The problem I'm trying to solve is basically the same as the one in this post:
https://stats.stackexchange.com/questions/339935/python-library-for-combinatorial-optimization
And my current implementation uses indeed a genetic algorithm based optimizer.
However, I would like to solve it as a binary linear programming problem (at least try, even though it's 'NP-hard', apparently).
My question is how to formulate the LP in the best way, because I am not sure I am doing it right.
The following is a simplified version of what I'm dealing with, which however shows exactly where the problem lies.
We make m*n (in this case 6) objects by a combinatorial process taking m (3) objects of type 'R1' (say {A,B,C}) and n (2) objects of type 'R2' (say {X,Y}).
The 6 objects {AX,AY,BX,BY,CX,CY} are evaluated and each gets a score D, in this case {0.8,0.7,0.5,0.9,0.4,0.0}, in this order.
CL <- cbind(expand.grid(R2=LETTERS[24:25],R1=LETTERS[1:3],stringsAsFactors = FALSE),D=c(0.8,0.7,0.5,0.9,0.4,0.0))
Now we want to select 2 distinct R1's and 1 R2 such that the sum of D is maximal.
In this example, the answer is R1 = {A,B}, R2 = {Y}.
However, one would not get to such conclusion taking, for instance, the 2 R1's and the R2 with the highest average D.
It would work for R1, but not for R2:
aggregate(D~R1,CL,mean)
# R1 D
#1 A 0.75
#2 B 0.70
#3 C 0.20
aggregate(D~R2,CL,mean)
# R2 D
#1 X 0.5666667
#2 Y 0.5333333
I know how to formulate this as a linear programming problem; only I am not sure my formulation is efficient, because basically it results in a problem with mn+m+n variables and 2(m+n)+2 constraints.
The main difficulty is that I need somehow to count the number of distinct R1's and R2's chosen, and I don't know any way of doing that apart from what I will show below (and is also described in my other post here).
This is what I would do:
CL["Entry"] <- seq_len(dim(CL)[[1]])
R1.mat <- table(CL$R1,CL$Entry)
R2.mat <- table(CL$R2,CL$Entry)
N_R1 <- dim(R1.mat)[[1]]
N_R2 <- dim(R2.mat)[[1]]
N_Entry <- dim(CL)[[1]]
constr.mat <- NULL
dir <- NULL
rhs <- NULL
constr.mat <- rbind(constr.mat,cbind(R1.mat,-diag(table(CL$R1)),matrix(0,N_R1,N_R2)))
dir <- c(dir,rep("<=",N_R1))
rhs <- c(rhs,rep(0,N_R1))
constr.mat <- rbind(constr.mat,cbind(R2.mat,matrix(0,N_R2,N_R1),-diag(table(CL$R2))))
dir <- c(dir,rep("<=",N_R2))
rhs <- c(rhs,rep(0,N_R2))
constr.mat <- rbind(constr.mat,constr.mat)
dir <- c(dir,rep(">=",N_R1+N_R2))
rhs <- c(rhs,1-table(CL$R1),1-table(CL$R2))
constr.mat <- rbind(constr.mat,c(rep(0,N_Entry),rep(1,N_R1),rep(0,N_R2)))
dir <- c(dir,"==")
rhs <- c(rhs,2)
constr.mat <- rbind(constr.mat,c(rep(0,N_Entry),rep(0,N_R1),rep(1,N_R2)))
dir <- c(dir,"==")
rhs <- c(rhs,1)
obj <- c(aggregate(D~Entry,CL,c)[["D"]],rep(0,N_R1+N_R2))
Which can be solved for instance by lpSolve:
sol <- lp("max", obj, constr.mat, dir, rhs, all.bin = TRUE,num.bin.solns = 1, use.rw=FALSE, transpose.constr=TRUE)
sol$solution
#[1] 0 1 0 1 0 0 1 1 0 0 1
showing that products {AY,BY} were selected, corresponding to R1 = {A,B} and R2 = {Y}:
CL[as.logical(sol$solution[1:N_Entry]),]
# R2 R1 D Entry
#2 Y A 0.7 2
#4 Y B 0.9 4
I found that on large problems lpSolve gets stuck for ages; Rsymphony seemed to perform better.
But again, my main question is: is this way of formulating the LP efficient? Should I do it differently?
Thanks!
EDIT
In the meantime, working on a somewhat related problem, I found that only one set of constraints may be sufficient, if one adds 'costs' (in this example, negative) to the objective vector for the 'distinct R1 and R2' variables.
So here, instead of:
obj <- c(aggregate(D~Entry,CL,c)[["D"]],rep(0,N_R1+N_R2))
I would do:
obj <- c(aggregate(D~Entry,CL,c)[["D"]],rep(-1,N_R1+N_R2))
This would make m+n constraints unnecessary.
It still remains a huge problem to solve, even for relatively small m, n, so if anyone can advise how to do it better...
I had a look at lp.transport, but that would be limited to 2 dimensions (i.e. only R1 and R2, not R1, R2, R3 for instance), and I don't think you can constrain the number of distinct objects per category in that kind of solver.

Generating clustered spatstat marks for a ppp object

This question is very close to what has been asked here. The answer is great if we want to generate random marks to an already existing point pattern - we draw from a multivariate normal distribution and associate with each point.
However, I need to generate marks that follows the marks given in the lansing dataset that comes with spatstat for my own point pattern. In other words, I have a point pattern without marks and I want to simulate marks with a definite pattern (for example, to illustrate the concept of segregation for my own data). How do I make such marks? I understand the number of points could be different between lansing and my data set but I am allowed to reduce the window or create more points. Thanks!
Here is another version of segregation in four different rectangular
regions.
library(spatstat)
p <- c(.6,.2,.1,.1)
prob <- rbind(p,
p[c(4,1:3)],
p[c(3:4,1:2)],
p[c(2:4,1)])
X <- unmark(spruces)
labels <- factor(LETTERS[1:4])
subwins <- quadrats(X, 2, 2)
Xsplit <- split(X, subwins)
rslt <- NULL
for(i in seq_along(Xsplit)){
Y <- Xsplit[[i]]
marks(Y) <- sample(labels, size = npoints(Y),
replace = TRUE, prob = prob[i,])
rslt <- superimpose(rslt, Y)
}
plot(rslt, main = "", cols = 1:4)
plot(subwins, add = TRUE)
Segregation refers to the fact that one species predominates in a
specific part of the observation window. An extreme example would be to
segregate completely based on e.g. the x-coordinate. This would generate strips
of points of different types:
library(spatstat)
X <- lansing
Y <- cut(X, X$x, breaks = 6, labels = LETTERS[1:6])
plot(Y, cols = 1:6)
Without knowing more details about the desired type of segregation it is
hard to suggest something more useful.

How to optimise layout without 'area' option in igraph?

I'm trying to optimise the layout of a network I'm constructing with the following code:
gr <- read.table("data.txt", header = F, stringsAsFactors = F)
node.names <- gr[,1]
node.names <- toupper(substr(node.names, 2, nchar(node.names)))
gr <- gr[,-1]
edge.table.index <- which(gr>0.5, arr.ind=T)
d <- data.frame(first = node.names[edge.table.index[,1]], second = node.names[edge.table.index[,2]])
g <- graph.data.frame(d, directed=F)
g[from=V(g), to=V(g)] <- 0
layout.g <- layout.fruchterman.reingold(g)
plot(g, layout = layout.g, vertex.size = 5, vertex.label.cex=.7, vertex.color = "lightblue", vertex.label.family = "sans", edge.color="black", vertex.frame.color= "white")
In particular, what I'm trying to do is decrease the area on which the nodes are displayed such that the connected components are closer together. Previous versions of igraph had the option 'area' in the layout.fruchterman.reingold() function but this doesn't seem to be the case anymore. I've tried playing with various options like vertex size, vertex font size and width/height of the exported pdf file but they don't really do the trick.
Does anyone have a suggestion? Here's a link to the data file, I hope it works:
data.txt

Recreate ggplot's geom_smooth CI background - in R basic?

I wish to recreate this graph:
(from here)
Using R base graphics.
I have no clue how to do that. Any advice ?
(My motivation is that I wish to create a plot where the line width (and/or color) will reflect another dimension. Until now - ggplot2 is the only place I found in R for how to do this. I would be happy to be able to do this also in base R)
See help(polygon) and example(polygon) (esp the Brownian motion example) -- the varying width is pretty common in some fields to show variability through time.
The same example is also in demo(graphics):
## An example showing how to fill between curves.
par(bg="white")
n <- 100
x <- c(0,cumsum(rnorm(n)))
y <- c(0,cumsum(rnorm(n)))
xx <- c(0:n, n:0)
yy <- c(x, rev(y))
plot(xx, yy, type="n", xlab="Time", ylab="Distance")
polygon(xx, yy, col="gray")
title("Distance Between Brownian Motions")
I don't know if exactly replicating the graph is possible in base graphics. In grid graphics it is possible. Nevertheless, the following code gets you an example that's something like what you want. Adapt it to the data set.
n <- 20
x <- rnorm(n)
y <- rnorm(n)
o <- order(x)
x <- x[o]
y <- y[o]
m <- loess(y~x, span = 1) #ggplot seems to smooth more than default
f <- predict(m, se = TRUE)
ci <- f$se * qt(0.975, f$df)
cih <- f$fit + ci
cil <- f$fit - ci
plot(x,y, ylim = c(min(cil,y), max(cih,y)))
lines(x, f$fit, lwd = 2)
xx <- c(x, rev(x))
yy <- c(cil, rev(cih))
polygon(xx, yy, col="#A9A9A930", border = NA)
OK, I spent a little too much time messing with this... note the last line is the ggplot version so you can compare the two.
#loess and error curves almost just like ggplot2
op <- par(las=1, mar = c(3,3,1,1))
n <- 30
x <- sort(rnorm(n)) #(varying density in predictor)
x <- x + abs(min(x))
x <- x/max(x)*2*pi
y <- sin(x)+rnorm(n) #(curvy)
m <- loess(y~x)
xx <- seq(min(x), max(x), (max(x)-min(x))/1000) #increase density of values to predict over to increase quality of curve
f <- predict(m, xx, se = TRUE)
ci <- f$se * qt(0.975, f$df)
cih <- f$fit + ci
cil <- f$fit - ci
plot(x,y, ylim = c(min(cil,y), max(cih,y)), cex.axis = 0.85, xlab = '', ylab = '', type = 'n')
title(xlab = 'x', ylab = 'y',line = 2)
grid(col = 'gray')
points(x,y, pch = 19, cex = 0.65)
lines(xx, f$fit, col = 'blue', lwd = 1.2)
xx <- c(xx, rev(xx))
yy <- c(cil, rev(cih))
polygon(xx, yy, col=rgb(0.1,0.1,0.1,0.25), border = NA)
par(op)
#qplot(x,y, geom = 'point') + stat_smooth()
And to get the smooth curve, look at loess and predict.loess
Would geom_ribbon in GGPlot be what you need? This creates a variable-width line.

How to Plot With Different Marker ( 'x' and 'o') Based on Condition in R

I have a data that looks like this:
for_y_axis <-c(0.49534,0.80796,0.93970,0.99998)
for_x_axis <-c(1,2,3,4)
count <-c(0,33,0,4)
What I want to do is to plot the graph using for_x_axis and for_y_axis
but will mark the point with "o" if the count value is equal to 0(zero) and
with "x" if the count value is greater than zero.
Is there a simple way to achieve that in R?
plot(for_x_axis, for_y_axis, pch = ifelse(count > 0, "x", "o"))
How does this sound? Adjust pch to your needs.
for_y_axis <- c(0.49534,0.80796,0.93970,0.99998)
for_x_axis <- c(1,2,3,4)
count <- c(0,33,0,4)
zerocount <- function(x) {
ifelse (x == 0, x <- 0, x <- 1)
}
pts <- sapply(count, zerocount)
plot(for_x_axis, for_y_axis, type = "n")
points(for_x_axis, for_y_axis, pch = pts)

Resources