I am working with race-stratified population estimates and I want to integrate race-stratified populations from three different data sources (census, PEP, and ACS). I developed a model to use information from all these three sources and estimate the true population which is defined as gamma.ctr for county c time t and race r (1=white and 2 for non-white).
The problem is that PEP data is not race-stratified and I need to find a way to estimate race-stratified pep data.
Before, I used one of the other two sources (census or ACS) to estimate ethnicity proportions and multiply PEP data by these proportions to obtain race-stratified PEP population as input data to the model.
Now I decided to do this multiplication within the model based on ethnicity proportions that are defined by gamma.ctr (true pop in county c, year t, and race r) which is updated by all data sources not one of them.
So I considered the input PEP data as peppop.ct (the population for county c and time t, not race-stratified). Then I defined ethnicity proportion as:
prob[c,t]=gamma.ctr[c,t,1]/(gamma.ctr[c,t,1]+gamma.ctr[c,t,2])
I want to multiply PEP data by these proportions to find race-stratified estimates within the JAGS model:
for (c in 1:Narea){
for (t in 1:nyears){
prob.ct[c,t]<-gamma.ctr[c,t,1]/(gamma.ctr[c,t,1]+gamma.ctr[c,1,2])
peppop.ctr[c,t,1]<-peppop.ct[c,t] * prob.ct[c,t]
peppop.ctr[c,t,2]<-peppop.ct[c,t] * (1-prob.ct[c,t])
}
}
I want to use this peppop.ctr as a response varaible later like this:
for (t in 1:nyears){
peppop.ctr[c,t,r] ~ dnorm(gamma.ctr[c,t,r], taupep.ctr[c,t,r])
}
But I receive this error:
Attempt to redefine node peppop.cpr[1,1,1]
It think the reason for this error is the fact that peppop.ctr are defined twice in left hand side of the equation and the error is related to redefining peppop.ctr in line:
peppop.ctr[c,t,1]<-peppop.ct[c,t] * prob.ct[c,t]
Is it possible to help me to solve this error. I need to estimate peppop.ctr first and then use these estimates to update gamma.ctr parameters. Any help is really appreciated.
You can use the zeros trick to both define a variable (e.g., y below) and then also use that variable as the dependent variable in some subsequent analysis. Here's an example:
library(runjags)
x <- rnorm(1000)
y <- 2 + 3 * x + rnorm(1000)
p <- runif(1000, .1, .9)
w <- y*p
z <- y-w
datl <- list(
x=x,
w=w,
z=z,
zeros = rep(0, length(x)),
N = length(x)
)
mod <- "model{
y <- w + z
C <- 10000 # this just has to be large enough to ensure all phi[i]'s > 0
for (i in 1:N) {
L[i] <- dnorm(y[i], mu[i], tau)
mu[i] <- b[1] + b[2]*x[i]
phi[i] <- -log(L[i]) + C
zeros[i] ~ dpois(phi[i])
}
#sig ~ dunif(0, sd(y))
#tau <- pow(sig, -2)
tau ~ dgamma(1,.1)
b[1] ~ dnorm(0, .0001)
b[2] ~ dnorm(0, .0001)
}
"
out <- run.jags(model = mod, data=datl, monitor = c("b", "tau"), n.chains = 2)
summary(out)
#> Lower95 Median Upper95 Mean SD Mode MCerr MC%ofSD
#> b[1] 1.9334538 1.991586 2.051245 1.991802 0.03026722 NA 0.0002722566 0.9
#> b[2] 2.9019547 2.963257 3.023057 2.963190 0.03057184 NA 0.0002744883 0.9
#> tau 0.9939587 1.087178 1.183521 1.087744 0.04845667 NA 0.0004280217 0.9
#> SSeff AC.10 psrf
#> b[1] 12359 -0.010240572 1.0000684
#> b[2] 12405 -0.006480322 0.9999677
#> tau 12817 0.010135609 1.0000195
summary(lm(y ~ x))
#>
#> Call:
#> lm(formula = y ~ x)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -3.2650 -0.6213 -0.0032 0.6528 3.3956
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 1.99165 0.03034 65.65 <2e-16 ***
#> x 2.96340 0.03013 98.34 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.9593 on 998 degrees of freedom
#> Multiple R-squared: 0.9065, Adjusted R-squared: 0.9064
#> F-statistic: 9671 on 1 and 998 DF, p-value: < 2.2e-16
Created on 2022-05-14 by the reprex package (v2.0.1)
Related
I am trying to sample two parameters (prior) from a categorical distribution ranging from 1 to 5000, theta[1] and theta[2] with the requirement that theta1 < theta2.
I have tried (among other things):
theta[1] ~ dcat(p1[])
p1[1:n] <- 1/n
theta[2] ~ dcat(p2[])
pi2[1:theta[1]] <- 0
pi2[sum(theta[1],1):n] <- 1/sum(n, -pi1)
with n = 5000
so that theta2 is sampled from the categorical distribution ranging from theta1 to n.
The error is: unknown variable theta[1].
Any help would be appreciated.
If in this categorical variable with n=5000, the only requirement is that theta1<theta2, you could use the order() function:
theta.star[1] ~ dcat(p1[])
theta.star[2] ~ dcat(p1[])
theta <- order(theta.star)
The order() function is a way to impose order constraints in JAGS.
I have four variables: a point process pattern of species
occurrence, rivers, ponds polygons and land image data. I would like
to make a dataset similar to that of Murchison dataset using these
shape layers but I have failed to manoeuvre.
I need to make a data frame from these polygon shape layers of
rivers, ponds and land cover images together with the point pattern
data of species occurrences I tried using a hyper frame but I am
unable to use a distance function from the river or the ponds.
rivers <- readShapespatial("river.shp") ponds <-
readShapeSpatial(pond.shp") fro <- read.table("fro.txt",
header=TRUE) image <- raster("image.tif")
I would like to combine
these four files as a single spatstat object like that of Murchison
data which comes with spatstat package. if I can put them in a frame
then ponds, land cover, rivers are covariates.
I have used analyst function but return errors that they can not be
used as covariates, fore example x is a list can not be used as
covariates particularly for ponds and rivers when I call the dist
function.
Why do you need a hyperframe? You refer to murchison data and that is not
a hyperframe. It simply a standard R list (with extendend classes
listof, anylist and solist for better printing and plotting in
spatstat, but the actual data structure is just a plain list).
To recreate the murchison data:
library(spatstat)
P <- murchison$gold # Points
L <- murchison$faults # Lines
W <- murchison$greenstone # "Windows
mur <- solist(points = P, lines = L, windows = W)
mur
#> List of spatial objects
#>
#> points:
#> Planar point pattern: 255 points
#> window: rectangle = [352782.9, 682589.6] x [6699742, 7101484] metres
#>
#> lines:
#> planar line segment pattern: 3252 line segments
#> window: rectangle = [352782.9, 682589.6] x [6699742, 7101484] metres
#>
#> windows:
#> window: polygonal boundary
#> enclosing rectangle: [352782.9, 681699.6] x [6706467, 7100804] metres
To use the data in a model they don’t have to be collected in a single list,
but it may be convenient. The following two models are identical:
(mod1 <- ppm(P ~ W))
#> Nonstationary Poisson process
#>
#> Log intensity: ~W
#>
#> Fitted trend coefficients:
#> (Intercept) WTRUE
#> -21.918688 3.980409
#>
#> Estimate S.E. CI95.lo CI95.hi Ztest Zval
#> (Intercept) -21.918688 0.1666667 -22.24535 -21.592028 *** -131.51213
#> WTRUE 3.980409 0.1798443 3.62792 4.332897 *** 22.13252
(mod2 <- ppm(points ~ windows, data = mur))
#> Nonstationary Poisson process
#>
#> Log intensity: ~windows
#>
#> Fitted trend coefficients:
#> (Intercept) windowsTRUE
#> -21.918688 3.980409
#>
#> Estimate S.E. CI95.lo CI95.hi Ztest Zval
#> (Intercept) -21.918688 0.1666667 -22.24535 -21.592028 *** -131.51213
#> windowsTRUE 3.980409 0.1798443 3.62792 4.332897 *** 22.13252
If you insist on a hyperframe you should have a column for each measured
variable, but these are primarily used for when you have several replications
of an experiment, and is not of much use here. The function call is simply:
murhyp <- hyperframe(points = P, lines = L, windows = W)
I am using the gam function in the mgcv package to fit spatially adaptive smoothing for heterogeneous data. This is my R code for fitting.
library(MASS)
data(mcycle)
fit <- gam(accel ~ s(times, k = 20, bs = 'ad'), data = mcycle, method = 'REML')
The output contains 5 smoothing parameters. I am trying to extract the values for each smoothing parameter ( S[[i]] for $i =1,..5$) and I used fit$S[[1]] to get the first smoothing parameter values, but it does not work. Could someone help me with this?
You want the $sp component
> fit$sp
s(times)1 s(times)2 s(times)3 s(times)4 s(times)5
1.364206e+01 5.204389e-04 2.036490e-03 8.565542e+00 2.428618e+03
The $S component of the $smooth list contains the penalty matrices associated with the five smoothing parameters.
See ?gamObject and ?smooth.construct for further details on what is returned in the fit.
If you really want the penalty matrices, then look at the structure of the smooth component:
> str(fit$smooth, max = 1)
List of 1
$ :List of 26
..- attr(*, "class")= chr [1:2] "pspline.smooth" "mgcv.smooth"
..- attr(*, "qrc")=List of 4
.. ..- attr(*, "class")= chr "qr"
..- attr(*, "nCons")= int 1
Even if there is a single smooth, the $smooth is a list. So we need fit$smooth[[1]] to access this smooth. Now if we look at the $S component of the smooth we see
> str(fit$smooth[[1]]$S, max = 1)
List of 5
$ : num [1:19, 1:19] 0.4446 -0.2845 0.0913 0.0426 0.0943 ...
$ : num [1:19, 1:19] 0.3417 -0.2441 0.0845 0.0341 0.0654 ...
$ : num [1:19, 1:19] 0.0913 -0.0734 0.0271 0.0109 0.0141 ...
$ : num [1:19, 1:19] 4.13e-05 -3.46e-05 4.10e-05 1.32e-04 -3.96e-05 ...
$ : num [1:19, 1:19] 1.68e-06 2.43e-06 3.49e-06 4.62e-06 1.08e-05 ...
Which indicates that there are five penalty matrices associated with this smooth and that each matrix is a component of the S list. Hence, for the ith penalty matrix we need
fit$smooth[[1]]$S[[ i ]]
Hence for the second penalty matrix we need
fit$smooth[[1]]$S[[2]]
the first six rows and columns of which look like this
> fit$smooth[[1]]$S[[2]][1:6, 1:6]
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0.34168394 -0.24407752 0.084500619 0.03412496 0.06538967 0.054028500
[2,] -0.24407752 0.36254851 -0.255915616 0.05368650 -0.03418746 -0.019895116
[3,] 0.08450062 -0.25591562 0.352961000 -0.21961696 0.04421239 0.001082056
[4,] 0.03412496 0.05368650 -0.219616955 0.35168761 -0.18138207 0.077301400
[5,] 0.06538967 -0.03418746 0.044212389 -0.18138207 0.25012833 -0.178018503
[6,] 0.05402850 -0.01989512 0.001082056 0.07730140 -0.17801850 0.264159096
I am comparing two alternatives for calculating p-values with R's pnorm() function.
xbar <- 2.1
mu <- 2
sigma <- 0.25
n = 35
# z-transformation
z <- (xbar - mu) / (sigma / sqrt(n))
# Alternative I using transformed values
pval1 <- pnorm(q = z)
# Alternative II using untransformed values
pval2 <- pnorm(q = xbar, mean = mu, sd = sigma)
How come the two calculated p-values are not the same? Should not they?
They are different because you use two different estimates of the standard deviation.
In the z-transformation calculation you use sigma / sqrt(n) as the standard deviation, but in the untransformed calculation you use sd = sigma, ignoring n.
I am interested in fitting the following nested random effect model in JAGS.
SAS code
proc nlmixed data=data1 qpoints=20;
parms beta0=2 beta1=1 ;
bounds vara >=0, varb_a >=0;
eta = beta0+ beta1*t+ b2+b3;
p = exp(eta)/(1+exp(eta));
model TestResult ~ binary(p);
random b2 ~ normal(0,vara) subject = HHcode;
random b3 ~ normal(0,varb_a) subject = IDNo_N(HHcode);
run;
My question: How to specify the random effect part?
I have repeated measurements on individuals. These individuals are further nested in the household. Note: The number of individuals per household vary!
Looking forward to hearing from you
Let's assume that we have two vectors which indicate which house and which individual a data point belongs to (these are things you will need to create, in R you can make these by changing a factor to numeric via as.numeric). So, if we have 10 data points from 2 houses and 5 individuals they would look like this.
house_vec = c(1,1,1,1,1,1,2,2,2,2) # 6 points for house 1, 4 for house 2
ind_vec = c(1,1,2,2,3,3,4,4,5,5) # everyone has two observations
N = 10 # number of data points
So, the above vectors tell us that there are 3 individuals in the first house (because the first 6 elements of house_vec are 1 and the first 6 elements of ind_vec range from 1 to 3) and the second house has 2 individuals (last 4 elements of house_vec are 2 and the last 4 elements of ind_vec are 4 and 5). With these vectors, we can do nested indexing in JAGS to create your random effect structure. Something like this would suffice. These vectors would be supplied in the data.list that you have to include with TestResult
for(i in 1:N){
mu_house[house_vec[i]] ~ dnorm(0, taua)
mu_ind[ind_vec[i]] ~ dnorm(mu_house[house_vec[i]], taub_a)
}
# priors
taua ~ dgamma(0.01, 0.01) # precision
sda <- 1 / sqrt(taua) # derived standard deviation
taub_a ~ dgamma(0.01, 0.01) # precision
sdb_a <- 1 / sqrt(taub_a) # derived standard deviation
You would only need to include mu_ind within the linear predictor, as it is informed by mu_house. So the rest of the model would look like.
for(i in 1:N){
logit(p[i]) <- beta0 + beta1 * t + mu_ind[ind_vec[i]]
TestResult[i] ~ dbern(p[i])
}
You would then need to set priors for beta0 and beta1