I am running an N-mixture model in JAGS, trying to see if posterior predicted values of N are higher in one habitat than another. I am wondering how to obtain posterior probabilities of estimated population size for each habitat individually after running the model. So, e.g., if I wanted to sum across all sites, I'd put
totalN<-sum(N[]) in the JAGS model and identify "totalN" as one of my parameters. If I have 2 habitat levels over which to sum N, do I need a for loop or is there another way to define it?
Below is my model so far...
model{
priors
#abundance
beta0 ~ dnorm(0, 0.001) # log(lambda) intercept
beta1 ~ dnorm(0, 0.001) #this is my regression parameter for habitat
tau.T ~ dgamma(0.001, 0.001) #this is for random effect of transect
# detection
alpha.p ~ dgamma(0.01, 0.01)
beta.p ~ dgamma (0.01, 0.01)
Poisson model for abundance
for (i in 1:nsite){
loglam[i] <- beta1*habitat[i] + ranef[transect[i]]
loglam.lim[i] <- min(250, max(-250, loglam[i])) # 'Stabilize' log
lam[i] <- exp(loglam.lim[i])
N[i] ~ dpois(lam[i])
}
for (i in 1:14){
ranef[i]~dnorm(beta0,tau.T)
}
Measurement error model
for (i in 1:nsite){
for (j in 1:nrep){
y[i,j] ~ dbin(p[i,j], N[i])
p[i,j] ~ dbeta(alpha.p,beta.p) #detection probability follows a beta distribution
}
}
posterior predictions
Nperhabitat<-sum(N[habitat]) #this doesn't work, only estimates a single set of posterior densities for N
#and get a derived detection probability
}
I am going to assume here that habitat is a binary vector. I would add two additional vectors to your data that define which elements in habitat are 1 and which are 0. From there you can index N with those two vectors.
# done in R and added to the data list supplied to JAGS
hab_1 <- which(habitat == 1)
hab_0 <- which(habitat == 0)
# add to data list
data_list <- list(..., hab_1 = hab_1, hab_0 = hab_0)
Then, inside the JAGS model you would just add:
N_habitat_1 <- sum(N[hab_1])
N_habitat_0 <- sum(N[hab_0])
This is effectively telling JAGS to provide the total abundance per habitat type. If you have way more sites of one habitat vs another this abundance may hide that the density of individuals could actually be less. Thus, you may want to divide this abundance by the total number of sites of each habitat type:
dens_habitat_1 <- sum(N[hab_1]) / sum(habitat)
dens_habitat_0 <- sum(N[hab_0]) / sum(1 - habitat)
This is, of course, assuming that habitat is binary.
Related
Suppose I have a 2D numpy array:
X = np.array[
[..., ...],
[..., ...]]
And I want to standardize the data either with:
X = StandardScaler().fit_transform(X)
or:
X = (X - X.mean())/X.std()
The results are different. Why are they different?
Assuming X is a feature matrix of shape (n x m) (n instances and m features). We want to scale each feature so its instances are distributed with a mean of zero and with unit variance.
To do this you need to calculate the mean and standard deviation of each feature for the provided instances (column of X) and then calculate the scaled feature vectors. Currently you are calculating the mean and standard deviation of the whole dataset and scaling the data using these values: this will give you meaningless results in all but a few special cases (i.e., X = np.ones((100,2)) is such a special case).
Practically, to calculate these statistics for each feature you will need to set the axis parameter of the .mean() or .std() methods to 0. This will perform the calculations along the columns and return a (1 x m) shaped array (actually a (m,) array, but thats another story), where each value is the mean or standard deviation for the given column. You can then use numpy broadcasting to correctly scale the feature vectors.
The below example shows how you can correctly implement it manually. x1 and x2 are 2 features with 100 training instances. We store them in a feature matrix X.
x1 = np.linspace(0, 100, 100)
x2 = 10 * np.random.normal(size=100)
X = np.c_[x1, x2]
# scale the data using the sklearn implementation
X_scaled = StandardScaler().fit_transform(X)
# scale the data taking mean and std along columns
X_scaled_manual = (X - X.mean(axis=0)) / X.std(axis=0)
If you print the two you will see they match exactly, explicitly:
print(np.sum(X_scaled-X_scaled_manual))
returns 0.0.
In WinBUGS, I am specifying a model with a multinomial likelihood function, and I need to make sure that the multinomial probabilities are all between 0 and 1 and sum to 1.
Here is the part of the code specifying the likelihood:
e[k,i,1:9] ~ dmulti(P[k,i,1:9],n[i,k])
Here, the array P[] specifies the probabilities for the multinomial distribution.
These probabilities are to be estimated from my data (the matrix e[]) using multiple linear regressions on a series of fixed and random effects. For instance, here is the multiple linear regression used to predict one of the elements of P[]:
P[k,1,2] <- intercept[1,2] + Slope1[1,2]*Covariate1[k] +
Slope2[1,2]*Covariate2[k] + Slope3[1,2]*Covariate3[k]
+ Slope4[1,2]*Covariate4[k] + RandomEffect1[group[k]] +
RandomEffect2[k]
At compiling, the model produces an error:
elements of proportion vector of multinomial e[1,1,1] must be between zero and one
If I understand this correctly, this means that the elements of the vector P[k,i,1:9] (the probability vector in the multinomial likelihood function above) may be very large (or small) numbers. In reality, they all need to be between 0 and 1, and sum to 1.
I am new to WinBUGS, but from reading around it seems that somehow using a beta regression rather than multiple linear regressions might be the way forward. However, although this would allow each element to be between 0 and 1, it doesn't seem to get to the heart of the problem, which is that all the elements of P[k,i,1:9] must be positive and sum to 1.
It may be that the response variable can very simply be transformed to be a proportion. I have tried this by trying to divide each element by the sum of P[k,i,1:9], but so far no success.
Any tips would be very gratefully appreciated!
(I have supplied the problematic sections of the model; the whole thing is fairly long.)
The usual way to do this would be to use the multinomial equivalent of a logit link to constrain the transformed probabilities to the interval (0,1). For example (for a single predictor but it is the same principle for as many predictors as you need):
Response[i, 1:Categories] ~ dmulti(prob[i, 1:Categories], Trials[i])
phi[i,1] <- 1
prob[i,1] <- 1 / sum(phi[i, 1:Categories])
for(c in 2:Categories){
log(phi[i,c]) <- intercept[c] + slope1[c] * Covariate1[i]
prob[i,c] <- phi[i,c] / sum(phi[i, 1:Categories])
}
For identifibility the value of phi[1] is set to 1, but the other values of intercept and slope1 are estimated independently. When the number of Categories is equal to 2, this collapses to the usual logistic regression but coded for a multinomial response:
log(phi[i,2]) <- intercept[2] + slope1[2] * Covariate1[i]
prob[i,2] <- phi[i, 2] / (1 + phi[i, 2])
prob[i,1] <- 1 / (1 + phi[i, 2])
ie:
logit(prob[i,2]) <- intercept[2] + slope1[2] * Covariate1[i]
prob[i,1] <- 1 - prob[i,2]
In this model I have indexed slope1 by the category, meaning that each level of the outcome has an independent relationship with the predictor. If you have an ordinal response and want to assume that the odds ratio associated with the covariate is consistent between successive levels of the response, then you can drop the index on slope1 (and reformulate the code slightly so that phi is cumulative) to get a proportional odds logistic regression (POLR).
Edit
Here is a link to some example code covering logistic regression, multinomial regression and POLR from a course I teach:
http://runjags.sourceforge.net/examples/squirrels.R
Note that it uses JAGS (rather than WinBUGS) but as far as I know there are no differences in model syntax for these types of models. If you want to quickly get started with runjags & JAGS from a WinBUGS background then you could follow this vignette:
http://runjags.sourceforge.net/quickjags.html
I am interested in fitting the following nested random effect model in JAGS.
SAS code
proc nlmixed data=data1 qpoints=20;
parms beta0=2 beta1=1 ;
bounds vara >=0, varb_a >=0;
eta = beta0+ beta1*t+ b2+b3;
p = exp(eta)/(1+exp(eta));
model TestResult ~ binary(p);
random b2 ~ normal(0,vara) subject = HHcode;
random b3 ~ normal(0,varb_a) subject = IDNo_N(HHcode);
run;
My question: How to specify the random effect part?
I have repeated measurements on individuals. These individuals are further nested in the household. Note: The number of individuals per household vary!
Looking forward to hearing from you
Let's assume that we have two vectors which indicate which house and which individual a data point belongs to (these are things you will need to create, in R you can make these by changing a factor to numeric via as.numeric). So, if we have 10 data points from 2 houses and 5 individuals they would look like this.
house_vec = c(1,1,1,1,1,1,2,2,2,2) # 6 points for house 1, 4 for house 2
ind_vec = c(1,1,2,2,3,3,4,4,5,5) # everyone has two observations
N = 10 # number of data points
So, the above vectors tell us that there are 3 individuals in the first house (because the first 6 elements of house_vec are 1 and the first 6 elements of ind_vec range from 1 to 3) and the second house has 2 individuals (last 4 elements of house_vec are 2 and the last 4 elements of ind_vec are 4 and 5). With these vectors, we can do nested indexing in JAGS to create your random effect structure. Something like this would suffice. These vectors would be supplied in the data.list that you have to include with TestResult
for(i in 1:N){
mu_house[house_vec[i]] ~ dnorm(0, taua)
mu_ind[ind_vec[i]] ~ dnorm(mu_house[house_vec[i]], taub_a)
}
# priors
taua ~ dgamma(0.01, 0.01) # precision
sda <- 1 / sqrt(taua) # derived standard deviation
taub_a ~ dgamma(0.01, 0.01) # precision
sdb_a <- 1 / sqrt(taub_a) # derived standard deviation
You would only need to include mu_ind within the linear predictor, as it is informed by mu_house. So the rest of the model would look like.
for(i in 1:N){
logit(p[i]) <- beta0 + beta1 * t + mu_ind[ind_vec[i]]
TestResult[i] ~ dbern(p[i])
}
You would then need to set priors for beta0 and beta1
I'm doing a network meta-analysis including several clinical trials. The response is binomial. Each trial contains several treatments.
When I do a random effects model, the output from JAGS and WinBUGS is similar. When I do a fixed effects model, the DIC and pD components are way out, though the posteriors of the parameters I'm interested in are similar.
I've got similar models that have Gaussian response, not binomial, and JAGS and WinBUGS are in agreement.
The BUGS/JAGS code for the fixed effects model is lifted from page 61 of this and appears below. However, the same code runs and produces similar posteriors using WinBUGS and JAGS, it's just DIC and pD that differ markedly. So I don't think this code is the problem.
for(i in 1:ns){ # Loop over studies
mu[i] ~ dnorm(0, .0001)
# Vague priors for all trial baselines
for (k in 1:na[i]) { # Loop over arms
r[i, k] ~ dbin(p[i, k], n[i, k])
# binomial likelihood
logit(p[i, k]) <- mu[i] + d[t[i, k]] - d[t[i, 1]]
# model for linear predictor
rhat[i, k] <- p[i, k] * n[i, k]
# expected value of the numerators
dev[i, k] <-
2 * (r[i, k] * (log(r[i, k]) - log(rhat[i, k])) +
(n[i, k] - r[i, k]) * (log(n[i, k] - r[i, k]) +
- log(n[i, k] - rhat[i, k]) ))
# Deviance contribution
}
resdev[i] <- sum(dev[i, 1:na[i]])
# summed residual deviance contribution for this trial
}
totresdev <- sum(resdev[])
# Total Residual Deviance
d[1] <- 0
# treatment effect is zero for reference treatment
for (k in 2:nt){
d[k] ~ dnorm(0, .0001)
} # vague priors for treatment effects
I found an old post describing a known issue, but that's too old for me to think it's the same problem.
Are there any known issues with JAGS reporting the wrong DIC and pD? (Searching for "JAGS bugs" isn't all that helpful.)
I'd be grateful of any pointers.
There are a number of different ways to calculate pD, and the method used by JAGS differs to that used by WinBUGS. See the Details section of the following help file:
?rjags::dic
Specifically:
DIC (Spiegelhalter et al 2002) is calculated by adding the “effective number of parameters” (pD) to the expected deviance. The definition of pD used by dic.samples is the one proposed by Plummer (2002) and requires two or more parallel chains in the model.
The details are in the (lengthy, but well worth reading) discussion of the following paper:
Spiegelhalter, D., N. Best, B. Carlin, and A. van der Linde (2002), Bayesian measures of model complexity and fit (with discussion). Journal of the Royal Statistical Society Series B 64, 583-639.
In general: all methods to calculate pD (of which I am aware) are approximations, and if two such methods disagree then it is probably because the assumptions behind for one (or both) approximation(s) are not being met.
Matt
I am aware that there is a duality between random effects and smooth curve estimation. At this link, Simon Wood describes how to specify random effects using mgcv. Of particular note is the following passage:
For example if g is a factor then s(g,bs="re") produces a random coefficient for each level of g, with the radndom coefficients all modelled as i.i.d. normal.
After a quick simulation, I can see this is correct, and that the model fits are almost identical. However, the likelihoods and degrees of freedom are VERY different. Can anyone explain the difference? Which one should be used for testing?
library(mgcv)
library(lme4)
set.seed(1)
x <- rnorm(1000)
ID <- rep(1:200,each=5)
y <- x
for(i in 1:200) y[which(ID==i)] <- y[which(ID==i)] + rnorm(1)
y <- y + rnorm(1000)
ID <- as.factor(ID)
# gam (mgcv)
m <- gam(y ~ x + s(ID,bs="re"))
gam.vcomp(m)
coef(m)[1:2]
logLik(m)
# lmer
m2 <- lmer(y ~ x + (1|ID))
sqrt(VarCorr(m2)$ID[1])
summary(m2)$coef[,1]
logLik(m2)
mean( abs( fitted(m)-fitted(m2) ) )
Full disclosure: I encountered this problem because I want to fit a GAM that also includes random effects (repeated measures), but need to know if I can trust likelihood-based tests under those models.