I am trying to apply Benedict Escoto's method from the paper "Bayesian Claim Severity with Mixed Distributions," published in Variance. I seem to be running into a JAGS simulation problem. When I run the code, JAGS gives me the following error:
Error in node ones[1] Node inconsistent with parents
The data is two-fold. First, there is a list of ground-up insurance claims. Information on their age in years, deductibles (truncation) and whether they have been capped (censoring, True or False) is provided. Second, there is a list of prior means and corresponding probability weights for a mixed exponential distribution.
One thing I noticed is that it works for one set of data of priors for mixed exponential distribution, but fails for another.
It works with:
Mean Weight
50 0.3
100 0.25
500 0.25
1500 0.1
5000 0.07
20000 0.03
But it fails with:
Mean Weight
3 0.72
14 0.19
42 0.05
138 0.02
503 0.01
1501 0.01
So it may have to do with parameter requirements for mixed exponential.
Data packet for JAGS is assembled as below
jags.data <- list(claims=(claim.df$x[!claim.df$capped]
- claim.df$truncation[!claim.df$capped]),
capped.claims=(claim.df$x[claim.df$capped]
- claim.df$truncation[claim.df$capped]),
alpha=alpha,
means=actual.means,
ones=rep(1, length(claim.df$[claim.df$capped])),
ages=claim.df$age[!claim.df$capped],
capped.ages=claim.df$age[claim.df$capped],
trend.shape=trend.shape,
trend.rate=1/trend.scale)
Notice that object "ones" is given values of 1 for each capped claim.
The initial values are supplied as below:
jags.init <- list(means=list(weights=prior.weights),
equal=list(weights=rep(1/m,m)))
Some miscellaneous values are provided as follows:
m <- length(actual.means)
alpha0 <- 20
alpha <- prior.weights * alpha0
trend.prior.mu <- .05
trend.prior.sigma <- .01
trend.scale <- trend.prior.sigma^2 / (1+trend.prior.mu)
trend.shape <- (1+trend.prior.mu)/trend.scale
The JAGS model is coded as below:
model <- "model {
weights ~ ddirch(alpha)
trend.factor ~ dgamma(trend.shape, trend.rate)
for (i in 1:length(claims)) {
buckets[i] ~ dcat(weights)
mu[i] <- means[buckets[i]] / trend.factor^ages[i]
claims[i] ~ dexp(1/mu[i])
}
for (i in 1:length(capped.claims)) {
capped.buckets[i] ~ dcat(weights)
capped.mu[i] <- means[capped.buckets[i]]/trend.factor^capped.ages[i]
prob.capped[i] <- exp(-capped.claims[i]/capped.mu[i])
ones[i] ~ dbern(prob.capped[i])
}
}"
Dirichlet, Categorical and Gamma distributions are used for priors. Ones is Bernoulli distributed to characterize claims as capped or uncapped.
Finally, the model is run in JAGS with the following:
model.out <- autorun.jags(model, data=jags.data, inits=jags.init,
monitor=c("weights","trend.factor"),
startburnin=1000, startsample=5000,
n.chains=n.chains, interactive=FALSE, thin=thin.factor)
Would anyone have an idea what goes wrong? Thanks
Related
I am running an N-mixture model in JAGS, trying to see if posterior predicted values of N are higher in one habitat than another. I am wondering how to obtain posterior probabilities of estimated population size for each habitat individually after running the model. So, e.g., if I wanted to sum across all sites, I'd put
totalN<-sum(N[]) in the JAGS model and identify "totalN" as one of my parameters. If I have 2 habitat levels over which to sum N, do I need a for loop or is there another way to define it?
Below is my model so far...
model{
priors
#abundance
beta0 ~ dnorm(0, 0.001) # log(lambda) intercept
beta1 ~ dnorm(0, 0.001) #this is my regression parameter for habitat
tau.T ~ dgamma(0.001, 0.001) #this is for random effect of transect
# detection
alpha.p ~ dgamma(0.01, 0.01)
beta.p ~ dgamma (0.01, 0.01)
Poisson model for abundance
for (i in 1:nsite){
loglam[i] <- beta1*habitat[i] + ranef[transect[i]]
loglam.lim[i] <- min(250, max(-250, loglam[i])) # 'Stabilize' log
lam[i] <- exp(loglam.lim[i])
N[i] ~ dpois(lam[i])
}
for (i in 1:14){
ranef[i]~dnorm(beta0,tau.T)
}
Measurement error model
for (i in 1:nsite){
for (j in 1:nrep){
y[i,j] ~ dbin(p[i,j], N[i])
p[i,j] ~ dbeta(alpha.p,beta.p) #detection probability follows a beta distribution
}
}
posterior predictions
Nperhabitat<-sum(N[habitat]) #this doesn't work, only estimates a single set of posterior densities for N
#and get a derived detection probability
}
I am going to assume here that habitat is a binary vector. I would add two additional vectors to your data that define which elements in habitat are 1 and which are 0. From there you can index N with those two vectors.
# done in R and added to the data list supplied to JAGS
hab_1 <- which(habitat == 1)
hab_0 <- which(habitat == 0)
# add to data list
data_list <- list(..., hab_1 = hab_1, hab_0 = hab_0)
Then, inside the JAGS model you would just add:
N_habitat_1 <- sum(N[hab_1])
N_habitat_0 <- sum(N[hab_0])
This is effectively telling JAGS to provide the total abundance per habitat type. If you have way more sites of one habitat vs another this abundance may hide that the density of individuals could actually be less. Thus, you may want to divide this abundance by the total number of sites of each habitat type:
dens_habitat_1 <- sum(N[hab_1]) / sum(habitat)
dens_habitat_0 <- sum(N[hab_0]) / sum(1 - habitat)
This is, of course, assuming that habitat is binary.
Here I demonstrated a survival model with rcs term. I was wondering whether the anova()under rms package is the way to test the linearity association? And How can I interpret the P-value of the Nonlinear term (see 0.094 here), does that support adding a rcs() term in the cox model?
library(rms)
data(pbc)
d <- pbc
rm(pbc, pbcseq)
d$status <- ifelse(d$status != 0, 1, 0)
dd = datadist(d)
options(datadist='dd')
# rcs model
m2 <- cph(Surv(time, status) ~ rcs(albumin, 4), data=d)
anova(m2)
Wald Statistics Response: Surv(time, status)
Factor Chi-Square d.f. P
albumin 82.80 3 <.0001
Nonlinear 4.73 2 0.094
TOTAL 82.80 3 <.0001
The proper way to test is with model comparison of the log-likelihood (aka deviance) across two models: a full and reduced:
m2 <- cph(Surv(time, status) ~ rcs(albumin, 4), data=d)
anova(m2)
m <- cph(Surv(time, status) ~ albumin, data=d)
p.val <- 1- pchisq( (m2$loglik[2]- m$loglik[2]), 2 )
You can see the difference in the inference using the less accurate Wald statistic (which in your case was not significant anyway since the p-value was > 0.05) versus this more accurate method in the example that Harrell used in his ?cph help page. Using his example:
> anova(f)
Wald Statistics Response: S
Factor Chi-Square d.f. P
age 57.75 3 <.0001
Nonlinear 8.17 2 0.0168
sex 18.75 1 <.0001
TOTAL 75.63 4 <.0001
You would incorrectly conclude that the nonlinear term was "significant" at conventional 0.05 level. This despite the fact that code creating the model was constructed as entirely linear in age (on the log-hazard scale):
h <- .02*exp(.04*(age-50)+.8*(sex=='Female'))
Create a reduced mode and compare:
f0 <- cph(S ~ age + sex, x=TRUE, y=TRUE)
anova(f0)
#-------------
Wald Statistics Response: S
Factor Chi-Square d.f. P
age 56.64 1 <.0001
sex 16.26 1 1e-04
TOTAL 75.85 2 <.0001
The difference in deviance is not significant with 2 degrees of freedom difference:
1-pchisq((f$loglik[2]- f0$loglik[2]),2)
[1] 0.1243212
I don't know why Harrell leaves this example in, because I've taken his RMS course and know that he endorses the cross-model comparison of deviance as the more accurate approach.
I am interested in fitting the following nested random effect model in JAGS.
SAS code
proc nlmixed data=data1 qpoints=20;
parms beta0=2 beta1=1 ;
bounds vara >=0, varb_a >=0;
eta = beta0+ beta1*t+ b2+b3;
p = exp(eta)/(1+exp(eta));
model TestResult ~ binary(p);
random b2 ~ normal(0,vara) subject = HHcode;
random b3 ~ normal(0,varb_a) subject = IDNo_N(HHcode);
run;
My question: How to specify the random effect part?
I have repeated measurements on individuals. These individuals are further nested in the household. Note: The number of individuals per household vary!
Looking forward to hearing from you
Let's assume that we have two vectors which indicate which house and which individual a data point belongs to (these are things you will need to create, in R you can make these by changing a factor to numeric via as.numeric). So, if we have 10 data points from 2 houses and 5 individuals they would look like this.
house_vec = c(1,1,1,1,1,1,2,2,2,2) # 6 points for house 1, 4 for house 2
ind_vec = c(1,1,2,2,3,3,4,4,5,5) # everyone has two observations
N = 10 # number of data points
So, the above vectors tell us that there are 3 individuals in the first house (because the first 6 elements of house_vec are 1 and the first 6 elements of ind_vec range from 1 to 3) and the second house has 2 individuals (last 4 elements of house_vec are 2 and the last 4 elements of ind_vec are 4 and 5). With these vectors, we can do nested indexing in JAGS to create your random effect structure. Something like this would suffice. These vectors would be supplied in the data.list that you have to include with TestResult
for(i in 1:N){
mu_house[house_vec[i]] ~ dnorm(0, taua)
mu_ind[ind_vec[i]] ~ dnorm(mu_house[house_vec[i]], taub_a)
}
# priors
taua ~ dgamma(0.01, 0.01) # precision
sda <- 1 / sqrt(taua) # derived standard deviation
taub_a ~ dgamma(0.01, 0.01) # precision
sdb_a <- 1 / sqrt(taub_a) # derived standard deviation
You would only need to include mu_ind within the linear predictor, as it is informed by mu_house. So the rest of the model would look like.
for(i in 1:N){
logit(p[i]) <- beta0 + beta1 * t + mu_ind[ind_vec[i]]
TestResult[i] ~ dbern(p[i])
}
You would then need to set priors for beta0 and beta1
I am running JAGS models through the R package runjags. I just updated to JAGS 4.0.0 from JAGS 3.4, and have noticed some unexpected behavior that seems to be related to the update.
First, when I run a model, I now get a warning message WARNING: Unused variable(s) in data table: followed by a list of data objects that are referenced in the model and provided as data. It doesn't seem to affect the results (but it is very puzzling). I have, however, noticed a few times while playing around with this that for some variables the posteriors were virtually identical to the priors (indicating that no updating occured). I can't seem to recreate the update failure right now, but below is a reproducible code example illustrating the odd warning message. The code example on the run.jags help page also produces the same warning.
Second, I thought I'd check to see if the same message pops up if I use the R package R2jags instead of runjags, but R2jags won't load because apparently rjags (one of the dependencies) is not compatible with JAGS 4.0 (its looking for JAGS 3.X). Also, in the runjags function run.jags, the argument method="rjags" doesn't seem to work anymore, but method="parallel" does work.
I'm using runjags_2.0.1-4 and R 3.2.2.
So my questions are:
1) Is rjags really incompatible with JAGS 4.0? The motivation to go to 4.0 was to use vectors as indices (see https://martynplummer.wordpress.com/2015/08/16/whats-new-in-jags-4-0-0-part-34-r-style-features/).
2) What is up with the unused variable(s) warning, and should I be concerned about it?
Thanks,
Glenn
Code:
#--- GENERATE DATA ------------------------
rm(list=ls())
# Number of sites and observations per site
N <- 200
nobs <- 3
# generate covariates and standardize (where appropriate)
set.seed(123)
forest <- rnorm(N)
# relationship between occupancy and covariates
b0 <- 0.5
b.for <- 0.5
psi <- plogis(b0 + b.for*forest)
# draw occupancy for each site
z <- rbinom(n=N, size=1,prob=psi)
# specify detection probablility
p <- 0.5
pz <- p*z
# generate the observations
Y <- rbinom(n=N, size=nobs,prob=pz)
#---- BUGS model ------------------------
model1 <- "model {
for (i in 1:N){
logit(eta[i]) <- b0 + b.for*forest[i]
z[i] ~ dbern(eta[i])
pz[i] <- z[i]*p
y[i] ~ dbin(pz[i],nobs)
} #i
b0.0 ~ dunif(0,1)
b0 <- log(b0.0/(1-b0.0))
b.for ~ dnorm(0,0.01)
p ~ dunif(0,1)
}"
occ.data1 <-list(y=Y,N=N,nobs=nobs,forest=forest)
inits1 <- function(){list(b0.0=runif(1),b.for=rnorm(1),p=runif(1),z=as.numeric(Y>0))}
parameters1 <- c("b0","b.for","p")
#---- RUN MODEL ------------------------
library(runjags)
ni <- 2000
nt <- 1
nb <- 1000
nc <- 3
ad <- 100
out <- run.jags(model=model1,data=occ.data1,monitor=parameters1,n.chains=nc,inits=inits1,burnin=nb,
sample=ni,adapt=ad,thin=nt,modules=c("glm","dic"),method="parallel")
To answer your questions:
1) rjags and JAGS used linked (non-interchangable) versions, and CRAN systems are still using JAGS_3.4.0 so the version of rjags on CRAN matches. This will be updated soon, and in the meantime you can grab the correct version of rjags from the sourceforge page as #jbaums notes.
2) This is a helpful message from JAGS/rjags telling you that you have specified something as data that the model isn't using. Remember that variable names are case sensitive i.e.
library('runjags')
model <- "model {
m ~ dunif(-1000,1000)
#data# M
#inits# m
#monitor# m
}"
M <- 0
m <- list(-10, 10)
results <- run.jags(model, method="interruptible", n.chains=2)
results <- run.jags(model, method="rjags", n.chains=2)
... gives you a warning because M does not match m. Also note that the warning looks a bit different from the two function calls - in the first it comes half-way down the JAGS output and in the second it comes as a warning in R after the function is completed.
As for 'should I be concerned' - yes if you think these variables should be in your model. If you can't find the problem try posting the code you are using - it got cut off from your original post.
Matt
I am looking for a library that provides a 'value with error' (eg x ± y). But searching for "Haskell xyz Error" only gives error handling libraries.
I would expect that such a library would provide common math operations (Num, Floating) where appropriate. The use case would be to get a error estimate from a calculation based on noisy sensor readings.
Update
I did some research and the term "propagation of uncertainty" came up. I found uncertainly-haskell which I'll try out soon. Are there other packages like this?
Have a look at the intervals package.
The Data.Eq.Approximate module seems to be a fit for getting approximate equality.
Data.Eq.Approximate
Contents
Type wrappers
Classes for tolerance type annotations
Absolute tolerance
Relative tolerance
Zero tolerance
Tolerance annotations using Digits
The purpose of this module is to provide newtype wrapper that allows one to effectively override the equality operator of a value so that it > is approximate rather than exact. For example, the type
type ApproximateDouble = AbsolutelyApproximateValue (Digits Five) Double
defines an alias for a wrapper containing Doubles such that two doubles are equal if they are equal to within five decimals of accuracy; for > example, we have that
1 == (1+10^^(-6) :: ApproximateDouble)
evaluates to True. Note that we did not need to wrap the value 1+10^^(-6) since AbsolutelyApproximateValue is an instance of Num. For > convenience, Num as well as many other of the numerical classes such as Real and Floating have all been derived for the wrappers defined in > this package so that one can conveniently use the wrapped values in the same way as one would use the values themselves.
Two kinds of wrappers are provided by this package.
The uncertain package seems to provide what you are looking for:
Some highlights from the readme:
Provides tools to manipulate numbers with inherent
experimental/measurement uncertainty, and propagates them through
functions based on principles from statistics.
Manipulate with error propagation
ghci> let x = 1.52 +/- 0.07
ghci> let y = 781.4 +/- 0.3
ghci> let z = 1.53e-1 `withPrecision` 3
ghci> cosh x
2.4 +/- 0.2
ghci> exp x / z * sin (y ** z)
10.9 +/- 0.9
ghci> pi + 3 * logBase x y
52 +/- 5
Create numbers
ghci> 1.52 +/- 0.07
1.52 +/- 7.0e-2
ghci> fromSamples [12.5, 12.7, 12.6, 12.6, 12.5]
12.58 +/- 7.0e-2
Comparisons
Note that this is very different from other libraries with similar
data types (like from intervals and rounding); these do not
attempt to maintain intervals or simply digit precisions; they instead
are intended to model actual experimental and measurement data with
their uncertainties, and apply functions to the data with the
uncertainties and properly propagating the errors with sound
statistical principles.
For a clear example, take
> (52 +/- 6) + (39 +/- 4)
91.0 +/- 7.0
In a library like intervals, this would result in 91 +/- 10
(that is, a lower bound of 46 + 35 and an upper bound of 58 + 43).
However, with experimental data, errors in two independent samples
tend to "cancel out", and result in an overall aggregate uncertainty
in the sum of approximately 7.