Nested Random effect in JAGS/ WinBUGS - jags

I am interested in fitting the following nested random effect model in JAGS.
SAS code
proc nlmixed data=data1 qpoints=20;
parms beta0=2 beta1=1 ;
bounds vara >=0, varb_a >=0;
eta = beta0+ beta1*t+ b2+b3;
p = exp(eta)/(1+exp(eta));
model TestResult ~ binary(p);
random b2 ~ normal(0,vara) subject = HHcode;
random b3 ~ normal(0,varb_a) subject = IDNo_N(HHcode);
run;
My question: How to specify the random effect part?
I have repeated measurements on individuals. These individuals are further nested in the household. Note: The number of individuals per household vary!
Looking forward to hearing from you

Let's assume that we have two vectors which indicate which house and which individual a data point belongs to (these are things you will need to create, in R you can make these by changing a factor to numeric via as.numeric). So, if we have 10 data points from 2 houses and 5 individuals they would look like this.
house_vec = c(1,1,1,1,1,1,2,2,2,2) # 6 points for house 1, 4 for house 2
ind_vec = c(1,1,2,2,3,3,4,4,5,5) # everyone has two observations
N = 10 # number of data points
So, the above vectors tell us that there are 3 individuals in the first house (because the first 6 elements of house_vec are 1 and the first 6 elements of ind_vec range from 1 to 3) and the second house has 2 individuals (last 4 elements of house_vec are 2 and the last 4 elements of ind_vec are 4 and 5). With these vectors, we can do nested indexing in JAGS to create your random effect structure. Something like this would suffice. These vectors would be supplied in the data.list that you have to include with TestResult
for(i in 1:N){
mu_house[house_vec[i]] ~ dnorm(0, taua)
mu_ind[ind_vec[i]] ~ dnorm(mu_house[house_vec[i]], taub_a)
}
# priors
taua ~ dgamma(0.01, 0.01) # precision
sda <- 1 / sqrt(taua) # derived standard deviation
taub_a ~ dgamma(0.01, 0.01) # precision
sdb_a <- 1 / sqrt(taub_a) # derived standard deviation
You would only need to include mu_ind within the linear predictor, as it is informed by mu_house. So the rest of the model would look like.
for(i in 1:N){
logit(p[i]) <- beta0 + beta1 * t + mu_ind[ind_vec[i]]
TestResult[i] ~ dbern(p[i])
}
You would then need to set priors for beta0 and beta1

Related

Simulating hierarchical data (nested structure), containing categorical variable

I am simulating dyadic data,
which is cluster size is fixed to two.
I have continuous IV (e.g.,education score) and categorical variable (e.g., gender)
these two variable is level-1 variable.
target ICC is 0.3 (ICC=between cluster variable / (within+ between))
so, when I creat education, it was fine.
what I did is,
genderating between-cluster column, and within-cluster column for member 1, and within-cluster column for member 2.
so, education for member 1= ICC weight * between cluster + icc weight * within 1
member 2= ICC weight *between cluster + iccc weight * within 2
cn <- 100 #number of pairs
mu <-c(0,0,0,0)
vcovmatrix <- matrix(c(1,0,0,0,
0,1,0,1,
0,0,1,0,
0,1,0,1),
nrow = 4,
byrow = TRUE,
dimnames = list(c("withn_1","btw_1","withn_2","btw_2"),
c("withn_1","btw_1","withn_2","btw_2")))
dat<-mvrnorm(cn, # sample size
mu=mu, # Mu
Sigma = vcovmatrix) # Covariance matrix
dat<-as.data.frame(dat)
dat$pid<-1:cn # pair ID (group ID)
dat%>%
dplyr::select(c(withn_1,btw_1,withn_2,btw_2))%>%
cor()
## Let's say that ICC is 0.3, This means that between:within ratio is 3:7.
## 3/(7+3)=0.3
withn_weight=sqrt(0.3)
btw_weight=sqrt(0.7)
dat <- dat %>%
mutate(
edu_1=withn_weight*withn_1+btw_weight*btw_1,
edu_2=withn_weight*withn_2+btw_weight*btw_2
)
However, I cannot easily envisioning how I can creat gender.
gender have two levels.
as I did in the education,
I can create between-cluster effect in gender, and within_1 and within_2
and gender_1 <-ICCweightgenderbetween + Iccweightgenderwithin1
gender_2<-ICCweightgnederbetween+ICCweightgenderwithin2
but gender is categorical variable, so how I can create my gender variable?

Calculating a custom probability distribution in python (numerically)

I have a custom (discrete) probability distribution defined somewhat in the form: f(x)/(sum(f(x')) for x' in a given discrete set X). Also, 0<=x<=1.
So I have been trying to implement it in python 3.8.2, and the problem is that the numerator and denominator both come out to be really small and python's floating point representation just takes them as 0.0.
After calculating these probabilities, I need to sample a random element from an array, whose each index may be selected with the corresponding probability in the distribution. So if my distribution is [p1,p2,p3,p4], and my array is [a1,a2,a3,a4], then probability of selecting a2 is p2 and so on.
So how can I implement this in an elegant and efficient way?
Is there any way I could use the np.random.beta() in this case? Since the difference between the beta distribution and my actual distribution is only that the normalization constant differs and the domain is restricted to a few points.
Note: The Probability Mass function defined above is actually in the form given by the Bayes theorem and f(x)=x^s*(1-x)^f, where s and f are fixed numbers for a given iteration. So the exact problem is that, when s or f become really large, this thing goes to 0.
You could well compute things by working with logs. The point is that while both the numerator and denominator might underflow to 0, their logs won't unless your numbers are really astonishingly small.
You say
f(x) = x^s*(1-x)^t
so
logf (x) = s*log(x) + t*log(1-x)
and you want to compute, say
p = f(x) / Sum{ y in X | f(y)}
so
p = exp( logf(x) - log sum { y in X | f(y)}
= exp( logf(x) - log sum { y in X | exp( logf( y))}
The only difficulty is in computing the second term, but this is a common problem, for example here
On the other hand computing logsumexp is easy enough to to by hand.
We want
S = log( sum{ i | exp(l[i])})
if L is the maximum of the l[i] then
S = log( exp(L)*sum{ i | exp(l[i]-L)})
= L + log( sum{ i | exp( l[i]-L)})
The last sum can be computed as written, because each term is now between 0 and 1 so there is no danger of overflow, and one of the terms (the one for which l[i]==L) is 1, and so if other terms underflow, that is harmless.
This may however lose a little accuracy. A refinement would be to recognize the set A of indices where
l[i]>=L-eps (eps a user set parameter, eg 1)
And then compute
N = Sum{ i in A | exp(l[i]-L)}
B = log1p( Sum{ i not in A | exp(l[i]-L)}/N)
S = L + log( N) + B

sklearn customized standarization of data

Suppose I have a 2D numpy array:
X = np.array[
[..., ...],
[..., ...]]
And I want to standardize the data either with:
X = StandardScaler().fit_transform(X)
or:
X = (X - X.mean())/X.std()
The results are different. Why are they different?
Assuming X is a feature matrix of shape (n x m) (n instances and m features). We want to scale each feature so its instances are distributed with a mean of zero and with unit variance.
To do this you need to calculate the mean and standard deviation of each feature for the provided instances (column of X) and then calculate the scaled feature vectors. Currently you are calculating the mean and standard deviation of the whole dataset and scaling the data using these values: this will give you meaningless results in all but a few special cases (i.e., X = np.ones((100,2)) is such a special case).
Practically, to calculate these statistics for each feature you will need to set the axis parameter of the .mean() or .std() methods to 0. This will perform the calculations along the columns and return a (1 x m) shaped array (actually a (m,) array, but thats another story), where each value is the mean or standard deviation for the given column. You can then use numpy broadcasting to correctly scale the feature vectors.
The below example shows how you can correctly implement it manually. x1 and x2 are 2 features with 100 training instances. We store them in a feature matrix X.
x1 = np.linspace(0, 100, 100)
x2 = 10 * np.random.normal(size=100)
X = np.c_[x1, x2]
# scale the data using the sklearn implementation
X_scaled = StandardScaler().fit_transform(X)
# scale the data taking mean and std along columns
X_scaled_manual = (X - X.mean(axis=0)) / X.std(axis=0)
If you print the two you will see they match exactly, explicitly:
print(np.sum(X_scaled-X_scaled_manual))
returns 0.0.

estimated posteriors in JAGS by levels of a factor

I am running an N-mixture model in JAGS, trying to see if posterior predicted values of N are higher in one habitat than another. I am wondering how to obtain posterior probabilities of estimated population size for each habitat individually after running the model. So, e.g., if I wanted to sum across all sites, I'd put
totalN<-sum(N[]) in the JAGS model and identify "totalN" as one of my parameters. If I have 2 habitat levels over which to sum N, do I need a for loop or is there another way to define it?
Below is my model so far...
model{
priors
#abundance
beta0 ~ dnorm(0, 0.001) # log(lambda) intercept
beta1 ~ dnorm(0, 0.001) #this is my regression parameter for habitat
tau.T ~ dgamma(0.001, 0.001) #this is for random effect of transect
# detection
alpha.p ~ dgamma(0.01, 0.01)
beta.p ~ dgamma (0.01, 0.01)
Poisson model for abundance
for (i in 1:nsite){
loglam[i] <- beta1*habitat[i] + ranef[transect[i]]
loglam.lim[i] <- min(250, max(-250, loglam[i])) # 'Stabilize' log
lam[i] <- exp(loglam.lim[i])
N[i] ~ dpois(lam[i])
}
for (i in 1:14){
ranef[i]~dnorm(beta0,tau.T)
}
Measurement error model
for (i in 1:nsite){
for (j in 1:nrep){
y[i,j] ~ dbin(p[i,j], N[i])
p[i,j] ~ dbeta(alpha.p,beta.p) #detection probability follows a beta distribution
}
}
posterior predictions
Nperhabitat<-sum(N[habitat]) #this doesn't work, only estimates a single set of posterior densities for N
#and get a derived detection probability
}
I am going to assume here that habitat is a binary vector. I would add two additional vectors to your data that define which elements in habitat are 1 and which are 0. From there you can index N with those two vectors.
# done in R and added to the data list supplied to JAGS
hab_1 <- which(habitat == 1)
hab_0 <- which(habitat == 0)
# add to data list
data_list <- list(..., hab_1 = hab_1, hab_0 = hab_0)
Then, inside the JAGS model you would just add:
N_habitat_1 <- sum(N[hab_1])
N_habitat_0 <- sum(N[hab_0])
This is effectively telling JAGS to provide the total abundance per habitat type. If you have way more sites of one habitat vs another this abundance may hide that the density of individuals could actually be less. Thus, you may want to divide this abundance by the total number of sites of each habitat type:
dens_habitat_1 <- sum(N[hab_1]) / sum(habitat)
dens_habitat_0 <- sum(N[hab_0]) / sum(1 - habitat)
This is, of course, assuming that habitat is binary.

Euler beam, solving differential equation in python

I must solve the Euler Bernoulli differential beam equation which is:
w’’’’(x) = q(x)
and boundary conditions:
w(0) = w(l) = 0
and
w′′(0) = w′′(l) = 0
The beam is as shown on the picture below:
beam
The continious force q is 2N/mm.
I have to use shooting method and scipy.integrate.odeint() func.
I can't even manage to start as i do not understand how to write the differential equation as a system of equation
Can someone who understands solving of differential equations with boundary conditions in python please help!
Thanks :)
The shooting method
To solve the fourth order ODE BVP with scipy.integrate.odeint() using the shooting method you need to:
1.) Separate the 4th order ODE into 4 first order ODEs by substituting:
u = w
u1 = u' = w' # 1
u2 = u1' = w'' # 2
u3 = u2' = w''' # 3
u4 = u3' = w'''' = q # 4
2.) Create a function to carry out the derivation logic and connect that function to the integrate.odeint() like this:
function calc(u, x , q)
{
return [u[1], u[2], u[3] , q]
}
w = integrate.odeint(calc, [w(0), guess, w''(0), guess], xList, args=(q,))
Explanation:
We are sending the boundary value conditions to odeint() for x=0 ([w(0), w'(0) ,w''(0), w'''(0)]) which calls the function calc which returns the derivatives to be added to the current state of w. Note that we are guessing the initial boundary conditions for w'(0) and w'''(0) while entering the known w(0)=0 and w''(0)=0.
Addition of derivatives to the current state of w occurs like this:
# the current w(x) value is the previous value plus the current change of w in dx.
w(x) = w(x-dx) + dw/dx
# others are calculated the same
dw(x)/dx = dw(x-dx)/dx + d^2w(x)/dx^2
# etc.
This is why we are returning values [u[1], u[2], u[3] , q] instead of [u[0], u[1], u[2] , u[3]] from the calc function, because u[1] is the first derivative so we add it to w, etc.
3.) Now we are able to set up our shooting method. We will be sending different initial boundary values for w'(0) and w'''(0) to odeint() and then check the end result of the returned w(x) profile to determine how close w(L) and w''(L) got to 0 (the known boundary conditions).
The program for the shooting method:
# a function to return the derivatives of w
def returnDerivatives(u, x, q):
return [u[1], u[2], u[3], q]
# a shooting funtion which takes in two variables and returns a w(x) profile for x=[0,L]
def shoot(u2, u4):
# the number of x points to calculate integration -> determines the size of dx
# bigger number means more x's -> better precision -> longer execution time
xSteps = 1001
# length of the beam
L= 1.0 # 1m
xSpace = np.linspace(0, L, xSteps)
q = 0.02 # constant [N/m]
# integrate and return the profile of w(x) and it's derivatives, from x=0 to x=L
return odeint(returnDerivatives, [ 0, u2, 0, u4] , xSpace, args=(q,))
# the tolerance for our results.
tolerance = 0.01
# how many numbers to consider for u2 and u4 (the guess boundary conditions)
u2_u4_maxNumbers = 1327 # bigger number, better precision, slower program
# you can also divide into separate variables like u2_maxNum and u4_maxNum
# these are already tested numbers (the best results are somewhere in here)
u2Numbers = np.linspace(-0.1, 0.1, u2_u4_maxNumbers)
# the same as above
u4Numbers = np.linspace(-0.5, 0.5, u2_u4_maxNumbers)
# result list for extracted values of each w(x) profile => [u2Best, u4Best, w(L), w''(L)]
# which will help us determine if the w(x) profile is inside tolerance
resultList = []
# result list for each U (or w(x) profile) => [w(x), w'(x), w''(x), w'''(x)]
resultW = []
# start generating numbers for u2 and u4 and send them to odeint()
for u2 in u2Numbers:
for u4 in u4Numbers:
U = []
U = shoot(u2,u4)
# get only the last row of the profile to determine if it passes tolerance check
result = U[len(U)-1]
# only check w(L) == 0 and w''(L) == 0, as those are the known boundary cond.
if (abs(result[0]) < tolerance) and (abs(result[2]) < tolerance):
# if the result passed the tolerance check, extract some values from the
# last row of the w(x) profile which we will need later for comaprisons
resultList.append([u2, u4, result[0], result[2]])
# add the w(x) profile to the list of profiles that passed the tolerance
# Note: the order of resultList is the same as the order of resultW
resultW.append(U)
# go through the resultList (list of extracted values from last row of each w(x) profile)
for i in range(len(resultList)):
x = resultList[i]
# both boundary conditions are 0 for both w(L) and w''(L) so we will simply add
# the two absolute values to determine how much the sum differs from 0
y = abs(x[2]) + abs(x[3])
# if we've just started set the least difference to the current
if i == 0:
minNum = y # remember the smallest difference to 0
index = 0 # remember index of best profile
elif y < minNum:
# current sum of absolute values is smaller
minNum = y
index = i
# print out the integral for w(x) over the beam
sum = 0
for i in resultW[index]:
sum = sum + i[0]
print("The integral of w(x) over the beam is:")
print(sum/1001) # sum/xSteps
This outputs:
The integral of w(x) over the beam is:
0.000135085272117
To print out the best profile for w(x) that we found:
print(resultW[index])
which outputs something like:
# w(x) w'(x) w''(x) w'''(x)
[[ 0.00000000e+00 7.54147813e-04 0.00000000e+00 -9.80392157e-03]
[ 7.54144825e-07 7.54142917e-04 -9.79392157e-06 -9.78392157e-03]
[ 1.50828005e-06 7.54128237e-04 -1.95678431e-05 -9.76392157e-03]
...,
[ -4.48774290e-05 -8.14851572e-04 1.75726275e-04 1.01560784e-02]
[ -4.56921910e-05 -8.14670764e-04 1.85892353e-04 1.01760784e-02]
[ -4.65067671e-05 -8.14479780e-04 1.96078431e-04 1.01960784e-02]]
To double check the results from above we will also solve the ODE using the numerical method.
The numerical method
To solve the problem using the numerical method we first need to solve the differential equations. We will get four constants which we need to find with the help of the boundary conditions. The boundary conditions will be used to form a system of equations to help find the necessary constants.
For example:
w’’’’(x) = q(x);
means that we have this:
d^4(w(x))/dx^4 = q(x)
Since q(x) is constant after integrating we have:
d^3(w(x))/dx^3 = q(x)*x + C
After integrating again:
d^2(w(x))/dx^2 = q(x)*0.5*x^2 + C*x + D
After another integration:
dw(x)/dx = q(x)/6*x^3 + C*0.5*x^2 + D*x + E
And finally the last integration yields:
w(x) = q(x)/24*x^4 + C/6*x^3 + D*0.5*x^2 + E*x + F
Then we take a look at the boundary conditions (now we have expressions from above for w''(x) and w(x)) with which we make a system of equations to solve the constants.
w''(0) => 0 = q(x)*0.5*0^2 + C*0 + D
w''(L) => 0 = q(x)*0.5*L^2 + C*L + D
This gives us the constants:
D = 0 # from the first equation
C = - 0.01 * L # from the second (after inserting D=0)
After repeating the same for w(0)=0 and w(L)=0 we obtain:
F = 0 # from first
E = 0.01/12.0 * L^3 # from second
Now, after we have solved the equation and found all of the integration constants we can make the program for the numerical method.
The program for the numerical method
We will make a FOR loop to go through the entire beam for every dx at a time and sum up (integrate) w(x).
L = 1.0 # in meters
step = 1001.0 # how many steps to take (dx)
q = 0.02 # constant [N/m]
integralOfW = 0.0; # instead of w(0) enter the boundary condition value for w(0)
result = []
for i in range(int(L*step)):
x= i/step
w = (q/24.0*pow(x,4) - 0.02/12.0*pow(x,3) + 0.01/12*pow(L,3)*x)/step # current w fragment
# add up fragments of w for integral calculation
integralOfW += w
# add current value of w(x) to result list for plotting
result.append(w*step);
print("The integral of w(x) over the beam is:")
print(integralOfW)
which outputs:
The integral of w(x) over the beam is:
0.00016666652805511192
Now to compare the two methods
Result comparison between the shooting method and the numerical method
The integral of w(x) over the beam:
Shooting method -> 0.000135085272117
Numerical method -> 0.00016666652805511192
That's a pretty good match, now lets see check the plots:
From the plots it's even more obvious that we have a good match and that the results of the shooting method are correct.
To get even better results for the shooting method increase xSteps and u2_u4_maxNumbers to bigger numbers and you can also narrow down the u2Numbers and u4Numbers to the same set size but a smaller interval (around the best results from previous program runs). Keep in mind that setting xSteps and u2_u4_maxNumbers too high will cause your program to run for a very long time.
You need to transform the ODE into a first order system, setting u0=w one possible and usually used system is
u0'=u1,
u1'=u2,
u2'=u3,
u3'=q(x)
This can be implemented as
def ODEfunc(u,x): return [ u[1], u[2], u[3], q(x) ]
Then make a function that shoots with experimental initial conditions and returns the components of the second boundary condition
def shoot(u01, u03): return odeint(ODEfunc, [0, u01, 0, u03], [0, l])[-1,[0,2]]
Now you have a function of two variables with two components and you need to solve this 2x2 system with the usual methods. As the system is linear, the shooting function is linear as well and you only need to find the coefficients and solve the resulting linear system.

Resources