Initializing a Matrix to NA in Rcpp - rcpp

There is a way to initialize Numeric vector with NA values like.
NumericVector x(10,NumericVector::get_na())
is there any similar way to initialize a matrix to NA values?

Here is a version that does not waste memory.
#include <Rcpp.h>
using namespace Rcpp ;
// [[Rcpp::export]]
NumericMatrix na_matrix(int n){
NumericMatrix m(n,n) ;
std::fill( m.begin(), m.end(), NumericVector::get_na() ) ;
return m ;
}
FWIW, in Rcpp11, you can use some more expressive syntax:
NumericMatrix m(n,n, NA) ;
Thanks to this constructor

Sort of.
Matrices in R really are vectors with a dimension attributes. So here is one way for a square matrix which generalizes easily to rectangular matrices:
R> cppFunction('NumericMatrix foo2(int a) {
+ NumericVector v = NumericVector(a*a,NumericVector::get_na());
+ return NumericMatrix(a,a,v.begin());
+ }')
> foo2(2)
[,1] [,2]
[1,] NA NA
[2,] NA NA
R>
Edit: But for almost all real work I use Armadillo matrix classes as they are easy to use, mature, performant and pretty feature-complete. They also have a number of constructors which fill, but (currently?) none for NA. But it is trivial to call the fill() method on the matrix constructed:
R> cppFunction('arma::mat foo3(int a) {
+ return arma::mat(a,a).fill(NA_REAL);
+ }', depends="RcppArmadillo")
R> foo3(2)
[,1] [,2]
[1,] NA NA
[2,] NA NA
R>

Related

Question about array multiplication in JAGS

I am working with race-stratified population estimates and I want to integrate race-stratified populations from three different data sources (census, PEP, and ACS). I developed a model to use information from all these three sources and estimate the true population which is defined as gamma.ctr for county c time t and race r (1=white and 2 for non-white).
The problem is that PEP data is not race-stratified and I need to find a way to estimate race-stratified pep data.
Before, I used one of the other two sources (census or ACS) to estimate ethnicity proportions and multiply PEP data by these proportions to obtain race-stratified PEP population as input data to the model.
Now I decided to do this multiplication within the model based on ethnicity proportions that are defined by gamma.ctr (true pop in county c, year t, and race r) which is updated by all data sources not one of them.
So I considered the input PEP data as peppop.ct (the population for county c and time t, not race-stratified). Then I defined ethnicity proportion as:
prob[c,t]=gamma.ctr[c,t,1]/(gamma.ctr[c,t,1]+gamma.ctr[c,t,2])
I want to multiply PEP data by these proportions to find race-stratified estimates within the JAGS model:
for (c in 1:Narea){
for (t in 1:nyears){
prob.ct[c,t]<-gamma.ctr[c,t,1]/(gamma.ctr[c,t,1]+gamma.ctr[c,1,2])
peppop.ctr[c,t,1]<-peppop.ct[c,t] * prob.ct[c,t]
peppop.ctr[c,t,2]<-peppop.ct[c,t] * (1-prob.ct[c,t])
}
}
I want to use this peppop.ctr as a response varaible later like this:
for (t in 1:nyears){
peppop.ctr[c,t,r] ~ dnorm(gamma.ctr[c,t,r], taupep.ctr[c,t,r])
}
But I receive this error:
Attempt to redefine node peppop.cpr[1,1,1]
It think the reason for this error is the fact that peppop.ctr are defined twice in left hand side of the equation and the error is related to redefining peppop.ctr in line:
peppop.ctr[c,t,1]<-peppop.ct[c,t] * prob.ct[c,t]
Is it possible to help me to solve this error. I need to estimate peppop.ctr first and then use these estimates to update gamma.ctr parameters. Any help is really appreciated.
You can use the zeros trick to both define a variable (e.g., y below) and then also use that variable as the dependent variable in some subsequent analysis. Here's an example:
library(runjags)
x <- rnorm(1000)
y <- 2 + 3 * x + rnorm(1000)
p <- runif(1000, .1, .9)
w <- y*p
z <- y-w
datl <- list(
x=x,
w=w,
z=z,
zeros = rep(0, length(x)),
N = length(x)
)
mod <- "model{
y <- w + z
C <- 10000 # this just has to be large enough to ensure all phi[i]'s > 0
for (i in 1:N) {
L[i] <- dnorm(y[i], mu[i], tau)
mu[i] <- b[1] + b[2]*x[i]
phi[i] <- -log(L[i]) + C
zeros[i] ~ dpois(phi[i])
}
#sig ~ dunif(0, sd(y))
#tau <- pow(sig, -2)
tau ~ dgamma(1,.1)
b[1] ~ dnorm(0, .0001)
b[2] ~ dnorm(0, .0001)
}
"
out <- run.jags(model = mod, data=datl, monitor = c("b", "tau"), n.chains = 2)
summary(out)
#> Lower95 Median Upper95 Mean SD Mode MCerr MC%ofSD
#> b[1] 1.9334538 1.991586 2.051245 1.991802 0.03026722 NA 0.0002722566 0.9
#> b[2] 2.9019547 2.963257 3.023057 2.963190 0.03057184 NA 0.0002744883 0.9
#> tau 0.9939587 1.087178 1.183521 1.087744 0.04845667 NA 0.0004280217 0.9
#> SSeff AC.10 psrf
#> b[1] 12359 -0.010240572 1.0000684
#> b[2] 12405 -0.006480322 0.9999677
#> tau 12817 0.010135609 1.0000195
summary(lm(y ~ x))
#>
#> Call:
#> lm(formula = y ~ x)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -3.2650 -0.6213 -0.0032 0.6528 3.3956
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 1.99165 0.03034 65.65 <2e-16 ***
#> x 2.96340 0.03013 98.34 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.9593 on 998 degrees of freedom
#> Multiple R-squared: 0.9065, Adjusted R-squared: 0.9064
#> F-statistic: 9671 on 1 and 998 DF, p-value: < 2.2e-16
Created on 2022-05-14 by the reprex package (v2.0.1)

Spatially adaptive smoothing using mgcv package in R

I am using the gam function in the mgcv package to fit spatially adaptive smoothing for heterogeneous data. This is my R code for fitting.
library(MASS)
data(mcycle)
fit <- gam(accel ~ s(times, k = 20, bs = 'ad'), data = mcycle, method = 'REML')
The output contains 5 smoothing parameters. I am trying to extract the values for each smoothing parameter ( S[[i]] for $i =1,..5$) and I used fit$S[[1]] to get the first smoothing parameter values, but it does not work. Could someone help me with this?
You want the $sp component
> fit$sp
s(times)1 s(times)2 s(times)3 s(times)4 s(times)5
1.364206e+01 5.204389e-04 2.036490e-03 8.565542e+00 2.428618e+03
The $S component of the $smooth list contains the penalty matrices associated with the five smoothing parameters.
See ?gamObject and ?smooth.construct for further details on what is returned in the fit.
If you really want the penalty matrices, then look at the structure of the smooth component:
> str(fit$smooth, max = 1)
List of 1
$ :List of 26
..- attr(*, "class")= chr [1:2] "pspline.smooth" "mgcv.smooth"
..- attr(*, "qrc")=List of 4
.. ..- attr(*, "class")= chr "qr"
..- attr(*, "nCons")= int 1
Even if there is a single smooth, the $smooth is a list. So we need fit$smooth[[1]] to access this smooth. Now if we look at the $S component of the smooth we see
> str(fit$smooth[[1]]$S, max = 1)
List of 5
$ : num [1:19, 1:19] 0.4446 -0.2845 0.0913 0.0426 0.0943 ...
$ : num [1:19, 1:19] 0.3417 -0.2441 0.0845 0.0341 0.0654 ...
$ : num [1:19, 1:19] 0.0913 -0.0734 0.0271 0.0109 0.0141 ...
$ : num [1:19, 1:19] 4.13e-05 -3.46e-05 4.10e-05 1.32e-04 -3.96e-05 ...
$ : num [1:19, 1:19] 1.68e-06 2.43e-06 3.49e-06 4.62e-06 1.08e-05 ...
Which indicates that there are five penalty matrices associated with this smooth and that each matrix is a component of the S list. Hence, for the ith penalty matrix we need
fit$smooth[[1]]$S[[ i ]]
Hence for the second penalty matrix we need
fit$smooth[[1]]$S[[2]]
the first six rows and columns of which look like this
> fit$smooth[[1]]$S[[2]][1:6, 1:6]
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0.34168394 -0.24407752 0.084500619 0.03412496 0.06538967 0.054028500
[2,] -0.24407752 0.36254851 -0.255915616 0.05368650 -0.03418746 -0.019895116
[3,] 0.08450062 -0.25591562 0.352961000 -0.21961696 0.04421239 0.001082056
[4,] 0.03412496 0.05368650 -0.219616955 0.35168761 -0.18138207 0.077301400
[5,] 0.06538967 -0.03418746 0.044212389 -0.18138207 0.25012833 -0.178018503
[6,] 0.05402850 -0.01989512 0.001082056 0.07730140 -0.17801850 0.264159096

Latent variable with Gaussian mixture model to impute the missing data

I'm currently trying to impute the missing data through Gaussian mixture model.
My reference paper is from here:
http://mlg.eng.cam.ac.uk/zoubin/papers/nips93.pdf
I currently focus on bivariate dataset with 2 Gaussian components.
This is the code to define the weight for each Gaussian component:
myData = faithful[,1:2]; # the data matrix
for (i in (1:N)) {
prob1 = pi1*dmvnorm(na.exclude(myData[,1:2]),m1,Sigma1); # probabilities of sample points under model 1
prob2 = pi2*dmvnorm(na.exclude(myData[,1:2]),m2,Sigma2); # same for model 2
Z<-rbinom(no,1,prob1/(prob1 + prob2 )) # Z is latent variable as to assign each data point to the particular component
pi1<-rbeta(1,sum(Z)+1/2,no-sum(Z)+1/2)
if (pi1>1/2) {
pi1<-1-pi1
Z<-1-Z
}
}
This is my code to define the missing values:
> whichMissXY<-myData[ which(is.na(myData$waiting)),1:2]
> whichMissXY
eruptions waiting
11 1.833 NA
12 3.917 NA
13 4.200 NA
14 1.750 NA
15 4.700 NA
16 2.167 NA
17 1.750 NA
18 4.800 NA
19 1.600 NA
20 4.250 NA
My constraint is, how to impute the missing data in "waiting" variable based on particular component.
This code is my first attempt to impute the missing data using conditional mean imputation. I know, it is definitely in the wrong way. The outcome would not lie to the particular component and produce outlier.
miss.B2 <- which(is.na(myData$waiting))
for (i in miss.B2) {
myData[i, "waiting"] <- m1[2] + ((rho * sqrt(Sigma1[2,2]/Sigma1[1,1])) * (myData[i, "eruptions"] - m1[1] ) + rnorm(1,0,Sigma1[2,2]))
#print(miss.B[i,])
}
I would appreciate if someone could give any advice on how to improve the imputation technique that could work with latent/hidden variable through Gaussian mixture model.
Thank you in advance
This is a solution for one type of covariance structure.
devtools::install_github("alexwhitworth/emclustr")
library(emclustr)
data(faithful)
set.seed(23414L)
ff <- apply(faithful, 2, function(j) {
na_idx <- sample.int(length(j), 50, replace=F)
j[na_idx] <- NA
return(j)
})
ff2 <- em_clust_mvn_miss(ff, nclust=2)
# hmm... seems I don't return the imputed values.
# note to self to update the code
plot(faithful, col= ff2$mix_est)
And the parameter outputs
$it
[1] 27
$clust_prop
[1] 0.3955708 0.6044292
$clust_params
$clust_params[[1]]
$clust_params[[1]]$mu
[1] 2.146797 54.833431
$clust_params[[1]]$sigma
[1] 13.41944
$clust_params[[2]]
$clust_params[[2]]$mu
[1] 4.317408 80.398192
$clust_params[[2]]$sigma
[1] 13.71741

Element-Wise Matrix Multiplication in Rcpp

I am working on a code that requires an element-wise matrix multiplication. I am trying to implement this in Rcpp since the code requires some expensive loops. I am fairly new to Rcpp, and may be missing something, but I cannot get the element-wise matrix multiplication to work.
// [[Rcpp::export]]
NumericMatrix multMat(NumericMatrix m1, NumericMatrix m2) {
NumericMatrix multMatrix = m1 * m2 // How can this be implemented ?
}
I may be missing something very trivial, and wanted to ask if there was any method to do this (other than using loops to iterate over each element and multiply).
Thanks in advance.
You probably want to use RcppArmadillo (or RcppEigen) for actual math on matrices.
R> library(RcppArmadillo)
R> cppFunction("arma::mat schur(arma::mat& a, arma::mat& b) {
+ return(a % b); }", depends="RcppArmadillo")
R> schur(matrix(1:4,2,2), matrix(4:1,2,2))
[,1] [,2]
[1,] 4 6
[2,] 6 4
R>
Element-wise multiplication is also called Schur (or Hadamard) multiplication. In Armadillo, the % supports it; see the Armadillo docs for more.
If you want to fake it, you can follow what's done here and use Rcpp's sugar on regular vectors, and convert them to matrices as needed:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector multMat(NumericMatrix m1, NumericMatrix m2) {
NumericVector multMatrix = m1 * m2;
multMatrix.attr("dim") = Dimension(m1.nrow(), m1.ncol());
return multMatrix;
}
/*** R
multMat( matrix(1:9, nrow=3), matrix(1:9, nrow=3) )
*/
But, as Dirk said, you're better off using RcppArmadillo for matrix operations.

R: Call matrixes from a vector of string names?

Imagine I've got 100 numeric matrixes with 5 columns each.
I keep the names of that matrixes in a vector or list:
Mat <- c("GON1EU", "GON2EU", "GON3EU", "NEW4", ....)
I also have a vector of coefficients "coef",
coef <- c(1, 2, 2, 1, ...)
And I want to calculate a resulting vector in this way:
coef[1]*GON1EU[,1]+coef[2]*GON2EU[,1]+coef[3]*GON3EU[,1]+coef[4]*NEW4[,1]+.....
How can I do it in a compact way, using the the vector of names?
Something like:
coef*(Object(Mat)[,1])
I think the key is how to call an object from a string with his name and use and vectorial notation. But I don't know how.
get() allows you to refer to an object by a string. It will only get you so far though; you'll still need to construct the repeated call to get() on the list matrices etc. However, I wonder if an alternative approach might be feasible? Instead of storing the matrices separately in the workspace, why not store the matrices in a list?
Then you can use sapply() on the list to extract the first column of each matrix in the list. The sapply() step returns a matrix, which we multiply by the coefficient vector. The column sums of that matrix are the values you appear to want from your above description. At least I'm assuming that coef[1]*GON1EU[,1] is a vector of length(GON1EU[,1]), etc.
Here's some code implementing this idea.
vec <- 1:4 ## don't use coef - there is a function with that name
mat <- matrix(1:12, ncol = 3)
myList <- list(mat1 = mat, mat2 = mat, mat3 = mat, mat4 = mat)
colSums(sapply(myList, function(x) x[, 1]) * vec)
Here is some output:
> sapply(myList, function(x) x[, 1]) * vec
mat1 mat2 mat3 mat4
[1,] 1 1 1 1
[2,] 4 4 4 4
[3,] 9 9 9 9
[4,] 16 16 16 16
> colSums(sapply(myList, function(x) x[, 1]) * vec)
mat1 mat2 mat3 mat4
30 30 30 30
The above example suggest you create, or read in, your 100 matrices as components of a list from the very beginning of your analysis. This will require you to alter the code you used to generate the 100 matrices. Seeing as you already have your 100 matrices in your workspace, to get myList from these matrices we can use the vector of names you already have and use a loop:
Mat <- c("mat","mat","mat","mat")
## loop
for(i in seq_along(myList2)) {
myList[[i]] <- get(Mat[i])
}
## or as lapply call - Kudos to Ritchie Cotton for pointing that one out!
## myList <- lapply(Mat, get)
myList <- setNames(myList, paste(Mat, 1:4, sep = ""))
## You only need:
myList <- setNames(myList, Mat)
## as you have the proper names of the matrices
I used "mat" repeatedly in Mat as that is the name of my matrix above. You would use your own Mat. If vec contains what you have in coef, and you create myList using the for loop above, then all you should need to do is:
colSums(sapply(myList, function(x) x[, 1]) * vec)
To get the answer you wanted.
See help(get) and that's that.
If you'd given us a reproducible example I'd have said a bit more. For example:
> a=1;b=2;c=3;d=4
> M=letters[1:4]
> M
[1] "a" "b" "c" "d"
> sum = 0 ; for(i in 1:4){sum = sum + i * get(M[i])}
> sum
[1] 30
Put whatever you need in the loop, or use apply over the vector M and get the object:
> sum(unlist(lapply(M,function(n){get(n)^2})))
[1] 30

Resources