using data.table with multiple threads in R - multithreading

Is there a way to utilize multiple threads for computation using data.table in R? For example let's say i have the following data.table:
dtb <- data.table(id=rep(1:10000, 1000), x=1:1e7)
setkey(dtb, id)
f <- function(m) { #some really complicated function }
res <- dtb[,f(x), by=id]
Is there a way to get R to multithread this if f takes a while to compute? What about in the case that f is quick, will multithreading help or is most of the time going to be taken by data.table in splitting things up into groups?

I am not sure that this is "multi-threading", but perhaps you meant to include a multi-core solution? If so, then look at this earlier answer: Performing calculations by subsets of data in R found with a search for "[r] [data.table] parallel"
Edit: (doubling of speed on a 4 core machine, but my system monitor suggests this only used 2 cores during the mclapply call.) Code copied from this thread: http://r.789695.n4.nabble.com/Access-to-local-variables-in-quot-j-quot-expressions-tt2315330.html#a2315337
calc.fake.dt.mclapply <- function (dt) {
mclapply(6*c(1000,1:4,6,8,10),
function(critical.age) {
dt$tmp <- pmax((dt$age < critical.age) * dt$x, 0)
dt[, cumsum.lag(tmp), by = grp]$V1})
}
mk.fake.df <- function (n.groups=10000, n.per.group=70) {
data.frame(grp=rep(1:n.groups, each=n.per.group),
age=rep(0:(n.per.group-1), n.groups),
x=rnorm(n.groups * n.per.group),
## These don't do anything, but only exist to give
## the table a similar size to the real data.
y1=rnorm(n.groups * n.per.group),
y2=rnorm(n.groups * n.per.group),
y3=rnorm(n.groups * n.per.group),
y4=rnorm(n.groups * n.per.group)) }
df <- mk.fake.df
df <- mk.fake.df()
calc.fake.dt.lapply <- function (dt) { # use base lapply for testing
lapply(6*c(1000,1:4,6,8,10),
function(critical.age) {
dt$tmp <- pmax((dt$age < critical.age) * dt$x, 0)
dt[, cumsum.lag(tmp), by = grp]$V1})
}
mk.fake.dt <- function (fake.df) {
fake.dt <- as.data.table(fake.df)
setkey(fake.dt, grp, age)
fake.dt
}
dt <- mk.fake.dt()
require(data.table)
dt <- mk.fake.dt(df)
cumsum.lag <- function (x) {
x.prev <- c(0, x[-length(x)])
cumsum(x.prev)
}
system.time(res.dt.mclapply <- calc.fake.dt.mclapply(dt))
user system elapsed
1.896 4.413 1.210
system.time(res.dt.lapply <- calc.fake.dt.lapply(dt))
user system elapsed
1.391 0.793 2.175

Related

RcppArmadillo: diagonal matrix multiplication is very slow

Let x be a vector and M a matrix.
In R, I can do
D <- diag(exp(x))
crossprod(M, D%M)
and in RcppArmadillo, I have the following which is much slower.
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::mat multiple_mnv(const arma::vec& x, const arma::mat& M) {
arma::colvec diagonal(x.size())
for (int i = 0; i < x.size(); i++)
{
diagonal(i) = exp(x[i]);
}
arma::mat D = diagmat(diagonal);
return M.t()*D*M;
}
Why is this so slow? How can I speed this up?
Welcome to Stack Overflow manju. For future questions, please be advised that a minimal reproducible example is expected, and in fact is in your best interest to provide; it helps others help you. Here's an example of how you could provide example data for others to work with:
## Set seed for reproducibility
set.seed(123)
## Generate data
x <- rnorm(10)
M <- matrix(rnorm(100), nrow = 10, ncol = 10)
## Output code for others to copy your objects
dput(x)
dput(M)
This is the data I will work with to show that your C++ code is in fact not slower than R. I used your C++ code (adding in a missing semicolon):
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::mat foo(const arma::vec& x, const arma::mat& M) {
arma::colvec diagonal(x.size());
for ( int i = 0; i < x.size(); i++ )
{
diagonal(i) = exp(x[i]);
}
arma::mat D = diagmat(diagonal);
return M.t() * D * M;
}
Note also that I had to make some of my own choices about the type of the return object and types of the function arguments (this is one of the places where a minimal reproducible example could help you: What if these choices affect my results?) I then create an R function to do what foo() does:
bar <- function(v, M) {
D <- diag(exp(v))
return(crossprod(M, D %*% M))
}
Note also that I had to fix a typo you had, changing D%M to D %*% M. Let's double check they give the same results:
all.equal(foo(x, M), bar(x, M))
# [1] TRUE
Now let's explore how fast they are:
library(microbenchmark)
bench <- microbenchmark(cpp = foo(x, M), R = foo(x, M), times = 1e5)
bench
# Unit: microseconds
# expr min lq mean median uq max
# cpp 22.185 23.015 27.00436 23.204 23.461 31143.30
# R 22.126 23.028 25.48256 23.216 23.475 29628.86
Those look pretty much the same to me! We can also look at a density plot of the times (throwing out the extreme value outliers to make things a little clearer):
cpp_times <- with(bench, time[expr == "cpp"])
R_times <- with(bench, time[expr == "R"])
cpp_time_dens <- density(cpp_times[cpp_times < quantile(cpp_times, 0.95)])
R_time_dens <- density(R_times[R_times < quantile(R_times, 0.95)])
plot(cpp_time_dens, col = "blue", xlab = "Time (in nanoseconds)", ylab = "",
main = "Comparing C++ and R execution time")
lines(R_time_dens, col = "red")
legend("topright", col = c("blue", "red"), bty = "n", lty = 1,
legend = c("C++ function (foo)", "R function (bar)"))
Why?
As helpfully pointed out by Dirk Eddelbuettel in the comments, in the end both R and Armadillo are going to be calling a LAPACK or BLAS routine anyways -- you shouldn't expect much difference unless you can give Armadillo a hint on how to be more efficient.
Can we make the Armadillo code faster?
Yes! As pointed out by mtall in the comments, we can give Armadillo the hint that we're dealing with a diagonal matrix. Let's try; we'll use the following code:
// [[Rcpp::export]]
arma::mat baz(const arma::vec& x, const arma::mat& M) {
return M.t() * diagmat(arma::exp(x)) * M;
}
And benchmark it:
all.equal(foo(x, M), baz(x, M))
# [1] TRUE
library(microbenchmark)
bench <- microbenchmark(cpp = foo(x, M), R = foo(x, M),
cpp2 = baz(x, M), times = 1e5)
bench
# Unit: microseconds
# expr min lq mean median uq max
# cpp 22.822 23.757 27.57015 24.118 24.632 26600.48
# R 22.855 23.771 26.44725 24.124 24.638 30619.09
# cpp2 20.035 21.218 25.49863 21.587 22.123 36745.72
We see a small but sure improvement; let's take a look graphically as we did before:
cpp_times <- with(bench, time[expr == "cpp"])
cpp2_times <- with(bench, time[expr == "cpp2"])
R_times <- with(bench, time[expr == "R"])
cpp_time_dens <- density(cpp_times[cpp_times < quantile(cpp_times, 0.95)])
cpp2_time_dens <- density(cpp2_times[cpp2_times < quantile(cpp2_times, 0.95)])
R_time_dens <- density(R_times[R_times < quantile(R_times, 0.95)])
xlims <- range(c(cpp_time_dens$x, cpp2_time_dens$x, R_time_dens$x))
ylims <- range(c(cpp_time_dens$y, cpp2_time_dens$y, R_time_dens$y))
ylims <- ylims * c(1, 1.15)
cols <- c("#0072b2", "#f0e442", "#d55e00")
cols <- c("#e69f00", "#56b4e9", "#009e73")
labs <- c("C++ original", "C++ improved", "R")
plot(cpp_time_dens, col = cols[1], xlim = xlims, ylim = ylims,
xlab = "Time (in nanoseconds)", ylab = "",
main = "Comparing C++ and R execution time")
lines(cpp2_time_dens, col = cols[2])
lines(R_time_dens, col = cols[3])
legend("topleft", col = cols, bty = "n", lty = 1, legend = labs, horiz = TRUE)

How to call model.matrix or equivalent from RCPP, possibly in threaded code?

we were hoping to use threads to get things going faster in an algorithm with many loops whose results are not interdependent.
within the code we hoped to port to rcpp, there is a call to model.matrix.
This did not appear straightforward to port.
Investigating this further (as to what code this runs for our use case), revealed that the S3 method for lm objects does some preparatory work on the variable and then calls the default version of the function as can be seen in this copy-paste of the code:
function (object, ...)
{
if (n_match <- match("x", names(object), 0L))
object[[n_match]]
else {
data <- model.frame(object, xlev = object$xlevels, ...)
if (exists(".GenericCallEnv", inherits = FALSE))
NextMethod("model.matrix", data = data, contrasts.arg = object$contrasts)
else {
dots <- list(...)
dots$data <- dots$contrasts.arg <- NULL
do.call("model.matrix.default", c(list(object = object,
data = data, contrasts.arg = object$contrasts),
dots))
}
}
}
the default version of the function farms at least some of its functionality out to a compiled C function:
function (object, data = environment(object), contrasts.arg = NULL,
xlev = NULL, ...) {
t <- if (missing(data))
terms(object)
else terms(object, data = data)
if (is.null(attr(data, "terms")))
data <- model.frame(object, data, xlev = xlev)
else {
reorder <- match(vapply(attr(t, "variables"), deparse2,
"")[-1L], names(data))
if (anyNA(reorder))
stop("model frame and formula mismatch in model.matrix()")
if (!identical(reorder, seq_len(ncol(data))))
data <- data[, reorder, drop = FALSE]
}
int <- attr(t, "response")
if (length(data)) {
contr.funs <- as.character(getOption("contrasts"))
namD <- names(data)
for (i in namD) if (is.character(data[[i]]))
data[[i]] <- factor(data[[i]])
isF <- vapply(data, function(x) is.factor(x) || is.logical(x),
NA)
isF[int] <- FALSE
isOF <- vapply(data, is.ordered, NA)
for (nn in namD[isF]) if (is.null(attr(data[[nn]], "contrasts")))
contrasts(data[[nn]]) <- contr.funs[1 + isOF[nn]]
if (!is.null(contrasts.arg)) {
if (!is.list(contrasts.arg))
warning("non-list contrasts argument ignored")
else {
if (is.null(namC <- names(contrasts.arg)))
stop("'contrasts.arg' argument must be named")
for (nn in namC) {
if (is.na(ni <- match(nn, namD)))
warning(gettextf("variable '%s' is absent, its contrast will be ignored",
nn), domain = NA)
else {
ca <- contrasts.arg[[nn]]
if (is.matrix(ca))
contrasts(data[[ni]], ncol(ca)) <- ca
else contrasts(data[[ni]]) <- contrasts.arg[[nn]]
}
}
}
}
}
else {
isF <- FALSE
data[["x"]] <- raw(nrow(data))
}
ans <- .External2(C_modelmatrix, t, data)
if (any(isF))
attr(ans, "contrasts") <- lapply(data[isF], attr,
"contrasts")
ans
}
is there some way of calling C_modelmatrix from Rcpp at all, whether it is single OR multi-threaded? Is there any library or package that does essentially the same thing from within Rcpp so I don't have to reinvent the wheel here? I'd rather not have to fully re-implement everything that model.matrix does if I can avoid it.
as we don't actually have functioning code, there isn't any to show for this yet.
The relevant portion of the function we were trying to speed up calls model.matrix like this: ("model.y is an lm", data are both copies of an original object returned by model.frame(model.y) )
ymat.t <- model.matrix(terms(model.y), data=pred.data.t)
ymat.c <- model.matrix(terms(model.y), data=pred.data.c)
this isn't really a results based question, more of an approach/methods based question
You can call model.matrix from within C++, but you cannot do so in a multi-threaded way.
There will also be overhead, but if the function call is needed deep within the middle of your code, it could be worth it as a convenience.
Example:
// [[Rcpp::export]]
RObject call(RObject x, RObject y){
Environment env = Environment::global_env();
Function f = env["model.matrix"];
RObject res = f(x,y);
return res;
}

Is possible to define a random limit for a loop in JAGS?

I am trying to implement a Weibull proportional hazards model with a cure fraction following the approach outlined by Hui, Ibrahim and Sinha (1999) - A New Bayesian Model for Survival Data with a Surviving Fraction. However, I am not sure if it is possible to define a random limit for a looping in JAGS.
library(R2OpenBUGS)
library(rjags)
set.seed(1234)
censored <- c(1, 1)
time_mod <- c(NA, NA)
time_cens <- c(5, 7)
tau <- 4
design_matrix <- rbind(c(1, 0, 0, 0), c(1, 0.2, 0.2, 0.04))
jfun <- function() {
for(i in 1:nobs) {
censored[i] ~ dinterval(time_mod[i], time_cens[i])
time_mod[i] <- ifelse(N[i] == 0, tau, min(Z))
for (k in 1:N[i]){
Z[k] ~ dweib(1, 1)
}
N[i] ~ dpois(fc[i])
fc[i] <- exp(inprod(design_matrix[i, ], beta))
}
beta[1] ~ dnorm(0, 10)
beta[2] ~ dnorm(0, 10)
beta[3] ~ dnorm(0, 10)
beta[4] ~ dnorm(0, 10)
}
inits <- function() {
time_init <- rep(NA, length(time_mod))
time_init[which(!status)] <- time_cens[which(!status)] + 1
out <- list(beta = rnorm(4, 0, 10),
time_mod = time_init,
N = rpois(length(time_mod), 5))
return(out)
}
data_base <- list('time_mod' = time_mod, 'time_cens' = time_cens,
'censored' = censored, 'design_matrix' = design_matrix,
'tau' = tau,
'nobs' = length(time_cens[!is.na(time_cens)]))
tc1 <- textConnection("jmod", "w")
write.model(jfun, tc1)
close(tc1)
# Calling JAGS
tc2 <- textConnection(jmod)
j <- jags.model(tc2,
data = data_base,
inits = inits(),
n.chains = 1,
n.adapt = 1000)
I observed the below error:
Error in jags.model(tc2, data = data_base, inits = inits(), n.chains = 1, :
RUNTIME ERROR:
Compilation error on line 6.
Unknown variable N
Either supply values for this variable with the data
or define it on the left hand side of a relation.
I am not entirely certain, but I am pretty sure that you cannot declare a random number of nodes in BUGS in general, so it would not be a specific JAGS' quirk.
Nevertheless, you can get a way around that.
Since BUGS is a declarative language instead of a procedural one, it is enough to declare an arbitrary but deterministic number of nodes (let's say "large enough") and then associate only a random number of them with a distribution and with observed data, leaving the remaining nodes deterministic.
Once you have observed the maximum value of N[i] (let's say N.max), you can pass it as a parameter to JAGS and then change this code of yours:
for (k in 1:N[i]){
Z[k] ~ dweib(1, 1)
}
into this:
for (k in 1:N.max){
if (k <= N[i]){
Z[k] ~ dweib(1, 1)
} else {
Z[k] <- 0
}
}
I hope this will do the trick in your case. So please give feedback latter about it.
Needless to say, if you have some non-zero, observed data associated to a deterministic Z[k], then all hell breaks loose inside Jags...

ForkJoinPool for parallel processing

I am trying to run run some code 1 million times. I initially wrote it using Threads but this seemed clunky. I started doing some more reading and I came across ForkJoin. This seemed like exactly what I needed but I cant figure out how to translate what I have below into "scala-style". Can someone explain the best way to use ForkJoin in my code?
val l = (1 to 1000000) map {_.toLong}
println("running......be patient")
l.foreach{ x =>
if(x % 10000 == 0) println("got to: "+x)
val thread = new Thread {
override def run {
//my code (API calls) here. writes to file if call success
}
}
}
The easiest way is to use par (it will use ForkJoinPool automatically):
val l = (1 to 1000000) map {_.toLong} toList
l.par.foreach { x =>
if(x % 10000 == 0) println("got to: " + x) //will be executed in parallel way
//your code (API calls) here. will also be executed in parallel way (but in same thread with `println("got to: " + x)`)
}
Another way is to use Future:
import scala.concurrent._
import ExecutionContext.Implicits.global //import ForkJoinPool
val l = (1 to 1000000) map {_.toLong}
println("running......be patient")
l.foreach { x =>
if(x % 10000 == 0) println("got to: "+x)
Future {
//your code (API calls) here. writes to file if call success
}
}
If you need work stealing - you should mark blocking code with scala.concurrent.blocking:
Future {
scala.concurrent.blocking {
//blocking API call here
}
}
It will tell ForkJoinPool to compensate blocked thread with new one - so you can avoid thread starvation (but there is some disadvantages).
In Scala, you can use Future and Promise:
val l = (1 to 1000000) map {
_.toLong
}
println("running......be patient")
l.foreach { x =>
if (x % 10000 == 0) println("got to: " + x)
Future{
println(x)
}
}

Truly declarative language?

Does anyone know of a truly declarative language? The behavior I'm looking for is kind of what Excel does, where I can define variables and formulas, and have the formula's result change when the input changes (without having set the answer again myself)
The behavior I'm looking for is best shown with this pseudo code:
X = 10 // define and assign two variables
Y = 20;
Z = X + Y // declare a formula that uses these two variables
X = 50 // change one of the input variables
?Z // asking for Z should now give 70 (50 + 20)
I've tried this in a lot of languages like F#, python, matlab etc, but every time I tried this they come up with 30 instead of 70. Which is correct from an imperative point of view, but I'm looking for a more declarative behavior if you know what I mean.
And this is just a very simple calculation. When things get more difficult it should handle stuff like recursion and memoization automagically.
The code below would obviously work in C# but it's just so much code for the job, I'm looking for something a bit more to the point without all that 'technical noise'
class BlaBla{
public int X {get;set;} // this used to be even worse before 3.0
public int Y {get;set;}
public int Z {get{return X + Y;}}
}
static void main(){
BlaBla bla = new BlaBla();
bla.X = 10;
bla.Y = 20;
// can't define anything here
bla.X = 50; // bit pointless here but I'll do it anyway.
Console.Writeline(bla.Z);// 70, hurray!
}
This just seems like so much code, curly braces and semicolons that add nothing.
Is there a language/ application (apart from Excel) that does this? Maybe I'm no doing it right in the mentioned languages, or I've completely missed an app that does just this.
I prototyped a language/ application that does this (along with some other stuff) and am thinking of productizing it. I just can't believe it's not there yet. Don't want to waste my time.
Any Constraint Programming system will do that for you.
Examples of CP systems that have an associated language are ECLiPSe, SICSTUS Prolog / CP package, Comet, MiniZinc, ...
It looks like you just want to make Z store a function instead of a value. In C#:
var X = 10; // define and assign two variables
var Y = 20;
Func<int> Z = () => X + Y; // declare a formula that uses these two variables
Console.WriteLine(Z());
X = 50; // change one of the input variables
Console.WriteLine(Z());
So the equivalent of your ?-prefix syntax is a ()-suffix, but otherwise it's identical. A lambda is a "formula" in your terminology.
Behind the scenes, the C# compiler builds almost exactly what you presented in your C# conceptual example: it makes X into a field in a compiler-generated class, and allocates an instance of that class when the code block is entered. So congratulations, you have re-discovered lambdas! :)
In Mathematica, you can do this:
x = 10; (* # assign 30 to the variable x *)
y = 20; (* # assign 20 to the variable y *)
z := x + y; (* # assign the expression x+y to the variable z *)
Print[z];
(* # prints 30 *)
x = 50;
Print[z];
(* # prints 70 *)
The operator := (SetDelayed) is different from = (Set). The former binds an unevaluated expression to a variable, the latter binds an evaluated expression.
Wanting to have two definitions of X is inherently imperative. In a truly declarative language you have a single definition of a variable in a single scope. The behavior you want from Excel corresponds to editing the program.
Have you seen Resolver One? It's like Excel with a real programming language behind it.
Here is Daniel's example in Python, since I noticed you said you tried it in Python.
x = 10
y = 10
z = lambda: x + y
# Output: 20
print z()
x = 20
# Output: 30
print z()
Two things you can look at are the cells lisp library, and the Modelica dynamic modelling language, both of which have relation/equation capabilities.
There is a Lisp library with this sort of behaviour:
http://common-lisp.net/project/cells/
JavaFX will do that for you if you use bind instead of = for Z
react is an OCaml frp library. Contrary to naive emulations with closures it will recalculate values only when needed
Objective Caml version 3.11.2
# #use "topfind";;
# #require "react";;
# open React;;
# let (x,setx) = S.create 10;;
val x : int React.signal = <abstr>
val setx : int -> unit = <fun>
# let (y,sety) = S.create 20;;
val y : int React.signal = <abstr>
val sety : int -> unit = <fun>
# let z = S.Int.(+) x y;;
val z : int React.signal = <abstr>
# S.value z;;
- : int = 30
# setx 50;;
- : unit = ()
# S.value z;;
- : int = 70
You can do this in Tcl, somewhat. In tcl you can set a trace on a variable such that whenever it is accessed a procedure can be invoked. That procedure can recalculate the value on the fly.
Following is a working example that does more or less what you ask:
proc main {} {
set x 10
set y 20
define z {$x + $y}
puts "z (x=$x): $z"
set x 50
puts "z (x=$x): $z"
}
proc define {name formula} {
global cache
set cache($name) $formula
uplevel trace add variable $name read compute
}
proc compute {name _ op} {
global cache
upvar $name var
if {[info exists cache($name)]} {
set expr $cache($name)
} else {
set expr $var
}
set var [uplevel expr $expr]
}
main
Groovy and the magic of closures.
def (x, y) = [ 10, 20 ]
def z = { x + y }
assert 30 == z()
x = 50
assert 70 == z()
def f = { n -> n + 1 } // define another closure
def g = { x + f(x) } // ref that closure in another
assert 101 == g() // x=50, x + (x + 1)
f = { n -> n + 5 } // redefine f()
assert 105 == g() // x=50, x + (x + 5)
It's possible to add automagic memoization to functions too but it's a lot more complex than just one or two lines. http://blog.dinkla.net/?p=10
In F#, a little verbosily:
let x = ref 10
let y = ref 20
let z () = !x + !y
z();;
y <- 40
z();;
You can mimic it in Ruby:
x = 10
y = 20
z = lambda { x + y }
z.call # => 30
z = 50
z.call # => 70
Not quite the same as what you want, but pretty close.
not sure how well metapost (1) would work for your application, but it is declarative.
Lua 5.1.4 Copyright (C) 1994-2008 Lua.org, PUC-Rio
x = 10
y = 20
z = function() return x + y; end
x = 50
= z()
70
It's not what you're looking for, but Hardware Description Languages are, by definition, "declarative".
This F# code should do the trick. You can use lazy evaluation (System.Lazy object) to ensure your expression will be evaluated when actually needed, not sooner.
let mutable x = 10;
let y = 20;
let z = lazy (x + y);
x <- 30;
printf "%d" z.Value

Resources