Parallelizing recursive branching computations in scala using futures

Parallelizing recursive branching computations in scala using futures - multithreading

I am trying to parallelize a SAT solver in Scala using Futures.
The algorithm for solving a SAT problem is loosely like (pseudo-code):
def has_solution(x):
if x is a solution:
return true
else if x is not a solution:
return false
else:
x1 = left_branch(x)
x2 = right_branch(x)
return has_solution(x1) or has_solution(x2)
So I see an opportunity to parallelize the computation whenever I branch the problem.
How can I do this with Futures? I need to wait for results from has_solution(x1) and has_solution(x2), and:
Return true as soon as either branch returns true
Return false if both branches return false
My current approach is the following:
object DPLL {
def apply(formula: Formula): Future[Boolean] = {
var tmp = formula
if (tmp.isEmpty) {
Future { true }
} else if (tmp.hasEmptyClause) {
Future { false }
} else {
for (unitClause <- tmp.unitClauses) tmp = tmp.propagateUnit(unitClause);
for (pureLiteral <- tmp.pureLiterals) tmp = tmp.assign(pureLiteral);
if (tmp.isEmpty())
Future { true }
else if (tmp.hasEmptyClause)
Future { false }
else {
val nextLiteral = tmp.chooseLiteral
Here is where branching takes place and where I'd like to wait for the computations as described above:
for (f1 <- DPLL(tmp.assign(nextLiteral));
f2 <- DPLL(tmp.assign(-nextLiteral)))
yield (f1 || f2)
}
}
}
}
This looks wrong when I run it because I can never achieve full use of my cores (8).
I have an intuition I should not be using futures for this kind of computation. Maybe futures are suited just for asynchronous computations. Should I try some lower-level threading or actor-based approach for this? Thanks.

This code works sequentially because of for block! Computation of f2 starts after computation of f1 finished.
for {
f1 <- DPLL(tmp.assign(nextLiteral))
f2 <- DPLL(tmp.assign(-nextLiteral))
} yield f1 || f2
Above block translates to following flatMap/map sequence and what flatMap/map does is to run the function after value is present.
DPLL(tmp.assign(nextLiteral)).flatMap(f1 =>
DPLL(tmp.assign(-nextLiteral)).map(f2 =>
f1 || f2)
One easy trick for starting computations in parallel is assigning them to a value and access that value in for comprehension
val comp1 = DPLL(tmp.assign(nextLiteral))
val comp2 = DPLL(tmp.assign(-nextLiteral))
for {
f1 <- comp1
f2 <- comp1
} yield f1 || f2

Related

How to call model.matrix or equivalent from RCPP, possibly in threaded code?

we were hoping to use threads to get things going faster in an algorithm with many loops whose results are not interdependent.
within the code we hoped to port to rcpp, there is a call to model.matrix.
This did not appear straightforward to port.
Investigating this further (as to what code this runs for our use case), revealed that the S3 method for lm objects does some preparatory work on the variable and then calls the default version of the function as can be seen in this copy-paste of the code:
function (object, ...)
{
if (n_match <- match("x", names(object), 0L))
object[[n_match]]
else {
data <- model.frame(object, xlev = object$xlevels, ...)
if (exists(".GenericCallEnv", inherits = FALSE))
NextMethod("model.matrix", data = data, contrasts.arg = object$contrasts)
else {
dots <- list(...)
dots$data <- dots$contrasts.arg <- NULL
do.call("model.matrix.default", c(list(object = object,
data = data, contrasts.arg = object$contrasts),
dots))
}
}
}
the default version of the function farms at least some of its functionality out to a compiled C function:
function (object, data = environment(object), contrasts.arg = NULL,
xlev = NULL, ...) {
t <- if (missing(data))
terms(object)
else terms(object, data = data)
if (is.null(attr(data, "terms")))
data <- model.frame(object, data, xlev = xlev)
else {
reorder <- match(vapply(attr(t, "variables"), deparse2,
"")[-1L], names(data))
if (anyNA(reorder))
stop("model frame and formula mismatch in model.matrix()")
if (!identical(reorder, seq_len(ncol(data))))
data <- data[, reorder, drop = FALSE]
}
int <- attr(t, "response")
if (length(data)) {
contr.funs <- as.character(getOption("contrasts"))
namD <- names(data)
for (i in namD) if (is.character(data[[i]]))
data[[i]] <- factor(data[[i]])
isF <- vapply(data, function(x) is.factor(x) || is.logical(x),
NA)
isF[int] <- FALSE
isOF <- vapply(data, is.ordered, NA)
for (nn in namD[isF]) if (is.null(attr(data[[nn]], "contrasts")))
contrasts(data[[nn]]) <- contr.funs[1 + isOF[nn]]
if (!is.null(contrasts.arg)) {
if (!is.list(contrasts.arg))
warning("non-list contrasts argument ignored")
else {
if (is.null(namC <- names(contrasts.arg)))
stop("'contrasts.arg' argument must be named")
for (nn in namC) {
if (is.na(ni <- match(nn, namD)))
warning(gettextf("variable '%s' is absent, its contrast will be ignored",
nn), domain = NA)
else {
ca <- contrasts.arg[[nn]]
if (is.matrix(ca))
contrasts(data[[ni]], ncol(ca)) <- ca
else contrasts(data[[ni]]) <- contrasts.arg[[nn]]
}
}
}
}
}
else {
isF <- FALSE
data[["x"]] <- raw(nrow(data))
}
ans <- .External2(C_modelmatrix, t, data)
if (any(isF))
attr(ans, "contrasts") <- lapply(data[isF], attr,
"contrasts")
ans
}
is there some way of calling C_modelmatrix from Rcpp at all, whether it is single OR multi-threaded? Is there any library or package that does essentially the same thing from within Rcpp so I don't have to reinvent the wheel here? I'd rather not have to fully re-implement everything that model.matrix does if I can avoid it.
as we don't actually have functioning code, there isn't any to show for this yet.
The relevant portion of the function we were trying to speed up calls model.matrix like this: ("model.y is an lm", data are both copies of an original object returned by model.frame(model.y) )
ymat.t <- model.matrix(terms(model.y), data=pred.data.t)
ymat.c <- model.matrix(terms(model.y), data=pred.data.c)
this isn't really a results based question, more of an approach/methods based question

You can call model.matrix from within C++, but you cannot do so in a multi-threaded way.
There will also be overhead, but if the function call is needed deep within the middle of your code, it could be worth it as a convenience.
Example:
// [[Rcpp::export]]
RObject call(RObject x, RObject y){
Environment env = Environment::global_env();
Function f = env["model.matrix"];
RObject res = f(x,y);
return res;
}

Implicit class holding mutable variable in multithreaded environment

I need to implement a parallel method, which takes two computation blocks, a and b, and starts each of them in a new thread. The method must return a tuple with the result values of both the computations. It should have the following signature:
def parallel[A, B](a: => A, b: => B): (A, B)
I managed to solve the exercise by using straight Java-like approach. Then I decided to make up a solution with implicit class. Here's it:
object ParallelApp extends App {
implicit class ParallelOps[A](a: => A) {
var result: A = _
def spawn(): Unit = {
val thread = new Thread {
override def run(): Unit = {
result = a
}
}
thread.start()
thread.join()
}
}
def parallel[A, B](a: => A, b: => B): (A, B) = {
a.spawn()
b.spawn()
(a.result, b.result)
}
println(parallel(1 + 2, "a" + "b"))
}
For unknown reason, I receive output (null,null). Could you please point me out where is the problem?

Spoiler alert: It's not complicated. It's funny, like a magic trick (if you consider reading the documentation about Java Memory Model "funny", that is). If you haven't figured it out yet, I would highly recommend to try to figure it out, otherwise it won't be funny. Someone should make a "division-by-zero proves 2 = 4"-riddle out of it.
Consider the following shorter example:
implicit class Foo[A](a: A) {
var result: String = "not initialized"
def computeResult(): Unit = result = "Yay, result!"
}
val a = "a string"
a.computeResult()
println(a.result)
When run, it prints
not initialized
despite the fact that we invoked computeResult() and set result to "Yay, result!". The problem is that the two invocations a.computeResult() and a.result belong to two completely independent instances of Foo. The implicit conversion is performed twice, and the second implicitly created object doesn't know anything about the changes in the first implicitly created object. It has nothing to do with threads or JMM at all.
By the way: your code is not parallel. Calling join right after calling start doesn't bring you anything, your main thread will simply go idle and wait until another thread finishes. At no point will there be two threads that do any useful work concurrently.

EDIT: Fixed a bug pointed out by Andrey Tyukin
One way to solve your problem is to use Scala Futures
Documentation. Tutorial.
Useful Klang Blog.
You'll typically need some combination of these libraries:
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.{Await, Future}
import scala.util.{Failure, Success}
import scala.concurrent.duration._
an asynchronous example:
def parallelAsync[A,B](a: => A, b: => B): Future[(A,B)] = {
// as per Andrey Tyukin's comments, this line runs
// the two futures sequentially and we do not get
// any benefit from it. I will leave this line here
// so others will not fall in my trap
//for {i <- Future(a); j <- Future(b) } yield (i,j)
Future(a) zip Future(b)
}
parallelAsync(1 + 2, "a" + "b").onComplete {
case Success(x) => println(x)
case Failure(e) => e.printStackTrace()
}
If you must block until both are complete, you can use this:
def parallelSync[A,B](a: => A, b: => B): (A,B) = {
// see comment above
//val f = for { i <- Future(a); j <- Future(b) } yield (i,j)
val tuple = Future(a) zip Future(b)
Await.result(tuple, 5 second)
}
println(parallelSync(3 + 4, "c" + "d"))
When running these little examples, don't forget to sleep a little bit at the end so the program won't end before the results come back
Thread.sleep(3000)

Why am I getting a race condition in multi-threading scala?

I am trying to parallelise a p-norm calculation over an array.
To achieve that I try the following, I understand I can solve this differently but I am interested in understanding where the race condition is occurring,
val toSum = Array(0,1,2,3,4,5,6)
// Calculate the sum over a segment of an array
def sumSegment(a: Array[Int], p:Double, s: Int, t: Int): Int = {
val res = {for (i <- s until t) yield scala.math.pow(a(i), p)}.reduceLeft(_ + _)
res.toInt
}
// Calculate the p-norm over an Array a
def parallelpNorm(a: Array[Int], p: Double): Double = {
var acc = 0L
// The worker who should calculate the sum over a slice of an array
class sumSegmenter(s: Int, t: Int) extends Thread {
override def run() {
// Calculate the sum over the slice
val subsum = sumSegment(a, p, s, t)
// Add the sum of the slice to the accumulator in a synchronized fashion
val x = new AnyRef{}
x.synchronized {
acc = acc + subsum
}
}
}
val split = a.size / 2
val seg_one = new sumSegmenter(0, split)
val seg_two = new sumSegmenter(split, a.size)
seg_one.start
seg_two.start
seg_one.join
seg_two.join
scala.math.pow(acc, 1.0 / p)
}
println(parallelpNorm(toSum, 2))
Expected output is 9.5393920142 but instead some runs give me 9.273618495495704 or even 2.23606797749979.
Any recommendations where the race condition could happen?

The problem has been explained in the previous answer, but a better way to avoid this race condition and improve performance is to use an AtomicInteger
// Calculate the p-norm over an Array a
def parallelpNorm(a: Array[Int], p: Double): Double = {
val acc = new AtomicInteger(0)
// The worker who should calculate the sum over a slice of an array
class sumSegmenter(s: Int, t: Int) extends Thread {
override def run() {
// Calculate the sum over the slice
val subsum = sumSegment(a, p, s, t)
// Add the sum of the slice to the accumulator in a synchronized fashion
acc.getAndAdd(subsum)
}
}
val split = a.length / 2
val seg_one = new sumSegmenter(0, split)
val seg_two = new sumSegmenter(split, a.length)
seg_one.start()
seg_two.start()
seg_one.join()
seg_two.join()
scala.math.pow(acc.get, 1.0 / p)
}
Modern processors can do atomic operations without blocking which can be much faster than explicit synchronisation. In my tests this runs twice as fast as the original code (with correct placement of x)

Move val x = new AnyRef{} outside sumSegmenter (that is, into parallelpNorm) -- the problem is that each thread is using its own mutex rather than sharing one.

Nested comprehension in Kotlin

Suppose I have the following nested for loop:
val test = mutableSetOf<Set<Int>>()
for (a in setA) {
for (b in setB) {
if (a.toString().slice(2..3) == b.toString().slice(0..1)) {
test.add(setOf(a,b))
}
}
}
In python, I could do a simple comprehension as
test = {[a,b] for a in setA for b in setB if a.str()[2:3] == b.str[0:1]}
I'm having a helluva time converting this to Kotlin syntax. I know for a single for loop with a conditional, I could use a filter and map to get the desired results (using the idiom: newSet = oldSet.filter{ conditional }.map { it }, but I cannot for the life of me figure out how to do the nesting this way.

This is what IDEA proposes:
for (a in setA)
setB
.filter { a.toString().slice(2..3) == it.toString().slice(0..1) }
.mapTo(test) { setOf(a, it) }
I do not think there is much to do about it. I think their is no native approach that is similar to the Python one, but it already actually is in terms of length very similar because only the functions and their names make it that long.
If we take a look a this hypothetical example:
for (a in setA) setB.f { a.t().s(2..3) == it.t().s(0..1) }.m(test) { setOf(a, it) }
It is not far from the Python example. The Python syntax is just very different.
(functions for that hypothesis)
fun <T> Iterable<T>.f(predicate: (T) -> Boolean) = filter(predicate)
fun String.s(range: IntRange) = slice(range)
fun <T, R, C : MutableCollection<in R>> Iterable<T>.m(destination: C, transform: (T) -> R) = mapTo(destination, transform)
fun Int.t() = toString()

If Kotlin doesn't have it, add it. Here is a cartesian product of the two sets as a sequence:
fun <F,S> Collection<F>.cartesian(other: Collection<S>): Sequence<Pair<F,S>> =
this.asSequence().map { f -> other.asSequence().map { s-> f to s } }.flatten()
Then use that in one of many ways:
// close to your original nested loop version:
setA.cartesian(setB).filter { (a,b) ->
a.toString().slice(2..3) == b.toString().slice(0..1)
}.forEach{ (a,b) -> test.add(setOf(a,b)) }
// or, add the pair instead of a set if that makes sense as alternative
setA.cartesian(setB).filter { (a,b) ->
a.toString().slice(2..3) == b.toString().slice(0..1)
}.forEach{ test2.add(it) }
// or, add the results of the full expression to the set at once
test.addAll(setA.cartesian(setB).filter { (a,b) ->
a.toString().slice(2..3) == b.toString().slice(0..1)
}.map { (a,b) -> setOf(a,b) } )
// or, the same as the last using a pair instead of 2 member set
test2.addAll(setA.cartesian(setB).filter { (a,b) ->
a.toString().slice(2..3) == b.toString().slice(0..1)
})
The above examples use these variables:
val test = mutableSetOf<Set<Int>>()
val test2 = mutableSetOf<Pair<Int,Int>>()
val setA = setOf<Int>()
val setB = setOf<Int>()

Groovier way of manipulating the list

I have two list like this :
def a = [100,200,300]
def b = [30,60,90]
I want the Groovier way of manipulating the a like this :
1) First element of a should be changed to a[0]-2*b[0]
2)Second element of a should be changed to a[1]-4*b[1]
3)Third element of a should be changed to a[2]-8*b[2]
(provided that both a and b will be of same length of 3)
If the list changed to map like this, lets say:
def a1 = [100:30, 200:60, 300:90]
how one could do the same above operation in this case.
Thanks in advance.

For List, I'd go with:
def result = []
a.eachWithIndex{ item, index ->
result << item - ((2**index) * b[index])
}
For Map it's a bit easier, but still requires an external state:
int i = 1
def result = a.collect { k, v -> k - ((2**i++) * v) }
A pity, Groovy doesn't have an analog for zip, in this case - something like zipWithIndex or collectWithIndex.

Using collect
In response to Victor in the comments, you can do this using a collect
def a = [100,200,300]
def b = [30,60,90]
// Introduce a list `c` of the multiplier
def c = (1..a.size()).collect { 2**it }
// Transpose these lists together, and calculate
[a,b,c].transpose().collect { x, y, z ->
x - y * z
}
Using inject
You can also use inject, passing in a map of multiplier and result, then fetching the result out at the end:
def result = [a,b].transpose().inject( [ mult:2, result:[] ] ) { acc, vals ->
acc.result << vals.with { av, bv -> av - ( acc.mult * bv ) }
acc.mult *= 2
acc
}.result
And similarly, you can use inject for the map:
def result = a1.inject( [ mult:2, result:[] ] ) { acc, key, val ->
acc.result << key - ( acc.mult * val )
acc.mult *= 2
acc
}.result
Using inject has the advantage that you don't need external variables declared, but has the disadvantage of being harder to read the code (and as Victor points out in the comments, this makes static analysis of the code hard to impossible for IDEs and groovypp)

def a1 = [100:30, 200:60, 300:90]
a1.eachWithIndex{item,index ->
println item.key-((2**(index+1))*item.value)
i++
}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Parallelizing recursive branching computations in scala using futures - multithreading

Related

How to call model.matrix or equivalent from RCPP, possibly in threaded code?

Implicit class holding mutable variable in multithreaded environment

Why am I getting a race condition in multi-threading scala?

Nested comprehension in Kotlin

Groovier way of manipulating the list

Categories

Resources