Hi I wonder how to measure thread pool time in scala.
Here is a example.
val pool = java.util.concurrent.Executors.newFixedThreadPool(2)
val start_time = System.nanoTime()
1 to 10 foreach { x =>
pool.execute(
new Runnable {
def run {
try{
Thread.sleep(2000)
println("n: %s, thread: %s".format(x, Thread.currentThread.getId))
}finally{
pool.shutdown()
}
}
}
)
}
val end_time = System.nanoTime()
println("time is "+(end_time - start_time)/(1e6 * 60 * 60))
But I think this is not working properly.
Is there any methods to measure the time?
There are numerous threads in your snippet.
main thread where you've created fixed thread pool and execute the loop.
10 threads that are "sleeping" 2 seconds and printing some stuff.
Your main thread as soon as it create 10 threads finish its work and print time. It does not wait for all parallel threads to be completed.
What you have to do is await results from all threads and only then perform total time calculation.
I would suggest you to learn a bit about concept of Future which will allow you to wait for result properly.
So your code might looks like following:
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent._
import scala.concurrent.duration.Duration
val start_time = System.nanoTime()
val zz = 1 to 10 map { x =>
Future {
Thread.sleep(2000)
println("n: %s, thread: %s".format(x, Thread.currentThread.getId))
}
}
Await.result(Future.sequence(zz), Duration.Inf)
val end_time = System.nanoTime()
println("time is " + (end_time - start_time) / (1e6 * 60 * 60))
I've used default global scala thread pool.
Related
Im trying to count to 200000 in four threads, I want to count with the first both
threads to 200_000 and subtract after counting of the both first.
I know I can do it with join, but when im starting join it doesn't count because of the add function (counter++)
It look like this:
val s = Semaphore(1)
var count = 0
thread{
repeat(100_000){
s.acquire()
add()
s.release()
println("${Thread.currentThread().name} :$count")
}
}
thread{
repeat(100_000){
s.acquire()
subtract()
s.release()
println("${Thread.currentThread().name} :$count")
}
}
fun add(){
count++
}
fun subtract(){
count--
}
The functions do only add/subtract for the counter variable above the Threads
I am quite new to Scala but I am learning about Threads and Multithreading.
As the title says, I am trying to implement a way to divide the problem onto different threads of variable count.
We are given this code:
/** Executes the provided function for each entry in the input sequence in parallel.
*
* #param input the input sequence
* #param parallelism the number of threads to use
* #param f the function to run
*/
def parallelForeach[A](input: IndexedSeq[A], parallelism: Int, f: A => Unit): Unit = ???
I tried implementing it like this:
def parallelForeach[A](input: IndexedSeq[A], parallelism: Int, f: A => Unit): Unit = {
if (parallelism < 1) {
throw new IllegalArgumentException("a degree of parallelism < 1 s not allowed for parallel foreach")
}
val threads = (0 until parallelism).map { threadId =>
val startIndex = threadId * input.size / parallelism
val endIndex = (threadId + 1) * input.size / parallelism
val task: Runnable = () => {
(startIndex until endIndex).foreach { A =>
val key = input.grouped(input.size / parallelism)
val x: Unit = input.foreach(A => f(A))
x
}
}
new Thread(task)
}
threads.foreach(_.start())
threads.foreach(_.join())
}
for this test:
test("parallel foreach should perform the given function once for each element in the sequence") {
val counter = AtomicLong(0L)
parallelForeach((1 to 100), 16, counter.addAndGet(_))
assert(counter.get() == 5050)
But, as you can guess, it doesn't work this way as my result isn't 5050 but 505000.
Now here is my question. How do I implement a way to use multithreading efficiently, so there are for example 16 different threads working at the same time?
Check your test: "1 to 100".
With your Code you go with each thread through 100, this is why your result is 100 times to large.
If I do a code like this:
foreachRDD{ rdd =>
//operation1
val before = time.now()
val result = rdd.map(r=> //some operation)
val finalTime = time.now() - before
//operation1
val before2 = time.now()
val result2 = result.map(r=> //some operation)
val finalTime2 = time.now() - before2
....
//Some action
}
I think that finalTime and finalTime2 are executed in the driver and they give me the real time to execute each of these operations, am I right? or these operations where are really executed?
I think you can use the time function, but only available only after 2.1.0 (You can add manually for the lower versions.)
val spark = SparkSession
.builder()
.appName("Spark test")
.master("local[*]")
.getOrCreate()
val df = ???
spark.time(df.show()) //some block of operation here
You can see here
/**
* Executes some code block and prints to stdout the time taken to execute the block. This is
* available in Scala only and is used primarily for interactive testing and debugging.
*
* #since 2.1.0
*/
def time[T](f: => T): T = {
val start = System.nanoTime()
val ret = f
val end = System.nanoTime()
// scalastyle:off println
println(s"Time taken: ${(end - start) / 1000 / 1000} ms")
// scalastyle:on println
ret
}
I used Future to implement a multi-thread function in scala language. But when the future number lager than cpu core number, the threads were splited to groups. And the threads in one group completed, then the other threads in other groups started. My code and output were listed below. Is there something wrong in my code, and how to fix it?
import scala.collection.mutable._
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent._
import scala.concurrent.duration._
import scala.language.postfixOps
object FutrueTest {
def main(args: Array[String]) {
val threads=10
def ft(): Future[String] = Future {
for (i <- 1 until 3) {
Thread.sleep(1000)
println(Thread.currentThread().getName + "\t" + i)
}
Thread.currentThread().getName + " end..."
}
var fs = Set[Future[String]]()
for (j <- 1 until threads) {
val f = ft
f.onComplete {
case _ => "Thread :" + j + " complete"
}
fs += f
}
fs.foreach(f => {
Await.ready(f, Duration.Inf)
})
}
}
output in terminal
ForkJoinPool-1-worker-13 1
ForkJoinPool-1-worker-15 1
ForkJoinPool-1-worker-11 1
ForkJoinPool-1-worker-1 1
ForkJoinPool-1-worker-3 1
ForkJoinPool-1-worker-7 1
ForkJoinPool-1-worker-9 1
ForkJoinPool-1-worker-5 1
ForkJoinPool-1-worker-1 2
ForkJoinPool-1-worker-15 2
ForkJoinPool-1-worker-9 2
ForkJoinPool-1-worker-3 2
ForkJoinPool-1-worker-11 2
ForkJoinPool-1-worker-13 2
ForkJoinPool-1-worker-7 2
ForkJoinPool-1-worker-5 2
ForkJoinPool-1-worker-15 1
ForkJoinPool-1-worker-15 2
Process finished with exit code 0
You can create your own execution context.
import java.util.concurrent.Executors
import scala.concurrent.ExecutionContext
object CustomExecutionContext {
private val availableProcessors = Runtime.getRuntime.availableProcessors()
implicit val nDExecutionContext = ExecutionContext.fromExecutor(Executors.newFixedThreadPool(availableProcessors * N)) // N is number of threads
}
Another solution: Control the number of future execution in parallel using FixedThreadPool. It will start 10 futures first and then start others after completing these.
implicit val ec = ExecutionContext.fromExecutor(java.util.concurrent.Executors.newFixedThreadPool(10))
Third solution: You can use throttle execution context. Refer
Use this instead of global execution context.
implicit val ec = ThrottledExecutionContext(maxConcurrents = 4)(scala.concurrent.ExecutionContext.global)
It will limit the parallelism.
4th solution: You can use akka fsm to throttle.
I am trying to run run some code 1 million times. I initially wrote it using Threads but this seemed clunky. I started doing some more reading and I came across ForkJoin. This seemed like exactly what I needed but I cant figure out how to translate what I have below into "scala-style". Can someone explain the best way to use ForkJoin in my code?
val l = (1 to 1000000) map {_.toLong}
println("running......be patient")
l.foreach{ x =>
if(x % 10000 == 0) println("got to: "+x)
val thread = new Thread {
override def run {
//my code (API calls) here. writes to file if call success
}
}
}
The easiest way is to use par (it will use ForkJoinPool automatically):
val l = (1 to 1000000) map {_.toLong} toList
l.par.foreach { x =>
if(x % 10000 == 0) println("got to: " + x) //will be executed in parallel way
//your code (API calls) here. will also be executed in parallel way (but in same thread with `println("got to: " + x)`)
}
Another way is to use Future:
import scala.concurrent._
import ExecutionContext.Implicits.global //import ForkJoinPool
val l = (1 to 1000000) map {_.toLong}
println("running......be patient")
l.foreach { x =>
if(x % 10000 == 0) println("got to: "+x)
Future {
//your code (API calls) here. writes to file if call success
}
}
If you need work stealing - you should mark blocking code with scala.concurrent.blocking:
Future {
scala.concurrent.blocking {
//blocking API call here
}
}
It will tell ForkJoinPool to compensate blocked thread with new one - so you can avoid thread starvation (but there is some disadvantages).
In Scala, you can use Future and Promise:
val l = (1 to 1000000) map {
_.toLong
}
println("running......be patient")
l.foreach { x =>
if (x % 10000 == 0) println("got to: " + x)
Future{
println(x)
}
}