Scala: thread is unable to access variable outside of thread? - multithreading

object Test extends App {
val x = 0
val threadHello = (1 to 5).map(_ => new Thread(() => {
println("Hello")
println(x) // this results in join never resolving "collecting data"
}))
threadHello.foreach(_.start())
threadHello.foreach(_.join())
println(x)
}
I'm still learning about concurrency in Scala but I'm having an issue where thread.join() never resolves and the program ends up running forever, UNLESS I comment out the println(x) statement.
Debugging reveals that the thread is never able to access the value of x, and I'm not sure why this is an issue.
The problem highlighted when debugging in IntelliJ

Your code actually runs just fine for me, in Scala 2.13.
That said, I suspect your problem has to do with the initialization order of vals in scala.App. If so, you should be able to work around it by making x lazy, like so:
object Test extends App {
lazy val x = 0
val threadHello = (1 to 5).map(_ => new Thread(() => {
println("Hello")
println(x) // this results in join never resolving "collecting data"
}))
threadHello.foreach(_.start())
threadHello.foreach(_.join())
println(x)
}
Alternatively, just don't use scala.App:
object Main {
def main(args: Array[String]) {
val x = 0
val threadHello = (1 to 5).map(_ => new Thread(() => {
println("Hello")
println(x) // this results in join never resolving "collecting data"
}))
threadHello.foreach(_.start())
threadHello.foreach(_.join())
println(x)
}
}

Related

Count written records in Spark with threads

I am using onTaskEnd Spark listener to get the number of records written into file like this:
import spark.implicits._
import org.apache.spark.sql._
import org.apache.spark.scheduler.{SparkListener, SparkListenerTaskEnd}
var recordsWritten: Long = 0L
val rowCountListener: SparkListener = new SparkListener() {
override def onTaskEnd(taskEnd: SparkListenerTaskEnd) {
synchronized {
recordsWritten += taskEnd.taskMetrics.outputMetrics.recordsWritten
}
}
}
def rowCountOf(proc: => Unit): Long = {
recordsWritten = 0L
spark.sparkContext.addSparkListener(rowCountListener)
try {
proc
} finally {
spark.sparkContext.removeSparkListener(rowCountListener)
}
recordsWritten
}
val rc = rowCountOf { (1 to 100).toDF.write.csv(s"test.csv") }
println(rc)
=> 100
However trying to run multiple actions in threads this obviously breaks:
Seq(1, 2, 3).par.foreach { i =>
val rc = rowCountOf { (1 to 100).toDF.write.csv(s"test${i}.csv") }
println(rc)
}
=> 600
=> 700
=> 750
I can have each thread declare its own variable, but spark context is still shared and I am unable to reckognize to which thread does specific SparkListenerTaskEnd event belong to. Is there any way to make it work?
(Right, maybe I could just make it separate spark jobs. But it's just a single piece of the program, so for the sake of simplicity I would prefer to stay with threads. In the worst case I'll just execute it serially or forget about counting records...)
A bit hackish but you could use accumulators as a filtering side-effect
val acc = spark.sparkContext.longAccumulator("write count")
df.filter { _ =>
acc.add(1)
true
}.write.csv(...)
println(s"rows written ${acc.count}")

What's the difference between the output of the following concurrent Scala programs

// First
import concurrent.Future
import concurrent.ExecutionContext.Implicits.global
for {
_ <- Future { Thread.sleep(3000); println("a") }
_ <- Future { Thread.sleep(2000); println("b") }
_ <- Future { Thread.sleep(1000); println("c") }
} {}
// Second

import concurrent.Future
import concurrent.ExecutionContext.Implicits.global
val future1 = Future { Thread.sleep(3000); println("a") }
val future2 = Future { Thread.sleep(2000); println("b") }
val future3 = Future { Thread.sleep(1000); println("c") }
for {
_ <- future1
_ <- future2
_ <- future3
} {}

You can use the command below to have a better understanding of what Scala compiler do under the hood:
$ scalac -Xprint:typer MainClass.scala
'First' will be desugared into:
scala.concurrent.Future.apply[Unit]({
java.lang.Thread.sleep(3000L); scala.Predef.println("a")
})(scala.concurrent.ExecutionContext.Implicits.global)
.foreach[Unit](((_: Unit) =>
scala.concurrent.Future.apply[Unit]({
java.lang.Thread.sleep(2000L); scala.Predef.println("b")
})(scala.concurrent.ExecutionContext.Implicits.global)
.foreach[Unit](((_: Unit) => scala.concurrent.Future.apply[Unit]({
java.lang.Thread.sleep(1000L); scala.Predef.println("c")
})(scala.concurrent.ExecutionContext.Implicits.global)
.foreach[Unit](((_: Unit) => ()))(scala.concurrent.ExecutionContext.Implicits.global)))(scala.concurrent.ExecutionContext.Implicits.global)))
(scala.concurrent.ExecutionContext.Implicits.global);
'Second' into
val future1: scala.concurrent.Future[Unit] = scala.concurrent.Future.apply[Unit]({
java.lang.Thread.sleep(3000L);
scala.Predef.println("a")
})(scala.concurrent.ExecutionContext.Implicits.global);
val future2: scala.concurrent.Future[Unit] = scala.concurrent.Future.apply[Unit]({
java.lang.Thread.sleep(2000L);
scala.Predef.println("b")
})(scala.concurrent.ExecutionContext.Implicits.global);
val future3: scala.concurrent.Future[Unit] = scala.concurrent.Future.apply[Unit]({
java.lang.Thread.sleep(1000L);
scala.Predef.println("c")
})(scala.concurrent.ExecutionContext.Implicits.global);
{
future1.flatMap[Unit](((_: Unit) => future2.flatMap[Unit](((_: Unit) => future3.map[Unit](((_: Unit) => ()))(scala.concurrent.ExecutionContext.Implicits.global)))(scala.concurrent.ExecutionContext.Implicits.global)))(scala.concurrent.ExecutionContext.Implicits.global);
()
}
In 'First' case the next Future will be created inside the '.foreach' of the first Future and so on.
In 'Second' case all 3 futures will be created first, executed in parallel and then flatMap'ped.
Since the for expression is desugared to a series of nested flatMap/map calls, the Future instances in the first examples will run sequentially.
While the code in the second example will, depending on the ExecutionContext in scope, run the Future instances in parallel.
Another thing to keep in mind is that scala Futures are strict, which means that you can't separate Future definition from its execution. You can read about this and other Future weaknesses here:
Scala Futures vs Monix Tasks

how to cap kotlin coroutines maximum concurrency

I've got a Sequence (from File.walkTopDown) and I need to run a long-running operation on each of them. I'd like to use Kotlin best practices / coroutines, but I either get no parallelism, or way too much parallelism and hit a "too many open files" IO error.
File("/Users/me/Pictures/").walkTopDown()
.onFail { file, ex -> println("ERROR: $file caused $ex") }
.filter { ... only big images... }
.map { file ->
async { // I *think* I want async and not "launch"...
ImageProcessor.fromFile(file)
}
}
This doesn't seem to run it in parallel, and my multi-core CPU never goes above 1 CPU's worth. Is there a way with coroutines to run "NumberOfCores parallel operations" worth of Deferred jobs?
I looked at Multithreading using Kotlin Coroutines which first creates ALL the jobs then joins them, but that means completing the Sequence/file tree walk completly bfore the heavy processing join step, and that seems... iffy! Splitting it into a collect and a process step means the collection could run way ahead of the processing.
val jobs = ... the Sequence above...
.toSet()
println("Found ${jobs.size}")
jobs.forEach { it.await() }
This isn't specific to your problem, but it does answer the question of, "how to cap kotlin coroutines maximum concurrency".
EDIT: As of kotlinx.coroutines 1.6.0 (https://github.com/Kotlin/kotlinx.coroutines/issues/2919), you can use limitedParallelism, e.g. Dispatchers.IO.limitedParallelism(123).
Old solution: I thought to use newFixedThreadPoolContext at first, but 1) it's deprecated and 2) it would use threads and I don't think that's necessary or desirable (same with Executors.newFixedThreadPool().asCoroutineDispatcher()). This solution might have flaws I'm not aware of by using Semaphore, but it's very simple:
import kotlinx.coroutines.async
import kotlinx.coroutines.awaitAll
import kotlinx.coroutines.coroutineScope
import kotlinx.coroutines.sync.Semaphore
import kotlinx.coroutines.sync.withPermit
/**
* Maps the inputs using [transform] at most [maxConcurrency] at a time until all Jobs are done.
*/
suspend fun <TInput, TOutput> Iterable<TInput>.mapConcurrently(
maxConcurrency: Int,
transform: suspend (TInput) -> TOutput,
) = coroutineScope {
val gate = Semaphore(maxConcurrency)
this#mapConcurrently.map {
async {
gate.withPermit {
transform(it)
}
}
}.awaitAll()
}
Tests (apologies, it uses Spek, hamcrest, and kotlin test):
import kotlinx.coroutines.ExperimentalCoroutinesApi
import kotlinx.coroutines.delay
import kotlinx.coroutines.launch
import kotlinx.coroutines.runBlocking
import kotlinx.coroutines.test.TestCoroutineDispatcher
import org.hamcrest.MatcherAssert.assertThat
import org.hamcrest.Matchers.greaterThanOrEqualTo
import org.hamcrest.Matchers.lessThanOrEqualTo
import org.spekframework.spek2.Spek
import org.spekframework.spek2.style.specification.describe
import java.util.concurrent.atomic.AtomicInteger
import kotlin.test.assertEquals
#OptIn(ExperimentalCoroutinesApi::class)
object AsyncHelpersKtTest : Spek({
val actionDelay: Long = 1_000 // arbitrary; obvious if non-test dispatcher is used on accident
val testDispatcher = TestCoroutineDispatcher()
afterEachTest {
// Clean up the TestCoroutineDispatcher to make sure no other work is running.
testDispatcher.cleanupTestCoroutines()
}
describe("mapConcurrently") {
it("should run all inputs concurrently if maxConcurrency >= size") {
val concurrentJobCounter = AtomicInteger(0)
val inputs = IntRange(1, 2).toList()
val maxConcurrency = inputs.size
// https://github.com/Kotlin/kotlinx.coroutines/issues/1266 has useful info & examples
runBlocking(testDispatcher) {
print("start runBlocking $coroutineContext\n")
// We have to run this async so that the code afterwards can advance the virtual clock
val job = launch {
testDispatcher.pauseDispatcher {
val result = inputs.mapConcurrently(maxConcurrency) {
print("action $it $coroutineContext\n")
// Sanity check that we never run more in parallel than max
assertThat(concurrentJobCounter.addAndGet(1), lessThanOrEqualTo(maxConcurrency))
// Allow for virtual clock adjustment
delay(actionDelay)
// Sanity check that we never run more in parallel than max
assertThat(concurrentJobCounter.getAndAdd(-1), lessThanOrEqualTo(maxConcurrency))
print("action $it after delay $coroutineContext\n")
it
}
// Order is not guaranteed, thus a Set
assertEquals(inputs.toSet(), result.toSet())
print("end mapConcurrently $coroutineContext\n")
}
}
print("before advanceTime $coroutineContext\n")
// Start the coroutines
testDispatcher.advanceTimeBy(0)
assertEquals(inputs.size, concurrentJobCounter.get(), "All jobs should have been started")
testDispatcher.advanceTimeBy(actionDelay)
print("after advanceTime $coroutineContext\n")
assertEquals(0, concurrentJobCounter.get(), "All jobs should have finished")
job.join()
}
}
it("should run one at a time if maxConcurrency = 1") {
val concurrentJobCounter = AtomicInteger(0)
val inputs = IntRange(1, 2).toList()
val maxConcurrency = 1
runBlocking(testDispatcher) {
val job = launch {
testDispatcher.pauseDispatcher {
inputs.mapConcurrently(maxConcurrency) {
assertThat(concurrentJobCounter.addAndGet(1), lessThanOrEqualTo(maxConcurrency))
delay(actionDelay)
assertThat(concurrentJobCounter.getAndAdd(-1), lessThanOrEqualTo(maxConcurrency))
it
}
}
}
testDispatcher.advanceTimeBy(0)
assertEquals(1, concurrentJobCounter.get(), "Only one job should have started")
val elapsedTime = testDispatcher.advanceUntilIdle()
print("elapsedTime=$elapsedTime")
assertThat(
"Virtual time should be at least as long as if all jobs ran sequentially",
elapsedTime,
greaterThanOrEqualTo(actionDelay * inputs.size)
)
job.join()
}
}
it("should handle cancellation") {
val jobCounter = AtomicInteger(0)
val inputs = IntRange(1, 2).toList()
val maxConcurrency = 1
runBlocking(testDispatcher) {
val job = launch {
testDispatcher.pauseDispatcher {
inputs.mapConcurrently(maxConcurrency) {
jobCounter.addAndGet(1)
delay(actionDelay)
it
}
}
}
testDispatcher.advanceTimeBy(0)
assertEquals(1, jobCounter.get(), "Only one job should have started")
job.cancel()
testDispatcher.advanceUntilIdle()
assertEquals(1, jobCounter.get(), "Only one job should have run")
job.join()
}
}
}
})
Per https://play.kotlinlang.org/hands-on/Introduction%20to%20Coroutines%20and%20Channels/09_Testing, you may also need to adjust compiler args for the tests to run:
compileTestKotlin {
kotlinOptions {
// Needed for runBlocking test coroutine dispatcher?
freeCompilerArgs += "-Xuse-experimental=kotlin.Experimental"
freeCompilerArgs += "-Xopt-in=kotlin.RequiresOptIn"
}
}
testImplementation 'org.jetbrains.kotlinx:kotlinx-coroutines-test:1.4.1'
The problem with your first snippet is that it doesn't run at all - remember, Sequence is lazy, and you have to use a terminal operation such as toSet() or forEach(). Additionally, you need to limit the number of threads that can be used for that task via constructing a newFixedThreadPoolContext context and using it in async:
val pictureContext = newFixedThreadPoolContext(nThreads = 10, name = "reading pictures in parallel")
File("/Users/me/Pictures/").walkTopDown()
.onFail { file, ex -> println("ERROR: $file caused $ex") }
.filter { ... only big images... }
.map { file ->
async(pictureContext) {
ImageProcessor.fromFile(file)
}
}
.toList()
.forEach { it.await() }
Edit:
You have to use a terminal operator (toList) befor awaiting the results
I got it working with a Channel. But maybe I'm being redundant with your way?
val pipe = ArrayChannel<Deferred<ImageFile>>(20)
launch {
while (!(pipe.isEmpty && pipe.isClosedForSend)) {
imageFiles.add(pipe.receive().await())
}
println("pipe closed")
}
File("/Users/me/").walkTopDown()
.onFail { file, ex -> println("ERROR: $file caused $ex") }
.forEach { pipe.send(async { ImageFile.fromFile(it) }) }
pipe.close()
This doesn't preserve the order of the projection but otherwise limits the throughput to at most maxDegreeOfParallelism. Expand and extend as you see fit.
suspend fun <TInput, TOutput> (Collection<TInput>).inParallel(
maxDegreeOfParallelism: Int,
action: suspend CoroutineScope.(input: TInput) -> TOutput
): Iterable<TOutput> = coroutineScope {
val list = this#inParallel
if (list.isEmpty())
return#coroutineScope listOf<TOutput>()
val brake = Channel<Unit>(maxDegreeOfParallelism)
val output = Channel<TOutput>()
val counter = AtomicInteger(0)
this.launch {
repeat(maxDegreeOfParallelism) {
brake.send(Unit)
}
for (input in list) {
val task = this.async {
action(input)
}
this.launch {
val result = task.await()
output.send(result)
val completed = counter.incrementAndGet()
if (completed == list.size) {
output.close()
} else brake.send(Unit)
}
brake.receive()
}
}
val results = mutableListOf<TOutput>()
for (item in output) {
results.add(item)
}
return#coroutineScope results
}
Example usage:
val output = listOf(1, 2, 3).inParallel(2) {
it + 1
} // Note that output may not be in same order as list.
Why not use the asFlow() operator and then use flatMapMerge?
someCoroutineScope.launch(Dispatchers.Default) {
File("/Users/me/Pictures/").walkTopDown()
.asFlow()
.filter { ... only big images... }
.flatMapMerge(concurrencyLimit) { file ->
flow {
emit(runInterruptable { ImageProcessor.fromFile(file) })
}
}.catch { ... }
.collect()
}
Then you can limit the simultaneous open files while still processing them concurrently.
To limit the parallelism to some value there is limitedParallelism function starting from the 1.6.0 version of the kotlinx.coroutines library. It can be called on CoroutineDispatcher object. So to limit threads for parallel execution we can write something like:
val parallelismLimit = Runtime.getRuntime().availableProcessors()
val limitedDispatcher = Dispatchers.Default.limitedParallelism(parallelismLimit)
val scope = CoroutineScope(limitedDispatcher) // we can set limitedDispatcher for the whole scope
scope.launch { // or we can set limitedDispatcher for a coroutine launch(limitedDispatcher)
File("/Users/me/Pictures/").walkTopDown()
.onFail { file, ex -> println("ERROR: $file caused $ex") }
.filter { ... only big images... }
.map { file ->
async {
ImageProcessor.fromFile(file)
}
}.toList().awaitAll()
}
ImageProcessor.fromFile(file) will be executed in parallel using parallelismLimit number of threads.
This will cap coroutines to workers. I'd recommend watching https://www.youtube.com/watch?v=3WGM-_MnPQA
package com.example.workers
import kotlinx.coroutines.*
import kotlinx.coroutines.channels.ReceiveChannel
import kotlinx.coroutines.channels.produce
import kotlin.system.measureTimeMillis
class ChannellibgradleApplication
fun main(args: Array<String>) {
var myList = mutableListOf<Int>(3000,1200,1400,3000,1200,1400,3000)
runBlocking {
var myChannel = produce(CoroutineName("MyInts")) {
myList.forEach { send(it) }
}
println("Starting coroutineScope ")
var time = measureTimeMillis {
coroutineScope {
var workers = 2
repeat(workers)
{
launch(CoroutineName("Sleep 1")) { theHardWork(myChannel) }
}
}
}
println("Ending coroutineScope $time ms")
}
}
suspend fun theHardWork(channel : ReceiveChannel<Int>)
{
for(m in channel) {
println("Starting Sleep $m")
delay(m.toLong())
println("Ending Sleep $m")
}
}

Nicifying execution contex's thread pool's output for logging/debuging in scala

Is there is nice way to rename a pool in/for an executon context to produce nicer output in logs/wile debugging. Not to be look like ForkJoinPool-2-worker-7 (because ~2 tells nothing about pool's purose in app) but WorkForkJoinPool-2-worker-7.. wihout creating new WorkForkJoinPool class for it?
Example:
object LogSample extends App {
val ex1 = ExecutionContext.global
val ex2 = ExecutionContext.fromExecutor(null:Executor) // another global ex context
val system = ActorSystem("system")
val log = Logging(system.eventStream, "my.nice.string")
Future {
log.info("1")
}(ex1)
Future {
log.info("2")
}(ex2)
Thread.sleep(1000)
// output, like this:
/*
[INFO] [09/14/2015 21:53:34.897] [ForkJoinPool-2-worker-7] [my.nice.string] 2
[INFO] [09/14/2015 21:53:34.897] [ForkJoinPool-1-worker-7] [my.nice.string] 1
*/
}
You need to implement custom thread factory, something like this:
class CustomThreadFactory(prefix: String) extends ForkJoinPool.ForkJoinWorkerThreadFactory {
def newThread(fjp: ForkJoinPool): ForkJoinWorkerThread = {
val thread = new ForkJoinWorkerThread(fjp) {}
thread.setName(prefix + "-" + thread.getName)
thread
}
}
val threadFactory = new CustomThreadFactory("custom prefix here")
val uncaughtExceptionHandler = new UncaughtExceptionHandler {
override def uncaughtException(t: Thread, e: Throwable) = e.printStackTrace()
}
val executor = new ForkJoinPool(10, threadFactory, uncaughtExceptionHandler, true)
val ex2 = ExecutionContext.fromExecutor(executor) // another global ex context
val system = ActorSystem("system")
val log = Logging(system.eventStream, "my.nice.string")
Future {
log.info("2") //[INFO] [09/15/2015 18:22:43.728] [custom prefix here-ForkJoinPool-1-worker-29] [my.nice.string] 2
}(ex2)
Thread.sleep(1000)
Ok. Seems this is not possible (particulary for default global iml) due to current scala ExecutonContext implementation.
What I could do is just copy that impl and replace:
class DefaultThreadFactory(daemonic: Boolean) ... {
def wire[T <: Thread](thread: T): T = {
thread.setName("My" + thread.getId) // ! add this one (make 'My' to be variable)
thread.setDaemon(daemonic)
thread.setUncaughtExceptionHandler(uncaughtExceptionHandler)
thread
}...
because threadFactory there
val threadFactory = new DefaultThreadFactory(daemonic = true)
is harcoded ...
(seems Vladimir Petrosyan was first showing nicer way :) )

spark-streaming and connection pool implementation

The spark-streaming website at https://spark.apache.org/docs/latest/streaming-programming-guide.html#output-operations-on-dstreams mentions the following code:
dstream.foreachRDD { rdd =>
rdd.foreachPartition { partitionOfRecords =>
// ConnectionPool is a static, lazily initialized pool of connections
val connection = ConnectionPool.getConnection()
partitionOfRecords.foreach(record => connection.send(record))
ConnectionPool.returnConnection(connection) // return to the pool for future reuse
}
}
I have tried to implement this using org.apache.commons.pool2 but running the application fails with the expected java.io.NotSerializableException:
15/05/26 08:06:21 ERROR OneForOneStrategy: org.apache.commons.pool2.impl.GenericObjectPool
java.io.NotSerializableException: org.apache.commons.pool2.impl.GenericObjectPool
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
...
I am wondering how realistic it is to implement a connection pool that is serializable. Has anyone succeeded in doing this ?
Thank you.
To address this "local resource" problem what's needed is a singleton object - i.e. an object that's warranted to be instantiated once and only once in the JVM. Luckily, Scala object provides this functionality out of the box.
The second thing to consider is that this singleton will provide a service to all tasks running on the same JVM where it's hosted, so, it MUST take care of concurrency and resource management.
Let's try to sketch(*) such service:
class ManagedSocket(private val pool: ObjectPool, val socket:Socket) {
def release() = pool.returnObject(socket)
}
// singleton object
object SocketPool {
var hostPortPool:Map[(String, Int),ObjectPool] = Map()
sys.addShutdownHook{
hostPortPool.values.foreach{ // terminate each pool }
}
// factory method
def apply(host:String, port:String): ManagedSocket = {
val pool = hostPortPool.getOrElse{(host,port), {
val p = ??? // create new pool for (host, port)
hostPortPool += (host,port) -> p
p
}
new ManagedSocket(pool, pool.borrowObject)
}
}
Then usage becomes:
val host = ???
val port = ???
stream.foreachRDD { rdd =>
rdd.foreachPartition { partition =>
val mSocket = SocketPool(host, port)
partition.foreach{elem =>
val os = mSocket.socket.getOutputStream()
// do stuff with os + elem
}
mSocket.release()
}
}
I'm assuming that the GenericObjectPool used in the question is taking care of concurrency. Otherwise, access to each pool instance need to be guarded with some form of synchronization.
(*) code provided to illustrate the idea on how to design such object - needs additional effort to be converted into a working version.
Below answer is wrong!
I'm leaving the answer here for reference, but the answer is wrong for the following reason. socketPool is declared as a lazy val so it will get instantiated with each first request for access. Since the SocketPool case class is not Serializable, this means that it will get instantiated within each partition. Which makes the connection pool useless because we want to keep connections across partitions and RDDs. It makes no difference wether this is implemented as a companion object or as a case class. Bottom line is: the connection pool must be Serializable, and apache commons pool is not.
import java.io.PrintStream
import java.net.Socket
import org.apache.commons.pool2.{PooledObject, BasePooledObjectFactory}
import org.apache.commons.pool2.impl.{DefaultPooledObject, GenericObjectPool}
import org.apache.spark.streaming.dstream.DStream
/**
* Publish a Spark stream to a socket.
*/
class PooledSocketStreamPublisher[T](host: String, port: Int)
extends Serializable {
lazy val socketPool = SocketPool(host, port)
/**
* Publish the stream to a socket.
*/
def publishStream(stream: DStream[T], callback: (T) => String) = {
stream.foreachRDD { rdd =>
rdd.foreachPartition { partition =>
val socket = socketPool.getSocket
val out = new PrintStream(socket.getOutputStream)
partition.foreach { event =>
val text : String = callback(event)
out.println(text)
out.flush()
}
out.close()
socketPool.returnSocket(socket)
}
}
}
}
class SocketFactory(host: String, port: Int) extends BasePooledObjectFactory[Socket] {
def create(): Socket = {
new Socket(host, port)
}
def wrap(socket: Socket): PooledObject[Socket] = {
new DefaultPooledObject[Socket](socket)
}
}
case class SocketPool(host: String, port: Int) {
val socketPool = new GenericObjectPool[Socket](new SocketFactory(host, port))
def getSocket: Socket = {
socketPool.borrowObject
}
def returnSocket(socket: Socket) = {
socketPool.returnObject(socket)
}
}
which you can invoke as follows:
val socketStreamPublisher = new PooledSocketStreamPublisher[MyEvent](host = "10.10.30.101", port = 29009)
socketStreamPublisher.publishStream(myEventStream, (e: MyEvent) => Json.stringify(Json.toJson(e)))

Resources