Kotlin to achieve multithread request hedging? - multithreading

Spring's reactor has an interesting feature : Hedging . It means spawning many requests and get the first returned result , and automatically clean other contexts. Josh Long recently has been actively promoting this feature. Googling Spring reactor hedging shows relative results. If anybody is curious , here is the sample code . In short , Flux.first() simplifies all the underlaying hassles , which is very impressive.
I wonder how this can be achieved with Kotlin's coroutine and multithread , (and maybe with Flow or Channel ) . I thought of a simple scenario : One service accepts longUrl and spawns the longUrl to many URL shorten service ( such as IsGd , TinyUrl ...) , and returns the first returned URL ... (and terminates / cleans other thread / coroutine resources)
There is an interface UrlShorter that defines this work :
interface UrlShorter {
fun getShortUrl(longUrl: String): String?
}
And there are three implementations , one for is.gd , another for tinyUrl , and the third is a Dumb implementation that blocks 10 seconds and return null :
class IsgdImpl : UrlShorter {
override fun getShortUrl(longUrl: String): String? {
logger.info("running : {}", Thread.currentThread().name)
// isGd api url blocked by SO , it sucks . see the underlaying gist for full code
val url = "https://is.gd/_create.php?format=simple&url=%s".format(URLEncoder.encode(longUrl, "UTF-8"))
return Request.Get(url).execute().returnContent().asString().also {
logger.info("returning {}", it)
}
}
}
class TinyImpl : UrlShorter {
override fun getShortUrl(longUrl: String): String? {
logger.info("running : {}", Thread.currentThread().name)
val url = "http://tinyurl.com/_api-create.php?url=$longUrl" // sorry the URL is blocked by stackoverflow , see the underlaying gist for full code
return Request.Get(url).execute().returnContent().asString().also {
logger.info("returning {}", it)
}
}
}
class DumbImpl : UrlShorter {
override fun getShortUrl(longUrl: String): String? {
logger.info("running : {}", Thread.currentThread().name)
TimeUnit.SECONDS.sleep(10)
return null
}
}
And there is a UrlShorterService that takes all the UrlShorter implementations , and try to spawn coroutines and get the first result .
Here is what I've thought of :
#ExperimentalCoroutinesApi
#FlowPreview
class UrlShorterService(private val impls: List<UrlShorter>) {
private val es: ExecutorService = Executors.newFixedThreadPool(impls.size)
private val esDispatcher = es.asCoroutineDispatcher()
suspend fun getShortUrl(longUrl: String): String {
return method1(longUrl) // there are other methods , with different ways...
}
private inline fun <T, R : Any> Iterable<T>.firstNotNullResult(transform: (T) -> R?): R? {
for (element in this) {
val result = transform(element)
if (result != null) return result
}
return null
}
The client side is simple too :
#ExperimentalCoroutinesApi
#FlowPreview
class UrlShorterServiceTest {
#Test
fun testHedging() {
val impls = listOf(DumbImpl(), IsgdImpl(), TinyImpl()) // Dumb first
val service = UrlShorterService(impls)
runBlocking {
service.getShortUrl("https://www.google.com").also {
logger.info("result = {}", it)
}
}
}
}
Notice I put the DumbImpl first , because I hope it may spawn first and blocking in its thread. And other two implementations can get result.
OK , here is the problem , how to achieve hedging in kotlin ? I try the following methods :
private suspend fun method1(longUrl: String): String {
return impls.asSequence().asFlow().flatMapMerge(impls.size) { impl ->
flow {
impl.getShortUrl(longUrl)?.also {
emit(it)
}
}.flowOn(esDispatcher)
}.first()
.also { esDispatcher.cancelChildren() } // doesn't impact the result
}
I hope method1 should work , but it totally executes 10 seconds :
00:56:09,253 INFO TinyImpl - running : pool-1-thread-3
00:56:09,254 INFO DumbImpl - running : pool-1-thread-1
00:56:09,253 INFO IsgdImpl - running : pool-1-thread-2
00:56:11,150 INFO TinyImpl - returning // tiny url blocked by SO , it sucks
00:56:13,604 INFO IsgdImpl - returning // idGd url blocked by SO , it sucks
00:56:19,261 INFO UrlShorterServiceTest$testHedging$1 - result = // tiny url blocked by SO , it sucks
Then , I thought other method2 , method3 , method4 , method5 ... but all not work :
/**
* 00:54:29,035 INFO IsgdImpl - running : pool-1-thread-3
* 00:54:29,036 INFO DumbImpl - running : pool-1-thread-2
* 00:54:29,035 INFO TinyImpl - running : pool-1-thread-1
* 00:54:30,228 INFO TinyImpl - returning // tiny url blocked by SO , it sucks
* 00:54:30,797 INFO IsgdImpl - returning // idGd url blocked by SO , it sucks
* 00:54:39,046 INFO UrlShorterServiceTest$testHedging$1 - result = // idGd url blocked by SO , it sucks
*/
private suspend fun method2(longUrl: String): String {
return withContext(esDispatcher) {
impls.map { impl ->
async(esDispatcher) {
impl.getShortUrl(longUrl)
}
}.firstNotNullResult { it.await() } ?: longUrl
}
}
/**
* 00:52:30,681 INFO IsgdImpl - running : pool-1-thread-2
* 00:52:30,682 INFO DumbImpl - running : pool-1-thread-1
* 00:52:30,681 INFO TinyImpl - running : pool-1-thread-3
* 00:52:31,838 INFO TinyImpl - returning // tiny url blocked by SO , it sucks
* 00:52:33,721 INFO IsgdImpl - returning // idGd url blocked by SO , it sucks
* 00:52:40,691 INFO UrlShorterServiceTest$testHedging$1 - result = // idGd url blocked by SO , it sucks
*/
private suspend fun method3(longUrl: String): String {
return coroutineScope {
impls.map { impl ->
async(esDispatcher) {
impl.getShortUrl(longUrl)
}
}.firstNotNullResult { it.await() } ?: longUrl
}
}
/**
* 01:58:56,930 INFO TinyImpl - running : pool-1-thread-1
* 01:58:56,933 INFO DumbImpl - running : pool-1-thread-2
* 01:58:56,930 INFO IsgdImpl - running : pool-1-thread-3
* 01:58:58,411 INFO TinyImpl - returning // tiny url blocked by SO , it sucks
* 01:58:59,026 INFO IsgdImpl - returning // idGd url blocked by SO , it sucks
* 01:59:06,942 INFO UrlShorterServiceTest$testHedging$1 - result = // idGd url blocked by SO , it sucks
*/
private suspend fun method4(longUrl: String): String {
return withContext(esDispatcher) {
impls.map { impl ->
async {
impl.getShortUrl(longUrl)
}
}.firstNotNullResult { it.await() } ?: longUrl
}
}
I am not familiar with Channel , sorry for the exception ↓
/**
* 01:29:44,460 INFO UrlShorterService$method5$2 - channel closed
* 01:29:44,461 INFO DumbImpl - running : pool-1-thread-2
* 01:29:44,460 INFO IsgdImpl - running : pool-1-thread-3
* 01:29:44,466 INFO TinyImpl - running : pool-1-thread-1
* 01:29:45,765 INFO TinyImpl - returning // tiny url blocked by SO , it sucks
* 01:29:46,339 INFO IsgdImpl - returning // idGd url blocked by SO , it sucks
*
* kotlinx.coroutines.channels.ClosedSendChannelException: Channel was closed
*
*/
private suspend fun method5(longUrl: String): String {
val channel = Channel<String>()
withContext(esDispatcher) {
impls.forEach { impl ->
launch {
impl.getShortUrl(longUrl)?.also {
channel.send(it)
}
}
}
channel.close()
logger.info("channel closed")
}
return channel.consumeAsFlow().first()
}
OK , I don't know if there are any other ways ... but all above are not working... All blocks at least 10 seconds ( blocked by DumbImpl) .
The whole source code can be found on github gist .
How can hedging be achieved in kotlin ? By Deferred or Flow or Channel or any other better ideas ? Thank you.
After submitting the question , I found all tinyurl , isGd url are blocked by SO . It really sucks !

If the actual work you want to do in parallel consists of network fetches, you should choose an async networking library so you can properly use non-blocking coroutines with it. For example, as of version 11 the JDK provides an async HTTP client which you can use as follows:
val httpClient: HttpClient = HttpClient.newHttpClient()
suspend fun httpGet(url: String): String = httpClient
.sendAsync(
HttpRequest.newBuilder().uri(URI.create(url)).build(),
BodyHandlers.ofString())
.await()
.body()
Here's a function that accomplishes request hedging given a suspendable implementation like above:
class UrlShortenerService(
private val impls: List<UrlShortener>
) {
suspend fun getShortUrl(longUrl: String): String? = impls
.asFlow()
.flatMapMerge(impls.size) { impl ->
flow<String?> {
try {
impl.getShortUrl(longUrl)?.also { emit(it) }
}
catch (e: Exception) {
// maybe log it, but don't let it propagate
}
}
}
.onCompletion { emit(null) }
.first()
}
Note the absence of any custom dispatchers, you don't need them for suspendable work. Any dispatcher will do, and all the work can run in a single thread.
The onCompletion parts steps into action when your all URL shorteners fail. In that case the flatMapMerge stage doesn't emit anything and first() would deadlock without the extra null injected into the flow.
To test it I used the following code:
class Shortener(
private val delay: Long
) : UrlShortener {
override suspend fun getShortUrl(longUrl: String): String? {
delay(delay * 1000)
println("Shortener $delay completing")
if (delay == 1L) {
throw Exception("failed service")
}
if (delay == 2L) {
return null
}
return "shortened after $delay seconds"
}
}
suspend fun main() {
val shorteners = listOf(
Shortener(4),
Shortener(3),
Shortener(2),
Shortener(1)
)
measureTimeMillis {
UrlShortenerService(shorteners).getShortUrl("bla").also {
println(it)
}
}.also {
println("Took $it ms")
}
}
This exercises the various failure cases like returning null or failing with an exception. For this code I get the following output:
Shortener 1 completing
Shortener 2 completing
Shortener 3 completing
shortened after 3 seconds
Took 3080 ms
We can see that the shorteners 1 and 2 completed but with a failure, shortener 3 returned a valid response, and shortener 4 was cancelled before completing. I think this matches the requirements.
If you can't move away from blocking requests, your implementation will have to start num_impls * num_concurrent_requests threads, which is not great. However, if that's the best you can have, here's an implementation that hedges blocking requests but awaits on them suspendably and cancellably. It will send an interrupt signal to the worker threads running the requests, but if your library's IO code is non-interruptible, these threads will hang waiting for their requests to complete or time out.
val es = Executors.newCachedThreadPool()
interface UrlShortener {
fun getShortUrl(longUrl: String): String? // not suspendable!
}
class UrlShortenerService(
private val impls: List<UrlShortener>
) {
suspend fun getShortUrl(longUrl: String): String {
val chan = Channel<String?>()
val futures = impls.map { impl -> es.submit {
try {
impl.getShortUrl(longUrl)
} catch (e: Exception) {
null
}.also { runBlocking { chan.send(it) } }
} }
try {
(1..impls.size).forEach { _ ->
chan.receive()?.also { return it }
}
throw Exception("All services failed")
} finally {
chan.close()
futures.forEach { it.cancel(true) }
}
}
}

This is essentially what the select APi was designed to do:
coroutineScope {
select {
impls.forEach { impl ->
async {
impl.getShortUrl(longUrl)
}.onAwait { it }
}
}
coroutineContext[Job].cancelChildren() // Cancel any requests that are still going.
}
Note that this will not handle exceptions thrown by the service implementations, you will need to use a supervisorScope with a custom exception handler and a filtering select loop if you want to actually handle those.

Related

Get return value from thread, is this Kotlin code thread safe?

I would like to run some treads, wait till all of them are finished and get the results.
Possible way to do that would be in the code below. Is it thread safe though?
import kotlin.concurrent.thread
sealed class Errorneous<R>
data class Success<R>(val result: R) : Errorneous<R>()
data class Fail<R>(val error: Exception) : Errorneous<R>()
fun <R> thread_with_result(fn: () -> R): (() -> Errorneous<R>) {
var r: Errorneous<R>? = null
val t = thread {
r = try { Success(fn()) } catch (e: Exception) { Fail(e) }
}
return {
t.join()
r!!
}
}
fun main() {
val tasks = listOf({ 2 * 2 }, { 3 * 3 })
val results = tasks
.map{ thread_with_result(it) }
.map{ it() }
println(results)
}
P.S.
Are there better built-in tools in Kotlin to do that? Like process 10000 tasks with pool of 10 threads?
It should be threads, not coroutines, as it will be used with legacy code and I don't know if it works well with coroutines.
Seems like Java has Executors that doing exactly that
fun <R> execute_in_parallel(tasks: List<() -> R>, threads: Int): List<Errorneous<R>> {
val executor = Executors.newFixedThreadPool(threads)
val fresults = executor.invokeAll(tasks.map { task ->
Callable<Errorneous<R>> {
try { Success(task()) } catch (e: Exception) { Fail(e) }
}
})
return fresults.map { future -> future.get() }
}

Stop Thread in Kotlin

First of all, I'm new in Kotlin, so please be nice :).
It's also my first time posting on StackOverflow
I want to literally STOP the current thread that I created but nothing works.
I tried quit(), quitSafely(), interrupt() but nothing works.
I created a class (Data.kt), in which I create and initialize a Handler and HandlerThread as follows :
class Dispatch(private val label: String = "main") {
var handler: Handler? = null
var handlerThread: HandlerThread? = null
init {
if (label == "main") {
handlerThread = null
handler = Handler(Looper.getMainLooper())
} else {
handlerThread = HandlerThread(label)
handlerThread!!.start()
handler = Handler(handlerThread!!.looper)
}
}
fun async(runnable: Runnable) = handler!!.post(runnable)
fun async(block: () -> (Unit)) = handler!!.post(block)
fun asyncAfter(milliseconds: Long, function: () -> (Unit)) {
handler!!.postDelayed(function, milliseconds)
}
fun asyncAfter(milliseconds: Long, runnable: Runnable) {
handler!!.postDelayed(runnable, milliseconds)
}
companion object {
val main = Dispatch()
private val global = Dispatch("global")
//fun global() = global
}
}
And now, in my DataManager, I use these to do asynchronous things :
fun getSomething(forceNetwork: Boolean ) {
val queue1 = Dispatch("thread1") // Create a thread called "thread1"
queue1.async {
for (i in 0..2_000_000) {
print("Hello World")
// Do everything i want in the current thread
}
// And on the main thread I call my callback
Dispatch.main.async {
//callback?.invoke(.........)
}
}
}
Now, in my MainActivity, I made 2 buttons :
One for running the function getSomething()
The other one is used for switching to another Controller View :
val button = findViewById<Button>(R.id.button)
button.setOnClickListener {
DataManager.getSomething(true)
}
val button2 = findViewById<Button>(R.id.button2)
button2.setOnClickListener {
val intent = Intent(this, Test::class.java) // Switch to my Test Controller
intent.setFlags(Intent.FLAG_ACTIVITY_NO_HISTORY)
startActivity(intent)
finish()
}
Is there a way to stop the thread, because when I switch to my second View, print("Hello World") is still triggered, unfortunately.
Thanks for helping me guys I hope that you understand !
A thread needs to periodically check a (global) flag and when it becomes true then the thread will break out from the loop. Java threads cannot be safely stopped without its consent.
Refer to page 252 here http://www.rjspm.com/PDF/JavaTheCompleteReference.pdf that describes the true story behind the legend.
I think that a truly interruptible thread is only possible through the support of the operating system kernel. The actual true lock is held deep down by the CPU hardware microprocessor.

how to cap kotlin coroutines maximum concurrency

I've got a Sequence (from File.walkTopDown) and I need to run a long-running operation on each of them. I'd like to use Kotlin best practices / coroutines, but I either get no parallelism, or way too much parallelism and hit a "too many open files" IO error.
File("/Users/me/Pictures/").walkTopDown()
.onFail { file, ex -> println("ERROR: $file caused $ex") }
.filter { ... only big images... }
.map { file ->
async { // I *think* I want async and not "launch"...
ImageProcessor.fromFile(file)
}
}
This doesn't seem to run it in parallel, and my multi-core CPU never goes above 1 CPU's worth. Is there a way with coroutines to run "NumberOfCores parallel operations" worth of Deferred jobs?
I looked at Multithreading using Kotlin Coroutines which first creates ALL the jobs then joins them, but that means completing the Sequence/file tree walk completly bfore the heavy processing join step, and that seems... iffy! Splitting it into a collect and a process step means the collection could run way ahead of the processing.
val jobs = ... the Sequence above...
.toSet()
println("Found ${jobs.size}")
jobs.forEach { it.await() }
This isn't specific to your problem, but it does answer the question of, "how to cap kotlin coroutines maximum concurrency".
EDIT: As of kotlinx.coroutines 1.6.0 (https://github.com/Kotlin/kotlinx.coroutines/issues/2919), you can use limitedParallelism, e.g. Dispatchers.IO.limitedParallelism(123).
Old solution: I thought to use newFixedThreadPoolContext at first, but 1) it's deprecated and 2) it would use threads and I don't think that's necessary or desirable (same with Executors.newFixedThreadPool().asCoroutineDispatcher()). This solution might have flaws I'm not aware of by using Semaphore, but it's very simple:
import kotlinx.coroutines.async
import kotlinx.coroutines.awaitAll
import kotlinx.coroutines.coroutineScope
import kotlinx.coroutines.sync.Semaphore
import kotlinx.coroutines.sync.withPermit
/**
* Maps the inputs using [transform] at most [maxConcurrency] at a time until all Jobs are done.
*/
suspend fun <TInput, TOutput> Iterable<TInput>.mapConcurrently(
maxConcurrency: Int,
transform: suspend (TInput) -> TOutput,
) = coroutineScope {
val gate = Semaphore(maxConcurrency)
this#mapConcurrently.map {
async {
gate.withPermit {
transform(it)
}
}
}.awaitAll()
}
Tests (apologies, it uses Spek, hamcrest, and kotlin test):
import kotlinx.coroutines.ExperimentalCoroutinesApi
import kotlinx.coroutines.delay
import kotlinx.coroutines.launch
import kotlinx.coroutines.runBlocking
import kotlinx.coroutines.test.TestCoroutineDispatcher
import org.hamcrest.MatcherAssert.assertThat
import org.hamcrest.Matchers.greaterThanOrEqualTo
import org.hamcrest.Matchers.lessThanOrEqualTo
import org.spekframework.spek2.Spek
import org.spekframework.spek2.style.specification.describe
import java.util.concurrent.atomic.AtomicInteger
import kotlin.test.assertEquals
#OptIn(ExperimentalCoroutinesApi::class)
object AsyncHelpersKtTest : Spek({
val actionDelay: Long = 1_000 // arbitrary; obvious if non-test dispatcher is used on accident
val testDispatcher = TestCoroutineDispatcher()
afterEachTest {
// Clean up the TestCoroutineDispatcher to make sure no other work is running.
testDispatcher.cleanupTestCoroutines()
}
describe("mapConcurrently") {
it("should run all inputs concurrently if maxConcurrency >= size") {
val concurrentJobCounter = AtomicInteger(0)
val inputs = IntRange(1, 2).toList()
val maxConcurrency = inputs.size
// https://github.com/Kotlin/kotlinx.coroutines/issues/1266 has useful info & examples
runBlocking(testDispatcher) {
print("start runBlocking $coroutineContext\n")
// We have to run this async so that the code afterwards can advance the virtual clock
val job = launch {
testDispatcher.pauseDispatcher {
val result = inputs.mapConcurrently(maxConcurrency) {
print("action $it $coroutineContext\n")
// Sanity check that we never run more in parallel than max
assertThat(concurrentJobCounter.addAndGet(1), lessThanOrEqualTo(maxConcurrency))
// Allow for virtual clock adjustment
delay(actionDelay)
// Sanity check that we never run more in parallel than max
assertThat(concurrentJobCounter.getAndAdd(-1), lessThanOrEqualTo(maxConcurrency))
print("action $it after delay $coroutineContext\n")
it
}
// Order is not guaranteed, thus a Set
assertEquals(inputs.toSet(), result.toSet())
print("end mapConcurrently $coroutineContext\n")
}
}
print("before advanceTime $coroutineContext\n")
// Start the coroutines
testDispatcher.advanceTimeBy(0)
assertEquals(inputs.size, concurrentJobCounter.get(), "All jobs should have been started")
testDispatcher.advanceTimeBy(actionDelay)
print("after advanceTime $coroutineContext\n")
assertEquals(0, concurrentJobCounter.get(), "All jobs should have finished")
job.join()
}
}
it("should run one at a time if maxConcurrency = 1") {
val concurrentJobCounter = AtomicInteger(0)
val inputs = IntRange(1, 2).toList()
val maxConcurrency = 1
runBlocking(testDispatcher) {
val job = launch {
testDispatcher.pauseDispatcher {
inputs.mapConcurrently(maxConcurrency) {
assertThat(concurrentJobCounter.addAndGet(1), lessThanOrEqualTo(maxConcurrency))
delay(actionDelay)
assertThat(concurrentJobCounter.getAndAdd(-1), lessThanOrEqualTo(maxConcurrency))
it
}
}
}
testDispatcher.advanceTimeBy(0)
assertEquals(1, concurrentJobCounter.get(), "Only one job should have started")
val elapsedTime = testDispatcher.advanceUntilIdle()
print("elapsedTime=$elapsedTime")
assertThat(
"Virtual time should be at least as long as if all jobs ran sequentially",
elapsedTime,
greaterThanOrEqualTo(actionDelay * inputs.size)
)
job.join()
}
}
it("should handle cancellation") {
val jobCounter = AtomicInteger(0)
val inputs = IntRange(1, 2).toList()
val maxConcurrency = 1
runBlocking(testDispatcher) {
val job = launch {
testDispatcher.pauseDispatcher {
inputs.mapConcurrently(maxConcurrency) {
jobCounter.addAndGet(1)
delay(actionDelay)
it
}
}
}
testDispatcher.advanceTimeBy(0)
assertEquals(1, jobCounter.get(), "Only one job should have started")
job.cancel()
testDispatcher.advanceUntilIdle()
assertEquals(1, jobCounter.get(), "Only one job should have run")
job.join()
}
}
}
})
Per https://play.kotlinlang.org/hands-on/Introduction%20to%20Coroutines%20and%20Channels/09_Testing, you may also need to adjust compiler args for the tests to run:
compileTestKotlin {
kotlinOptions {
// Needed for runBlocking test coroutine dispatcher?
freeCompilerArgs += "-Xuse-experimental=kotlin.Experimental"
freeCompilerArgs += "-Xopt-in=kotlin.RequiresOptIn"
}
}
testImplementation 'org.jetbrains.kotlinx:kotlinx-coroutines-test:1.4.1'
The problem with your first snippet is that it doesn't run at all - remember, Sequence is lazy, and you have to use a terminal operation such as toSet() or forEach(). Additionally, you need to limit the number of threads that can be used for that task via constructing a newFixedThreadPoolContext context and using it in async:
val pictureContext = newFixedThreadPoolContext(nThreads = 10, name = "reading pictures in parallel")
File("/Users/me/Pictures/").walkTopDown()
.onFail { file, ex -> println("ERROR: $file caused $ex") }
.filter { ... only big images... }
.map { file ->
async(pictureContext) {
ImageProcessor.fromFile(file)
}
}
.toList()
.forEach { it.await() }
Edit:
You have to use a terminal operator (toList) befor awaiting the results
I got it working with a Channel. But maybe I'm being redundant with your way?
val pipe = ArrayChannel<Deferred<ImageFile>>(20)
launch {
while (!(pipe.isEmpty && pipe.isClosedForSend)) {
imageFiles.add(pipe.receive().await())
}
println("pipe closed")
}
File("/Users/me/").walkTopDown()
.onFail { file, ex -> println("ERROR: $file caused $ex") }
.forEach { pipe.send(async { ImageFile.fromFile(it) }) }
pipe.close()
This doesn't preserve the order of the projection but otherwise limits the throughput to at most maxDegreeOfParallelism. Expand and extend as you see fit.
suspend fun <TInput, TOutput> (Collection<TInput>).inParallel(
maxDegreeOfParallelism: Int,
action: suspend CoroutineScope.(input: TInput) -> TOutput
): Iterable<TOutput> = coroutineScope {
val list = this#inParallel
if (list.isEmpty())
return#coroutineScope listOf<TOutput>()
val brake = Channel<Unit>(maxDegreeOfParallelism)
val output = Channel<TOutput>()
val counter = AtomicInteger(0)
this.launch {
repeat(maxDegreeOfParallelism) {
brake.send(Unit)
}
for (input in list) {
val task = this.async {
action(input)
}
this.launch {
val result = task.await()
output.send(result)
val completed = counter.incrementAndGet()
if (completed == list.size) {
output.close()
} else brake.send(Unit)
}
brake.receive()
}
}
val results = mutableListOf<TOutput>()
for (item in output) {
results.add(item)
}
return#coroutineScope results
}
Example usage:
val output = listOf(1, 2, 3).inParallel(2) {
it + 1
} // Note that output may not be in same order as list.
Why not use the asFlow() operator and then use flatMapMerge?
someCoroutineScope.launch(Dispatchers.Default) {
File("/Users/me/Pictures/").walkTopDown()
.asFlow()
.filter { ... only big images... }
.flatMapMerge(concurrencyLimit) { file ->
flow {
emit(runInterruptable { ImageProcessor.fromFile(file) })
}
}.catch { ... }
.collect()
}
Then you can limit the simultaneous open files while still processing them concurrently.
To limit the parallelism to some value there is limitedParallelism function starting from the 1.6.0 version of the kotlinx.coroutines library. It can be called on CoroutineDispatcher object. So to limit threads for parallel execution we can write something like:
val parallelismLimit = Runtime.getRuntime().availableProcessors()
val limitedDispatcher = Dispatchers.Default.limitedParallelism(parallelismLimit)
val scope = CoroutineScope(limitedDispatcher) // we can set limitedDispatcher for the whole scope
scope.launch { // or we can set limitedDispatcher for a coroutine launch(limitedDispatcher)
File("/Users/me/Pictures/").walkTopDown()
.onFail { file, ex -> println("ERROR: $file caused $ex") }
.filter { ... only big images... }
.map { file ->
async {
ImageProcessor.fromFile(file)
}
}.toList().awaitAll()
}
ImageProcessor.fromFile(file) will be executed in parallel using parallelismLimit number of threads.
This will cap coroutines to workers. I'd recommend watching https://www.youtube.com/watch?v=3WGM-_MnPQA
package com.example.workers
import kotlinx.coroutines.*
import kotlinx.coroutines.channels.ReceiveChannel
import kotlinx.coroutines.channels.produce
import kotlin.system.measureTimeMillis
class ChannellibgradleApplication
fun main(args: Array<String>) {
var myList = mutableListOf<Int>(3000,1200,1400,3000,1200,1400,3000)
runBlocking {
var myChannel = produce(CoroutineName("MyInts")) {
myList.forEach { send(it) }
}
println("Starting coroutineScope ")
var time = measureTimeMillis {
coroutineScope {
var workers = 2
repeat(workers)
{
launch(CoroutineName("Sleep 1")) { theHardWork(myChannel) }
}
}
}
println("Ending coroutineScope $time ms")
}
}
suspend fun theHardWork(channel : ReceiveChannel<Int>)
{
for(m in channel) {
println("Starting Sleep $m")
delay(m.toLong())
println("Ending Sleep $m")
}
}

Bridge channel to a sequence

This code is based on Coroutines guide example: Fan-out
val inputProducer = produce<String>(CommonPool) {
(0..inputArray.size).forEach {
send(inputArray[it])
}
}
val resultChannel = Channel<Result>(10)
repeat(threadCount) {
launch(CommonPool) {
inputProducer.consumeEach {
resultChannel.send(getResultFromData(it))
}
}
}
What is the right way to create a Sequence<Result> that will provide results?
You can get the channel .iterator() from the ReceiveChannel and then wrap that channel iterator into a Sequence<T>, implementing its normal Iterator<T> that blocks waiting for the result on each request:
fun <T> ReceiveChannel<T>.asSequence(context: CoroutineContext) =
Sequence {
val iterator = iterator()
object : AbstractIterator<T>() {
override fun computeNext() = runBlocking(context) {
if (!iterator.hasNext())
done() else
setNext(iterator.next())
}
}
}
val resultSequence = resultChannel.asSequence(CommonPool)
I had the same problem, and in the end I came up with this rather unusual/convoluted solution:
fun Channel<T>.asSequence() : Sequence<T> {
val itr = this.iterator()
return sequence<Int> {
while ( runBlocking {itr.hasNext()} ) yield( runBlocking<T> { itr.next() } )
}
}
I do not think it is particular efficient (go with the one provided by #hotkey), but it has a certain appeal to me at least.

Number of threads in Akka keep increasing. What could be wrong?

Why does the thread count keep increasing ?
LOOK AT THE BOTTOM RIGHT in this image.
The overall flow is like this:
Akka HTTP Server API
-> on http request, sendMessageTo DataProcessingActor
-> sendMessageTo StorageActor
-> sendMessageTo DataBaseActor
-> sendMessageTo IndexActor
This is the definition of Akka HTTP API ( in pseudo-code ):
Main {
path("input/") {
post {
dataProcessingActor forward message
}
}
}
Below are the actor definitions ( in pseudo-code ):
DataProcessingActor {
case message =>
message = parse message
storageActor ! message
}
StorageActor {
case message =>
indexActor ! message
databaseActor ! message
}
DataBaseActor {
case message =>
val c = get monogCollection
c.store(message)
}
IndexActor {
case message =>
elasticSearch.index(message)
}
After I run this setup, and on sending multiple HTTP requsts to "input/" HTTP endpoint, I get errors:
for( i <- 0 until 1000000) {
post("input/", someMessage+i)
}
Error:
[ERROR] [04/22/2016 13:20:54.016] [Main-akka.actor.default-dispatcher-15] [akka.tcp://Main#127.0.0.1:2558/system/IO-TCP/selectors/$a/0] Accept error: could not accept new connection
java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
at akka.io.TcpListener.acceptAllPending(TcpListener.scala:107)
at akka.io.TcpListener$$anonfun$bound$1.applyOrElse(TcpListener.scala:82)
at akka.actor.Actor$class.aroundReceive(Actor.scala:480)
at akka.io.TcpListener.aroundReceive(TcpListener.scala:32)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
at akka.actor.ActorCell.invoke(ActorCell.scala:495)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
EDIT 1
Here is the application.conf file being used:
akka {
loglevel = "INFO"
stdout-loglevel = "INFO"
logging-filter = "akka.event.slf4j.Slf4jLoggingFilter"
actor {
default-dispatcher {
throughput = 10
}
}
actor {
provider = "akka.remote.RemoteActorRefProvider"
}
remote {
enabled-transports = ["akka.remote.netty.tcp"]
netty.tcp {
hostname = "127.0.0.1"
port = 2558
}
}
}
I figured out that ElasticSearch was the problem. I am using Java API for ElasticSearch, and that is leaking sockets because of the way it was being used from Java API. Now resolved as described here.
Below is the Elastic Search client service using Java API
trait ESClient { def getClient(): Client }
case class ElasticSearchService() extends ESClient {
def getClient(): Client = {
val client = new TransportClient().addTransportAddress(
new InetSocketTransportAddress(Config.ES_HOST, Config.ES_PORT)
)
client
}
}
This is the actor which was causing the leak:
class IndexerActor() extends Actor {
val elasticSearchSvc = new ElasticSearchService()
lazy val client = elasticSearchSvc.getClient()
override def preStart = {
// initialize index, and mappings etc.
}
def receive() = {
case message =>
// do indexing here
indexMessage(ES.client, message)
}
}
NOTE: Every time an actor instance is created, a new connection is being made.
Every invocation of new ElasticSearchService() was creating a new connection to ElasticSearch. I moved that into a separate object as shown below, and also the actor uses this object instead:
object ES {
val elasticSearchSvc = new ElasticSearchService()
lazy val client = elasticSearchSvc.getClient()
}
class IndexerActor() extends Actor {
override def preStart = {
// initialize index, and mappings etc.
}
def receive() = {
case message =>
// do indexing here
indexMessage(ES.client, message)
}
}

Resources