java.util.concurrent.RejectedExecutionException: rejected from java.util.concurrent.ThreadPoolExecutor - multithreading

I have an integrated executor service to execute the tasks concurrently and when used without Future, everything is working fine. In order to check tasks completion and no failure from the executor threads, I have added an Future and trying to check failure state.
Sample Code:
val pool = Executors.newFixedThreadPool(THREAD_COUNT)
implicit val context = ExecutionContext.fromExecutorService(pool)
var index = 0
while (index < employeeJobDataList.size()) {
val startIndex = index
var endIndex = index + BATCH_THRESHOLD
if (startIndex + BATCH_THRESHOLD >= employeeJobDataList.size()) {
endIndex = employeeJobDataList.size()
}
val future = Future({
upsertsFailedBatches.addAll(batchSaveInDDB(employeeJobDataList.subList(startIndex, endIndex), failedList))
})
future.onFailure { case e : Throwable => errorFromExecutorService . append (e.toString) }
index = index + BATCH_THRESHOLD
}
context.shutdown()
while (!context.isTerminated()) {
Thread.sleep(THREAD_SLEEP_TIME_MILLI)
}
Error details:
java.util.concurrent.RejectedExecutionException: Task scala.concurrent.impl.CallbackRunnable#74a938eb rejected from java.util.concurrent.ThreadPoolExecutor#5e5fcd94[Shutting down, pool size = 10, active threads = 10, queued tasks = 87323, completed tasks = 213]
at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379)
at scala.concurrent.impl.ExecutionContextImpl$$anon$1.execute(ExecutionContextImpl.scala:136)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
at scala.concurrent.Promise$class.complete(Promise.scala:55)
at scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:157)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:23)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Related

How to kill all threads caused by Scala .par when one throws and exception?

This code blocks, the exceptions that are thrown don't kill the loop
List(1, 2, 3, 4, 5).par.foreach { i =>
println("i = " + i)
if (i == 5) {
println("Sleeping forever")
java.lang.Thread.sleep(Long.MaxValue)
}
throw new IllegalArgumentException("foo")
} must throwA[IllegalArgumentException]
Is there a way to use .par but make it blow up properly?
I think that when using the par libs you should expect the exception to occour once all the threads actually finish and are are joined back into the current thread, I suspect this because by looking at the implementation of the foreach method (cmd clicking on foreach) a method named executeAndWaitResult is used
Here are some other q/s that seem somewhat similar perhaps it helps
interrupt scala parallel collection
How to cancel Future in Scala?
https://immutables.pl/2016/10/08/parallel-futures-and-exceptions/
This seems to work, but verbose though
implicit class PimpedListScala[T](l: List[T]) {
def parForeachThrowOnException(f: T => Unit, sleepMillis: Long): Unit = {
var exception: Option[Throwable] = None
val future = Future(
l.par.foreach(e =>
try {
f(e)
} catch {
case t: Throwable =>
if (exception.isEmpty) exception = Some(t)
}
)
)
while (exception.isEmpty && !future.isCompleted) {
java.lang.Thread.sleep(sleepMillis)
}
exception.foreach(throw _)
}
}
I've also tried this
def parForeachThrowOnException(f: T => Unit): Unit =
Await.result(Future.traverse(l)(a => Future(f(a))), Duration.Inf)
but this works unpredictably. For a live experiment it took a full 2 hours for the first exception thrown to propagate up and kill the application.

Running blocking CPU bound tasks on Kotlin coroutines

I have been experimenting with Kotlin and running blocking CPU tasks on kotlin coroutines. When things are blocking such as big cpu intensive computations we dont really have suspension but rather we need to launch things on different threads and let them run in parallel.
I managed to get the following code working as expected with async + Default dispatcher but wondered if it was gonna work with withContext and it did not.
fun cpuBlockingTasks() = runBlocking {
val time = measureTimeMillis {
val t1 = cpuTask(id = 1, blockTime = 500)
val t2 = cpuTask(id = 2, blockTime = 2000)
println("The answer is ${t1 + t2}")
}
println("Time taken: $time")
}
suspend fun cpuTask(id: Int, blockTime: Long): Int = withContext(Dispatchers.Default) {
println("work $id start ${getThreadName()}")
val res = doSomeCpuIntensiveTask(blockTime)
println("work $id end ${getThreadName()}")
res
}
fun doSomeCpuIntensiveTask(time: Long): Int {
Thread.sleep(time) // to mimick actual thread blocking / cpu work
return 1
}
This code completes in >2500 ms and runs on the same thread sequentially. I was expecting it to kick off the first coroutine in a thread, immediately return to the caller and kick of the second on a different thread but did not work like that. Anyone know why would that be and how it can be fixed without launching async coroutine in the caller function?
This it the output
work 1 start ForkJoinPool.commonPool-worker-5 #coroutine#1
work 1 end ForkJoinPool.commonPool-worker-5 #coroutine#1
work 2 start ForkJoinPool.commonPool-worker-5 #coroutine#1
work 2 end ForkJoinPool.commonPool-worker-5 #coroutine#1
The answer is 2
Time taken: 2523
You are not creating a new coroutine in cpuTask 1 and cpuTask 2. You are just switching context. It can be easily fixed with async:
fun cpuBlockingTasks() = runBlocking {
val time = measureTimeMillis {
val t1 = async { cpuTask(id = 1, blockTime = 500) }
val t2 = async { cpuTask(id = 2, blockTime = 2000) }
println("The answer is ${t1.await() + t2.await()}")
}
println("Time taken: $time") // Time taken: 2026
}

io.grpc.StatusRuntimeException: ABORTED when using Grpc client in Spark

I'm trying to run a spark (2.2) job to get some data from the server using GRPC (1.1.2) client calls. I get this error when I run this code through spark. Running the same job for a small set works fine. From what I researched, I understand that ABORTED message is because of some concurrency issues, so I'm guessing it is because the client is unable to create more than a certain number of stubs, but I'm not sure how to proceed. Also, I know for a fact that the GRPC server works well with large number of requests and I'm well below the number of requests it can handle. Any ideas?
Adding more information as requested:
My client CatalogGrpcClient has these methods to handle channels and the request:
private List<ManagedChannel> getChannels() {
return IntStream.range(0, numChannels).mapToObj(x ->
ManagedChannelBuilder.forAddress(channelHost, channelPort).usePlaintext(true).build()
).collect(Collectors.toList());
}
private ManagedChannel getChannel() {
return channels.get(ThreadLocalRandom.current().nextInt(channels.size()));
}
private ListingRequest populateRequest(ListingRequest.Builder req, String requestId) {
return req.setClientSendTs(System.currentTimeMillis())
.setRequestId(StringUtils.defaultIfBlank(req.getRequestId(), requestId))
.setSchemaVersion(StringUtils.defaultIfBlank(req.getSchemaVersion(), schema))
.build();
}
private List<ListingResponse> getGrpcListingWithRetry(ListingRequest.Builder request,
String requestIdStr,
int retryLimit,
int sleepBetweenRetry) {
int retryCount = 0;
while (retryCount < retryLimit) {
try {
return StreamSupport.stream(Spliterators.spliteratorUnknownSize(CatalogServiceGrpc.newBlockingStub(getChannel()).getListings(populateRequest(request, requestIdStr)), Spliterator.ORDERED), false).collect(Collectors.toList());
} catch (Exception e) {
System.out.println("Exception " + e.getCause().getMessage());
retryCount = retryCount + 1;
try {
Thread.sleep(sleepBetweenRetry);
} catch (InterruptedException e1) {
e1.printStackTrace();
}
}
}
throw new StatusRuntimeException(Status.ABORTED);
}
I use the method getCatalogListingData in the method extract which is used to map to a case class in the spark job
def extract(itemIds: List[Long], validAspects: Broadcast[Array[String]]): List[ItemDetailModel] = {
var itemsDetails = List[ItemDetailModel]()
val client = new CatalogGrpcClient()
implicit val formats = DefaultFormats
val listings = client.getCatalogListingData(itemIds.map(x => x.asInstanceOf[java.lang.Long]).asJava).asScala
...
...
itemsDetails
}
Here's the spark code which calls extract. itemsMissingDetails is a dataframe with a column "item" which is a list of unique item ids. The zipWithIndex and the following map is so that I pass 50 item ids in each request to the GRPC svc.
itemsMissingDetails
.rdd
.zipWithIndex
.map(x => (x._2 / 50, List(x._1.getLong(0))))
.reduceByKey(_ ++ _)
.flatMap(items => extract(items._2, validAspects))
.toDF
.write
.format("csv")
.option("header",true)
.option("sep", "\t")
.option("escapeQuotes", false)
.save(path)
The ABORTED error is actually thrown by my client after a long time (~30 min to 1 hour). When I start this job, it gets the info I need from the GRPC svc for a few thousand items on every worker. After this, the job hangs up (on each worker) and after a really long wait (~30 min to 1 hour), it fails with the above exception or proceeds further. I haven't been able to consistently get StatusRuntimeException.

How to get multiple async results within a given timeout with GPars?

I'd like to retrieve multiple "costly" results using parallel processing but within a specific timeout.
I'm using GPars Dataflow.task but it looks like I'm missing something as the process returns only when all dataflow variable are bound.
def timeout = 500
def mapResults = []
GParsPool.withPool(3) {
def taskWeb1 = Dataflow.task {
mapResults.web1 = new URL('http://web1.com').getText()
}.join(timeout, TimeUnit.MILLISECONDS)
def taskWeb2 = Dataflow.task {
mapResults.web2 = new URL('http://web2.com').getText()
}.join(timeout, TimeUnit.MILLISECONDS)
def taskWeb3 = Dataflow.task {
mapResults.web3 = new URL('http://web3.com').getText()
}.join(timeout, TimeUnit.MILLISECONDS)
}
I did see in the GPars Timeouts doc a way to use Select to get the fastest result within the timeout.
But I'm looking for a way to retrieve as much as possible results in the given time frame.
Is there a better "GPars" way to achieve this?
Or with Java 8 Future/Callable ?
Since you're interested in Java 8 based solutions too, here's a way to do it:
int timeout = 250;
ExecutorService executorService = Executors.newFixedThreadPool(3);
try {
Map<String, CompletableFuture<String>> map =
Stream.of("http://google.com", "http://yahoo.com", "http://bing.com")
.collect(
Collectors.toMap(
// the key will be the URL
Function.identity(),
// the value will be the CompletableFuture text fetched from the url
(url) -> CompletableFuture.supplyAsync(
() -> readUrl(url, timeout),
executorService
)
)
);
executorService.awaitTermination(timeout, TimeUnit.MILLISECONDS);
//print the resulting map, cutting the text at 100 chars
map.entrySet().stream().forEach(entry -> {
CompletableFuture<String> future = entry.getValue();
boolean completed = future.isDone()
&& !future.isCompletedExceptionally()
&& !future.isCancelled();
System.out.printf("url %s completed: %s, error: %s, result: %.100s\n",
entry.getKey(),
completed,
future.isCompletedExceptionally(),
completed ? future.getNow(null) : null);
});
} catch (InterruptedException e) {
//rethrow
} finally {
executorService.shutdownNow();
}
This will give you as many Futures as URLs you have, but gives you an opportunity to see if any of the tasks failed with an exception. The code could be simplified if you're not interested in these exceptions, only the contents of successful retrievals:
int timeout = 250;
ExecutorService executorService = Executors.newFixedThreadPool(3);
try {
Map<String, String> map = Collections.synchronizedMap(new HashMap<>());
Stream.of("http://google.com", "http://yahoo.com", "http://bing.com")
.forEach(url -> {
CompletableFuture
.supplyAsync(
() -> readUrl(url, timeout),
executorService
).thenAccept(content -> map.put(url, content));
});
executorService.awaitTermination(timeout, TimeUnit.MILLISECONDS);
//print the resulting map, cutting the text at 100 chars
map.entrySet().stream().forEach(entry -> {
System.out.printf("url %s completed, result: %.100s\n",
entry.getKey(), entry.getValue() );
});
} catch (InterruptedException e) {
//rethrow
} finally {
executorService.shutdownNow();
}
Both of the codes will wait for about 250 milliseconds (it will take only a tiny bit more because of the submissions of the tasks to the executor service) before printing the results. I found about 250 milliseconds is the threshold where some of these url-s can be fetched on my network, but not necessarily all. Feel free to adjust the timeout to experiment.
For the readUrl(url, timeout) method you could use a utility library like Apache Commons IO. The tasks submitted to the executor service will get an interrupt signal even if you don't explicitely take into account the timeout parameter. I could provide an implementation for that but I believe it's out of scope for the main issue in your question.

collecting results asynchronously from gpars parallel executor

We've got some code in Java using ThreadPoolExecutor and CompletionService. Tasks are submitted in large batches to the pool; results go to the completion service where we collect completed tasks when available without waiting for the entire batch to complete:
ThreadPoolExecutor _executorService =
new ThreadPoolExecutor(MAX_NUMBER_OF_WORKERS, new LinkedBlockingQueue(20));
CompletionService _completionService =
new ExecutorCompletionService<Callable>(_executorService)
//submit tasks
_completionService.submit( some task);
//get results
while(...){
Future result = _completionService.poll(timeout);
if(result)
//process result
}
The total number of workers in the pool is MAX_NUMBER_OF_WORKERS; tasks submitted without an available worker are queued; up to 20 tasks may be queued, after which, tasks are rejected.
What is the Gpars counterpart to this approach?
Reading the documentation on gpars parallelism, I found many potential options: collectManyParallel(), anyParallel(), fork/join, etc., and I'm not sure which ones to even test. I was hoping to find some mention of "completion" or "completion service" as a comparison in the docs, but found nothing. I'm looking for some direction/pointers on where to start from those experienced with gpars.
Collecting results on-the-fly, throttling producers - this calls for a dataflow solution. Please find a sample runnable demo below:
import groovyx.gpars.dataflow.DataflowQueue
import groovyx.gpars.group.DefaultPGroup
import groovyx.gpars.scheduler.DefaultPool
import java.util.concurrent.LinkedBlockingQueue
import java.util.concurrent.ThreadPoolExecutor
import java.util.concurrent.TimeUnit
int MAX_NUMBER_OF_WORKERS = 10
ThreadPoolExecutor _executorService =
new ThreadPoolExecutor(MAX_NUMBER_OF_WORKERS, MAX_NUMBER_OF_WORKERS, 1000, TimeUnit.MILLISECONDS, new LinkedBlockingQueue(200));
final group = new DefaultPGroup(new DefaultPool(_executorService))
final results = new DataflowQueue()
//submit tasks
30.times {value ->
group.task(new Runnable() {
#Override
void run() {
println 'Starting ' + Thread.currentThread()
sleep 5000
println 'Finished ' + Thread.currentThread()
results.bind(value)
}
});
}
group.task {
results << -1 //stop the consumer eventually
}
//get results
while (true) {
def result = results.val
println result
if (result == -1) break
//process result
}
group.shutdown()

Resources