I'm using an ExecutionService as follows:
ExecutorService exe = Executors.newWorkStealingPool(parts.size());
...
Stream<Future<String>> futures = parts.stream().map(part -> exe.submit(() -> processPartition(part)));
...
String ret[] = futures.map(t -> {
try {
return t.get();
} catch (InterruptedException | ExecutionException e) {
throw new RuntimeException(e);
}
}).toArray(n -> new String[n]);
the code inside of processPartition() is executing only one at a time.
What is going on?
I spent several hours troubleshooting this and then finally found the answer 2 minutes after posting.
The problem is in this pattern:
Stream<Future<String>> futures = [...]
By using a stream, each Future is not submitted in until each corresponding map(t -> is called.
FIX:
List<Future<String>> futures = [...] .collect(Collectors.toList());
That forces all the threads to get submitted.
Related
I am doing this simulation using the spark runner:
PipelineOptions options = PipelineOptionsFactory.fromArgs(args).create();
Pipeline p = Pipeline.create(options);
p.apply(Create.of(1))
.apply(ParDo.of(new DoFn<Integer, Integer>() {
#ProcessElement
public void apply(#Element Integer element, OutputReceiver<Integer> outputReceiver) {
IntStream.range(0, 4_000_000).forEach(outputReceiver::output);
}
}))
.apply(Reshuffle.viaRandomKey())
.apply(ParDo.of(new DoFn<Integer, Integer>() {
#ProcessElement
public void apply(#Element Integer element, OutputReceiver<Integer> outputReceiver) {
try {
// simulate a rpc call of 10ms
Thread.sleep(10);
} catch (InterruptedException e) {
e.printStackTrace();
}
outputReceiver.output(element);
}
}));
PipelineResult result = p.run();
result.waitUntilFinish();
I am running using --runner=SparkRunner --sparkMaster=local[8] but only 1 thread is used after the reshuffle.
Why the Rechuffle is not working?
If I change the reshuffle for this:
.apply(MapElements.into(kvs(integers(), integers())).via(e -> KV.of(e % 8, e)))
.apply(GroupByKey.create())
.apply(Values.create())
.apply(Flatten.iterables())
Then I get 8 thread running.
BR, Rafael.
It looks like Reshuffle on Beam on Spark boils down to the implementation at
https://github.com/apache/beam/blob/master/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/GroupCombineFunctions.java#L191
I wonder if in this case both rdd.context().defaultParallelism() and rdd.getNumPartitions() are 1. I've filed https://issues.apache.org/jira/browse/BEAM-10834 to investigate.
In the meantime, you can use GroupByKey to get the desired parallelism as you've indicated. (If you don't literally have integers, you could try using the hash of your element, a Math.random(), or even an incrementing counter as the key).
I would like to run some treads, wait till all of them are finished and get the results.
Possible way to do that would be in the code below. Is it thread safe though?
import kotlin.concurrent.thread
sealed class Errorneous<R>
data class Success<R>(val result: R) : Errorneous<R>()
data class Fail<R>(val error: Exception) : Errorneous<R>()
fun <R> thread_with_result(fn: () -> R): (() -> Errorneous<R>) {
var r: Errorneous<R>? = null
val t = thread {
r = try { Success(fn()) } catch (e: Exception) { Fail(e) }
}
return {
t.join()
r!!
}
}
fun main() {
val tasks = listOf({ 2 * 2 }, { 3 * 3 })
val results = tasks
.map{ thread_with_result(it) }
.map{ it() }
println(results)
}
P.S.
Are there better built-in tools in Kotlin to do that? Like process 10000 tasks with pool of 10 threads?
It should be threads, not coroutines, as it will be used with legacy code and I don't know if it works well with coroutines.
Seems like Java has Executors that doing exactly that
fun <R> execute_in_parallel(tasks: List<() -> R>, threads: Int): List<Errorneous<R>> {
val executor = Executors.newFixedThreadPool(threads)
val fresults = executor.invokeAll(tasks.map { task ->
Callable<Errorneous<R>> {
try { Success(task()) } catch (e: Exception) { Fail(e) }
}
})
return fresults.map { future -> future.get() }
}
my question is really simple : is this program valid as a simulation of the producer-consumer problem ?
public class ProducerConsumer {
public static void main(String[] args) {
Consumers c = new Consumers(false, null);
Producer p = new Producer(true, c);
c.p = p;
p.start();
c.start();
}
}
class Consumers extends Thread {
boolean hungry; // I want to eat
Producer p;
public Consumers(boolean hungry, Producer p) {
this.hungry = hungry;
this.p = p;
}
public void run() {
while (true) {
// While the producer want to produce, don't go
while (p.nice == true) {
// Simulation of the waiting, to check if it doesn't wait and
//`eat at the same time or any bad interleavings
System.out.println("Consumer doesn't eat");
try {
sleep(500);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
for (int i = 0; i < 3; i++) {
try {
sleep(1000);
// Because the consumer eat, the producer is boring and
// want to produce, that's the meaning of the nice.
// This line makes the producer automatically wait in the
// while loop as soon as it has finished to produce.
p.nice = true;
} catch (InterruptedException e) {
e.printStackTrace();
}
System.out.println("Consumer eat");
}
hungry = false;
System.out.println("\nConsumer doesn't eat anymore\n");
}
}
}
class Producer extends Thread {
boolean nice;
Consumers c;
public Producer(boolean nice, Consumers c) {
this.nice = nice;
this.c = c;
}
public void run() {
while (true) {
/**
* I begin with the producer so the producer, doesn't enter the
* loop because no food has been produce and hungry is
* exceptionally false because that's how work this program,
* so at first time the producer doesn't enter the loop.
*/
while (c.hungry == true) {
try {
sleep(500);
} catch (InterruptedException e) {
e.printStackTrace();
}
System.out.println("Producer doesn't produce");
}
/**
* While the consumer wait in the while loop of its run method
* which means that nice is true the producer produce and during
* the production the consumer become hungry, which make the
* loop "enterable" for theproducer. The advantage of this is
* that the producer already knows that it has to go away after
* producing, the consumer doesn't need to tell him
* Produce become true, and it has no effect for the first round
*/
for (int i = 0; i < 3; i++) {
try {
sleep(1000);
c.hungry = true;
} catch (InterruptedException e) {
e.printStackTrace();
}
System.out.println("Producer produce");
}
/**
* After a while, producer produce, the consumer is still in the
* loop, so we can tell him he can go, but we have to make
* sure that the producer doesn't pass the loop before the
* consumer goes out and set back produce to true will lead the
* consumer to be stuck again, and that's the role of the,
* c.hungry in the for loop, because the producer knows it has
* some client, it directly enter the loop and so can't
* starve the client.
*/
System.out.println("\nProducer doesn't produce anymore\n");
nice = false;
}
}
}
I didn't use any synchronization, wait or notify, so for a parallel programming problem it seems very strange, but when I run it there aren't any deadlocks, starvation or bad interleavings, the producer produces, then stop, the consumer eats and then stops and again as many time as I wanted.
Have I cheat somewhere ?
Thanks !
P.S- I don't know why but the first line of my question doesn't appear, it was just said hello
First of all, careful with the naming, "Consumers" is misleading, you are only simulating a lone consumer. Nice can also be replaced with "producing".
Secondly, you're using while(condition) sleep, which is basically the less efficient, non protected version of a semaphore wait, so you did use a form of wait.
E.G.
while (p.nice == true) {
System.out.println("Consumer doesn't eat");
try {
sleep(500);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
is your P()
System.out.println("\nProducer doesn't produce anymore\n");
nice = false;
is your V()
This method, however is both inefficient (the waiting thread is either busy waiting or sleeps for a moment while being able to go) and unprotected (because there is no protection for simultaneous access of nice and hungry, you won't be able to expand this program with more Consumers or Producers).
Hope this helps.
I downloaded some existing code from internet. I ran it with few modifications. In one scenario, I did not get what I was looking for. Here is the code -
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ForkJoinPool;
import java.util.concurrent.RecursiveAction;
public class MyRecursiveAction extends RecursiveAction{
private long workload = 0;
public MyRecursiveAction(long workload) {
this.workload = workload;
}
#Override
protected void compute() {
if(this.workload > 16) {
System.out.println("Splitting workload :: " + this.workload);
List<MyRecursiveAction> subtasks = new ArrayList<MyRecursiveAction>();
subtasks.addAll(createSubtasks());
for(RecursiveAction subtask : subtasks) {
subtask.fork();
}
}else {
System.out.println("Doing work myself1 " + this.workload);
try {
Thread.sleep(1000L);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
System.out.println("Done it ya " + this.workload);
}
}
private List<MyRecursiveAction> createSubtasks() {
List<MyRecursiveAction> subTasks = new ArrayList<>();
MyRecursiveAction subtask1 = new MyRecursiveAction(this.workload / 2);
MyRecursiveAction subtask2 = new MyRecursiveAction(this.workload / 2);
subTasks.add(subtask1);
subTasks.add(subtask2);
return subTasks;
}
public static void main(String[] args) {
MyRecursiveAction myRecursiveAction = new MyRecursiveAction(24);
ForkJoinPool forkJoinPool = new ForkJoinPool(4);
forkJoinPool.invoke(myRecursiveAction);
}
}
Check the following excerpt -
System.out.println("Doing work myself1 " + this.workload);
try {
Thread.sleep(1000L);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
System.out.println("Done it ya " + this.workload);
I added a sleep of 1 second and then I printed another statement. However if I run the code, I don't see that statement getting printed. I don't understand why. Why will that not get printed ? In fact the result of the execution is -
Splitting workload :: 24
Doing work myself1 12
Doing work myself1 12
I was expecting the following line as well - "Done it ya"..
Make workload static and volatile:
private static volatile long workload = 0;
Loose the this.workload for just workload.
Alter if statement to:
if(workload > 0) {
Then you will get to "Done it ya".
I have found the reason as to why the last line was not getting printed.This is because fork works in asynchronous way. So its altogether a different thread which sleeps for some time. In asynchronous programming, there is no need for the main thread to wait for the response to come back unless we via code add some constructs. In this case by the time thread wakes up after 1 second, the main thread is already over.
To force the main thread to wait for execution of other threads, we need to use JOIN.
ForkJoinTask.join(): This method blocks until the result of the computation is done.
So if I add the following block
for(RecursiveAction subtask : subtasks) {
subtask.join();
}
the main thread waits and we get all the expected lines printed on the console.
for there are lots of data should be put into hazelcast map, I want to prevent reading from others when the data is putting into the map.
is there any way to realize it?
for example:
map a = map(1,000,000,000) // a has 1,000,000,000 elements
map b = map(2,000) // b has 200 emlemnts
i want to put all of b into a ;
the elements of b should be accessed after all of these are put into map a;
if the elements of map b haven't been put into map a entirely, the elements of map b couldn't be accessed.
use case:
map a ={1,2,3,4,5}
map b ={a,b,c,d,e}
print a // result {1,2,3,4,5}
foreach item in b
a.put item
print a // result {1,2,3,4,5}
end foreach
print a //result {1,2,3,4,5,a,b,c,d,e}
i want to merge these two maps.while, map b's elements couldn't be accessed via map a before merging finished.
my solutions
thank all the people for their help.
after reading the hazelcast manual, I choose the transactionalMap to resolve this problem.
transactionalMap is READ_COMMITED islate. it could suspend reading map(1) threads when the transaction is updating map(1).
``` java
static Runnable tx = new Runnable() {
#Override
public void run() {
try {
logger.info("start transaction...");
TransactionContext txCxt = hz.newTransactionContext();
txCxt.beginTransaction();
TransactionalMap<Object, Object> map = txCxt.getMap("map");
try {
logger.info("before put map(1)");
Thread.sleep(300);
map.put("1", "1"); // reader1 is blocked
logger.info("after put map(1)");
Thread.sleep(500);
map.put("2", "2"); // reader2 is blocked
logger.info("after put map(2)");
Thread.sleep(500);
txCxt.commitTransaction();
logger.info("transaction committed");
} catch (RuntimeException t) {
txCxt.rollbackTransaction();
throw t;
}
Thread.sleep(500);
} catch (InterruptedException e) {
e.printStackTrace();
} finally {
logger.info("Finished testmap size:{}, testmap(1):{}, testmap(2):{} ", testmap.size(), testmap.get("1"),
testmap.get("2"));
Hazelcast.shutdownAll();
logger.info("system exit.");
System.exit(0);
}
}
};
```
What's your motivation / use-case? You can use transactions, but that could have a bad impact on performance. Alternatively you could use manual locking - see ILock.
However both these techniques should be used as a last-resort - when you have no chance to design your application differently.
One way to achieve this is by locking the segments in Map b while adding to it. Once pushing the entries to Map a is complete, you can unlock the segments.
There will be performance implications with this methods though as it requires an extra step of locking/unlocking.