I am having a hard time programming parallel services. The goal is to retrieve data from Facebook with asynchronous API calls and afterwards iterate over the retrieved data synchronously performing GORM actions.
The first step of fetching data asynchronously seems to work fine with:
List<Activity> activityList = Activity.findAllByFacebookPageIsNotNullAndFetchEvents(true, [max: 100])
PromiseList promiseList = new PromiseList()
activityList.each { Activity activity->
promiseList << { fetchEventData(activity.facebookPage, null) }
}
Now I am trying to iterate over the results, like:
promiseList.onComplete { List results ->
results.each { ArrayList eventSet ->
eventSet.each { LazyMap eventData ->
createEvent(eventData)
}
}
}
The createEvent() method tries to save new a Event. This operation fails with:
2017-04-11 10:56:47.018 ERROR --- [ctor Thread 129] o.h.engine.jdbc.spi.SqlExceptionHelper : No operations allowed after statement closed.
2017-04-11 10:56:47.024 ERROR --- [ctor Thread 124] o.h.engine.jdbc.spi.SqlExceptionHelper : No operations allowed after statement closed.
2017-04-11 10:56:47.024 ERROR --- [ctor Thread 125] o.h.engine.jdbc.spi.SqlExceptionHelper : Cannot convert value '2017-01-11 23:31:39' from column 3 to TIMESTAMP.
2017-04-11 10:56:47.025 ERROR --- [ctor Thread 105] o.h.engine.jdbc.spi.SqlExceptionHelper : No operations allowed after statement closed.
2017-04-11 10:56:47.026 ERROR --- [ctor Thread 103] o.h.engine.jdbc.spi.SqlExceptionHelper : No operations allowed after statement closed.
2017-04-11 10:56:47.026 ERROR --- [ctor Thread 107] o.h.engine.jdbc.spi.SqlExceptionHelper : No operations allowed after statement closed.
So I guess createEvent() is called from various threads instead of the "main" thread.
Can someone please tell me how to do this the right way?
Edit:
I also tried:
List<ArrayList> promiseResult = promiseList.get()
promiseResult.each { ArrayList<LazyMap> eventList ->
eventList.each {
Event.findByFacebookId((String) it['id'])
//createEvent(it)
}
}
Fails with java.lang.NullPointerException
Try this
Event.withNewSession {
Event.withNewTransaction {
// Event update code here
}
}
Thanks for your answers. I think they got me on the right track. Maybe I wasn't to clear with what I wanted to achieve. It wasn't necessary that GORM calls are asynchronous. Although this still seems a good idea! My method is way to slow :D
However I achieved the desired behaviour using waitAll() and afterwards doing the DB handling.
A working example is:
List<Activity> activityList = Activity.findAllByFacebookPageIsNotNullAndFetchEvents(true, [max: 100])
List promises = []
activityList.each { Activity activity->
promises << task { fetchEventData(activity.facebookPage, null) } // query website asynchronously; this is really fast!
}
def promisesResults = waitAll(promises)
promisesResults.each { ArrayList<LazyMap> eventList ->
eventList.each { LazyMap eventData ->
try {
createEvent(eventData) // DB actions; this is pretty slow
} catch (e) {
log.error(e.toString())
}
}
}
Related
In my Android application I have code that should run periodically in its own coroutine and should be cancelable.
for this I have the following functions:
startJob(): Initializes the job, sets up invokeOnCompletion() and starts the work loop in the respective scope
private fun startJob() {
if (::myJob.isInitialized && myJob.isActive) {
return
}
myJob= Job()
myJob.invokeOnCompletion {
it?.message.let {
var msg = it
if (msg.isNullOrBlank()) {
msg = "Job stopped. Reason unknown"
}
myJobCompleted(msg)
}
}
CoroutineScope(Dispatchers.IO + myJob).launch {
workloop()
}
}
workloop(): The main work loop. Do some work in a loop with a set delay in each iteration:
private suspend fun workloop() {
while (true) {
// doing some stuff here
delay(setDelayInMilliseconds)
}
}
myJobCompleted: do some finalizing. For now simply log a message for testing.
private fun myJobCompleted(msg: String) {
try {
mainActivityReference.logToGUI(msg)
}
catch (e:Exception){
println("debug: " + e.message)
}
}
Running this and calling myJob.Cancel() will throw the following exception in myJobCompleted():
debug: Only the original thread that created a view hierarchy can touch its views.
I'm curious as to why this code isn't running on the main thread, since startJob() IS called from the main thread?
Furthermore: is there a option similar to using a CancellationTokenSource in c#, where the job is not immediately cancelled, but a cancellation request can be checked each iteration of the while loop?
Immediately breaking off the job, regardless of what it is doing (although it will pretty much always be waiting for the delay on cancellation) doesn't seem like a good idea to me.
It is not the contract of Job.invokeOnCompletion to run on the same thread where Job is created. Moreover, such a contract would be impossible to implement.
You can't expect an arbitrary piece of code to run on an arbitrary thread, just because there was some earlier method invocation on that thread. The ability of the Android main GUI thread to execute code submitted from the outside is special, and involves the existence a top-level event loop.
In the world of coroutines, what controls thread assignment is the coroutine context, while clearly you are outside of any context when creating the job. So the way to fix it is to explicitly launch(Dispatchers.Main) a coroutine from within invokeOnCompletion.
About you question on cancellation, you can use withContext(NonCancellable) to surround the part of code you want to protect from cancellation.
I have a situation like this where I make some web requests in parallel. Sometimes I make these calls and all requests see the same error (e.g. no-network):
void main() {
Observable.just("a", "b", "c")
.flatMap(s -> makeNetworkRequest())
.subscribe(
s -> {
// TODO
},
error -> {
// handle error
});
}
Observable<String> makeNetworkRequest() {
return Observable.error(new NoNetworkException());
}
class NoNetworkException extends Exception {
}
Depending on the timing, if one request emits the NoNetworkException before the others can, Retrofit/RxJava will dispose/interrupt** the others. I'll see one of the following logs (not all three) for each request remaining in progress++:
<-- HTTP FAILED: java.io.IOException: Canceled
<-- HTTP FAILED: java.io.InterruptedIOException
<-- HTTP FAILED: java.io.InterruptedIOException: thread interrupted
I'll be able to handle the NoNetworkException error in the subscriber and everything downstream will get disposed of and all is OK.
However based on timing, if two or more web requests emit NoNetworkException, then the first one will trigger the events above, disposing of everything down stream. The second NoNetworkException will have nowhere to go and I'll get the dreaded UndeliverableException. This is the same as example #1 documented here.
In the above article, the author suggested using an error handler. Obviously retry/retryWhen don't make sense if I expect to hear the same errors again. I don't understand how onErrorResumeNext/onErrorReturn help here, unless I map them to something recoverable to be handled downstream:
Observable.just("a", "b", "c")
.flatMap(s ->
makeNetworkRequest()
.onErrorReturn(error -> {
// eat actual error and return something else
return "recoverable error";
}))
.subscribe(
s -> {
if (s.equals("recoverable error")) {
// handle error
} else {
// TODO
}
},
error -> {
// handle error
});
but this seems wonky.
I know another solution is to set a global error handler with RxJavaPlugins.setErrorHandler(). This doesn't seem like a great solution either. I may want to handle NoNetworkException differently in different parts of my app.
So what other options to I have? What do other people do in this case? This must be pretty common.
** I don't fully understand who is interrupting/disposing of who. Is RxJava disposing of all other requests in flatmap which in turn causes Retrofit to cancel requests? Or does Retrofit cancel requests, resulting in each
request in flatmap emitting one of the above IOExceptions? I guess it doesn't really matter to answer the question, just curious.
++ It's possible that not all a, b, and c requests are in flight depending on thread pool.
Have you tried by using flatMap() with delayErrors=true?
I'm using Spark Streaming to process a stream by processing each partition (saving events to HBase), then ack the last event in each RDD from the driver to the receiver, so the receiver can ack it to its source in turn.
public class StreamProcessor {
final AckClient ackClient;
public StreamProcessor(AckClient ackClient) {
this.ackClient = ackClient;
}
public void process(final JavaReceiverInputDStream<Event> inputDStream)
inputDStream.foreachRDD(rdd -> {
JavaRDD<Event> lastEvents = rdd.mapPartition(events -> {
// ------ this code executes on the worker -------
// process events one by one; I don't use ackClient here
// return the event with the max delivery tag here
});
// ------ this code executes on the driver -------
Event lastEvent = .. // find event with max delivery tag across partitions
ackClient.ack(lastEvent); // use ackClient to ack last event
});
}
}
The problem here is that I get the following error (even though everything seems to work fine):
org.apache.spark.SparkException: Task not serializable
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158)
at org.apache.spark.SparkContext.clean(SparkContext.scala:1435)
at org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:602)
at org.apache.spark.api.java.JavaRDDLike$class.mapPartitions(JavaRDDLike.scala:141)
at org.apache.spark.api.java.JavaRDD.mapPartitions(JavaRDD.scala:32)
...
Caused by: java.io.NotSerializableException: <some non-serializable object used by AckClient>
...
It seems that Spark is trying to serialize AckClient to send it to the workers, but I thought that only code inside mapPartitions is serialized/shipped to the workers, and that the code at the RDD level (i.e. inside foreachRDD but not inside mapPartitions) would not be serialized/shipped to the workers.
Can someone confirm if my thinking is correct or not? And if it is correct, should this be reported as a bug?
You are correct, this was fixed in 1.1. However, if you look at the stack trace, the cleaner that is throwing is being invoked in the mapPartitions
at org.apache.spark.SparkContext.clean(SparkContext.scala:1435)
at org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:602)
So, the problem has to do with your mapPartitions. Make sure that you aren't accidentally wrapping this, as that is a common issue
I'm trying to create my first app in Swift which involves making multiple requests to a website. These requests are each done using the block
var task = NSURLSession.sharedSession().dataTaskWithRequest(request, completionHandler: {data, response, error -> Void in ... }
task.resume()
From what I understand this block uses a thread different to the main thread.
My question is, what is the best way to design code that relies on the values in that block? For instance, the ideal design (however not possible due to the fact that the thread executing these blocks is not the main thread) is
func prepareEmails() {
var names = getNames()
var emails = getEmails()
...
sendEmails()
}
func getNames() -> NSArray {
var names = nil
....
var task = NSURLSession.sharedSession().dataTaskWithRequest(request, completionHandler: {data, response, error -> Void in
names = ...
})
task.resume()
return names
}
func getEmails() -> NSArray {
var emails = nil
....
var task = NSURLSession.sharedSession().dataTaskWithRequest(request, completionHandler: {data, response, error -> Void in
emails = ...
})
task.resume()
return emails
}
However in the above design, most likely getNames() and getEmails() will return nil, as the the task will not have updated emails/name by the time it returns.
The alternative design (which I currently implement) is by effectively removing the 'prepareEmails' function and doing everything sequentially in the task functions
func prepareEmails() {
getNames()
}
func getNames() {
...
var task = NSURLSession.sharedSession().dataTaskWithRequest(request, completionHandler: {data, response, error -> Void in
getEmails(names)
})
task.resume()
}
func getEmails(names: NSArray) {
...
var task = NSURLSession.sharedSession().dataTaskWithRequest(request, completionHandler: {data, response, error -> Void in
sendEmails(emails, names)
})
task.resume()
}
Is there a more effective design than the latter? This is my first experience with concurrency, so any advice would be greatly appreciated.
The typical pattern when calling an asynchronous method that has a completionHandler parameter is to use the completionHandler closure pattern, yourself. So the methods don't return anything, but rather call a closure with the returned information as a parameter:
func getNames(completionHandler:(NSArray!)->()) {
....
let task = NSURLSession.sharedSession().dataTaskWithRequest(request) {data, response, error -> Void in
let names = ...
completionHandler(names)
}
task.resume()
}
func getEmails(completionHandler:(NSArray!)->()) {
....
let task = NSURLSession.sharedSession().dataTaskWithRequest(request) {data, response, error -> Void in
let emails = ...
completionHandler(emails)
}
task.resume()
}
Then, if you need to perform these sequentially, as suggested by your code sample (i.e. if the retrieval of emails was dependent upon the names returned by getNames), you could do something like:
func prepareEmails() {
getNames() { names in
getEmails() {emails in
sendEmails(names, emails) // I'm assuming the names and emails are in the input to this method
}
}
}
Or, if they can run concurrently, then you should do so, as it will be faster. The trick is how to make a third task dependent upon two other asynchronous tasks. The two traditional alternatives include
Wrapping each of these asynchronous tasks in its own asynchronous NSOperation, and then create a third task dependent upon those other two operations. This is probably beyond the scope of the question, but you can refer to the Operation Queue section of the Concurrency Programming Guide or see the Asynchronous vs Synchronous Operations and Subclassing Notes sections of the NSOperation Class Reference.
Use dispatch groups, entering the group before each request, leaving the group within the completion handler of each request, and then adding a dispatch group notification block (called when all of the group "enter" calls are matched by their corresponding "leave" calls):
func prepareEmails() {
let group = dispatch_group_create()
var emails: NSArray!
var names: NSArray!
dispatch_group_enter(group)
getNames() { results in
names = results
dispatch_group_leave(group)
}
dispatch_group_enter(group)
getEmails() {results in
emails = results
dispatch_group_leave(group)
}
dispatch_group_notify(group, dispatch_get_main_queue()) {
if names != nil && emails != nil {
self.sendEmails(names, emails)
} else {
// one or both of those requests failed; tell the user
}
}
}
Frankly, if there's any way to retrieve both the emails and names in a single network request, that's going to be far more efficient. But if you're stuck with two separate requests, you could do something like the above.
Note, I wouldn't generally use NSArray in my Swift code, but rather use an array of String objects (e.g. [String]). Furthermore, I'd put in error handling where I return the nature of the error if either of these fail. But hopefully this illustrates the concepts involved in (a) writing your own methods with completionHandler blocks; and (b) invoking a third bit of code dependent upon the completion of two other asynchronous tasks.
The answers above (particularly Rob's DispatchQueue based answer) describe the concurrency concepts necessary to run two tasks in parallel and then respond to the result. The answers lack error handling for clarity because traditionally, correct solutions to concurrency problems are quite verbose.
Not so with HoneyBee.
HoneyBee.start()
.setErrorHandler(handleErrorFunc)
.branch {
$0.chain(getNames)
+
$0.chain(getEmails)
}
.chain(sendEmails)
This code snippet manages all of the concurrency, routes all errors to handleErrorFunc and looks like the concurrent pattern that is desired.
My code runs 4 function to fill in information (Using Invoke) to a class such as:
class Person
{
int Age;
string name;
long ID;
bool isVegeterian
public static Person GetPerson(int LocalID)
{
Person person;
Parallel.Invoke(() => {GetAgeFromWebServiceX(person)},
() => {GetNameFromWebServiceY(person)},
() => {GetIDFromWebServiceZ(person)},
() =>
{
// connect to my database and get information if vegeterian (using LocalID)
....
if (!person.isVegetrian)
return null
....
});
}
}
My question is: I can not return null if he's not a vegeterian, but I want to able to stop all threads, stop processing and just return null. How can it be achieved?
To exit the Parallel.Invoke as early as possible you'd have to do three things:
Schedule the action that detects whether you want to exit early as the first action. It's then scheduled sooner (maybe as first, but that's not guaranteed) so you'll know sooner whether you want to exit.
Throw an exception when you detect the error and catch an AggregateException as Jon's answer indicates.
Use cancellation tokens. However, this only makes sense if you have an opportunity to check their IsCancellationRequested property.
Your code would then look as follows:
var cts = new CancellationTokenSource();
try
{
Parallel.Invoke(
new ParallelOptions { CancellationToken = cts.Token },
() =>
{
if (!person.IsVegetarian)
{
cts.Cancel();
throw new PersonIsNotVegetarianException();
}
},
() => { GetAgeFromWebServiceX(person, cts.Token) },
() => { GetNameFromWebServiceY(person, cts.Token) },
() => { GetIDFromWebServiceZ(person, cts.Token) }
);
}
catch (AggregateException e)
{
var cause = e.InnerExceptions[0];
// Check if cause is a PersonIsNotVegetarianException.
}
However, as I said, cancellation tokens only make sense if you can check them. So there should be an opportunity inside GetAgeFromWebServiceX to check the cancellation token and exit early, otherwise, passing tokens to these methods doesn't make sense.
Well, you can throw an exception from your action, catch AggregateException in GetPerson (i.e. put a try/catch block around Parallel.Invoke), check for it being the right kind of exception, and return null.
That fulfils everything except stopping all the threads. I think it's unlikely that you'll easily be able to stop already running tasks unless you start getting into cancellation tokens. You could stop further tasks from executing by keeping a boolean value to indicate whether any of the tasks so far has failed, and make each task check that before starting... it's somewhat ugly, but it will work.
I suspect that using "full" tasks instead of Parallel.Invoke would make all of this more elegant though.
Surely you need to load your Person from the database first anyway? As it is your code calls the Web services with a null.
If your logic really is sequential, do it sequentially and only do in parallel what makes sense.