Hazelcast Scheduled Executor always starts task on newest member - hazelcast

I am using Hazelcast Scheduled Executor Service in order to run a specific task, on one service instance only. In order to achieve this behavior we take advantage of com.hazelcast.scheduledexecutor.TaskUtils.named(java.lang.String, java.lang.Runnable) decorator to avoid duplicate tasks.
Hazelcast dependencies used:
implementation "com.hazelcast:hazelcast-all:4.2"
implementation "com.hazelcast:hazelcast-kubernetes:2.2.2" // DNS Lookup
We use a hazelcast cluster with one member per service instance
Code example:
public void scheduleTask() {
IScheduledExecutorService es = hazelcastInstance.getScheduledExecutorService("myScheduledExecutor");
try {
es.scheduleAtFixedRate(named("taskName", task)), 0, 30, SECONDS);
} catch (DuplicateTaskException ex) {
System.out.println("Task was already scheduled!");
}
}
The above example manages to partially achieve the desired behavior. The only issue is that every time a new instance spins up, the scheduled executor will run the task on that specific instance. This is not ideal since we would like the task to execute, for example, once every 6 hours.
Is there any way to configure the scheduled executor so that it maintains running the task on the original instance it started and transfers to another instance only if the original one goes down?

This is not directly possible, but there is an approximation.
IScheduledExecutorService does not provide a method to provide an ordering for which members to use.
You can submit a task to a specific member, but this won't failover to a member of your choosing if the original specific member dies.
... various other options ...
Or the option in your code, scheduleAtFixedRate which will pick a member, and this may be a different member each time the cluster changes size. It should not always be the newest unless by co-incidence.
What you could do is have a scheduled task that selects a member to run an ordinary task upon. One task that launches another.
In the scheduled task, the run() could call hazelcastInstance.getCluster().getMembers() to get the list of members in the cluster. All it needs is some logic to pick a member, and then do hazelcastInstance.getExecutorService("default").executeOnMember(runnable, member).
You might pick a member with a specific attribute that you configure. Or a member with a specific IP. Perhaps simplest is to pick the oldest, since this doesn't change until the oldest leaves, in which case 2nd olest is now oldest, easy failover.

The workaround I used was to dispose already scheduled task and schedule again with an initial delay which I calculate based on the last time the task was scheduled.
Something like:
public void scheduleTask() {
IScheduledExecutorService es = hazelcastInstance.getScheduledExecutorService("myScheduledExecutor");
try {
disposePreviousScheduledTasks(es);
es.scheduleAtFixedRate(named("taskName", task)), initialDelay, 30, SECONDS);
} catch (DuplicateTaskException ex) {
System.out.println("Task was already scheduled!");
}
}

Related

Appropriate solution for long running computations in Azure App Service and .NET Core 3.1?

What is an appropriate solution for long running computations in Azure App Service and .NET Core 3.1 in an application that has no need for a database and no IO to anything outside of this application ? It is a computation task.
Specifically, the following is unreliable and needs a solution.
[Route("service")]
[HttpPost]
public Outbound Post(Inbound inbound)
{
Debug.Assert(inbound.Message.Equals("Hello server."));
Outbound outbound = new Outbound();
long Billion = 1000000000;
for (long i = 0; i < 33 * Billion; i++) // 230 seconds
;
outbound.Message = String.Format("The server processed inbound object.");
return outbound;
}
This sometimes returns a null object to the HttpClient (not shown). A smaller workload will always succeed. For example 3 billion iterations always succeeds. A bigger number would be nice specifically 240 billion is a requirement.
I think in the year 2020 a reasonable goal in Azure App Service with .NET Core might be to have a parent thread count to 240 billion with the help of 8 child threads so each child counts to 30 billion and the parent divides an 8 M byte inbound object into smaller objects inbound to each child. Each child receives a 1 M byte inbound and returns to the parent a 1 M byte outbound. The parent re-assembles the result into a 8 M byte outbound.
Obviously the elapsed time will be 12.5%, or 1/8, or one-eighth, of the time a single thread implementation would need. The time to cut-up and re-assemble objects is small compared to the computation time. I am assuming the time to transmit the objects is very small compared to the computation time so the 12.5% expectation is roughly accurate.
If I can get 4 or 8 cores that would be good. If I can get threads that give me say 50% of the cycles of a core, then I would need may be 8 or 16 threads. If each thread gives me 33% of the cycles of a core then I would need 12 or 24 threads.
I am considering the BackgroundService class but I am looking for confirmation that this is the correct approach. Microsoft says...
BackgroundService is a base class for implementing a long running IHostedService.
Obviously if something is long running it would be better to make it finish sooner by using multiple cores via System.Threading but this documentation seems to mention System.Threading only in the context of starting tasks via System.Threading.Timer. My example code shows there is no timer needed in my application. An HTTP POST will serve as the occasion to do work. Typically I would use System.Threading.Thread to instantiate multiple objects to use multiple cores. I find the absence of any mention of multiple cores to be a glaring omission in the context of a solution for work that takes a long time but may be there is some reason Azure App Service doesn't deal with this matter. Perhaps I am just not able to find it in tutorials and documentation.
The initiation of the task is the illustrated HTTP POST controller. Suppose the longest job takes 10 minutes. The HTTP client (not shown) sets the timeout limit to 1000 seconds which is much more than 10 minutes (600 seconds) in order for there to be a margin of safety. HttpClient.Timeout is the relevant property. For the moment I am presuming the HTTP timeout is a real limit; rather than some sort of non-binding (fake limit) such that some other constraint results in the user waiting 9 minutes and receiving an error message. A real binding limit is a limit for which I can say "but for this timeout it would have succeeded". If the HTTP timeout is not the real binding limit and there is something else constraining the system, I can adjust my HTTP controller to instead have three (3) POST methods. Thus POST1 would mean start a task with the inbound object. POST2 means tell me if it is finished. POST3 means give me the outbound object.
What is an appropriate solution for long running computations in Azure App Service and .NET Core 3.1 in an application that has no need for a database and no IO to anything outside of this application ? It is a computation task.
Prologue
A few years ago a ran in to a pretty similar problem. We needed a service that could process large amounts of data. Sometimes the processing would take 10 seconds, other times it could take an hour.
At first we did it how your question illustrates: Send a request to the service, the service processes the data from the request and returns the response when finished.
Issues At Hand
This was fine when the job only took around a minute or less, but anything above this, the server would shut down the session and the caller would report an error.
Servers have a default of around 2 minutes to produce a response before it gives up on the request. It doesn't quit the processing of the request... but it does quit the HTTP session. It doesn't matter what parameters you set on your HttpClient, the server is the one that delegates how long is too long.
Reasons For Issues
All this is for good reasons. Server sockets are extremely expensive. You have a finite amount to go around. The server is trying to protect your service by severing requests that are taking longer than a specified time in order to avoid socket starvation issues.
Typically you want your HTTP requests to take only a few milliseconds. If they are taking longer than this, you will eventually run in to socket issues if your service has to fulfil other requests at a high rate.
Solution
We decided to go the route of IHostedService, specifically the BackgroundService. We use this service in conjunction with a Queue. This way you can set up a queue of jobs and the BackgroundService will process them one at a time (in some instances we have service processing multiple queue items at once, in others we scaled horizontally producing two or more queues).
Why an ASP.NET Core service running a BackgroundService? I wanted to handle this without tightly-coupling to any Azure-specific constructs in case we needed to move out of Azure to some other cloud service (back in the day we were contemplating this for other reasons we had at the time.)
This has worked out quite well for us and we haven't seen any issues since.
The work flow goes like this:
Caller sends a request to the service with some parameters
Service generates a "job" object and returns an ID immediately via 202 (accepted) response
Service places this job in to a queue that is being maintained by a BackgroundService
Caller can query the job status and get information about how much has been done and how much is left to go using this job ID
Service finishes the job, puts the job in to a "completed" state and goes back to waiting on the queue to produce more jobs
Keep in mind your service has the capability to scale horizontally where there would be more than one instance running. In this case I am using Redis Cache to store the state of the jobs so that all instances share the same state.
I also added in a "Memory Cache" option to test things locally if you don't have a Redis Cache available. You could run the "Memory Cache" service on a server, just know that if it scales then your data will be inconsistent.
Example
Since I'm married with kids, I really don't do much on Friday nights after everyone goes to bed, so I spent some time putting together an example that you can try out. The full solution is also available for you to try out.
QueuedBackgroundService.cs
This class implementation serves two specific purposes: One is to read from the queue (the BackgroundService implementation), the other is to write to the queue (the IQueuedBackgroundService implementation).
public interface IQueuedBackgroundService
{
Task<JobCreatedModel> PostWorkItemAsync(JobParametersModel jobParameters);
}
public sealed class QueuedBackgroundService : BackgroundService, IQueuedBackgroundService
{
private sealed class JobQueueItem
{
public string JobId { get; set; }
public JobParametersModel JobParameters { get; set; }
}
private readonly IComputationWorkService _workService;
private readonly IComputationJobStatusService _jobStatusService;
// Shared between BackgroundService and IQueuedBackgroundService.
// The queueing mechanism could be moved out to a singleton service. I am doing
// it this way for simplicity's sake.
private static readonly ConcurrentQueue<JobQueueItem> _queue =
new ConcurrentQueue<JobQueueItem>();
private static readonly SemaphoreSlim _signal = new SemaphoreSlim(0);
public QueuedBackgroundService(IComputationWorkService workService,
IComputationJobStatusService jobStatusService)
{
_workService = workService;
_jobStatusService = jobStatusService;
}
/// <summary>
/// Transient method via IQueuedBackgroundService
/// </summary>
public async Task<JobCreatedModel> PostWorkItemAsync(JobParametersModel jobParameters)
{
var jobId = await _jobStatusService.CreateJobAsync(jobParameters).ConfigureAwait(false);
_queue.Enqueue(new JobQueueItem { JobId = jobId, JobParameters = jobParameters });
_signal.Release(); // signal for background service to start working on the job
return new JobCreatedModel { JobId = jobId, QueuePosition = _queue.Count };
}
/// <summary>
/// Long running task via BackgroundService
/// </summary>
protected override async Task ExecuteAsync(CancellationToken stoppingToken)
{
while(!stoppingToken.IsCancellationRequested)
{
JobQueueItem jobQueueItem = null;
try
{
// wait for the queue to signal there is something that needs to be done
await _signal.WaitAsync(stoppingToken).ConfigureAwait(false);
// dequeue the item
jobQueueItem = _queue.TryDequeue(out var workItem) ? workItem : null;
if(jobQueueItem != null)
{
// put the job in to a "processing" state
await _jobStatusService.UpdateJobStatusAsync(
jobQueueItem.JobId, JobStatus.Processing).ConfigureAwait(false);
// the heavy lifting is done here...
var result = await _workService.DoWorkAsync(
jobQueueItem.JobId, jobQueueItem.JobParameters,
stoppingToken).ConfigureAwait(false);
// store the result of the work and set the status to "finished"
await _jobStatusService.StoreJobResultAsync(
jobQueueItem.JobId, result, JobStatus.Success).ConfigureAwait(false);
}
}
catch(TaskCanceledException)
{
break;
}
catch(Exception ex)
{
try
{
// something went wrong. Put the job in to an errored state and continue on
await _jobStatusService.StoreJobResultAsync(jobQueueItem.JobId, new JobResultModel
{
Exception = new JobExceptionModel(ex)
}, JobStatus.Errored).ConfigureAwait(false);
}
catch(Exception)
{
// TODO: log this
}
}
}
}
}
It is injected as so:
services.AddHostedService<QueuedBackgroundService>();
services.AddTransient<IQueuedBackgroundService, QueuedBackgroundService>();
ComputationController.cs
The controller used to read/write jobs looks like this:
[ApiController, Route("api/[controller]")]
public class ComputationController : ControllerBase
{
private readonly IQueuedBackgroundService _queuedBackgroundService;
private readonly IComputationJobStatusService _computationJobStatusService;
public ComputationController(
IQueuedBackgroundService queuedBackgroundService,
IComputationJobStatusService computationJobStatusService)
{
_queuedBackgroundService = queuedBackgroundService;
_computationJobStatusService = computationJobStatusService;
}
[HttpPost, Route("beginComputation")]
[ProducesResponseType(StatusCodes.Status202Accepted, Type = typeof(JobCreatedModel))]
public async Task<IActionResult> BeginComputation([FromBody] JobParametersModel obj)
{
return Accepted(
await _queuedBackgroundService.PostWorkItemAsync(obj).ConfigureAwait(false));
}
[HttpGet, Route("computationStatus/{jobId}")]
[ProducesResponseType(StatusCodes.Status200OK, Type = typeof(JobModel))]
[ProducesResponseType(StatusCodes.Status404NotFound, Type = typeof(string))]
public async Task<IActionResult> GetComputationResultAsync(string jobId)
{
var job = await _computationJobStatusService.GetJobAsync(jobId).ConfigureAwait(false);
if(job != null)
{
return Ok(job);
}
return NotFound($"Job with ID `{jobId}` not found");
}
[HttpGet, Route("getAllJobs")]
[ProducesResponseType(StatusCodes.Status200OK,
Type = typeof(IReadOnlyDictionary<string, JobModel>))]
public async Task<IActionResult> GetAllJobsAsync()
{
return Ok(await _computationJobStatusService.GetAllJobsAsync().ConfigureAwait(false));
}
[HttpDelete, Route("clearAllJobs")]
[ProducesResponseType(StatusCodes.Status200OK)]
[ProducesResponseType(StatusCodes.Status401Unauthorized)]
public async Task<IActionResult> ClearAllJobsAsync([FromQuery] string permission)
{
if(permission == "this is flakey security so this can be run as a public demo")
{
await _computationJobStatusService.ClearAllJobsAsync().ConfigureAwait(false);
return Ok();
}
return Unauthorized();
}
}
Working Example
For as long as this question is active, I will maintain a working example you can try out. For this specific example, you can specify how many iterations you would like to run. To simulate long-running work, each iteration is 1 second. So, if you set the iteration value to 60, it will run that job for 60 seconds.
While it's running, run the computationStatus/{jobId} or getAllJobs endpoint. You can watch all the jobs update in real time.
This example is far from a fully-functioning-covering-all-edge-cases-full-blown-ready-for-production example, but it's a good start.
Conclusion
After a few years of working in the back-end, I have seen a lot of issues arise by not knowing all the "rules" of the back-end. Hopefully this answer will shed some light on issues I had in the past and hopefully this saves you from having to deal with said problems.
One option could be to try out Azure Durable Functions, which are more oriented to long-running jobs that warrant checkpoints and state as against attempting to finish within the context of the triggering request. It also has the concept of fan-out/fan-in, in case what you're describing could be divided into smaller jobs with an aggregated result.
If just raw compute is the goal, Azure Batch might be a better option since it facilitates that scaling.
I assume the actual work that needs be done is something other than iterating over a loop doing nothing, so in terms of possible parallelization I can't offer much help right now. Is the work CPU intensive or IO related?
When it comes to long running work in an Azure App Service, one of the option is to use a Web Job. A possible solution would be to post the request for computation to a queue (Storage Queue or Azure Message Bus Queues). The webjob then processes those messages and possibly puts a new message on another queue that the requester can use to handle the results.
If the time needed for processing is guaranteed to be less than 10 minutes you could replace the Web Job with an Queue Triggered Azure Function. It is a serverless offering on Azure with great scaling possibilities.
Another option is indeed using a Service Worker or an instance of an IHostingService and do some queue processing there.
Since you're saying that your computation succeeds at fewer iterations, a simple solution is to simply save your results periodically and resume the computation.
For example, say you need to perform 240 Billion iterations, and you know that the highest number of iterations to perform reliably is 3 Billion iterations, I would set up the following:
A slave, that actually performs the task (240Billion iterations)
A master that periodically received input from the slave about progress.
The slave can periodically send a message to the master (say once every 2billion iterations ?). This message could contain whatever is relevant to resume the computation should the computation be interrupted.
The master should keep track of the slave. If the master determines that the slave has died / crashed / whatever, the master should simply create a new slave which should resume computation from the last reported position.
How exactly you implement the master and slave is a matter of your personal preference.
Rather than have a single loop perform 240 billion iterations, if you can split your computation across nodes, I would try to simultaneously compute the solution in parallel across as many nodes as possible.
I personally use node.js for multicore projects. Although you are using asp.net, I include this example of node.js to illustrate the architecture that works for me.
Node.js on multi-core machines
https://dzone.com/articles/multicore-programming-in-nodejs
As Noah Stahl has mentioned in his answer, Azure Durable Functions and Azure Batch seem like options to help you achieve your goal on your platform. Please see his answer for more details.
The standard answer is to use asynchronous messaging. I have a blog series on the topic. This is particularly the case since you're already in Azure.
You already have an Azure web app service, but now you want to run code outside of a request - "request-extrinsic code". The proper way to run that code is in a separate process - Azure Functions or Azure WebJobs are a good match for Azure webapps.
First, you want a durable queue. Azure Storage Queues are a good fit since you're in Azure anyway. Then your webapi can just write a message into the queue and return. The important part here is that this is a durable queue, not an in-memory queue.
Meanwhile, the Azure Function / WebJob is processing that queue. It will pick up the work from the queue and execute it.
The final piece of the puzzle is the completion notification. This is a pretty common approach:
I can adjust my HTTP controller to instead have three (3) POST methods. Thus POST1 would mean start a task with the inbound object. POST2 means tell me if it is finished. POST3 means give me the outbound object.
To do this, your background processor should save the "in-progress" / "complete/result" state somewhere where the webapi process can access it. If you already have a shared database (and it makes sense to keep results), then this may be the easiest choice. I would also consider using Azure Cosmos DB, which has a nice time-to-live setting so the background service can inject the results that are "good for 24 hours" or whatever, after which they're automatically cleaned up.

Using Hibernate, Spring Data JPA in multithreading [duplicate]

I am using Spring Batch and Partition to do parallel processing. Hibernate and Spring Data Jpa for db. For the partition step, the reader, processor and writer have stepscope and so I can inject partition key and range(from-to) to them. Now in processor, I have one synchronized method and expected this method to be ran once at time, but it is not the case.
I set it to have 10 partitions , all 10 Item reader read the right partitioned range. The problem comes with item processor. Blow code has the same logic I use.
public class accountProcessor implementes ItemProcessor{
#override
public Custom process(item) {
createAccount(item);
return item;
}
//account has unique constraints username, gender, and email
/*
When 1 thread execute that method, it will create 1 account
and save it. If next thread comes in and try to save the same account,
it should find the account created by first thread and do one update.
But now it doesn't happen, instead findIfExist return null
and it try to do another insert of duplicate data
*/
private synchronized void createAccount(item) {
Account account = accountRepo.findIfExist(item.getUsername(), item.getGender(), item.getEmail());
if(account == null) {
//account doesn't exist
account = new Account();
account.setUsername(item.getUsername());
account.setGender(item.getGender());
account.setEmail(item.getEmail());
account.setMoney(10000);
} else {
account.setMoney(account.getMoney()-10);
}
accountRepo.save(account);
}
}
The expected output is that only 1 thread will run this method at any given time and so that there will be no duplicate inserttion in db as well as avoid DataintegrityViolationexception.
Actually result is that second thread can't find the first account and try to create a duplicate account and save to db, which will cause DataintegrityViolationexception, unique constraints error.
Since I synchronized the method, thread should execute it in order, second thread should wait for first thread to finish and then run, which mean it should be able to find the first account.
I tried with many approaches, like a volatile set to contains all unique accounts, do saveAndFlush to make commits asap, using threadlocal whatsoever, no of these works.
Need some help.
Since you made the item processor step-scoped, you don't really need synchronization as each step will have its own instance of the processor.
But it looks like you have a design problem rather than an implementation issue. You are trying to sychronize threads to act in a certain order in a parallel setup. When you decide to go parallel and divide the data into partitions and give each worker (either local or remote) a partition to work on, you must admit that these partitions will be processed in an undefined order and that there should be no relation between records of each partition or between the work done by each worker.
When 1 thread execute that method, it will create 1 account
and save it. If next thread comes in and try to save the same account,
it should find the account created by first thread and do one update. But now it doesn't happen, instead findIfExist return null and it try to do another insert of duplicate data
That's because the transaction of thread1 may not be committed yet, hence thread2 won't find the record you think have been inserted by thread1.
It looks like you are trying to create or update some accounts with a partitioned setup. I'm not sure if this setup is suitable for the problem at hand.
As a side note, I would not call accountRepo.save(account); in an item processor but rather do that in an item writer.
Hope this helps.

Why do sagas (aka, process managers) contain an internal state and why are they persisted to the event store?

A lot of articles on CQRS imply that sagas have an internal state and must be saved to the event store. I don't see why this is necessary.
For example, say I have three aggregates: Order, Invoice and Shipment. When a customer places an order, the order process starts. However, the shipment cannot be sent until the invoice has been paid and the shipment has first been prepared.
A customer places an order with the PlaceOrder command.
The OrderCommandHandler calls OrderRepository::placeOrder().
The OrderRepository::placeOrder() method returns an OrderPlaced event, which is stored in the EventStore and sent along the EventBus.
The OrderPlaced event contains the orderId and pre-allocates a invoiceId and shipmentId.
The OrderProcess ("saga") receives the OrderPlaced event, creating the invoice and preparing the shipment if necessary (achieving idempotence in the event handler).
6a. At some point in time, the OrderProcess receives the InvoicePaid event. It checks to see whether the shipment has been prepared by looking up the shipment in the ShipmentRepository, and if so, sends the shipment.
6b. At some point in time, the OrderProcess receives the ShipmentPrepared event. It chekcs to see whether the invoice has been paid by looking up the invoice in the InvoiceRepository, and if so, sends the shipment.
To all the experienced DDD/CQRS/ES gurus out there, can you please tell me what concept I'm missing and why this design of a "stateless saga" will not work?
class OrderCommandHandler {
public function handle(PlaceOrder $command) {
$event = $this->orderRepository->placeOrder($command->orderId, $command->customerId, ...);
$this->eventStore->store($event);
$this->eventBus->emit($event);
}
}
class OrderRepository {
public function placeOrder($orderId, $customerId, ...) {
$invoiceId = randomString();
$shipmentId = randomString();
return new OrderPlaced($orderId, $customerId, $invoiceId, $shipmentId);
}
}
class InvoiceRepository {
public function createInvoice($invoiceId, $customerId, ...) {
// Etc.
return new InvoiceCreated($invoiceId, $customerId, ...);
}
}
class ShipmentRepository {
public function prepareShipment($shipmentId, $customerId, ...) {
// Etc.
return new ShipmentPrepared($shipmentId, $customerId, ...);
}
}
class OrderProcess {
public function onOrderPlaced(OrderPlaced $event) {
if (!$this->invoiceRepository->hasInvoice($event->invoiceId)) {
$invoiceEvent = $this->invoiceRepository->createInvoice($event->invoiceId, $event->customerId, $event->invoiceId, ...);
$this->eventStore->store($invoiceEvent);
$this->eventBus->emit($invoiceEvent);
}
if (!$this->shipmentRepository->hasShipment($event->shipmentId)) {
$shipmentEvent = $this->shipmentRepository->prepareShipment($event->shipmentId, $event->customerId, ...);
$this->eventStore->store($shipmentEvent);
$this->eventBus->emit($shipmentEvent);
}
}
public function onInvoicePaid(InvoicePaid $event) {
$order = $this->orderRepository->getOrders($event->orderId);
$shipment = $this->shipmentRepository->getShipment($order->shipmentId);
if ($shipment && $shipment->isPrepared()) {
$this->sendShipment($shipment);
}
}
public function onShipmentPrepared(ShipmentPrepared $event) {
$order = $this->orderRepository->getOrders($event->orderId);
$invoice = $this->invoiceRepository->getInvoice($order->invoiceId);
if ($invoice && $invoice->isPaid()) {
$this->sendShipment($this->shipmentRepository->getShipment($order->shipmentId));
}
}
private function sendShipment(Shipment $shipment) {
$shipmentEvent = $shipment->send();
$this->eventStore->store($shipmentEvent);
$this->eventBus->emit($shipmentEvent);
}
}
Commands can fail.
That's the primary problem; the entire reason we have aggregates in the first place, is so that they can protect the business from invalid state changes. So what happens in onOrderPlaced() if the createInvoice command fails?
Furthermore (though somewhat related) you are lost in time. Process managers handle events; events are things that have already happened in the past. Ergo -- process managers are running in the past. In a very real sense, they can't even talk to anyone that has seen a more recent event than the one that they are processing right now (in fact, they might be the first handler to see this event, meaning everybody else is a step in the past).
This is why you can't run commands synchronously; your event handler is in the past, and the aggregate can't protect its invariant unless it is running in the present. You need the asynchronous dispatch to get the command running against the correct version of the aggregate.
Next problem: when you dispatch the command asynchronously, you can't directly observe the result. It might fail, or get lost en route, and the event handler won't know. The only way that it can determine that the command succeeded is by observing a generated event.
A consequence is that the process manager cannot distinguish a command that failed from a command that succeeded (but the event hasn't become visible yet). To support a finite sla, you need a timing service that wakes up the process manager from time to time to check on things.
When the process manager wakes up, it needs state to know if it has already finished the work.
With state, everything is so much simpler to manage. The process manager ccan re-issue possibly lost commands to be sure that they get through, without also flooding the domain with commands that have already succeeded. You can model the clock without throwing clock events into the domain itself.
What you are referring to seems to be along the lines of orchestration (with a process manager) vs choreography.
Choreography works absolutely fine but you will not have a process manager as a first-class citizen. Each command handler will determine what to do. Even my current project (December 2015) uses choreography quite a bit with a webMethods integration broker. Messages may even carry some of the state along with them. However, when anything needs to take place in parallel your are rather shafted.
A relevant service orchestration vs choreography question demonstrates these concepts quite nicely. One of the answers contains a nice pictorial representation and, as stated in the answer, more complex interactions typically require state for the process.
I find that you typically will require state when interacting with services and endpoints beyond your control. Human interaction, such as authorizations, also require this type of state.
If you can get away with not having state specifically for a process manager it may be OK. However, later on you may run into issues. For example, some low-level/core/infrastructure service may span across various processes. This may cause issues in a choreography scenario.

Distributed\Parallel computing using app-engine (java api)

I want to use the master-slave (worker) paradigm, to solve a problem. I have read that opening new threads manually (for example using thread pool) is not available and I need to use queue, attached code example:
class MyDeferred implements DeferredTask {
#Override
public void run() {
// Do something interesting
}
};
MyDeferred task = new MyDeferred();
// Set instance variables etc as you wish
Queue queue = QueueFactory.getDefaultQueue();
queue.add(withPayload(task));
How can I get the result of the workers (which were added to the queue)?
I need this info, in-order to solve the bigger problem.
Actually you can use threads on GAE, but there are limitations. If you need long-running threads you can use background threads, but this requires you to use backend instances.
If you opt to use task queue, then keep in mind that tasks do not "return" to caller. To aggregate results you'll need to use datastore.
You will have to write the results into the datastore.
Just as a starting point to think about it, you might pass a JobId as a parameter to the tasks, have each task write an entity with the result and the JobId, and then later query the datstore for the given JobId to get all the results.

How do I create a scheduler which never executes more than one Task at a time using async-await?

I want to implement a class or pattern that ensures that I never execute more than one Task at a time for a certain set of operations (HTTP calls). The invocations of the Tasks can come from different threads at random times. I want to make use of the async-await pattern so that the caller can handle exceptions by wrapping the call in a try-catch.
Here's an illustration of the intended flow of execution:
Pseudo code from caller:
try {
Task someTask = GetTask();
await SomeScheduler.ThrottledRun(someTask);
}
catch(Exception ex) {
// Handle exception
}
The Taskclass here might instead be an Action class depending on the solution.
Note that I when I use the word "Schedule" in this question I'm not necessarily using it with relation to the .NET Task Scheduler. I don't know the async-await library well enough to know at what angle and with what tools to approach this problem. The TaskScheduler might be relevant here, and it may not. I've read the TAP pattern document and found patterns that almost solve this problem, but not quite (the chapter on interleaving).
There is a new ConcurrentExclusiveSchedulerPair type in .NET 4.5 (I don't remember if it was included in the Async CTP), and you can use its ExclusiveScheduler to restrict execution to one Task at a time.
Consider structuring your problem as a Dataflow. It's easy to just pass a TaskScheduler into the block options for the parts of the dataflow you want restricted.
If you don't want to (or can't) use Dataflow, you can do something similar yourself. Remember that in TAP, you always return started tasks, so you don't have the "creation" separated from the "scheduling" like you do in TPL.
You can use ConcurrentExclusiveSchedulerPair to schedule Actions (or async lambdas without return values) like this:
public static ConcurrentExclusiveSchedulerPair schedulerPair =
new ConcurrentExclusiveSchedulerPair();
public static TaskFactory exclusiveTaskFactory =
new TaskFactory(schedulerPair.ExclusiveScheduler);
...
public static Task RunExclusively(Action action)
{
return exclusiveTaskFactory.StartNew(action);
}
public static Task RunExclusively(Func<Task> action)
{
return exclusiveTaskFactory.StartNew(action).Unwrap();
}
There are a few things to note about this:
A single instance of ConcurrentExclusiveSchedulerPair only coordinates Tasks that are queued to its schedulers. A second instance of ConcurrentExclusiveSchedulerPair would be independent from the first, so you have to ensure the same instance is used in all parts of your system you want coordinated.
An async method will - by default - resume on the same TaskScheduler that started it. So this means if one async method calls another async method, the "child" method will "inherit" the parent's TaskScheduler. Any async method may opt out of continuing on its TaskScheduler by using ConfigureAwait(false) (in that case, it continues directly on the thread pool).

Resources