Gearman callback with nested jobs - gearman

I have a gearman job that runs and itself executes more jobs when in turn may execute more jobs. I would like some kind of callback when all nested jobs have completed. I can easily do this, but my implementations would tie up workers (spin until children are complete) which I do not want to do.
Is there a workaround? There is no concept of "groups" in Gearman AFAIK, so I can't add jobs to a group and have something fire once that group has completed.

As you say, there's nothing built-in to Gearman to handle this. If you don't want to tie up a worker (and letting that worker add tasks and track their completion for you), you'll have to do out-of-band status tracking.
A way to do this is to keep a group identifier in memcached, and increment the number of finished subtasks when a task finishes, and increment the number of total tasks when you add a new one for the same group. You can then poll memcached to see the current state of execution (tasks finished vs tasks total).

Related

How to represent Activity Diagram with multiple scheduled background jobs?

I would like to know what is the correct way to visualize background jobs which are running on a schedule controlled by scheduling service?
In my opinion the correct way should be to have an action representing the scheduling itself, followed by fork node which splits the flow for each of the respective scheduled jobs.
Example. On a schedule Service X is supposed to collect data from an API every day, on another schedule Service Y is supposed to aggregate the collected data.
I've tried to research old themes and find any diagram representing similar activity.
Your current diagram says that:
first the scheduler does something (e.g. identifying the jobs to launch)
then passes control in parallel to all the jobs it wants to launch
no other jobs are scheduled after the scheduler finished its task
the first job that finshes interrupts all the others.
The way it should work would be:
first the scheduler is setup
then the setup launches the real scheduler, which will run in parallel to the scheduled jobs
scheduled jobs can finish (end flow, i.e. a circle with a X inside) without terminating everything
the activity would stop (flow final) only when the scheduler is finished.
Note that the UML specifications do not specify how parallelism is implemented. And neither does your scheduler: whether it is true parallelism using multithreaded CPUs or multiple CPUs, or whether it is time slicing where some interruptions are used to switch between tasks that are executed in reality in small sequential pieces is not relevant for this modeling.
The remaining challenges are:
the scheduler could launch additional jobs. One way of doing it could be to fork back to itself and to a new job.
the scheduler could launch a variable number of jobs in parallel. A better way to represent it is with a «parallel» expansion region, with the input corresponding to task object, and actions that consume the taks by executing them.
if the scheduler runs in parallel to the expansion region, you could also imagine that the schedule provides at any moment some additional input (new tasks to be processed).

How to force sequential execution of tasks in a separate instances in Activiti workflows

I want to find a way to execute the script tasks of the separate instances of the same workflow sequentially.
In my case multiple workflow instances are being started on one resource in parallel by a script task basing on some attributes of the resource that the master flow is opened on and the script tasks of those instances are run in parallel, which I don't want. I tried both options of "Asynchronous" flag, but it still executes the script tasks in parallel. For now I'm just saving the duration for sleep() function as variable in the function that starts those instances putting the various values depending on a condition and it basically works, but using it is not the best practice, so maybe some of you, more experienced colleagues will be able to help me finding a "nicer" way to resolve my problem.
Use inline signal or message events to signal from one process to the other. On completion of one task in one process, signal the release of the next task in the next process.
Continue until all tasks are complete.

Parallel processing with RabbitMQ

Anyone have any suggestions or resources that I can read up on how to go about scaling parallel tasks with fair processing and consume using rabbitMQ? If there are over 200 users with 2000+ tasks per user and want to make sure the tasks get processed and consumed equally while also having ability for users to get their tasks done in parallel. Each tasks take about 1s.
Below is how the app is currently handling that. But I feel like there is a much better way.
So quick overview of what the app is currently doing. User can create a lists of todo tasks. Each tasks takes about 1.5s max. We have parallel processing of these tasks in place using rabbitMQ. There are total of 6 workers for each server with prefetch value of 1. There are about 200 queues, call it task_queue_1, etc. So each task_queue is dedicated to some range of todo tasks.
For example task_queue_1 is dedicated to first 10 items from task, and so on. Each of the items, starting from task 1-10, get priority assigned from 10 to 1 to ensure first task gets process first, and so on.
The queues itself also have priority. Workers will pull from task_queue_1 first but if there are messages in task_queue_2 waiting for more than one minute the it will have bigger priority, and so on. This is handled through stack and ordering.
This has been doing fine but I realized it's not exactly scalable. If I have bunch of users submitting lists with over 2k tasks, some users will be forced to wait to get their tasks processed because workers are busy processing tasks from queue_2, queue_3, etc due to wait time longer than 1 minute.

Celery: ensure tasks are executed sequentially

In a project I'm currently working on, I issue Celery tasks every once in a while. These tasks are for specific clients, so there are tasks for e.g. clientA, for clientB and for clientC. There are some additional conditions:
Tasks for the same client may never be executed in parallel.
Tasks for the same client must be executed in sequence, i.e. message queue order.
The Celery cookbook (see also this article) shows a locking mechanism that ensures that a single task can only be executed one at a time. This mechanism could easily be adapted to ensure that a task for a single client could only be executed one at a time. This satisfies the first condition.
The second condition is harder to ensure. Since tasks are generated from different processes, I can't use task chaining. Perhaps I could modify the locking mechanism to retry tasks while they are waiting for the lock, but this still can not guarantee order (due to retry timeouts, but also due to a race condition in acquiring the lock.)
For now, i have limited my concurrency to 1 to ensure order, although some of the tasks are taking a long time and this scales quite badly.

Several tasks run periodically in one .NET winform app

I will do several tasks periodically in my app, but they have different period, how to do this?
run each task in a separate timer thread.
run all the period task in the same timer thread, but check the time to see if the task should be activated.
do you have any better solution?
It would mostly depend on how many tasks you have to run.
With two or three tasks it would make sense to keep have a separate timer for each task, but will get unwieldy with more tasks.
If there a good number of tasks I would have single timer that checks a list of tasks to see if there are any tasks ready to run. That way to add a task; just add it to the list. Having a list of tasks would also make it easy for the tasks to be data driven.
Sounds like you should execute each task on its own thread. IT will easy the configuration of timing and controlling start/stop of each task
Using the Timer control is a good option, if the tasks should execute every given time delta.

Resources