I would like to know what is the correct way to visualize background jobs which are running on a schedule controlled by scheduling service?
In my opinion the correct way should be to have an action representing the scheduling itself, followed by fork node which splits the flow for each of the respective scheduled jobs.
Example. On a schedule Service X is supposed to collect data from an API every day, on another schedule Service Y is supposed to aggregate the collected data.
I've tried to research old themes and find any diagram representing similar activity.
Your current diagram says that:
first the scheduler does something (e.g. identifying the jobs to launch)
then passes control in parallel to all the jobs it wants to launch
no other jobs are scheduled after the scheduler finished its task
the first job that finshes interrupts all the others.
The way it should work would be:
first the scheduler is setup
then the setup launches the real scheduler, which will run in parallel to the scheduled jobs
scheduled jobs can finish (end flow, i.e. a circle with a X inside) without terminating everything
the activity would stop (flow final) only when the scheduler is finished.
Note that the UML specifications do not specify how parallelism is implemented. And neither does your scheduler: whether it is true parallelism using multithreaded CPUs or multiple CPUs, or whether it is time slicing where some interruptions are used to switch between tasks that are executed in reality in small sequential pieces is not relevant for this modeling.
The remaining challenges are:
the scheduler could launch additional jobs. One way of doing it could be to fork back to itself and to a new job.
the scheduler could launch a variable number of jobs in parallel. A better way to represent it is with a «parallel» expansion region, with the input corresponding to task object, and actions that consume the taks by executing them.
if the scheduler runs in parallel to the expansion region, you could also imagine that the schedule provides at any moment some additional input (new tasks to be processed).
Related
I want to find a way to execute the script tasks of the separate instances of the same workflow sequentially.
In my case multiple workflow instances are being started on one resource in parallel by a script task basing on some attributes of the resource that the master flow is opened on and the script tasks of those instances are run in parallel, which I don't want. I tried both options of "Asynchronous" flag, but it still executes the script tasks in parallel. For now I'm just saving the duration for sleep() function as variable in the function that starts those instances putting the various values depending on a condition and it basically works, but using it is not the best practice, so maybe some of you, more experienced colleagues will be able to help me finding a "nicer" way to resolve my problem.
Use inline signal or message events to signal from one process to the other. On completion of one task in one process, signal the release of the next task in the next process.
Continue until all tasks are complete.
I have a gearman job that runs and itself executes more jobs when in turn may execute more jobs. I would like some kind of callback when all nested jobs have completed. I can easily do this, but my implementations would tie up workers (spin until children are complete) which I do not want to do.
Is there a workaround? There is no concept of "groups" in Gearman AFAIK, so I can't add jobs to a group and have something fire once that group has completed.
As you say, there's nothing built-in to Gearman to handle this. If you don't want to tie up a worker (and letting that worker add tasks and track their completion for you), you'll have to do out-of-band status tracking.
A way to do this is to keep a group identifier in memcached, and increment the number of finished subtasks when a task finishes, and increment the number of total tasks when you add a new one for the same group. You can then poll memcached to see the current state of execution (tasks finished vs tasks total).
My question might sound a bit naive but I'm pretty new with multi-threaded programming.
I'm writing an application which processes incoming external data. For each data that arrives a new task is created in the following way:
System.Threading.Tasks.Task.Factory.StartNew(() => methodToActivate(data));
The items of data arrive very fast (each second, half second, etc...), so many tasks are created. Handling each task might take around a minute. When testing it I saw that the number of threads is increasing all the time. How can I limit the number of tasks created, so the number of actual working threads is stable and efficient. My computer is only dual core.
Thanks!
One of your issues is that the default scheduler sees tasks that last for a minute and makes the assumption that they are blocked on another tasks that have yet to be executed. To try and unblock things it schedules more pending tasks, hence the thread growth. There are a couple of things you can do here:
Make your tasks shorter (probably not an option).
Write a scheduler that deals with this scenario and doesn't add more threads.
Use SetMaxThreads to prevent
unbounded thread pool growth.
See the section on Thread Injection here:
http://msdn.microsoft.com/en-us/library/ff963549.aspx
You should look into using the producer/consumer pattern with a BlockingCollection<T> around a ConcurrentQueue<T> where you set the BoundedCapacity to something that makes sense given the characteristics of your workload. You can make your BoundedCapacity configurable and then tweak as you run through some profiling sessions to find the sweet spot.
While it's true that the TPL will take care of queueing up the tasks you create, creating too many tasks does not come without penalties. Also, what's the point in producing more work than you can consume? You want to produce enough work that the consumers will never be starved, but you don't want to get to far ahead of yourself because that's just wasting resources and potentially stealing those very same resources from your consumers.
You can create a custom TaskScheduler for the Task Parallel library and then schedule tasks on that by passing an instance of it to the TaskFactory constructor.
Here's one example of how to do that: Task Scheduler with a maximum degree of parallelism.
I was just wondering if there's any field which concerns the task-control programming (or at least that's the way I call it).
For a better explanation of task-control consider the following scenario:
An application (master-thread) waits for a command - which might be a particular action or a set of actions the application should perform.
When a command is received the master-thread creates a task (= spawns an independent thread which actually does the action) and adds a record in it's task-list - thus keeping track of the time of execution, thread handle, task priority...etc.
The master-thread awaits for any other incoming commands while taking care of all the tasks - e.g: kills tasks running too long, prioritizes tasks with higher priorities, kills a task on a request of another task, limits the number of currently running tasks, allows task scheduling, cleans finished tasks (threads) and so on.
The model is pretty similar to what we can see in OS dealing with running processes.
Are there any good practices programming such task-models or is there some theoretical work done in this field? Maybe my question is too generalized, but at least I wanted to know whether there are any experiences working on such models or if there's a better approach.
Thanks for any answers.
There are a number of patterns which may help you here.
http://parlab.eecs.berkeley.edu/wiki/patterns/patterns
Task queue and master/worker would be reasonable places to start.
I will do several tasks periodically in my app, but they have different period, how to do this?
run each task in a separate timer thread.
run all the period task in the same timer thread, but check the time to see if the task should be activated.
do you have any better solution?
It would mostly depend on how many tasks you have to run.
With two or three tasks it would make sense to keep have a separate timer for each task, but will get unwieldy with more tasks.
If there a good number of tasks I would have single timer that checks a list of tasks to see if there are any tasks ready to run. That way to add a task; just add it to the list. Having a list of tasks would also make it easy for the tasks to be data driven.
Sounds like you should execute each task on its own thread. IT will easy the configuration of timing and controlling start/stop of each task
Using the Timer control is a good option, if the tasks should execute every given time delta.