I am still fairly new to parallel computing so I am not too sure which tool to use for the job.
I have a System.Threading.Tasks.Task that needs to wait for n number number of tasks to finish before starting. The tricky part is some of its dependencies may start after this task starts (You are guaranteed to never hit 0 dependent tasks until they are all done).
Here is kind of what is happening
Parent thread creates somewhere between 1 and (NUMBER_OF_CPU_CORES - 1) tasks.
Parent thread creates task to be run when all of the worker tasks are finished.
Parent thread creates a monitoring thread
Monitoring thread may kill a worker task or spawn a new task depending on load.
I can figure out everything up to step 4. How do I get the task from step 2 to wait to run until any new worker threads created in step 4 finish?
You can pass an array of the Tasks you're waiting on to TaskFactory.ContinueWhenAll, along with the new task to start after they're all done.
edit: Possible workaround for your dynamically generated tasks problem: have a two-step continuation; every "dependent task" you start should have a chained ContinueWith which checks the total number of tasks still running, and if it's zero, launches the actual continuation task. That way, every task will do the check when it's done, but only the last one will launch the next phase. You'll need to synchronize access to the "remaining tasks" counter, of course.
Related
I wonder how Camunda manage multiple instances of a sub-process.
For example this BPMN:
Let's say multi-instances process would iterate on a big collection, 500 instances.
I have a function in a web app that call the endpoint to complete the user common task, and perform another call to camunda engine to get all tasks (on first API call callback). I am supposed to get a list of 500 sub-process user tasks (the ones generated by the multi-instances process).
What if the get tasks call is performed before Camunda Engine successfully instantiated all sub-processes?
Do i get a partial list of task ?
How to detect that main and sub processes are ready?
I don't really know if Camunda is able to manage this problematic by itself so I though of the following solution, knowing I only can use Modeler environment with Groovy to add code (Javascript as well, but the entire code parts already added are groovy):
Use of a sub process throw event to catch in main process, then count and compare tasks ready with awaited tasks number for each signal emitted.
Thanks
I would maybe likely spawn the tasks as parallel process (or 500 of them) and then got to a next step in which I signal or otherwise set a state that indicates the spawning is completed. I would further join the parallel processes all together again and have here a task signaling or otherwise setting a state that indicates all the parallel processes are done. See https://docs.camunda.org/manual/7.12/reference/bpmn20/gateways/parallel-gateway/. This way you can know exactly at what point (after spawning is done and before the join) you have a chance of getting your 500 spawned sub processes
I have one DAG that has three task streams (licappts, agents, agentpolicy):
For simplicity I'm calling these three distinct streams. The streams are independent in the sense that just because agentpolicy failed doesn't mean the other two (liceappts and agents) should be affected by the other streams failure.
But for the sourceType_emr_task_1 tasks (i.e., licappts_emr_task_1, agents_emr_task_1, and agentpolicy_emr_task_1) I can only run one of these tasks at a time. For example I can't run agents_emr_task_1 and agentpolicy_emr_task_1 at the same time even though they are two independent tasks that don't necessarily care about each other.
How can I achieve this functionality in Airflow? For now the only thing I can think of is to wrap that task in a script that somehow locks a global variable, then if the variable is locked I'll have the script do a Thread.sleep(60 seconds) or something, and then retry. But that seems very hacky and I'm curious if Airflow offers a solution for this.
I'm open to restructuring the ordering of my DAG if needed to achieve this. One thing I thought about doing was to make a hard coded ordering of
Dag Starts -> ... -> licappts_emr_task_1 -> agents_emr_task_1 -> agentpolicy_emr_task_1 -> DAG Finished
But I don't think combining the streams this way because then for example agentpolicy_emr_task_1 has to wait for the other two to finish before it can start and there could be times when agentpolicy_emr_task_1 is ready to go before the other two have finished their other tasks.
So ideally I want whatever sourceType_emr_task_1 task to start that's ready first and then block the other tasks from running their sourceType_emr_task_1 task until it's finished.
Update:
Another solution I just thought of is if there is a way for me to check on the status of another task I could create a script for sourceType_emr_task_1 that checks to see if any of the other two sourceType_emr_task_1 tasks have a status of running, and if they do it'll sleep and periodically check to see if none of the other's are running, in which case it'll start it's process. I'm not a big fan of this way though because I feel like it could cause a race condition where both read (at the same time) that none are running and both start running.
You could use a pool to ensure the parallelism for those tasks is 1.
For each of the *_emr_task_1 tasks, set a pool kwarg to to be something like pool=emr_task.
Then just go into the webserver -> admin -> pools -> create:
Set the name Pool to match the pool used in your operator, and the Slots to be 1.
This will ensure the scheduler will only allow tasks to be queued for that pool up to the number of slots configured, regardless of the parallelism of the rest of Airflow.
With activiti it is possible to design parallel tasks, however these tasks are internally executed sequentially (by the same thread).
I need to execute tasks in a asynchronous way, and then "join" the tasks once they are finished.
The process is:
preparation -> execute task 1
-> execute task 2 at the same time
-> Then once both are finished, go one
It is a matter of optimization, because tasks 1 and 2 are web-service calls and may require a lot of time.
From everything I read, this is not possible with activiti. Using async tasks, it is not possible to join then properly (detect that both are finished). The first finished task is OK, but the second throws an OptimisticLockException and is restarted (which is not acceptable).
Maybe there is something I misunderstood and this is something possible or even easy??? Did anyone succeed in it?
I am not sure if i understand your question clearly.
but Activiti does support Async processing.
To Join two Async processes you can create another task that will wait till both the Async tasks are completed.
I have n tasks in a waiting list.
Each task has associated with it an entry that contains some meta information:
Task1 A,B
Task2 A
Task3 B,C
Task4 A,B,C
And an asssociated hashmap that contains entries like:
A 1
B 2
C 2
This implies that if a task, that contains in its meta information A, is already running, then no other task containing
A can run at the same time.
However, since B has a limit of 2 tasks, so either task1 and task3 can run together, or task3 and task4.
But task1, task3 and task4 cannot run together since both the limits of A and B will be violated, though limit of C is not
violated.
If I need to select tasks to run in different threads, what logic/algorithm would you suggest? And, when should this logic
be invoked? I view the task list as a shared resource which might need to be locked when tasks
are selected to run from it. Right now, I think this logic might have to be invoked when a task is added to the list and
also, when a running task has completed. But this could block the addition of new elements to the list, unless I make a copy of the list before running the logic.
How would your logic change if I were to give higher priority to tasks that contain more entries like 'A,B,C'
than that to 'A,B'?
This is kind of a continuation of Choosing a data structure for a variant of producer consumer problem and How to access the underlying queue of a ThreadpoolExecutor in a thread safe way, just in case any one is wondering about the background of the problem.
Yes, this is nasty. I immediately thought of an array/list of semaphores, initialized from the hashmap from which any thread attempting to execute a task would have to get units as defined by the metadata. About a second later, I realized that such a design would deadlock pretty quick!
I think that one dedicated producer thread is going to have to iterate a 'readyJobs' list in an attempt to find a task that can execute with the current resources avaliable. It could do this both when new tasks become available and after a task is completed, so releasing resources. The producer thread could wait on one input queue, (thread-safe producer-consumer queue), to which is queued both new tasks from [wherever] and completed tasks that are queued back from the work threads, (callback fired by the work threads pushes the completed task to the input queue?). Adding new tasks might be blocked briefly, but only while the input queue is blocked by some other task being added.
In the case of assigning 'priorites', you could insert-sort the 'readyJobs' list as you wish, so that higher-priority tasks are checked first to see if they can run with the resources available. If they cannot, then the rest of the list is iterated and a lower-priority job might be able to run.
I hope that you do not want to 'preempt' lower-priority tasks so as to release resources early - that would get really, really messy :(
Rgds,
Martin
I coded a monitoring program in RPG that checks if the fax/400 is operational.
And now I want this program to check every 15 minutes.
Instead of placing a job every 15 minutes in the job scheduler (which would be ugly to manage), I made the program wait between checks using DLYJOB.
Now how can I make this program "place itself" in memory so it keeps running?
(I thought of using SBMJOB, but I can't figure in which job queue I could place it.)
A good job queue to use for an endlessly running job would be QSYSNOMAX. That allows unlimited numbers of jobs to be running.
You could submit the job to that queue in your QSTRTUP program and it will simply remain running all the time.
Here what I have done in the past. There are two approaches to this.
Submit a new job every time the program runs with DLYJOB before it runs.
Create a loop and only end given a certain condition.
What I did with a Monitor MSGW program was the following:
PGM
DCL VAR(&TIME) TYPE(*CHAR) LEN(6)
DCL VAR(&STOPTIME) TYPE(*CHAR) LEN(6) +
VALUE('200000')
/* Setup my program (run only once) */
START:
/* Perform my actions */
RTVSYSVAL SYSVAL(QTIME) RTNVAR(&TIME)
IF COND(&TIME *GE &STOPTIME) THEN(GOTO CMDLBL(END))
DLYJOB DLY(180)
GOTO CMDLBL(START)
END:
ENDPGM
This will run continuously until 8:00 pm. Then I add this to the job scheduler to submit every morning.
As far as which jobq. I am using QINTER, but it could really be run anywhere. Make sure you choose a subsystem with enough available running jobs as this will take one.
The negative of running in QINTER if the program starts to hit 100% CPU, that will use up all of your interactive CPU and effectively locks up your system.
i know of 3 ways to that.
1) using Data queue, there is parm to tell it to wait endlessly and at time-interval.
2) using OVRDBF cmd, there is parm there to tell that it should not end or EOF, making your pgm to keep on waiting.
3) easiest to implement, sbmjob to call a pgm that loops forever eg with DOW 1=1, you can insert a code to check for certain time interval before it iterates. You can have your logic inside the loop that checks for fax, process it and then back to waiting.