Groovy: best way to perform long running jobs in parallel

Groovy: best way to perform long running jobs in parallel - groovy

I go through the GPars library in Groovy and try to figure out how to achieve below targets:
Run a number of jobs as closure in parallel;
Support timeout in each job or as a whole;
Support cancellation;
The job itself may fork other long running job with timeout as well.
It looks like there are several ways: dataflow, asynchronous invocation, fork/join... But what is the best way to do so with regards to reliability and flexibility?

Related

Durable Functions could reduce my time execution?

I can execute a process "x" in parallel using Azure Functions Durable Fan In/Fan Out.
If I divide my unique process "x" in multiple process using this concept, can I reduce the execution time for the function?

In general Azure Functions Premium allow for higher timeout values. So, if you don't want to deal with the issue, just upgrade ;-)
Azure Durable Functions might or might not reduce your total runtime.
BUT every "Activity Call" is a new function execution with an own timeout.
Either Fanning out or even calling activities it in serial will prevent timeout issue as long the called activity will not extend the timeout period for functions.
If, however you have an activity which will run for an extended period, you will need premium functions anyway. But your solution with "batch" processing looks quite promising to avoid this.

Making use of the fan-out/fan-in approach, you will run tasks in parallel instead of sequentially so the duration of the execution will be the duration of your longest single task to execute. It's the best approach to use if the requests do not require information from each other to process.
You could make use of Task Asynchronous Programming (TAP) to build tasks, call relevant methods and wait for all tasks to finish if you don't want them to be on Durable Functions

single slurm array vs multiple sbatch calls

I can run N embarrassingly parallel jobs by using a slurm array like:
#SBATCH --array=1-N
Alternately I think I can achieve the same from a scheduling perspective (i.e. scheduled independently and as soon as resources become available) by manually launching 8 job. For example with a simply bash script with a loop.
Since the latter is far more flexible, I don't see the utility I using the --array option built into slurm.
Am I missing something?

Arrays offer a simple way to create parametrised jobs without writing the Bash loop. It
(obviously) creates the jobs and assign them a parameter ;
takes care of output file name parametrisation ;
makes the submission of a dependent job that should run after all those jobs are completer much easier
makes the output of squeue less cluttered
Furthermore, the jobs in an array can be managed as a whole, the squeue, scancel, etc. command can work on the whole array as opposed to writing another loop to cancel them for instance. This is even more interesting in the case you have multiple arrays running at the same time ; you do not need to manage the tracking of each individual job by yourself.
Finally, especially for large arrays, it makes the scheduler easier and can increase the job throughput.
If you need flexibility, then job arrays are not the solution, but maybe a workflow manager could help you.

multithreading in jenkins pipeline

Im new in groovy
I have a jenkins pipeline that loads hundreds of mocks.
I want to boost up my building times, so I thought maybe I should use multithreading for the job.
problem is, I couldnt find any examples for multithreading a single stage.
only found the parallel option, but in my understanding, its only suitable if I want to run multiple stages at once. not for running multiple threads at the same stage.
any ideas?
stage("mocks") {
//load mocks
}
thanks!

Running Cukes in Parallel with JRuby

I'm trying to run cucumber scenarios in parallel from inside my gem. From other answers, I've found I can execute cucumber scenarios with the following:
runtime = Cucumber::Runtime.new
runtime.load_programming_language('rb')
#result = Cucumber::Cli::Main.new(['features\my_feature:20']).execute!(runtime)
The above code works fine when I run one scenario at a time, but when I run them in parallel using something like Celluloid or Peach, I get Ambiguous Step errors. It seems like my step definitions are being loaded for each parallel test and cucumber thinks I have multiple steps definitions of the same kind.
Any ideas how I can run these things in parallel?

Cucumber is not thread safe. Each scenario must be run in a separate thread with it's own cucumber runtime. Celluloid may try to run multiple scenarios on the same actor at the same time.
There is a project called cukeforker that can run scenarios in parallel but it only supports mri on linux and osx. It forks a subprocess per scenario.
I've created a fork of cukeforker called jcukeforker that supports both mri and jruby on linux. Jcukeforker will distribute scenarios to subprocesses. The subprocesses are reused. Subprocesses are used instead of threads to guarantee that each test has it's own global variables. This is important when running the subprocess on a vncserver which requires the DISPLAY variable to be set.

JCL job dependency without scheduler

I'm trying to implement a JCL, in a JES2 environment, that launches a set of jobs with dependencies in it, for example:
JOB_A -> JOB_B )
JOB_C -> JOB_D ) -> JOB_E
In other words, JOB_E is only launched when JOB_B and JOB_D are finished.
I can launch JOB_B and JOB_D through job internal reader in JOB_A and JOB_C but I can't not create the dependencies for JOB_E.
I tried to explore JCL resource lock so that I could lock a data set in JOB_B and JOB_D that JOB_E needed so that JOB_E would only start when all data set's are available but the JCL only requests data set in STEP level and release them afterwards. If JCL could request all data set before start I could implement some sort of mutex in the JOBs, for example:
JOB_A locks data set DSN_A
JOB_B waits to get data set DSN_A
JOB_C locks data set DSN_C
JOB_D waits to get data set DSN_C
JOB_E waits to get data set DSN_A and DSN_C
How to do this?
I need this to test set of JCL's in a development environment without access to a scheduler.

Your comment that you need this to test in a development environment without access to a scheduler makes me wonder if your shop has a scheduler for the production environment. If it does, then your testing will not actually test what will be used in your production environment. Just something to think about if you haven't already.
In answer to your question, One technique is to use a utility such as IEBGENER in the last step of one job to submit a subsequent job.
For example, The last step of JOB_A would execute IEBGENER with SYSUT1 containing the execution JCL for JOB_B and SYSUT2 pointing at INTRDR. This is one technique you could use, though getting JOB_E to run so that it doesn't interfere with any of the other jobs might be tricky, as JOB_E needs to run after both JOB_B and JOB_D complete.
Another technique would be to use Rexx in batch mode to submit your jobs using the internal reader and then use the SDSF Rexx interface to watch for when they complete. Essentially you will be writing a special-purpose job scheduler, specific to your set of jobs.
Update, ten years later...
As of z/OS 2.2 IBM has added JES2 Execution Control Statements which "define the execution sequencing of a group of jobs and the jobs themselves". Prior to use of this feature, some configuration must be done my your z/OS Systems Programmer.

I'm wondering why to invest precious time to test a set of jobs, where the PROD set is entirely different and will be handled by some xyz scheduler. Don't mind, if I sound crazy but lemme propose mine too:
Assumption: Your jobs take manageable CPU and NO NEED to be run parallely.
A triggers B triggers C triggers D
triggers E (I know its not worthy but
your testing goes fine) I just put it
here by thinking what I would do if I
were you. I mainly need my testing to
go quick and fine. Lemme know your cliche.
Now, lemme appreciate you both for such resolution that we can manage submission of jobs by means of REXX that too creating a virtual and subjective scheduling.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Groovy: best way to perform long running jobs in parallel - groovy

Related

Durable Functions could reduce my time execution?

single slurm array vs multiple sbatch calls

multithreading in jenkins pipeline

Running Cukes in Parallel with JRuby

JCL job dependency without scheduler

Categories

Resources