I need to run identical jobs in schedule, and they differ only in few strings.
As you may know, there is no a convenient way to create identical jobs with different parameters. For now i prefer so "codeless" way to do so, or with "as less code as possilbe".
So lets imagine they are stored in a rows of JobsConfigurations table of the website-related database.
How I can get the Job name of job being running to pick the right configuration from the table?
Thanks for help!
See https://github.com/projectkudu/kudu/wiki/Web-Jobs#environment-settings
The WEBJOBS_NAME environment variable will give you the name of the current WebJob.
Related
I have a practical use case. three notebooks (pyspark) all have one common parameter.
need to schedule all three notebooks in a sequence.
is there any way to run them by setting one parameter value, as they are same in all?
please suggest the best way to do it.
I know it might be a bit a confusing title but couldn't get up to anythig better.
The problem ...
I have a ADF Pipeline with 3 Activities, first a Copy to a DB, then 2 times a Stored procedure. All are triggered by day and use a WindowEnd to read the right directory or pass a data to the SP.
There is no way I can get a import-date into the XML files that we are receiving.
So i'm trying to add it in the first SP.
Problem is that once the first action from the pipeline is done 2 others are started.
The 2nd action in the same slice, being the SP that adds the dates, but in case history is loaded the same Pipeline starts again a copy for another slice.
So i'm getting mixed up data.
As you can see in the 'Last Attempt Start'.
Anybody has a idea on how to avoid this ?
ADF Monitoring
In case somebody hits a similar problem..
I've solved the problem by working with daily named tables.
each slice puts its data into a staging table with a _YYYYMMDD after, can be set as"tableName": "$$Text.Format('[stg].[filesin_1_{0:yyyyMMdd}]', SliceEnd)".
So now there is never a problem anymore of parallelism.
The only disadvantage is that the SP's coming after this first have to work with Dynamic SQL as the table name where they are selecting from is variable.
But that wasn't a big coding problem.
Works like a charm !
I am working with NGS data and the newest test files are massive.
Normally our pipeline is using just one node and the output from different tools is its ./scratch folder.
To use just one node is not possible with the current massive data set. That's why I would like to use at least 2 nodes to solve the issues such as speed, not all jobs are submitted, etc.
Using multiple nodes or even multiple partitions is easy - i know how which parameter to use for that step.
So my issue is not about missing parameters, but the logic behind slurm to solve the following issue about I/O:
Lets say I have tool-A. Tool-A is running with 700 jobs on two nodes (340 jobs on node1 and 360 jobs on node2) - the ouput is saved on ./scratch on each node separately.
Tool-B is using the results from tool-A - which are on two different nodes.
What is the best approach to fix that?
- Is there a parameter which tells slurm which jobs belongs together and where to find the input for tool-B?
- would it be smarter to change the output on /scratch to a local-folder?
- or would it be better to merge the output from tool-A from both nodes to one node?
- any other ideas?
I hope I made my issue "simply" to understand... Please apologize if that is not the case!
My naive suggestion would be why not share a scratch nfs volume across all nodes ? This way all ouput datas of ToolA would be acessible for ToolB whatever the node. It migth not be the best solution for read/write speed, but to my mind it would be the easiest for your situation.
A more sofware solution (not to hard to develop) can be to implement a database that track where the files have been generated.
I hope it help !
... just for those coming across this via search engines: if you cannot use any kind of shared filesystem (NFS, GPFS, Lustre, Ceph) and you don't have only massive data sets, you could use "staging", meaning data transfer before and after your job really runs.
Though this is termed "cast"ing in the Slurm universe, it generally means you define
files to be copied to all nodes assigned to your job BEFORE the job starts
files to be copied from nodes assigned to your job AFTER the job completes.
This can be a way to get everything needed back and forth from/to your job's nodes even without a shared file system.
Check the man page of "sbcast" and amend your sbatch job scripts accordingly.
I am trying to learn SQLyog Job Agent (SJA).
I am on a Linux machine, and use SJA from within a bash script by a line command: ./sja myschema.xml
I need to sync an almost 100 tables db and its local clone.
Since a single table stores a few config data, which I do not wish to sync, it seems I need to write a myschema.xml where I list all the remaining 99 tables.
Question is: is there a way to write to sync all the table but a single one?
I hope my question is clear. I appreciate your help.
If you are using the latest version of sqlyog:You are given the table below, and the option to generate an xml job file at the end of the database syncronisation wizard reflecting the operation you've opted to perform. This will in effect list the other 99 tables in the xml file itself for you, but it will give you what you are looking for, and I dont think you would be doing anything in particular with an individual table, since you are specifying all tables in a single element.
How do we get the current date in a PS file name qualifier using JCL?
Example out put file name: Z000417.BCV.TEST.D120713 (YYMMDD format).
This can be done, but not necessarily in a straightforward manner. The straightforward manner would be to use a system symbol in your JCL. Unfortunately this only works for batch jobs if it has been enabled for the job class on more recent versions of z/OS.
Prior to z/OS v2, IBM's stated reason this didn't work is that your job could be submitted on a machine in London, the JCL could be interpreted on a machine in Sydney, and the job could actually execute on a machine in Chicago. Which date (or time) should be on the dataset? There is no one correct answer, and so we all created our own solutions to the problem that incorporates the answer we believe to be correct for our organization.
If you are able to use system symbols in your batch job JCL, there is a list of valid symbols available to you.
One way to accomplish your goal is to use a job scheduling tool. I am familiar with Control-M, which uses what are called "auto-edit variables." These are special constructs that the product provides. The Control-M solution would be to code your dataset name as
Z000417.BCV.TEST.D%%ODATE.
Some shops implement a scheduled job that creates a member in a shared PDS. The member consists of a list of standard JCL SET statements...
// SET YYMMDD=120713
// SET CCYYMMDD=20120713
// SET MMDDYY=071312
...and so on. This member is created once a day, at midnight, by a job scheduled for that purpose. The job executes a program written in that shop to create these SET statements.
Another answer is you could use ISPF file tailoring in batch to accomplish your goal. This would work because the date would be set in the JCL before the job was submitted. While this will work, I don't recommend it unless you're already familiar with file tailoring and executing ISPF in batch in your shop. I think it's kind of complicated for something this simple to accomplish in other ways outlined in this reply.
You could use a GDG instead of a dataset with a date in its name. If what you're looking for is a unique name, that's what GDGs accomplish (among other things).
The last idea that comes to my mind is to create your dataset with a name not containing the date, then use a Unix System Services script to construct an ALTER command (specifying the NEWNAME parameter) for IDCAMS, then execute IDCAMS to rename your dataset.
If you are loading the jobs using JOBTRAC/CONTROL-M schedulers,
getting the date in required format is possibly easy. The format
could be 'OSYMD', which will be replaced by scheduler on the fly
before it loads the job. It has got many formats to pacify the need.
You can also make use of a JCL utility, which i dont remember exactly but I would. This takes the file name from a SYSIN dataset and makes as the DSN name of the output. The SYSIN dataset can be created in the previous step by using a simple DFSORT &DATE commands. Do lemme know if you need syntax, i prefer to google and handson.