Puppet : How to loop to check the status of an exec command - puppet

In one of my Puppet manifests, I am running the exec command to run a job on a remote server and dump the output into a file. The file contents are in json and it contains a couple of fields. One field is the status, which decides whether the job is complete and another is the jobid, which provides the id of the job. If the status is complete, I use the jobid to query the server for more information. If the status is not complete, I need to keep looping until the job is complete.
I realize that exec has "try_sleep" but since I need to parse the json file after the exec, I don't believe I can use it.
How do I go about solving this one ?
As requested by Alex, adding the sequence below
Use exec command to execute some statement > /tmp/phase1.json
Parse /tmp/phase1.json and extract fields status and jobid
If field status is not complete, keep looping until it completes
If field status is complete, take the jobid and perform further processing

Related

How to cancel a JCL job(Mainframe) in SDSF???(OZA1) error

I received a JCL Error after submitting a job.
20.46.44 JOB08763 $HASP165 WPR062M ENDED AT OZA1 - JCL ERROR CN(INTERNAL).
and in SDSF I am seeing this
How can I fix this (Cancel the job)? What is the reason for this error?
Thanks in advance.
If you are authorized to do so, you can cancel a job in SDSF by putting a C in the "N P" column and pressing the Enter key. But, that's your TSO session (the JobID starts with TSU) and you probably don't want to cancel it. The message you received indicated the job you submitted had a JCL error and ended, so there's no need to cancel it because it's no longer running.
The job shown in the screen shot is your current TSO session; you don't want to cancel this, do you? (BTW, please post text instead of images whenever possible).
The jobname of the one in the screen shot is WPR062 and the jobid is TSU08747. The TSU prefix in the jobid tells you its a TSO session.
The job (not TSO session) in error which gave you this message:
20.46.44 JOB08763 $HASP165 WPR062M ENDED AT OZA1 - JCL ERROR CN(INTERNAL)
has a jobname WPR026M with jobid JOB08763. The JOB prefix tells you its a batch job.
You need to look at the job's output in SDSD to find out what caused the JCL error.
For completeness:
Started task have a jobid prefix of STC.
If your system is configured to allow more than 99'999 active jobids, the prefixes will become single character, i.e. T for TSO sessions, J for batch jobs, and S for started tasks.
As already stated the SDSF output you are looking at is showing your TSO UserID. That is long running and is not the job that is in error.
According to the error message
20.46.44 JOB08763 $HASP165 WPR062M ENDED AT OZA1 - JCL ERROR CN(INTERNAL)
The actual jobname is WPR062M. To investigate the issue I suggest that you use the command PREFIX WMPR062* and then use the H command. The output you are looking for is in the Held queue.
Investigate that job by putting an S in the CmdLine (note, I don't have SDSF on my system but the command column is located on the left side of the screen.
In that job will be the reason for the JCLerror.

Is it possible to change a job ID to something human-readable?

I'd like to send myself a text when a job is finished. I understand how to change the job name so that the .o and .e files have the appropriate name. But I'm not sure if there's a way to change the job ID from a string of numbers to a specified key so I know which job it is. I usually have a lot of different jobs going at once, so it's difficult to remember all the different job ID numbers. Is there a way in the .pbs script to change the job ID so that when I get the message I can see which job it is rather than just a string of numbers?
If you are using Torque and add the -N flag, then you can add a name to the job. It will still use the numeric portion of the job id as part of the output and error filenames, but this allows you to add something to help you distinguish among your jobs. For example:
$ echo ls | qsub -N whatevernameyouplease

Find out ID of 'at' job from within it

When I schedule a job with 'at' it is assigned an id, viz:
job 44 at 2014-01-28 17:30
When that job runs I would like to get at that id from within it. This is on Centos, FWIW. I have established that no environment variable contains the ID. When the Perl code in that job runs I would like it to be able to print the job ID (44 in this example).
Yes, I know that atq shows an = next to jobs that are executing, but there might be more than one of those at a time.
I could do something like pass a unique argument to the job when scheduling it, capture the ID, save that and the argument to a file somewhere, read that from the job. That's a lot of work I'd rather not go to if I don't have to, and it seems like this should be simple but I'm drawing a blank.
What follows is figured out by reading sources of at-3.14. The way at puts job id and the time when it is run into the file name should be similar for any version, but I haven't checked this.
To begin whith at encodes the job id and the time when a particular job should be run into the file name describing a job. The file name has format aJJJJJTTTTTTTT, where JJJJJ is 5 character hexadecimal string, the job id, and TTTTTTTT is an 8 character hexadecimal string, the time when the job should be run. The time is stored as seconds from the epoch.
At jobs are run by feeding a job description file as the standard input to sh -c. Fortunately the Linux kernel provides a symbolic link, /proc/self/fd/0, which will point to the standard input of the process currently being executed (play with ls -l /proc/self/fd/0 in case you need to assure yourself that this indeed is so).
A file describing a job has been deleted by the time a job is run. However, the file is still available for the kernel because it has been duplicated with dup(2) before being used as the standard input for a job. So, actually we are resolving a symbolic link to a file name which is not visible any more. In the perl script at the end we need to take this into account as readlink will return something like /foo/bar/baz (deleted) instead of /foo/bar/baz. And we're interested in just the file name which has all the information we need.
The reason why the symbolic link points to a deleted file is because at daemon unlinks the original before executing the job. Unlinking gets done only after creating a copy, a hard link, which begins with = instead of a. With this the at daemon tries to ensure there will be only one copy of a job running: the daemon will not execle(2), ie. it will bail out, should the link(2) fail. Because the original file has been subject to open(2) and dup(2) the inode is still there for the kernel to use because it still has hard links pointing to it.
After a fairly long and possibly confusing introduction, here is how to put it all together:
#!/usr/bin/perl
use strict;
use warnings;
my $job_file = readlink("/proc/self/fd/0");
if (index($job_file, " ") > 0) {
$job_file = substr($job_file, 0, index($job_file, " ") - 1);
}
my $tmp = substr($job_file, rindex($job_file, "/") + 1);
$tmp =~ s/^a([0-9a-f]{5})[0-9a-f]+/$1/;
my $job_id = hex($tmp);
if ($job_id > 0) {
printf("My AT job id is %d.\n", $job_id);
}
# end of file.

Handle "race-condition" between 2 cron tasks. What is the best approach?

I have a cron task that runs periodically. This task depends on a condition to be valid in order to complete its processing. In case it matters this condition is just a SELECT for specific records in the database. If the condition is not satisfied (i.e the SELECT does not return the result set expected) then the script exits immediately.
This is bad as the condition would be valid soon enough (don't know how soon but it will be valid due to the run of another script).
So I would like somehow to make the script more robust. I thought of 2 solutions:
Put a while loop and sleep constantly until the condition is
valid. This should work but it has the downside that once the script
is in the loop, it is out of control. So I though to additionally
after waking up to check is a specific file exists. If it does it
"understands" that the user wants to "force" stop it.
Once the script figures out that the condition is not valid yet it
appends a script in crontab and stops. That seconds script
continually polls for the condition and if the condition is valid
then restart the first script to restart its processing. This solution to me it seems to work but I am not sure if it is a good solution. E.g. perhaps programatically modifying the crontab is a bad idea?
Anyway, I thought that perhaps this problem is common and could have a standard solution, much better than the 2 I came up with. Does anyone have a better proposal? Which from my ideas would be best? I am not very experienced with cron tasks so there could be things/problems I could be overseeing.
instead of programmatically appending the crontab, you might want to consider using at to schedule the job to run again at some time in the future. If the script determines that it cannot do its job now, it can simply schedule itself to run again a few minutes (or a few hours, as it may) later by way of an at command.
Following up from our conversation in comments, you can take advantage of conditional execution in a cron entry. Supposing you want to branch based on time of day, you might use the output from date.
For example: this would always invoke the first command, then invoke the second command only if the clock hour is currently 11:
echo 'ScriptA running' ; [ $(date +%H) == 11 ] && echo 'ScriptB running'
More examples!
To check the return value from the first command:
echo 'ScriptA' ; [ $? == 0 ] echo 'ScriptB'
To instead check the STDOUT, you can use as colon as a noop and branch by capturing output with the same $() construct we used with date:
: ; [ $(echo 'ScriptA') == 'ScriptA' ] && echo 'ScriptB'
One downside on the last example: STDOUT from the first command won't be printed to the console. You could capture it to a variable which you echo out, or write it to a file with tee, if that's important.

Collecting return code and stdout string from running SAS program in Linux KornShell script

Some developers and I are using KornShell (ksh) to run SAS programs in a Linux environment. The script invokes a SAS command line and I wish to collect the stdout from the SAS execution (a string defined and written by SAS) as well as the Linux return code (0/1).
My Code (collects stdout into envar, but return_code is always 0 because the envar assignment was successful):
envar=$(./sas XXXX/filename.sas -log $LOG_FILE)
return_code=$?
Is there a way to collect both the return code and the std out without having to submit this command twice?
SAS does not write anything to STDOUT when it is run as a non-interactive process. The log file contains the record of statements executed and step statistics; "printed" output (such as from proc print) is written to a "listing" file. By default, that file will be created using the name of your source file appended with ".lst" (in your case, filename.lst).
You are providing a file to accept the log output using the -log system option. The related option to define the listing file is the -print option. Of course, if the program does not create any listing output, such an option isn't needed.
And as you've discovered, the value returned by $? is the execution return code from SAS. Any non-zero value will indicate some sort of error occurred during program execution.
If you want to influence the return code, you can use the ABORT data step statement in your SAS program. That will immediately halt the SAS program as set the return code to something meaningful to you. For example, suppose you want to terminate further processing if a particular PROC SQL step fails:
data _null_;
rc = symgetn('SQLRC');
put rc=;
if rc > 0 then ABORT RETURN 10;
run;
This would set the return code to 10 and you could use your outer script to send an email to the appropriate person. Such a custom return code value must be greater than 6 and less than 976; other values are reserved for SAS. Here is the SAS doc link.

Resources