Is there a way to update parameters as each scheduled flow run? - prefect

I'm trying to find a way to update the parameters begin used for each iteration of a schedule flow.
For example, say I have a flow that's schedule to run once every Monday for a year. For the first Monday the flow needs to run with a parameter of say 5. The next Monday needs to be ran with a parameter of 7 etc. The parameter needed for each weeks' run would change by a constant number.
Based on the docs, it looks like I could create a clock with the corresponding parameter for each run but that seems excessive for flows that are schedule for many runs.
Is there a simpler way of doing this in Prefect?

This sounds like you want to encapsulate some form of business logic into your Parameter calculation, which is a great use case for adding a new Task to your Flow:
import prefect
from prefect import task, Flow, Parameter
varying_param = Parameter("param", default=None) # none as the default
#task(name="Varying Parameter")
def param_calculation(p):
time = prefect.context.scheduled_start_time
# if a value was provided, use it
if p is not None:
return p
# do some calculations to decide what value is appropriate
# and return it
with Flow("Minimal example") as flow:
param_value = param_calculation(varying_param)
# now use this value in downstream tasks

Related

In Prefect, can a task value be cached for the duration of the flow run?

I have a flow in which I use a .map(); as such, I "loop" over multiple inputs, however some of the inputs I need to generate only once, but I notice that my flow keep re-generating them.
Is it possible to cache/checkpoint the result of a task (which is used in other tasks) for the duration of the run?
My understanding is that it's possible to cache for a specific amount of time like so:
import datetime
from prefect import task
#task(cache_for=datetime.timedelta(hours=1))
def some_task():
...
However, if the run is less than the cache_for time, would the cache still hold for the next run (if not I guess a caching with a long time will work).
Yes, there are a few different ways to achieve this type of caching:
Use a different cache validator
In addition to configuring your cache expiration (as you've done above), you can also choose to configure a cache validator. In your case, you might use either an input or parameter validator.
Use a cache key
You can "share" a cache amongst tasks (both within a single Flow and across Flows) by specifying a cache_key on your tasks:
#task(cache_for=datetime.timedelta(hours=1), cache_key="my-key")
def some_task():
...
This will then look up your candidate Cached states by key instead of by task ID.
Use a file-based target
Lastly, and increasingly the more popular setup, is to use a file-based target for your task. You can then template this target string with things like flow_run_id and the inputs provided to your task. Whenever the task runs, it first checks for the existence of data at the specified target location, and if found, does not rerun. For example:
#task(target="{flow_run_id}/{scheduled_start_time:%Y-%d-%m}/results.bytes")
def some_task():
...
This template has the effect of re-using the data at the target if both of the following are true:
the task is rerun within the same day
the task is rerun as a part of the same flow run
You can then share this template across multiple tasks (or in your case, across all the mapped children).
Note that you can also provide inputs and parameters to your target template if you desire.

Dynamic NSubprocess - start additional subprocess *after* NSuprocess is started

We know that a subprocess should be started for n items when the task is activated, however, during the lifetime of the NSubprocess task, additional items could be found.
Let's say we have Order and OrderItem. When a customer submits an order and the flow gets to fulfill_orderitems (NSubprocess - FulfillOrderItem flow) task, we start the subprocess for each order item.
However, during the lifetime of this fulfill_orderitems task the customer can contact us to add additional item to the order. At this point we are forced to create a new Order process which is a bit tedioius; whereas if we could simply initiate a new FulfillOrderItem subprocess then things remain much simpler.
Obviously once the fulfill_orderitems is done and the customer calls afterwards to add additional item, then we'd have to either roll back (unpractical) or make a new Order process.
This is just a simple example, but generally speaking this behavior is quite useful. One could argue this should be a core functionality of NSubprocess (ie. to be able to add additional n+1 subprocess during the liftetime of the task). How would one go about doing this?
We need to handle 2 things (from what I can see):
Function that starts this additional process. From what I understand it could be as simple as:
#Activation.status.transition(source=STATUS.STARTED)
def start_subprocess(self, item):
self.flow_task.subflow_task.run(self.task, item)
The view where the item is submitted through a form. Alternatively submit it through code directly. This bit I have trouble coming up with. It should be relatively simple, because it is very similar to what StartSubprocess does... (but we need to call the aforementioned start_subprocess(item)? However, I'm more interested in being able to call the method directly (eg through DRF).

How do you set an event to occur at a particular time?

I'm trying to create a program that will initiate a few other functions at a certain second tonight, for new years, but I can't find any answers that work in Python 3. Does anyone have an answer to this? It happens to be 11:58:43 that needs to be executed on.
You can use the datetime module in python to get the present date and time in any required format. Then you can easily implement what you are looking for provided , your python program keeps running checking for this time to be reached.
For this checking loop , it will check time for each second if you put a delay in your checking function as 1 second .
import time
time.sleep(1) # sleeps for 1 second.
I suggest a better method for your purpose .
On windows :- Use the inbuilt task scheduler to run your python program.
On Linux :- Take a look at 'cron' jobs to do any tasks at any specified time pattern . Its very easy to implement.
Thanks to the help of our above friend Natesh bhat, (who will keep his correct answer) I made this script:
import time
time.strftime("%H:%M:%S")
currenttime = str(time.strftime("%H:%M:%S"))
print(currenttime)
#Actual Start Time
timetogo = "23:58:43"
#Time for testing (uncomment)
#timetogo = "18:00:00"
while True:
currenttime = str(time.strftime("%H:%M:%S"))
if currenttime == timetogo:
your_function_here()
Feel free to use it for yourself.

AnyLogic: Look ahead simulation

Is it possible to perform look ahead simulation in AnyLogic?
Specifically:
Simulate till time T.
Using 2 values of a variable, simulate for both values till T+t in parallel.
Evaluate the system state at T+t, choose the value of variable which leads to better performance.
Continue simulating from T using the selected value for the variable.
This is the basic functionality I am trying to implement. The variable values can be taken from decision tree, which should not affect the implementation.
Please let me know if someone has done something like this.
Yes, it is possible with some Java code. You may:
Pause parent experiment, save snapshot at time T;
Create two new experiments from parent experiment;
Load snapshots in two new experiments;
Continue execution of both experiments till time T + t;
Send notification to parent experiment, compare the results, assign the best value and continue simulation.
Some points can be done manually with UI controls or by code, some — by code only.

Daily a number of jobs run in mainframe, i need to fetch their start and end time by some other jcl or rexx automatically, is it possible?

Daily a number of jobs run in mainframe, i need to fetch their start and end time by some other jcl or rexx automatically, is it possible ?
Yes it's possible. As #SaggingRufus indicates, your job scheduler (CA JMR, Control-M, etc.) may provide this functionality - talk to your technical staff and ask.
It's possible to do this via the Rexx interface to SDSF and then scheduling a job to execute your Rexx code. The ID under which the Rexx code executes must have authority to look at the jobs for which you wish to retrieve information. There is also a Java interface to SDSF.
Another mechanism that may be available to you is SMF, but that's not going to be an easy road unless you've also got MXG.
Talk to your technical staff and explain what you want and why, they may have already solved this problem.
The standard way to do what you want is to use SMF 30 records. You can do this in REXX but it will be a little bit tricky if you don't understand the format of SMF records. Your site may have a tool like SAS which will make it trivial.
SMF 30 subtype 1 is written when a job (or any address space) starts.
SMF 30 subtype 5 is written when a job ends.
There are several other subtype records written such as job step termination deltas. The SMF 30s contain absolutely everything you could possibly want to know about a batch job. If you just wanted to know how much elapsed or CPU time a job has taken then just read the subtype 5 and look at the performance section.
If you really must use REXX then there are products that have REXX APIs that access SMF data such as IBM Transaction Analysis Workbench for z/OS. Disclaimer: I'm one of the developers of that product.
This solution will work if your site uses CA JMR
//SCANJMR JOB (11111),'JMRSCAN',
// CLASS=T,MSGCLASS=X,MSGLEVEL=(1,1),
//SCAN EXEC JMRSCAN
//JMRPRINT DD DSN=&&OUTDATASET,
// DISP=(NEW,CATLG,),
// UNIT=SYSDA,SPACE=(TRK,(20,20)),
// DCB=(LRECL=133,RECFM=FB,BLKSIZE=6118)
//JMRIN DD *
FUNCTION LIST=ALL JOBMASK=* SDATE=16/11/16
EDATE=16/11/16
/*
then all you need to do is get a count of how many records are in this file.
If not, other job schedulers may provide similar functionality.
Another way could be to add a simple steps to the jobs that run a Rexx program that stores the date.
These steps needn't even be in the actual production job(s) you could schedule them as jobs with the production job as a successor and then as a predecessor.
Rexx has in-built time and date functions an example of their use is:-
rc = audit('OACG22X Invoked by' userid() 'at' time() 'on' date()'.')
You could update the report data by either using a DISP of MOD or by reading it in and then rewriting it with the new record added. EXECIO being the rexx function that you'd use.
When you've run the report, this would then clear the data or perhaps cycle a GDG (create an empty +1).
The following Rexx is pretty close to what could be used (albeit quite inflated i.e. you would basically be interested in the EXECIO's and the generation of the out.1 (using the current date and time) , (this only maintains 1 record in the output)):-
/* REXX - CYCLE TAPES WITHIN A POOL FOR EMHA800W BATCH JOB */
/*--------------------------------------------------------------------*/
/* read in data from tape cycle dataset */
/*--------------------------------------------------------------------*/
"EXECIO 1 DISKR CYCTAPE (stem in. FINIS"
LastTape = SUBSTR(in.1,1,6)
If LastTape = "XXXXXX" Then NewTape = "SAP001"
Else Do
TapeNum = SUBSTR(in.1,5,2)
If DATATYPE(TapeNum,"N") Then Do
NewNum = TapeNum + 1
If Newnum > 4 Then NewNum = 1
RetCde = NewNum
Newnum = RIGHT(Newnum,2,"0")
NewTape = "SAP0"||NewNum
End
Else RetCde = 100
End
out.1 = NewTape||" "||DATE("E")||" "||TIME("N")
"EXECIO 1 DISKW CYCTAPEO (stem out. FINIS"
Say "Return Code will be "||RetCde
Return RetCde
Running Rexx via batch is detailed here How can I run my Rexx program as a batch job?.
I haven't used Zeke but from a very brief search it appears that you may be able to inspect the EMR (Event Master Record).

Resources