Set job plan by python script in maximo - maximo

How can i set a job plan into a workorder with python automatism scripting?
This is my code
from psdi.server import MXServer
woSet = MXServer.getMXServer().getMboSet("WORKORDER", mbo.getUserInfo())
woSet.setWhere("WONUM='ODLSCAFF87'")
wo = woSet.getMbo()
mbo.setValue("JPNUM", "PDLSCAFF")
woSet.save()
But it doesn't work cause it doesn't find WORKORDER.JPNUM

Related

Run databricks job from notebook

I want to know if it is possible to run a Databricks job from a notebook using code, and how to do it
I have a job with multiple tasks, and many contributors, and we have a job created to execute it all, now we want to run the job from a notebook to test new features without creating a new task in the job, also for running the job multiple times in a loop, for example:
for i in [1,2,3]:
run job with parameter i
Regards
what you need to do is the following:
install the databricksapi. %pip install databricksapi==1.8.1
Create your job and return an output. You can do that by exiting the notebooks like that:
import json dbutils.notebook.exit(json.dumps({"result": f"{_result}"}))
If you want to pass a dataframe, you have to pass them as json dump too, there is some official documentation about that from databricks. check it out.
Get the job id you will need it later. You can get it from the jobs details in databricks.
In the executors notebook you can use the following code.
def run_ks_job_and_return_output(params):
context = json.loads(dbutils.notebook.entry_point.getDbutils().notebook().getContext().toJson())
# context
url = context['extraContext']['api_url']
token = context['extraContext']['api_token']
jobs_instance = Jobs.Jobs(url, token) # initialize a jobs_instance
runs_job_id = jobs_instance.runJob(****************, 'notebook',
params) # **** is the job id
run_is_not_completed = True
while run_is_not_completed:
current_run = [run for run in jobs_instance.runsList('completed')['runs'] if run['run_id'] == runs_job_id['run_id'] and run['number_in_job'] == runs_job_id['number_in_job']]
if len(current_run) == 0:
time.sleep(30)
else:
run_is_not_completed = False
current_run = current_run[0]
print( f"Result state: {current_run['state']['result_state']}, You can check the resulted output in the following link: {current_run['run_page_url']}")
note_output = jobs_instance.runsGetOutput(runs_job_id['run_id'])['notebook_output']
return note_output
run_ks_job_and_return_output( { 'parm1' : 'george',
'variable': "values1"})
If you want to run the job many times in parallel you can do the following. (first be sure that you have increased the max concurent runs in the job settings)
from multiprocessing.pool import ThreadPool
pool = ThreadPool(1000)
results = pool.map(lambda j: run_ks_job_and_return_output( { 'table' : 'george',
'variable': "values1",
'j': j}),
[str(x) for x in range(2,len(snapshots_list))])
There is also the possibility to save the whole html output but maybe you are not interested on that. In any case I will answer to that to another post on StackOverflow.
Hope it helps.
You can use following steps :
Note-01:
dbutils.widgets.text("foo", "fooDefault", "fooEmptyLabel")
dbutils.widgets.text("foo2", "foo2Default", "foo2EmptyLabel")
result = dbutils.widgets.get("foo")+"-"+dbutils.widgets.get("foo2")
def display():
print("Function Display: "+result)
dbutils.notebook.exit(result)
Note-02:
thislist = ["apple", "banana", "cherry"]
for x in thislist:
dbutils.notebook.run("Note-01 path", 60, {"foo": x,"foo2":'Azure'})

Can we load environment variables from ~/.bash_profile without having to source ~./bash_profile?

I have a python script named scraper.py that scrapes information from the web on certain days. I automated this script to run on these days with a cronjob. Now every time the script runs, I want to send notifications to Slack to make sure that the scrape was successful. So I created a different script, helper_functions.py, that has the functionality to send messages to Slack. Now, because I am using an API_KEY that I can't share in the script, since I push it on GitHub, I stored it in ~./.bash_profile. The script runs perfectly fine if I do source ~/.bash_profile from the terminal, but when I close my session, the code breaks. So is there a way to make it work without sourcing the bash folder?
Following are the scripts
scraper.py
import datetime
import helper_functions as hf
hf.slack_msg("Start scrape")
class IndexSpider(scrapy.Spider):
name = "index"
start_urls = [
"https://finance.yahoo.com"
]
def parse(self, response):
index = response.css("span.Trsdu\(0\.3s\)::text").getall()
yield {
'datetime' : datetime.datetime.now().strftime("%Y-%m-%d %X"),
's&p_500' : index[0],
's&p_500_delta' : index[1],
's&p_500_delta(%)' : index[2],
'dow_30' : index[3],
'dow_30_delta' : index[4],
'dow_30_delta(%)' : index[5],
'nasdaq' : index[6],
'nasdaq_delta' : index[7],
'nasdaq_delta(%)' : index[8],
}
hf.slack_msg("End scrape")
helper_functions.py
import json
import os
def slack_msg(msg):
data = {
"text" : msg
}
webhook = os.environ.get("SLACK_API_KEY")
requests.post(webhook, json.dumps(data))
An active line in a crontab will be either an environment setting or a cron command. An environment setting is of the form,
name = value
so you can add
SLACK_API_KEY='theapikey'
* * * * * scraper.py
Here is one idea:
put the token in its own file, e.g. ~/.secrets/slack_api_key.txt
modify ~/.bash_profile to do export SLACK_API_KEY=$(cat ~/.secrets/slack_api_key.txt)
modify helper_functions.py to do webhook = os.environ.get("SLACK_API_KEY") or open(os.path.expanduser("~/.secrets/slack_api_key.txt")).read().strip(), which will first look for the environment variable, and fall back to reading the file if the variable is not defined

Restarting nested notebook runs in Databricks Job Workflow

I have a Databricks scheduled job which runs 5 different notebooks sequentially, and each notebook contains, let's say 5 different command cells. When the job fails in notebook 3, cmd cell 3, I can properly recover from failure, though I'm not sure if there's any way of either restarting the scheduled job from notebook 3, cell 4, or even from the beginning of notebook 4, if I've manually completed the remaining cmd's in notebook 3. Here's an example of one of my jobs
%python
import sys
try:
dbutils.notebook.run("/01. SMETS1Mig/" + dbutils.widgets.get("env_parent_directory") + "/02 Processing Curated Staging/02 Build - Parameterised/Load CS Feedback Firmware STG", 6000, {
"env_ingest_db": dbutils.widgets.get("env_ingest_db")
, "env_stg_db": dbutils.widgets.get("env_stg_db")
, "env_tech_db": dbutils.widgets.get("env_tech_db")
})
except Exception as error:
sys.exit('Failure in Load CS Feedback Firmware STG ({error})')
try:
dbutils.notebook.run("/01. SMETS1Mig/" + dbutils.widgets.get("env_parent_directory") + "/03 Processing Curated Technical/02 Build - Parameterised/Load CS Feedback Firmware TECH", 6000, {
"env_ingest_db": dbutils.widgets.get("env_ingest_db")
, "env_stg_db": dbutils.widgets.get("env_stg_db")
, "env_tech_db": dbutils.widgets.get("env_tech_db")
})
except Exception as error:
sys.exit('Failure in Load CS Feedback Firmware TECH ({error})')
try:
dbutils.notebook.run("/01. SMETS1Mig/" + dbutils.widgets.get("env_parent_directory") + "/02 Processing Curated Staging/02 Build - Parameterised/STA_6S - CS Firmware Success", 6000, {
"env_ingest_db": dbutils.widgets.get("env_ingest_db")
, "env_stg_db": dbutils.widgets.get("env_stg_db")
, "env_tech_db": dbutils.widgets.get("env_tech_db")
})
except Exception as error:
sys.exit('Failure in STA_6S - CS Firmware Success ({error})')
you should not use sys.exit, because it quits Python interpreter. Just let exception bubble up if it happens.
you must change the architecture of your application and add some sort of idempotency to ETL (online course), which would mean propagating a date to child notebooks or something like that.
run %pip install retry in the beginning of the notebook to install retry package
from retry import retry, retry_call
#retry(Exception, tries=3)
def idempotent_run(notebook, timeout=6000, **args):
# this is only approximate code to be used for inspiration and you should adjust it to your needs. It's not guaranteed to work for your case.
did_it_run_before = spark.sql(f"SELECT COUNT(*) FROM meta.state WHERE notebook = '{notebook}' AND args = '{sorted(args.items())}'").first()[0]
if did_it_run_before > 0:
return
result = dbutils.notebook.run(notebook, timeout, args)
spark.sql(f"INSERT INTO meta.state SELECT '{notebook}' AS notebook, '{sorted(args.items())}' AS args")
return result
pd = dbutils.widgets.get("env_parent_directory")
# call this within respective cells.
idempotent_run(
f"/01. SMETS1Mig/{pd}/03 Processing Curated Technical/02 Build - Parameterised/Load CS Feedback Firmware TECH",
# set it to something, that would define the frequency of the job
this_date='2020-09-28',
env_ingest_db=dbutils.widgets.get("env_ingest_db"),
env_stg_db=dbutils.widgets.get("env_stg_db"),
env_tech_db=dbutils.widgets.get("env_tech_db"))

How to properly invoke Python 3 script from SPSS syntax window using SCRIPT command (+ additional problems during runtime)

I would like to run two Python 3 scripts from SPSS syntax window. It is possible to perform it using BEGIN PROGRAM-END PROGRAM. block or SCRIPT command. This time I need to find a solution using second command.
Simplified code:
*** MACROS.
define export_tabs (!positional !tokens (1))
output modify
/select logs headings texts warnings pagetitles outlineheaders notes
/deleteobject delete = yes.
OUTPUT EXPORT
/CONTENTS EXPORT = visible LAYERS = printsetting MODELVIEWS = printsetting
/XLSX DOCUMENTFILE = "doc.xlsx"
OPERATION = createsheet
sheet = !quote(!unquote(!1))
LOCATION = lastcolumn NOTESCAPTIONS = no
!enddefine.
define matrix_tab (!positional !charend('/')
/!positional !charend('/')
/!positional !charend('/')
/!positional !charend('/')
/stat = !tokens (1))
!do !i !in (!3)
ctables
/mrsets countduplicates = no
/vlabels variables = !concat(!1,_,!2,_,!i) display = label
/table !concat(!1,_,!2,_,!i)
[rowpct.responses.count !concat(!unquote(!stat),"40.0"), totals[count f40.0]]
/slabels position = column visible = no
/clabels rowlabels = opposite
/categories variables = !concat(!1,_,!2,_,!i) order = a key = value
empty = include total = yes label = "VALID COUNT" position = after
/titles title = !upcase(!4).
!doend
!enddefine.
*** REPORT.
* Sheet 1.
output close all.
matrix_tab $Q1 / 1 / 1 2 / "QUESTION 1" / stat="pct".
script "C:\path\script 1.py".
script "C:\path\script 2.py".
export_tabs "Q1".
* Sheet 2.
output close all.
matrix_tab $Q2 / 2 / 3 4 / "QUESTION 2" / stat="pct".
script "C:\path\script 1.py".
script "C:\path\script 2.py".
export_tabs "Q2".
When I run a block for the first sheet everything works fine. However, when I run a block for the second sheet SPSS doesn't execute Python scripts and jumps straight to export_tabs macro (problems with synchronization?). I thought a problem had been possibly in a way I executed SCRIPT command. So I tried this:
script "C:\path\script 1.py" pythonversion = 3.
script "C:\path\script 2.py" pythonversion = 3.
but in effect SPSS - even though the syntax window coloured these parts of syntax - returned this error message:
>Error # 3251 in column 152. Text: pythonversion
>The SCRIPT command contains unrecognized text following the the file
>specification. The optional parameter must be a quoted string enclosed in
>parentheses.
>Execution of this command stops.
Has anyone of you had such problem and/or have an idea why this happens?
NOTE: Both Python scripts run smoothly from the Python 3.4.3 shell installed with the version of SPSS I have, thus I don't think the core of the problem is within those codes.
This seems to be a document defect in the way this keyword was implemented. I have been able to replicate it and have logged a defect with IBM SPSS Statistics Development.
In this case, the order matters. Rather than this:
script "C:\path\script 2.py" pythonversion = 3.
Try instead:
script pythonversion = 3 "C:\path\script 2.py".

scheduling a task at multiple timings(with different parameters) using celery beat but task run only once(with random parameters)

What i am trying to achieve
Write a scheduler, that uses a database to schedule similar tasks at different timings.
For the same i am using celery beat, the code snippet below would give an idea
try:
reader = MongoReader()
except:
raise
try:
tasks = reader.get_scheduled_tasks()
except:
raise
celerybeat_schedule = dict()
for task in tasks:
celerybeat_schedule[task["task_id"]] =dict()
celerybeat_schedule[task["task_id"]]["task"] = task["task_name"]
celerybeat_schedule[task["task_id"]]["args"] = (task,)
celerybeat_schedule[task["task_id"]]["schedule"] = get_task_schedule(task)
app.conf.update(BROKER_URL=rabbit_mq_endpoint, CELERY_TASK_SERIALIZER='json', CELERY_ACCEPT_CONTENT=['json'], CELERYBEAT_SCHEDULE=celerybeat_schedule)
so these are three steps
- reading all tasks from datastore
- creating a dictionary, celery scheduler which is populated by all tasks having properties, task_name(method that would run), parameters(data to pass to the method), schedule(stores when to run)
- updating this with celery configurations
Expected scenario
given all entries run the same celery task name that just prints, have same schedule to be run every 5 min, having different parameters specifying what to print, lets say db has
task name , parameter , schedule
regular_print , Hi , {"minutes" : 5}
regular_print , Hello , {"minutes" : 5}
regular_print , Bye , {"minutes" : 5}
I expect, these to be printing every 5 minutes to print all three
What happens
Only one of Hi, Hello, Bye prints( possible randomly, surely not in sequence)
Please help,
Thanks a lot in advance :)
Was able to resolve this using version 4 of celery. Sample similar to what worked for me.. can also find in documentation by celery for version 4
#taking address and user-pass from environment(you can mention direct values)
ex_host_queue = os.environ["EX_HOST_QUEUE"]
ex_port_queue = os.environ["EX_PORT_QUEUE"]
ex_user_queue = os.environ["EX_USERID_QUEUE"]
ex_pass_queue = os.environ["EX_PASSWORD_QUEUE"]
broker= "amqp://"+ex_user_queue+":"+ex_pass_queue+"#"+ex_host_queue+":"+ex_port_queue+"//"
#celery initialization
app = Celery(__name__,backend=broker, broker=broker)
app.conf.task_default_queue = 'scheduler_queue'
app.conf.update(
task_serializer='json',
accept_content=['json'], # Ignore other content
result_serializer='json'
)
task = {"task_id":1,"a":10,"b":20}
##method to update scheduler
def add_scheduled_task(task):
print("scheduling task")
del task["_id"]
print("adding task_id")
name = task["task_name"]
app.add_periodic_task(timedelta(minutes=1),add.s(task), name = task["task_id"])
#app.task(name='scheduler_task')
def scheduler_task(data):
print(str(data["a"]+data["b"]))

Resources