I set up a periodic task using celery beat. The task runs and I can see the result in the console.
I want to have a python script that recollects the results thrown by the tasks.
I could do it like this:
#client.py
from cfg_celery import app
task_id = '337fef7e-68a6-47b3-a16f-1015be50b0bc'
try:
x = app.AsyncResult(id)
print(x.get())
except:
print('some error')
Anyway, as you can see, for this test I had to copy the task_id thrown at the celery beat console (so to say) and hardcode it in my script. Obviously this is not going to work in real production.
I hacked it setting the task_id on the celery config file:
#cfg_celery.py
app = Celery('celery_config',
broker='redis://localhost:6379/0',
include=['taskos'],
backend = 'redis'
)
app.conf.beat_schedule = {
'something': {
'task': 'tasks.add',
'schedule': 10.0,
'args': (16, 54),
'options' : {'task_id':"my_custom_id"},
}
}
This way I can read it like this:
#client.py
from cfg_celery import app
task_id = 'my_custom_id'
try:
x = app.AsyncResult(id)
print(x.get())
except:
print('some error')
The problem with this approach is that I lose the previous results (previous to the call of client.py).
Is there some way I can read a list of the task_id's in the celery backend?
If I have more than one periodic tasks, can I get a list of task_id's from each periodic task?
Can I use app.tasks.key() to accomplish this, how?
pd: not english-speaking-native, plus new to celery, be nice if I used some terminology wrong.
OK. I am not sure if nobody answered this because is difficult or because my question is too dumb.
Anyway, what I wanted to do is to get the results of my 'celery-beat' tasks from another python process.
Being in the same process there was no problem I could access the task id and everything was easy from there on. But from other process I didn't find a way to retrieve a list of the finished tasks.
I tried python-RQ (it is nice) but when I saw that using RQ I couldn't do that either I came to understand that I had to manually make use of redis storage capabilities. So I got what I wanted, doing this:
. Use 'bind=True' to be able to instrospect from within the task function.
. Once I have the result of the function, I write it in a list in redis (I made some trick to limit the sizeof this list)
. Now I can from an independent process connect to the same redis server and retrieve the results stored in such list.
My files ended up being like this:
cfg_celery.py : here I define the way the tasks are going to be called.
#cfg_celery.py
from celery import Celery
appo = Celery('celery_config',
broker='redis://localhost:6379/0',
include=['taskos'],
backend = 'redis'
)
'''
urlea se decoro como periodic_task. no hay necesidad de darla de alta aqi.
pero como add necesita args, la doy de alta manualmente p pasarselos
'''
appo.conf.beat_schedule = {
'q_loco': {
'task': 'taskos.add',
'schedule': 10.0,
'args': (16, 54),
# 'options' : {'task_id':"lcura"},
}
}
taskos.py : these are the tasks.
#taskos.py
from cfg_celery import appo
from celery.decorators import periodic_task
from redis import Redis
from datetime import timedelta
import requests, time
rds = Redis()
#appo.task(bind=True)
def add(self,a, b):
#result of operation. very dummy.
result = a + b
#storing in redis
r= (self.request.id,time.time(),result)
rds.lpush('my_results',r)
# for this test i want to have at most 5 results stored in redis
long = rds.llen('my_results')
while long > 5:
x = rds.rpop('my_results')
print('popping out',x)
long = rds.llen('my_results')
time.sleep(1)
return a + b
#periodic_task(run_every=20)
def urlea(url='https://www.fullstackpython.com/'):
inicio = time.time()
R = dict()
try:
resp = requests.get(url)
R['vato'] = url+" = " + str(resp.status_code*10)
R['num palabras'] = len(resp.text.split())
except:
R['vato'] = None
R['num palabras'] = 0
print('u {} : {}'.format(url,time.time()-inicio))
time.sleep(0.8) # truco pq se vea mas claramente la dif.
return R
consumer.py : the independent process that can get the results.
#consumer.py
from redis import Redis
nombre_lista = 'my_results'
rds = Redis()
tamaño = rds.llen(nombre_lista)
ultimos_resultados = list()
for i in range(tamaño):
ultimos_resultados.append(rds.rpop(nombre_lista))
print(ultimos_resultados)
I am relatively new to programming and I hope that this answer can help noobs like me. If I got something wrong feel free to make the corrections as necessary.
Related
I want to know if it is possible to run a Databricks job from a notebook using code, and how to do it
I have a job with multiple tasks, and many contributors, and we have a job created to execute it all, now we want to run the job from a notebook to test new features without creating a new task in the job, also for running the job multiple times in a loop, for example:
for i in [1,2,3]:
run job with parameter i
Regards
what you need to do is the following:
install the databricksapi. %pip install databricksapi==1.8.1
Create your job and return an output. You can do that by exiting the notebooks like that:
import json dbutils.notebook.exit(json.dumps({"result": f"{_result}"}))
If you want to pass a dataframe, you have to pass them as json dump too, there is some official documentation about that from databricks. check it out.
Get the job id you will need it later. You can get it from the jobs details in databricks.
In the executors notebook you can use the following code.
def run_ks_job_and_return_output(params):
context = json.loads(dbutils.notebook.entry_point.getDbutils().notebook().getContext().toJson())
# context
url = context['extraContext']['api_url']
token = context['extraContext']['api_token']
jobs_instance = Jobs.Jobs(url, token) # initialize a jobs_instance
runs_job_id = jobs_instance.runJob(****************, 'notebook',
params) # **** is the job id
run_is_not_completed = True
while run_is_not_completed:
current_run = [run for run in jobs_instance.runsList('completed')['runs'] if run['run_id'] == runs_job_id['run_id'] and run['number_in_job'] == runs_job_id['number_in_job']]
if len(current_run) == 0:
time.sleep(30)
else:
run_is_not_completed = False
current_run = current_run[0]
print( f"Result state: {current_run['state']['result_state']}, You can check the resulted output in the following link: {current_run['run_page_url']}")
note_output = jobs_instance.runsGetOutput(runs_job_id['run_id'])['notebook_output']
return note_output
run_ks_job_and_return_output( { 'parm1' : 'george',
'variable': "values1"})
If you want to run the job many times in parallel you can do the following. (first be sure that you have increased the max concurent runs in the job settings)
from multiprocessing.pool import ThreadPool
pool = ThreadPool(1000)
results = pool.map(lambda j: run_ks_job_and_return_output( { 'table' : 'george',
'variable': "values1",
'j': j}),
[str(x) for x in range(2,len(snapshots_list))])
There is also the possibility to save the whole html output but maybe you are not interested on that. In any case I will answer to that to another post on StackOverflow.
Hope it helps.
You can use following steps :
Note-01:
dbutils.widgets.text("foo", "fooDefault", "fooEmptyLabel")
dbutils.widgets.text("foo2", "foo2Default", "foo2EmptyLabel")
result = dbutils.widgets.get("foo")+"-"+dbutils.widgets.get("foo2")
def display():
print("Function Display: "+result)
dbutils.notebook.exit(result)
Note-02:
thislist = ["apple", "banana", "cherry"]
for x in thislist:
dbutils.notebook.run("Note-01 path", 60, {"foo": x,"foo2":'Azure'})
I am a newbie. My current project is when the current end decides to start the modbus service, I will create a process for the modbus service. Then the value is obtained in the parent process, through the ZeroMQ PUB/SUB to pass the value, I now want to update the value of the modbus register in the modbus service process.
I tried the method mentioned by pymodbus provided by updating_server.py, and twisted.internet.task.LoopingCall() to update the value of the register, but this will make it impossible for me to connect to my server with the client. I don't know why?
Use LoopingCall() to establish the server, the log when the client connects.
Then I tried to put both the uploading and startTCPserver in the async loop, but the update was only entered for the first time after the startup, and then it was not entered.
Currently, I'm using the LoopingCall() to handle updates, but I don't think this is a good way.
This is the code I initialized the PUB and all the tags that can read the tag.
from loop import cycle
import asyncio
from multiprocessing import Process
from persistence import models as pmodels
from persistence import service as pservice
from persistence import basic as pbasic
import zmq
from zmq.asyncio import Context
from common import logging
from server.modbustcp import i3ot_tcp as sertcp
import common.config as cfg
import communication.admin as ca
import json
import os
import signal
from datetime import datetime
from server.opcuaserver import i3ot_opc as seropc
async def main():
future = []
task = []
global readers, readers_old, task_flag
logger.debug("connecting to database and create table.")
pmodels.connect_create()
logger.debug("init read all address to create loop task.")
cycle.init_readers(readers)
ctx = Context()
publisher = ctx.socket(zmq.PUB)
logger.debug("init publish [%s].", addrs)
publisher.bind(addrs)
readers_old = readers.copy()
for reader in readers:
task.append(asyncio.ensure_future(
cycle.run_readers(readers[reader], publisher)))
if not len(task):
task_flag = True
logger.debug("task length [%s - %s].", len(task), task)
opcua_server = LocalServer(seropc.opc_server, "opcua")
future = [
start_get_all_address(),
start_api(),
create_address_loop(publisher, task),
modbus_server(),
opcua_server.run()
]
logger.debug("run loop...")
await asyncio.gather(*future)
asyncio.run(main(), debug=False)
This is to get the device tag value and publish it.
async def run_readers(reader, publisher):
while True:
await reader.run(publisher)
class DataReader:
def __init__(self, freq, clients):
self._addresses = []
self._frequency = freq
self._stop_signal = False
self._clients = clients
self.signature = sign_data_reader(self._addresses)
async def run(self, publisher):
while not self._stop_signal:
for addr in self._addresses:
await addr.read()
data = {
"type": "value",
"data": addr._final_value
}
publisher.send_pyobj(data)
if addr._status:
if addr.alarm_log:
return_alarm_log = pbasic.get_log_by_time(addr.alarm_log['date'])
if return_alarm_log:
data = {
"type": "alarm",
"data": return_alarm_log
}
publisher.send_pyobj(data)
self.data_send(addr)
logger.debug("run send data")
await asyncio.sleep(int(self._frequency))
def stop(self):
self._stop_signal = True
modbus server imports
from common import logging
from pymodbus.server.asynchronous import StartTcpServer
from pymodbus.device import ModbusDeviceIdentification
from pymodbus.datastore import ModbusSequentialDataBlock
from pymodbus.datastore import ModbusSlaveContext, ModbusServerContext
from persistence import service as pservice
from persistence import basic as pbasic
import zmq
import common.config as cfg
import struct
import os
import signal
from datetime import datetime
from twisted.internet.task import LoopingCall
def updating_writer(a):
logger.info("in updates of modbus tcp server.")
context = a[0]
# while True:
if check_pid(os.getppid()) is False:
os.kill(os.getpid(), signal.SIGKILL)
url = ("ipc://{}" .format(cfg.get('ipc', 'pubsub')))
logger.debug("connecting to [%s].", url)
ctx = zmq.Context()
subscriber = ctx.socket(zmq.SUB)
subscriber.connect(url)
subscriber.setsockopt(zmq.SUBSCRIBE, b"")
slave_id = 0x00
msg = subscriber.recv_pyobj()
logger.debug("updates.")
if msg['data']['data_type'] in modbus_server_type and msg['type'] == 'value':
addr = pservice.get_mbaddress_to_write_value(msg['data']['id'])
if addr:
logger.debug(
"local address and length [%s - %s].",
addr['local_address'], addr['length'])
values = get_value_by_type(msg['data']['data_type'], msg['data']['final'])
logger.debug("modbus server updates values [%s].", values)
register = get_register(addr['type'])
logger.debug(
"register [%d] local address [%d] and value [%s].",
register, addr['local_address'], values)
context[slave_id].setValues(register, addr['local_address'], values)
# time.sleep(1)
def tcp_server(pid):
logger.info("Get server configure and device's tags.")
st = datetime.now()
data = get_servie_and_all_tags()
if data:
logger.debug("register address space.")
register_address_space(data)
else:
logger.debug("no data to create address space.")
length = register_number()
store = ModbusSlaveContext(
di=ModbusSequentialDataBlock(0, [0] * length),
co=ModbusSequentialDataBlock(0, [0] * length),
hr=ModbusSequentialDataBlock(0, [0] * length),
ir=ModbusSequentialDataBlock(0, [0] * length)
)
context = ModbusServerContext(slaves=store, single=True)
identity = ModbusDeviceIdentification()
identity.VendorName = 'pymodbus'
identity.ProductCode = 'PM'
identity.VendorUrl = 'http://github.com/bashwork/pymodbus/'
identity.ProductName = 'pymodbus Server'
identity.ModelName = 'pymodbus Server'
identity.MajorMinorRevision = '2.2.0'
# ----------------------------------------------------------------------- #
# set loop call and run server
# ----------------------------------------------------------------------- #
try:
logger.debug("thread start.")
loop = LoopingCall(updating_writer, (context, ))
loop.start(1, now=False)
# process = Process(target=updating_writer, args=(context, os.getpid(),))
# process.start()
address = (data['tcp_ip'], int(data['tcp_port']))
nt = datetime.now() - st
logger.info("modbus tcp server begin has used [%s] s.", nt.seconds)
pservice.write_server_status_by_type('modbus', 'running')
StartTcpServer(context, identity=identity, address=address)
except Exception as e:
logger.debug("modbus server start error [%s].", e)
pservice.write_server_status_by_type('modbus', 'closed')
This is the code I created for the modbus process.
def process_stop(p_to_stop):
global ptcp_flag
pid = p_to_stop.pid
os.kill(pid, signal.SIGKILL)
logger.debug("process has closed.")
ptcp_flag = False
def ptcp_create():
global ptcp_flag
pid = os.getpid()
logger.debug("sentry pid [%s].", pid)
ptcp = Process(target=sertcp.tcp_server, args=(pid,))
ptcp_flag = True
return ptcp
async def modbus_server():
logger.debug("get mosbuc server's status.")
global ptcp_flag
name = 'modbus'
while True:
ser = pservice.get_server_status_by_name(name)
if ser['enabled']:
if ser['tcp_status'] == 'closed' or ser['tcp_status'] == 'running':
tags = pbasic.get_tag_by_name(name)
if len(tags):
if ptcp_flag is False:
logger.debug("[%s] status [%s].", ser['tcp_name'], ptcp_flag)
ptcp = ptcp_create()
ptcp.start()
else:
logger.debug("modbus server is running ...")
else:
logger.debug("no address to create [%s] server.", ser['tcp_name'])
pservice.write_server_status_by_type(name, "closed")
else:
logger.debug("[%s] server is running ...", name)
else:
if ptcp_flag:
process_stop(ptcp)
logger.debug("[%s] has been closed.", ser['tcp_name'])
pservice.write_server_status_by_type(name, "closed")
logger.debug("[%s] server not allowed to running.", name)
await asyncio.sleep(5)
This is the command that Docker runs.
/usr/bin/docker run --privileged --network host --name scout-sentry -v /etc/scout.cfg:/etc/scout.cfg -v /var/run:/var/run -v /sys:/sys -v /dev/mem:/dev/mem -v /var/lib/scout:/data --rm shulian/scout-sentry
This is the Docker configuration file /etc/scout.cfg.
[scout]
mode=product
[logging]
level=DEBUG
[db]
path=/data
[ipc]
cs=/var/run/scout-cs.sock
pubsub=/var/run/pubsub.sock
I want to be able to trigger the modbus value update function when there is a message coming from ZeroMQ, and it will be updated correctly.
Let's start from inside out.
Q : ...this will make it impossible for me to connect to my server with the client. I don't know why?
ZeroMQ is a smart broker-less messaging / signaling middleware or better a platform for smart-messaging. In case one feels not so much familiar with the art of Zen-of-Zero as present in ZeroMQ Architecture, one may like to start with ZeroMQ Principles in less than Five Seconds before diving into further details.
The Basis :
The Scalable Formal Communication Archetype, borrowed from ZeroMQ PUB/SUB, does not come at zero-cost.
This means that each infrastructure setup ( both on PUB-side and on SUB-side ) takes some, rather remarkable time and no one can be sure of when the AccessNode cnfiguration results in RTO-state. So the SUB-side (as proposed above) ought be either a permanent entity, or the user shall not expect to make it RTO in zero-time, after a twisted.internet.task.LoopingCall() gets reinstated.
Preferred way: instantiate your (semi-)persistent zmq.Context(), get it configured so as to serve the <aContextInstance>.socket( zmq.PUB ) as needed, a minimum safeguarding setup being the <aSocketInstance>.setsockopt( zmq.LINGER, 0 ) and all transport / queuing / security-handling details, that the exosystem exposes to your code ( whitelisting and secure sizing and resources protection being the most probable candidates - but details are related to your application domain and the risks that you are willing to face being prepared to handle them ).
ZeroMQ strongly discourages from sharing ( zero-sharing ) <aContextInstance>.socket()-instances, yet the zmq.Context()-instance can be shared / re-used (ref. ZeroMQ Principles... ) / passed to more than one threads ( if needed ).
All <aSocketInstance>{.bind()|.connect()}- methods are expensive, so try to setup the infrastructure AccessPoint(s) and their due error-handling way before one tries to use the their-mediated communication services.
Each <aSocketInstance>.setsockopt( zmq.SUBSCRIBE, ... ) is expensive in that it may take ( depending on (local/remote) version ) a form of a non-local, distributed-behaviour - local side "sets" the subscription, yet the remote side has to "be informed" about such state-change and "implements" the operations in line with the actual (propagated) state. While in earlier versions, all messages were dispatched from the PUB-side and all the SUB-side(s) were flooded with such data and were left for "filtering" which will be moved into a local-side internal-Queue, the newer versions "implement" the Topic-Filter on the PUB-side, which further increases the latency of setting the new modus-operandi in action.
Next comes the modus-operandi: how <aSocketInstance>.recv() gets results:
In their default API-state, .recv()-methods are blocking, potentially infinitely blocking, if no messages arrive.
Solution: avoid blocking-forms of calling ZeroMQ <aSocket>.recv()-methods by always using the zmq.NOBLOCK-modes thereof or rather test a presence or absence of any expected-message(s) with <aSocket>.poll( zmq.POLLIN, <timeout> )-methods available, with zero or controlled-timeouts. This makes you the master, who decides about the flow of code-execution. Not doing so, you knowingly let your code depend on external sequence ( or absence ) of events and your architecture is prone to awful problems with handling infinite blocking-states ( or potential unsalvageable many-agents' distributed behaviour live-locks or dead-locks )
Avoid uncontrolled cross-breeding of event-loops - like passing ZeroMQ-driven-loops into an external "callback"-alike handler or async-decorated code-blocks, where the stack of (non-)blocking logics may wreck havoc the original idea just by throwing the system into an unresolvable state, where events miss expected sequence of events and live-locks are unsalvagable or just the first pass happen to go through.
Stacking asyncio-code with twisted-LoopingCall()-s and async/await-decorated code + ZeroMQ blocking .recv()-s is either a Piece-of-Filligrane-Precise-Art-of-Truly-a-Zen-Master, or a sure ticket to Hell - with all respect to the Art-of-Truly-Zen-Masters :o)
So, yes, complex thinking is needed -- welcome to the realms of distributed-computing!
I have a dag that creates a spark-task and executes a certain script located in a particular directory. There are two tasks like this. Both of these tasks need to receive the same ID generated in the DAG file before these tasks are executed. If I simply store and pass a value solely via the python script, the IDs are different, which is normal. So I am trying to push the value to XCOM with a PythonOperator and task.
I need to pull the values from XCOM and update a 'params' dictionary with that information in order to be able to pass it to my spark task.
Could you please help me, i am hitting my head in the wall and just can't figure it out.
I tried the following:
create a function just to retrieve the data from xcom and the return it. Assigned this function to the params variable, but doesn't work. I cannot return from a python function inside the DAG which uses the xcom_pull function
tried assigning an empty list and appending to it from the python function. and then the final list to provide directly to my spark task. Doesn't work either. Please help!
Thanks a lot in advance for any help related to this. I will need this value the same for this and multiple other spark tasks that may come into the same DAG file.
DAG FILE
import..
from common.base_tasks import spark_task
default_args = {
'owner': 'airflow',
'start_date': days_ago(1),
'email_on_failure': True,
'email_on_retry': False,
}
dag = DAG(
dag_id='dag',
default_args=default_args,
schedule_interval=timedelta(days=1)
)
log_level = "info"
id_info = {
"id": str(uuid.uuid1()),
"start_time": str(datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S,%f'))
}
# this stores the value to XCOM successfully
def store_id(**kwargs):
kwargs['ti'].xcom_push(key='id_info', value=id_info)
store_trace_task = PythonOperator(task_id='store_id', provide_context=True, python_callable=store_id, dag=dag)
extra_config = {'log_level': log_level}
config = '''{"config":"data"}'''
params = {'config': config,'extra_config': json.dumps(extra_config}
# ---------- this doesn't work ----------
pars = []
pars.append(params)
def task1_pull_params(**kwargs):
tracing = kwargs['ti'].xcom_pull(task_ids='store_trace_task')
pars.append(tracing)
# params = {
# 'parsed_config': parsed_config,
# 'extra_config': json.dumps(extra_config),
# 'trace_data': tracing
# }
# return params # return pushes to xcom, xcom_push does the same
task1_pull_params = PythonOperator(task_id='task1_pull_params', provide_context=True, python_callable=task1_pull_params, dag=dag)
store_trace_task >> task1_pull_params
# returning value from the function and calling it to assign res to the params variable below also doesn't work
# params = task1_pull_params
# this prints only what's outside of the function, i.e. params
print("===== pars =====> ", pars)
pipeline_task1 = spark_task(
name='task1',
script='app.py',
params=params,
dag=dag
)
task1_pull_params >> pipeline_task1
I have a Python 3.6 code that connects to MQTT and subscribes to a topic. Every time that the callback function "on_message" gets triggered, it instantiates a class that has a single method that does the following: Opens the db file, save the received data, closes the db file.
The Python script described above works almost fine. It receives about 7 MQTT messages per second, so for each message it needs to [Open_DB - Save_Data - Close_DB]. There are some messages getting PUBACK but not saved, perhaps due to some many unnecesary operations, so I want to improve:
I spent a lot of time (not an expert) trying to create a class that would open the db once, write many thousands of times to the db, and only when done, close the db file. to create a class that would have three methods:
1. MyDbClass.open_db_file()
2. MyDbClass.save_data()
3. MyDbClass.close_db_file()
The problem as you may guess is that it is not possible to call MyDbClass.save_data() from within the "on_message" callback, even when the object has been placed on a global variable. Here is the non-working code with the proposed idea, that I cleaned up for easier reading:
# -----------------------------
This code has been cleaned-up for faster reading
import paho.mqtt.client as mqtt
import time
import json
import sqlite3
Global Variables
db_object = ""
class MyDbClass():
def __init__(self):
pass
def open_db_file(self, dbfile):
self.db_conn = sqlite3.connect(db_file)
return self.db_conn
def save_data(self, json_data):
self.time_stamp = time.strftime('%Y%m%d%H%M%S')
self.data = json.loads(json_data)
self.sql = '''INSERT INTO trans_reqs (received, field_a, field_b, field_c) \
VALUES (?, ?, ?, ?)'''
self.fields_values = ( self.time_stamp, self.data['one'], self.data['two'], self.data['three']] )
self.cur = self.db_conn.cursor()
self.cur.execute(self.sql, self.fields_values)
self.db_conn.commit()
def close_db_file(self):
self.cur.close()
self.db_conn.close()
def on_mqtt_message(client, userdata, msg):
global db_object
m_decode = msg.payload.decode("utf-8","ignore")
db_object.save_data(m_decode)
def main():
global db_object
Database to use - Trying to create an object to manage DB tasks (from MyDbClass)
db_file = "my_filename.sqlite"
db_object = MyDbClass.open_db_file(db_file)
# MQTT -- Set varibles
broker_address= "..."
port = 1883
client_id = "..."
sub_topic = "..."
sub_qos = 1
# MQTT -- Instanciate the MQTT Client class and set callbacks
client = mqtt.Client(client_id)
client.on_connect = on_mqtt_connect
client.on_disconnect = on_mqtt_disconnect
client.on_message = on_mqtt_message
client.on_log = on_mqtt_log
client.clean_session = True
#client.username_pw_set(usr, password=pwd) #set username and password
print('Will connect to broker ', broker_address)
client.connect(broker_address, port=port, keepalive=45 )
client.loop_start()
client.subscribe(sub_topic, sub_qos)
try:
while True:
time.sleep(.1)
except KeyboardInterrupt:
# Disconnects MQTT
client.disconnect()
client.loop_stop()
print("....................................")
print("........ User Interrupted ..........")
print("....................................")
db_object.close_db_file()
client.loop_stop()
client.disconnect()
if __name__ == "__main__":
main()
Any help on how to do this will be greatly appreciated!
What i am trying to achieve
Write a scheduler, that uses a database to schedule similar tasks at different timings.
For the same i am using celery beat, the code snippet below would give an idea
try:
reader = MongoReader()
except:
raise
try:
tasks = reader.get_scheduled_tasks()
except:
raise
celerybeat_schedule = dict()
for task in tasks:
celerybeat_schedule[task["task_id"]] =dict()
celerybeat_schedule[task["task_id"]]["task"] = task["task_name"]
celerybeat_schedule[task["task_id"]]["args"] = (task,)
celerybeat_schedule[task["task_id"]]["schedule"] = get_task_schedule(task)
app.conf.update(BROKER_URL=rabbit_mq_endpoint, CELERY_TASK_SERIALIZER='json', CELERY_ACCEPT_CONTENT=['json'], CELERYBEAT_SCHEDULE=celerybeat_schedule)
so these are three steps
- reading all tasks from datastore
- creating a dictionary, celery scheduler which is populated by all tasks having properties, task_name(method that would run), parameters(data to pass to the method), schedule(stores when to run)
- updating this with celery configurations
Expected scenario
given all entries run the same celery task name that just prints, have same schedule to be run every 5 min, having different parameters specifying what to print, lets say db has
task name , parameter , schedule
regular_print , Hi , {"minutes" : 5}
regular_print , Hello , {"minutes" : 5}
regular_print , Bye , {"minutes" : 5}
I expect, these to be printing every 5 minutes to print all three
What happens
Only one of Hi, Hello, Bye prints( possible randomly, surely not in sequence)
Please help,
Thanks a lot in advance :)
Was able to resolve this using version 4 of celery. Sample similar to what worked for me.. can also find in documentation by celery for version 4
#taking address and user-pass from environment(you can mention direct values)
ex_host_queue = os.environ["EX_HOST_QUEUE"]
ex_port_queue = os.environ["EX_PORT_QUEUE"]
ex_user_queue = os.environ["EX_USERID_QUEUE"]
ex_pass_queue = os.environ["EX_PASSWORD_QUEUE"]
broker= "amqp://"+ex_user_queue+":"+ex_pass_queue+"#"+ex_host_queue+":"+ex_port_queue+"//"
#celery initialization
app = Celery(__name__,backend=broker, broker=broker)
app.conf.task_default_queue = 'scheduler_queue'
app.conf.update(
task_serializer='json',
accept_content=['json'], # Ignore other content
result_serializer='json'
)
task = {"task_id":1,"a":10,"b":20}
##method to update scheduler
def add_scheduled_task(task):
print("scheduling task")
del task["_id"]
print("adding task_id")
name = task["task_name"]
app.add_periodic_task(timedelta(minutes=1),add.s(task), name = task["task_id"])
#app.task(name='scheduler_task')
def scheduler_task(data):
print(str(data["a"]+data["b"]))