Dispy, initiating a SharedJobCluster on a compute node - dispy

I am creating a compute cluster in python using dispy. One of my use cases would be very nicely solved by starting a process on a compute node that itself starts a distributed process. As such, I have implemented the SharedJobCluster on the primary scheduler, and also in the function that will be sent to the cluster (which should in turn, start a series of distributed processes). However, when the second SharedJobCluster is initiated, the code hangs and does not move past this line (nor show any errors).
Minimum working example:
def clusterfun():
import dispy
import test2
import logging
log_filename = 'worker.log'
logging.basicConfig(filename=log_filename,
level=logging.DEBUG,
format='%(asctime)s %(name)-12s %(levelname)-8s %(message)s',
datefmt='[%m-%d-%Y %H:%M:%S]')
logging.info("Starting cluster...")
# THE FOLLOWING LINE HANGS
cluster = dispy.SharedJobCluster(test2.clusterfun2, port=0, scheduler_node='127.0.0.1')
logging.info("Started cluster...")
job = cluster.submit()
logging.info("Submitted job...")
return job()
if __name__ == '__main__':
import dispy
#
# Start the Compute cluster
#
cluster = dispy.SharedJobCluster(clusterfun, port=0, depends=['test2.py'], scheduler_node='127.0.0.1')
job = cluster.submit()
print(job())
test2.py contains:
def clusterfun2():
return "Foo"
For reference, I am currently running the dispyscheduler.py, dispynode, and this python code all on the same machine. This setup works, except when trying to initiate embedded distribution task.
The worker.log output contains "Starting cluster..." but nothing else.
If I check the status of the node it says that it is running 1 job, but it never completes.

Related

How to shut down CherryPy in no incoming connections for specified time?

I am using CherryPy to speak to an authentication server. The script runs fine if all the inputted information is fine. But if they make an mistake typing their ID the internal HTTP error screen fires ok, but the server keeps running and nothing else in the script will run until the CherryPy engine is closed so I have to manually kill the script. Is there some code I can put in the index along the lines of
if timer >10 and connections == 0:
close cherrypy (< I have a method for this already)
Im mostly a data mangler, so not used to web servers. Googling shows lost of hits for closing CherryPy when there are too many connections but not when there have been no connections for a specified (short) time. I realise the point of a web server is usually to hang around waiting for connections, so this may be an odd case. All the same, any help welcome.
Interesting use case, you can use the CherryPy plugins infrastrcuture to do something like that, take a look at this ActivityMonitor plugin implementation, it shutdowns the server if is not handling anything and haven't seen any request in a specified amount of time (in this case 10 seconds).
Maybe you have to adjust the logic on how to shut it down or do anything else in the _verify method.
If you want to read a bit more about the publish/subscribe architecture take a look at the CherryPy Docs.
import time
import threading
import cherrypy
from cherrypy.process.plugins import Monitor
class ActivityMonitor(Monitor):
def __init__(self, bus, wait_time, monitor_time=None):
"""
bus: cherrypy.engine
wait_time: Seconds since last request that we consider to be active.
monitor_time: Seconds that we'll wait before verifying the activity.
If is not defined, wait half the `wait_time`.
"""
if monitor_time is None:
# if monitor time is not defined, then verify half
# the wait time since the last request
monitor_time = wait_time / 2
super().__init__(
bus, self._verify, monitor_time, self.__class__.__name__
)
# use a lock to make sure the thread that triggers the before_request
# and after_request does not collide with the monitor method (_verify)
self._active_request_lock = threading.Lock()
self._active_requests = 0
self._wait_time = wait_time
self._last_request_ts = time.time()
def _verify(self):
# verify that we don't have any active requests and
# shutdown the server in case we haven't seen any activity
# since self._last_request_ts + self._wait_time
with self._active_request_lock:
if (not self._active_requests and
self._last_request_ts + self._wait_time < time.time()):
self.bus.exit() # shutdown the engine
def before_request(self):
with self._active_request_lock:
self._active_requests += 1
def after_request(self):
with self._active_request_lock:
self._active_requests -= 1
# update the last time a request was served
self._last_request_ts = time.time()
class Root:
#cherrypy.expose
def index(self):
return "Hello user: current time {:.0f}".format(time.time())
def main():
# here is how to use the plugin:
ActivityMonitor(cherrypy.engine, wait_time=10, monitor_time=5).subscribe()
cherrypy.quickstart(Root())
if __name__ == '__main__':
main()

Python RQ-Scheduler not giving any output

I am unable to get rq_scheduler working. Here is a simple example:
app.py
from flask import Flask
import datetime
from redis import Redis
from rq import Queue
from rq_scheduler import Scheduler
from tasks import example
app=Flask(__name__)
app.secret_key='abc'
app.redis = Redis.from_url('redis://')
app.task_queue = Queue('test', connection=app.redis)
scheduler = Scheduler(queue=app.task_queue,connection=app.redis)
#app.task_queue.enqueue('tasks.example',2)
#scheduler.enqueue_at(datetime.datetime(2020,4,16,10,46), example, 2)
scheduler.enqueue_in(datetime.timedelta(seconds=1), example, 2)
if __name__=='__main__':
app.run(host='0.0.0.0', port=5000, debug=True)
tasks.py
import time
def example(seconds):
print('Starting task')
for i in range(seconds):
print(i)
time.sleep(1)
print('Task completed')
In the app directory in terminal, I start the following in separate tabs:
$redis-server
$rq worker test
$rqscheduler
$python app.py
The first queue.enqueue works fine. Both scheduler tasks do nothing. What is wrong?
I suspect that you may be getting confused because rqscheduler by default checks for new jobs every one minute. You can tweak this with the -i flag to set the interval in seconds, and also add the -v flag for more verbose output:
rqscheduler -i 1 -v
However I also noticed another issue with the above Flask code...
Probably due to the dev server spawning a separate process I was finding that the scheduler.enqueue_in function was enqueuing the job twice. This probably wouldn't be an issue if the enqueue_in function was called inside a view function. However where you have it placed this actually runs when the application is started.
So when launching with the dev server this gets executed twice. This will then run once every time the autoreloader senses a code change: So after starting the dev server, then saving a change to the code, 3 jobs total have been enqueued.
For the purpose of testing this, it may be advisable just to have a simple python script which doesn't actually run the Flask app:
# enqueue_test.py
from redis import Redis
from rq import Queue
from rq_scheduler import Scheduler
from tasks import example
r = Redis.from_url('redis://localhost:6379')
q = Queue('test', connection=r)
scheduler = Scheduler(queue=q, connection=r)
scheduler.enqueue_in(datetime.timedelta(seconds=1), example, 2)

Pytest function testing multi-processing task queue service

I have a task queue processing service that I'm trying to run pytest function testing on. When running it in 'production', I start this from the command line, e.g. python main.py.
I can't figure out how to start this task service from pytest to do function testing on it. How do I start up the service inside pytest so that I can then add a job to it and see if the job gets processed and added to the database when completed?
def main():
store = "jobs"
worker_id = 1
# Process tasks
task_processing[store] = multiprocessing.Process(
target=process_tasks, args=(store, worker_id)
)
nanopub_processing[store].start()
if __name__ == "__main__":
main()
Just make sure you access the main function correctly:
from main import main
def test_main():
main()
...

Is there a way to run python flask function, every specific interval of time and display on the local server the output?

I am working python program using flask, where i want to extract keys from dictionary. this keys is in text format. But I want to repeat this above whole process after every specific interval of time. And display this output on local browser each time.
I have tried this using flask_apscheduler. The program run and shows output but only once, but dose not repeat itself after interval of time.
This is python program which i tried.
#app.route('/trend', methods=['POST', 'GET'])
def run_tasks():
for i in range(0, 1):
app.apscheduler.add_job(func=getTrendingEntities, trigger='cron', args=[i], id='j'+str(i), second = 5)
return "Code run perfect"
#app.route('/loc', methods=['POST', 'GET'])
def getIntentAndSummary(self, request):
if request.method == "POST":
reqStr = request.data.decode("utf-8", "strict")
reqStrArr = reqStr.split()
reqStr = ' '.join(reqStrArr)
text_1 = []
requestBody = json.loads(reqStr)
if requestBody.get('m') is not None:
text_1.append(requestBody.get('m'))
return jsonify(text_1)
if (__name__ == "__main__"):
app.run(port = 8000)
The problem is that you're calling add_job every time the /trend page is requested. The job should only be added once, as part of the initialization, before starting the scheduler (see below).
It would also make more sense to use the 'interval' trigger instead of 'cron', since you want your job to run every 5 seconds. Here's a simple working example:
from flask import Flask
from flask_apscheduler import APScheduler
import datetime
app = Flask(__name__)
#function executed by scheduled job
def my_job(text):
print(text, str(datetime.datetime.now()))
if (__name__ == "__main__"):
scheduler = APScheduler()
scheduler.add_job(func=my_job, args=['job run'], trigger='interval', id='job', seconds=5)
scheduler.start()
app.run(port = 8000)
Sample console output:
job run 2019-03-30 12:49:55.339020
job run 2019-03-30 12:50:00.339467
job run 2019-03-30 12:50:05.343154
job run 2019-03-30 12:50:10.343579
You can then modify the job attributes by calling scheduler.modify_job().
As for the second problem which is refreshing the client view every time the job runs, you can't do that directly from Flask. An ugly but simple way would be to add <meta http-equiv="refresh" content="1" > to the HTML page to instruct the browser to refresh it every second. A much better implementation would be to use SocketIO to send new data in real-time to the web client.
I would recommend that you start a demonized thread, import your application variable, then you can use with app.app_context() in order to log into to your console.
It's a little bit more fiddly but allows the application to run separated by different threads.
I use this method to fire off a bunch of http requests concurrently. The alternative is wait for each response before making a new one.
I'm sure you've realised that the thread will become occupied of you run an infinitely running command.
Make sure to demonize the thread so that when you stop your web app it will kill the thread at the same time gracefully.

APScheduler resets after every deploy

I have a script which which when run adds rss feed parsing tasks to some celery queues. Now I have implemented apscheduler to run the script every 2 hours to get new data from the feeds.
My implementation looks like this:
#!/usr/bin/env python
import atexit
import logging
import os
from logging import getLogger
from apscheduler.schedulers.blocking import BlockingScheduler
logger = getLogger('scheduled_parser')
PARSER_SCHEDULER = 'parser_scheduler'
def main():
scheduler = BlockingScheduler(job_defaults={'coalesce': True})
scheduler.add_jobstore('sqlalchemy',alias='scheduler_config', url=os.environ.get("DATABASE_URL"))
scheduler.add_job(run_parser, 'interval', seconds=int(os.environ.get("SCHEDULER_RUN_FREQUENCY")),
id=PARSER_SCHEDULER, replace_existing=True)
scheduler.start()
atexit.register(lambda: scheduler.shutdown())
def run_parser():
< code to add items to queues>
if __name__ == "__main__":
logging.basicConfig()
logger.setLevel(logging.INFO)
main()
My code is deployed on heroku and I have following in my procfile
clock: python scheduled_parser
<celery worker processes>
I am having following issues:
I am storing the scheduler job in persistant storage and I can even see it in my db, but when I do scheduler.get_job(PARSER_SCHEDULER,'scheduler_config') I get None
Whenever I deploy on heroku, I think the next run is being updated. For example if parser is set to run every 2 hours and next run going to be at 4:00pm and if I deploy on Heroku at 3:00pm then my next run happens at 5:00pm instead of 4:00pm.
Not sure about your issue #1, but I think issue #2 is that on every deploy, this line is going to replace the job, thus resetting the schedule:
scheduler.add_job(run_parser, 'interval', seconds=int(os.environ.get("SCHEDULER_RUN_FREQUENCY")),
id=PARSER_SCHEDULER, replace_existing=True)

Resources