How to use ApScheduler correctly in FastAPI? - python-3.x

from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
import uvicorn
import time
from loguru import logger
from apscheduler.schedulers.background import BackgroundScheduler
app = FastAPI()
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
test_list = ["1"]*10
def check_list_len():
global test_list
while True:
time.sleep(5)
logger.info(f"check_list_len:{len(test_list)}")
#app.on_event('startup')
def init_data():
scheduler = BackgroundScheduler()
scheduler.add_job(check_list_len, 'cron', second='*/5')
scheduler.start()
#app.get("/pop")
async def list_pop():
global test_list
test_list.pop(1)
logger.info(f"current_list_len:{len(test_list)}")
if __name__ == '__main__':
uvicorn.run(app="main3:app", host="0.0.0.0", port=80, reload=False, debug=False)
Above is my code, I want to take out a list of elements through get request, and set a periodic task constantly check the number of elements in the list, but when I run, always appear the following error:
Execution of job "check_list_len (trigger: cron[second='*/5'], next run at: 2021-11-25 09:48:50 CST)" skipped: maximum number of running instances reached (1)
2021-11-25 09:48:50.016 | INFO | main3:check_list_len:23 - check_list_len:10
Execution of job "check_list_len (trigger: cron[second='*/5'], next run at: 2021-11-25 09:48:55 CST)" skipped: maximum number of running instances reached (1)
2021-11-25 09:48:55.018 | INFO | main3:check_list_len:23 - check_list_len:10
INFO: 127.0.0.1:55961 - "GET /pop HTTP/1.1" 200 OK
2021-11-25 09:48:57.098 | INFO | main3:list_pop:35 - current_list_len:9
Execution of job "check_list_len (trigger: cron[second='*/5'], next run at: 2021-11-25 09:49:00 CST)" skipped: maximum number of running instances reached (1)
2021-11-25 09:49:00.022 | INFO | main3:check_list_len:23 - check_list_len:9
It looks like I started two scheduled tasks and only one succeeded, but I started only one task. How do I avoid this

You're getting the behavior you're asking for. You've configured apscheduler to run check_list_len every five seconds, but you've also made it so that function runs without terminating - just sleeping for five seconds in an endless loop. That function never terminates, so apscheduler doesn't run it again - since it still hasn't finished.
Remove the infinite loop inside your utility function when using apscheduler - it'll call the function every five seconds for you:
def check_list_len():
global test_list # you really don't need this either, since you're not reassigning the variable
logger.info(f"check_list_len:{len(test_list)}")

Related

Data acquisition and parallel analysis

With this example, I am able to start 10 processes and then continue to do "stuff".
import random
import time
import multiprocessing
if __name__ == '__main__':
"""Demonstration of GIL-friendly asynchronous development with Python's multiprocessing module"""
def process(instance):
total_time = random.uniform(0, 2)
time.sleep(total_time)
print('Process %s : completed in %s sec' % (instance, total_time))
return instance
for i in range(10):
multiprocessing.Process(target=process, args=(i,)).start()
for i in range(2):
print("im doing stuff")
output:
>>
im doing stuff
im doing stuff
Process 8 : completed in 0.5390905372395016 sec
Process 6 : completed in 1.2313793332779521 sec
Process 2 : completed in 1.3439237625459899 sec
Process 0 : completed in 2.171809500083049 sec
Process 5 : completed in 2.6980031493633887 sec
Process 1 : completed in 3.3807358192422416 sec
Process 3 : completed in 4.597366303348297 sec
Process 7 : completed in 4.702447947943171 sec
Process 4 : completed in 4.8355495004170965 sec
Process 9 : completed in 4.9917788543156245 sec
I'd like to have a main while True loop which do data acquisition and just start a new process at each iteration (with the new data) and check if any process has finished and look at the output.
How could I verify that a process has ended and what is its return value? Edit: while processes in a list are still executing.
If I had to summarize my problem: how can I know which process is finished in a list of processes - with some still executing or new added?

python threading event.wait() using same object in multiple threads

where is the documentation for the python3 threading library's event.wait() method that explains how 1 event can be used multiple times in different threads?
the example below shows the same event can be used in multiple threads, each with a different wait() duration, probably because each has its own lock under the hood.
But this functionality is not documented in an obvious way, on the threading page.
this works great but it's not clear why it works or if this will continue to work in future python versions.
are there ways this could bonk unexpectedly?
can an inherited event work properly in multiple classes as long as it's used in separate threads?
import logging
import threading
import time
logging.basicConfig(level=logging.DEBUG,
format='[%(levelname)s] (%(threadName)-10s) %(message)s',)
def worker(i,dt,e):
tStart=time.time()
e.wait(dt)
logging.debug('{0} tried to wait {1} seconds but really waited {2}'.format(i,dt, time.time()-tStart ))
e = threading.Event()
maxThreads=10
for i in range(maxThreads):
dt=1+i # (s)
w = threading.Thread(target=worker, args=(i,dt,e))
w.start()
output:
[DEBUG] (Thread-1 ) 0 tried to wait 1 seconds but really waited 1.0003676414489746
[DEBUG] (Thread-2 ) 1 tried to wait 2 seconds but really waited 2.00034761428833
[DEBUG] (Thread-3 ) 2 tried to wait 3 seconds but really waited 3.0001776218414307
[DEBUG] (Thread-4 ) 3 tried to wait 4 seconds but really waited 4.000180244445801
[DEBUG] (Thread-5 ) 4 tried to wait 5 seconds but really waited 5.000337362289429
[DEBUG] (Thread-6 ) 5 tried to wait 6 seconds but really waited 6.000308990478516
[DEBUG] (Thread-7 ) 6 tried to wait 7 seconds but really waited 7.000143051147461
[DEBUG] (Thread-8 ) 7 tried to wait 8 seconds but really waited 8.000152826309204
[DEBUG] (Thread-9 ) 8 tried to wait 9 seconds but really waited 9.00012469291687
[DEBUG] (Thread-10 ) 9 tried to wait 10 seconds but really waited 10.000144481658936
Since e is threading Event,
You are declaring it locally for each thread (all the 10 threads are excecated almost parallelly).
You can check it here:
import logging
import threading
import time
logging.basicConfig(level=logging.DEBUG,
format='[%(levelname)s] (%(threadName)s) %(message)s',)
def worker(i,dt,e):
tStart=time.time()
logging.info('Program will wait for {} time while trying to print the change from {} to {}'.format(dt,i,dt))
e.wait(dt)
logging.debug('{0} tried to wait {1} seconds but really waited {2}'.format(i,dt, time.time()-tStart ))
e = threading.Event()
maxThreads=10
for i in range(maxThreads):
dt=1+i # (s)
w = threading.Thread(target=worker, args=(i,dt,e))
w.start()
It's not the locking, its just about the value passed into the thread target

How to profile a vim plugin written in python

Vim offers the :profile command, which is really handy. But it is limited to Vim script -- when it comes to plugins implemented in python it isn't that helpful.
Currently I'm trying to understand what is causing a large delay on Denite. As it doesn't happen in vanilla Vim, but only on some specific conditions which I'm not sure how to reproduce, I couldn't find which setting/plugin is interfering.
So I turned to profiling, and this is what I got from :profile:
FUNCTION denite#vim#_start()
Defined: ~/.vim/bundle/denite.nvim/autoload/denite/vim.vim line 33
Called 1 time
Total time: 5.343388
Self time: 4.571928
count total (s) self (s)
1 0.000006 python3 << EOF
def _temporary_scope():
nvim = denite.rplugin.Neovim(vim)
try:
buffer_name = nvim.eval('a:context')['buffer_name']
if nvim.eval('a:context')['buffer_name'] not in denite__uis:
denite__uis[buffer_name] = denite.ui.default.Default(nvim)
denite__uis[buffer_name].start(
denite.rplugin.reform_bytes(nvim.eval('a:sources')),
denite.rplugin.reform_bytes(nvim.eval('a:context')),
)
except Exception as e:
import traceback
for line in traceback.format_exc().splitlines():
denite.util.error(nvim, line)
denite.util.error(nvim, 'Please execute :messages command.')
_temporary_scope()
if _temporary_scope in dir():
del _temporary_scope
EOF
1 0.000017 return []
(...)
FUNCTIONS SORTED ON TOTAL TIME
count total (s) self (s) function
1 5.446612 0.010563 denite#helper#call_denite()
1 5.396337 0.000189 denite#start()
1 5.396148 0.000195 <SNR>237_start()
1 5.343388 4.571928 denite#vim#_start()
(...)
I tried to use the python profiler directly by wrapping the main line:
import cProfile
cProfile.run(_temporary_scope(), '/path/to/log/file')
, but no luck -- just a bunch of errors from cProfile. Perhaps it is because the way python is started from Vim, as it is hinted here that it only works on the main thread.
I guess there should be an easier way of doing this.
The python profiler does work by enclosing the whole code,
cProfile.run("""
(...)
""", '/path/to/log/file')
, but it is not that helpful. Maybe it is all that is possible.

Celery beat is not showing or executing scheduled tasks

I am using celery and celery beat to handle task execution and scheduled tasks in a Python project. I am not using django.
Execution of celery tasks is working as expected. However I have run into a wall trying to get scheduled tasks (celery beat) to run.
I have followed the celery documentation to add my task to app.conf.beat_schedule successfully. If I print out the beat schedule after adding my task, I can see that the task has been added to app.conf.beat_schedule successfully.
from celery import Celery
from celery.task import task
# Celery init
app = Celery('tasks', broker='pyamqp://guest#localhost//')
# get the latest device reading from the appropriate provider
#app.task(bind=True, retry_backoff=True)
def get_reading(self, provider, path, device, config, location, callback):
logger.info("get_reading() called")
module = importlib.import_module('modules.%s' % provider)
try:
module.get_reading(path, device, config, location, callback)
except Exception as e:
self.retry(exc=e)
# add the periodic task
def add_get_reading_periodic_task(provider, path, device, config, location, callback, interval = 600.0):
app.conf.beat_schedule = {
"poll-provider": {
"task": "get_reading",
"schedule": interval,
"args": (provider, path, device, config, location, callback)
}
}
logger.info(app.conf.beat_schedule)
logger.info("Added task 'poll-provider' for %s to beat schedule" % provider)
Looking at my application log, I can see that app.conf.beat_schedule has been updated with the data passed to add_get_reading_periodic_task():
2017-08-17 11:07:13,216 - gateway - INFO - {'poll-provider': {'task': 'get_reading', 'schedule': 10, 'args': ('provider1', '/opt/provider1', None, {'location': {'lan.local': {'uri': 'http://192.168.1.10'}}}, 'lan.local', {'url': 'http://localhost:8080', 'token': '*******'})}}
2017-08-17 11:07:13,216 - gateway - INFO - Added task 'poll-provider' for provider1 to beat schedule
I'm manually running celery worker and celery beat simultaneously (in different terminal windows) on the same application file:
$ celery worker -A gateway --loglevel=INFO
$ celery beat -A gateway --loglevel=DEBUG
If I call get_reading.delay(...) within my application, it is executed by the celery worker as expected.
However, the celery beat process never shows any indication that the scheduled task is registered:
celery beat v4.0.2 (latentcall) is starting.
__ - ... __ - _
LocalTime -> 2017-08-17 11:05:15
Configuration ->
. broker -> amqp://guest:**#localhost:5672//
. loader -> celery.loaders.app.AppLoader
. scheduler -> celery.beat.PersistentScheduler
. db -> celerybeat-schedule
. logfile -> [stderr]#%DEBUG
. maxinterval -> 5.00 minutes (300s)
[2017-08-17 11:05:15,228: DEBUG/MainProcess] Setting default socket timeout to 30
[2017-08-17 11:05:15,228: INFO/MainProcess] beat: Starting...
[2017-08-17 11:05:15,248: DEBUG/MainProcess] Current schedule:
<ScheduleEntry: celery.backend_cleanup celery.backend_cleanup() <crontab: 0 4 * * * (m/h/d/dM/MY)>
[2017-08-17 11:05:15,248: DEBUG/MainProcess] beat: Ticking with max interval->5.00 minutes
[2017-08-17 11:05:15,250: DEBUG/MainProcess] beat: Waking up in 5.00 minutes.
[2017-08-17 11:10:15,351: DEBUG/MainProcess] beat: Synchronizing schedule...
[2017-08-17 11:10:15,355: DEBUG/MainProcess] beat: Waking up in 5.00 minutes.
[2017-08-17 11:15:15,400: DEBUG/MainProcess] beat: Synchronizing schedule...
[2017-08-17 11:15:15,402: DEBUG/MainProcess] beat: Waking up in 5.00 minutes.
[2017-08-17 11:20:15,502: DEBUG/MainProcess] beat: Synchronizing schedule...
[2017-08-17 11:20:15,504: DEBUG/MainProcess] beat: Waking up in 5.00 minutes.
This is seemingly confirmed by running celery inspect scheduled:
-> celery#localhost.lan: OK
- empty -
I have tried starting celery beat both before and after adding the scheduled task to app.conf.beat_schedule, and in both cases the scheduled task never appears in celery beat.
I read that celery beat did not support dynamic reloading of the configuration until version 4, but I am running celery beat 4.0.2
What am I doing wrong here? Why isn't celery beat showing my scheduled task?
Have you tried using the code as described in the Documentation:
#app.on_after_configure.connect
def setup_periodic_tasks(sender, **kwargs):
# Calls test('hello') every 10 seconds.
sender.add_periodic_task(10.0, test.s('hello'), name='add every 10')
?
create a model for creating periodic tasks:
class TaskScheduler(models.Model):
periodic_task = models.OneToOneField(
PeriodicTask, on_delete=models.CASCADE, blank=True, null=True)
node = models.OneToOneField(
Node, on_delete=models.CASCADE, blank=True, null=True)
#staticmethod
def schedule_every(task_name, period, every, sheduler_name, args, queue=None):
# schedules a task by name every "every" "period". So an example call would be:
# TaskScheduler('mycustomtask', 'seconds', 30, [1,2,3])
# that would schedule your custom task to run every 30 seconds with the arguments 1,2 and 3 passed to the actual task.
permissible_periods = ['days', 'hours', 'minutes', 'seconds']
if period not in permissible_periods:
raise Exception('Invalid period specified')
# create the periodic task and the interval
# create some name for the period task
ptask_name = sheduler_name
interval_schedules = IntervalSchedule.objects.filter(
period=period, every=every)
if interval_schedules: # just check if interval schedules exist like that already and reuse em
interval_schedule = interval_schedules[0]
else: # create a brand new interval schedule
interval_schedule = IntervalSchedule()
interval_schedule.every = every # should check to make sure this is a positive int
interval_schedule.period = period
interval_schedule.save()
if(queue):
ptask = PeriodicTask(name=ptask_name, task=task_name,
interval=interval_schedule, queue=queue)
else:
ptask = PeriodicTask(name=ptask_name, task=task_name,
interval=interval_schedule)
if(args):
ptask.args = args
ptask.save()
return TaskScheduler.objects.create(periodic_task=ptask)
#staticmethod
def schedule_cron(task_name, at, sheduler_name, args):
# schedules a task by name every day at the #at time.
# create some name for the period task
ptask_name = sheduler_name
crons = CrontabSchedule.objects.filter(
hour=at.hour, minute=at.minute)
if crons: # just check if CrontabSchedule exist like that already and reuse em
cron = crons[0]
else: # create a brand new CrontabSchedule
cron = CrontabSchedule()
cron.hour = at.hour
cron.minute = at.minute
cron.save()
ptask = PeriodicTask(name=ptask_name,
crontab=cron, task=task_name)
if(args):
ptask.args = args
ptask.save()
return TaskScheduler.objects.create(periodic_task=ptask)
def stop(self):
""" pauses the task """
ptask = self.periodic_task
ptask.enabled = False
ptask.save()
def start(self):
ptask = self.periodic_task
ptask.enabled = True
ptask.save()
def terminate(self):
self.stop()
ptask = self.periodic_task
PeriodicTask.objects.get(name=ptask.name).delete()
self.delete()
ptask.delete()
and then from you code
# give the periodic task a unique name
scheduler_name = "%s_%s" % ('node_heartbeat', str(node.pk))
# organize the task arguments
args = json.dumps([node.extern_id, node.pk])
# create the periodic task in heartbeat queue
task_scheduler = TaskScheduler().schedule_every(
'core.tasks.pdb_heartbeat', 'seconds', 15, scheduler_name , args, 'heartbeat')
task_scheduler.node = node
task_scheduler.save()
the original class I came across here a long time ago I've added schedule_cron

threads and function 'print'

I'm trying to parallelize a script that prints out how many documents, pictures and videos there are in a directory as well as some other informations. I've put the serial script at the end of this message. Here's one example that shows how it outputs the informations about the directory given :
7 documents use 110.4 kb ( 1.55 % of total size)
2 pictures use 6.8 Mb ( 98.07 % of total size)
0 videos use 0.0 bytes ( 0.00 % of total size)
9 others use 26.8 kb ( 0.38 % of total size)
Now, I would like to use threads to minimize the execution time. I've tried this :
import threading
import tools
import time
import os
import os.path
directory_path="Users/usersos/Desktop/j"
cv=threading.Lock()
type_=["documents","pictures","videos"]
e={}
e["documents"]=[".pdf",".html",".rtf",".txt"]
e["pictures"]=[".png",".jpg",".jpeg"]
e["videos"]=[".mpg",".avi",".mp4",".mov"]
class type_thread(threading.Thread):
def __init__(self,n,e_):
super().__init__()
self.extensions=e_
self.name=n
def __run__(self):
files=tools.g(directory_path,self.extensions)
n=len(files)
s=tools.size1(files)
p=s*100/tools.size2(directory_path)
cv.acquire()
print("{} {} use {} ({:10.2f} % of total size)".format(n,self.name,tools.compact(s),p))
cv.release()
types=[type_thread(t,e[t]) for t in type_]
for t in types:
t.start()
for t in types:
t.join()
When I run that, nothing is printed out ! And when I key in 't'+'return key' in the interpreter, I get <type_thread(videos, stopped 4367323136)> What's more, sometimes the interpreter returns the right statistics with these same keys.
Why is that ?
Initial script (serial) :
import tools
import time
import os
import os.path
type_=["documents","pictures","videos"]
all_=type_+["others"]
e={}
e["documents"]=[".pdf",".html",".rtf",".txt"]
e["pictures"]=[".png",".jpg",".jpeg"]
e["videos"]=[".mpg",".avi",".mp4",".mov"]
def statistic(directory_path):
#----------------------------- Computing ---------------------------------
d={t:tools.g(directory_path,e[t]) for t in type_}
d["others"]=[os.path.join(root,f) for root, _, files_names in os.walk(directory_path) for f in files_names if os.path.splitext(f)[1].lower() not in e["documents"]+e["pictures"]+e["videos"]]
n={t:len(d[t]) for t in type_}
n["others"]=len(d["others"])
s={t:tools.size1(d[t]) for t in type_}
s["others"]=tools.size1(d["others"])
s_dir=tools.size2(directory_path)
p={t:s[t]*100/s_dir for t in type_}
p["others"]=s["others"]*100/s_dir
#----------------------------- Printing ---------------------------------
for t in all_:
print("{} {} use {} ({:10.2f} % of total size)".format(n[t],t,tools.compact(s[t]),p[t]))
return s_dir
Method start() seems not to work. When I replace
for t in types:
t.start()
for t in types:
t.join()
with
for t in types:
t.__run__()
It works fine (at least for now, I don't know if it will still when I'll add other commands).

Resources