I am trying to set up a delayed task task whose timing will depend on several parameters (either passed to it or obtained from a redis database).
the pseudocode would look like this:
def main():
scheduler = BackgroundScheduler()
scheduler.add_job(delayed_task,
id=task_id,
next_run_time=somedate,
args=(task_id, some_data))
scheduler.start()
do_something_else()
def delayed_task(id, passed_data):
rd = connect_to_redis()
redis_data = rd.fetch_data(id)
publish_data(passed_data, redis_data)
updated_run_time = parse(redis_data)
#obtain a scheduler object here
scheduler.modify_job(id, next_run_time=updated_run_time)
The question is the following: is there a way to access the scheduler from a task?
The scheduler cannot be passed as parameter to the task, as this will raise
TypeError: can't pickle _thread.lock objects
For the same reason, I can't put all this in a class and have it called a method since the arguments of the method include self, which is the class containing the scheduler, and will thus result in the same issue.
Is it possible to regain an instance of the scheduler from outside, like I can generate a new connexion to redis?
Related
I am currently developing some code that deals with big multidimensional arrays. Of course, Python gets very slow if you try to perform these computations in a serialized manner. Therefore, I got into code parallelization, and one of the possible solutions I found has to do with the multiprocessing library.
What I have come up with so far is first dividing the big array in smaller chunks and then do some operation on each of those chunks in a parallel fashion, using a Pool of workers from multiprocessing. For that to be efficient and based on this answer I believe that I should use a shared memory array object defined as a global variable, to avoid copying it every time a process from the pool is called.
Here I add some minimal example of what I'm trying to do, to illustrate the issue:
import numpy as np
from functools import partial
import multiprocessing as mp
import ctypes
class Trials:
# Perform computation along first dimension of shared array, representing the chunks
def Compute(i, shared_array):
shared_array[i] = shared_array[i] + 2
# The function you actually call
def DoSomething(self):
# Initializer function for Pool, should define the global variable shared_array
# I have also tried putting this function outside DoSomething, as a part of the class,
# with the same results
def initialize(base, State):
global shared_array
shared_array = np.ctypeslib.as_array(base.get_obj()).reshape(125, 100, 100) + State
base = mp.Array(ctypes.c_float, 125*100*100) # Create base array
state = np.random.rand(125,100,100) # Create seed
# Initialize pool of workers and perform calculations
with mp.Pool(processes = 10,
initializer = initialize,
initargs = (base, state,)) as pool:
run = partial(self.Compute,
shared_array = shared_array) # Here the error says that shared_array is not defined
pool.map(run, np.arange(125))
pool.close()
pool.join()
print(shared_array)
if __name__ == '__main__':
Trials = Trials()
Trials.DoSomething()
The trouble I am encountering is that when I define the partial function, I get the following error:
NameError: name 'shared_array' is not defined
For what I understand, I think that means that I cannot access the global variable shared_array. I'm sure that the initialize function is executing, as putting a print statement inside of it gives back a result in the terminal.
What am I doing incorrectly, is there any way to solve this issue?
I'm working on my first ever REST API, so apologies in advance if I've missed something basic. I have a function that takes a JSON request from another server, processes it (makes a prediction based on the data), and returns another JSON with the results. I'd like to keep a log on the server's local disk of all requests to this endpoint along with their results, for evaluation purposes and for retraining the model. However, for the purposes of minimising the latency of returning the result to the user, I'd like to return the response data first, and then write it to the local disk. It's not obvious to me how to do this properly, as the FastAPI paradigm necessitates that the result of a POST method is the return value of the decorated function, so anything I want to do with the data has to be done before it is returned.
Below is a minimal working example of what I think is my closest attempt at getting it right so far, using a custom object with a log decorator - my idea was just to assign the result to the log object as a class attribute, then use another method to write it to disk, but I can't figure out how to make sure that that function gets called after get_data every time.
import json
import uvicorn
from fastapi import FastAPI, Request
from functools import wraps
from pydantic import BaseModel
class Blob(BaseModel):
id: int
x: float
def crunch_numbers(data: Blob) -> dict:
# does some stuff
return {'foo': 'bar'}
class PostResponseLogger:
def __init__(self) -> None:
self.post_result = None
def log(self, func, *args, **kwargs):
#wraps(func)
def func_to_log(*args, **kwargs):
post_result = func(*args, **kwargs)
self.post_result = post_result
# how can this be done outside of this function ???
self.write_data()
return post_result
return func_to_log
def write_data(self):
if self.post_result:
with open('output.json', 'w') as f:
json.dump(self.post_result, f)
def main():
app = FastAPI()
logger = PostResponseLogger()
#app.post('/get_data/')
#logger.log
def get_data(input_json: dict, request: Request):
result = crunch_numbers(input_json)
return result
uvicorn.run(app=app)
if __name__ == '__main__':
main()
Basically, my question boils down to: "is there a way, in the PostResponseLogger class, to automatically call self.write_data after every call to self.log?", but if I'm using the wrong approach altogether, any other suggestions are also welcome.
You could have a Background Task for that purpose. A background task "will run only once the response has been sent" (see Starlette documentation). "This is useful for operations that need to happen after a request, but that the client doesn't really have to be waiting for the operation to complete before receiving the response" (see FastAPI documentation).
You can define a task function to run in the background for writing the log data, as shown below:
def write_log_data():
logger.write_data()
Then, import BackgroundTasks and define a parameter in your endpoint with a type declaration of BackgroundTasks. Inside of your endpoint, pass your task function (i.e., write_log_data, as defined above) to the background_tasks object with the method .add_task():
from fastapi import BackgroundTasks
#app.post('/get_data/')
#logger.log
def get_data(input_json: dict, request: Request, background_tasks: BackgroundTasks):
result = crunch_numbers(input_json)
background_tasks.add_task(write_log_data)
return result
The same principle could be applied if a middleware was used to capture and log the response data, as described in this answer, or a custom APIRoute class, as demonstrated in this answer.
For future reference, if you (or anyone) ever need to use async/await syntax, and run into concurrency issues (such as the event loop getting blocked) while performing some heavy background computation, please have a look at this answer, which explains the difference between defining an endpoint or a background task function with async def and def (briefly, async def endpoints/background tasks will run in the event loop, whereas def functions will run in an external threadpool that is then awaited), as well as provides solutions when it comes to running blocking I/O-bound or CPU-bound operations in such functions.
This is the problem I am having. I wont share code because of condfidentiality but instead I will provide some dummy example.
Assume that we have a class as follows:
class SayHello:
def __init__(self, name, id):
self.name=name
self.id=id
#public func
def doSomething(self, arg1, arg2 ):
DoAHugeTaskWithArgument
Lets say now that in an other modules we have this:
class CallOperations:
def __init__(self):
self.dummydict={1: {"james":20, "peter":30, "victor":40, "john":45, "ali":21, "tom":41, "hector":37}, 2:{"james":23, "peter":31, "victor":44, "john":46, "ali":23, "tom":44, "hector":35} }
def runProcessors(self):
#runprocess
for _, v in self.dummydict.items():
Instances = [SayHello(g,b) for g ,b in v.items()]
with ProcessPoolExecutor(max_workers=2) as executor:
future = [executor.submit(ins.doSomething, 2, 1235) for ins in Instances]
So the problem starts here. I want to know what instances are running doSomething() funtion in their respective process. I want to set a variable = 1 when the function of that instance is running in the process and set it to zero when it is completed.
Each instance has its own name and id. Is there way to find out the name of the running instance in the process?
This problem is making me very confused and can not find a proper solution.
Thank you alot.
If I understand your question correctly, you want to know when an instance of SayHello is executing and when it is not. You can set a variable (1 or 0) by using a Manager - but the usefulness of this can be debated. You might want to use a lock instead.
I had to tweak your code a bit but this is a running example. It picks one of your tasks as the one to monitor in the while loop. It is a dummy loop that never exits but you'll get the idea. It will keep polling the variable of one of your instances and you can see it change when that task is running, and then revert back to zero.
from time import sleep
from concurrent.futures import ProcessPoolExecutor
from multiprocessing import Manager
class SayHello:
def __init__(self, name, id):
self.name=name
self.id=id
self.status = Manager().Value("i",0)
#public func
def doSomething(self, arg1, arg2 ):
self.status.value = 1
sleep(5)
self.status.value = 0
class CallOperations:
def __init__(self):
self.dummydict={1: {"james":20, "peter":30, "victor":40, "john":45, "ali":21, "tom":41, "hector":37}, 2:{"james":23, "peter":31, "victor":44, "john":46, "ali":23, "tom":44, "hector":35} }
def runProcessors(self):
#runprocess
for _, v in self.dummydict.items():
Instances = [SayHello(g,b) for g ,b in v.items()]
f = Instances[3]
executor = ProcessPoolExecutor(max_workers=2)
future = [executor.submit(ins.doSomething, 2, 1235) for ins in Instances]
while True:
print(f.status.value)
# Insert break condition here
sleep(0.5)
executor.shutdown()
foo = CallOperations()
foo.runProcessors()
The problem with this is that it can lead to a race condition depending on what you do in your main program. If you want to do any operations on the instance when it is in passive state, it might progress to active just after you check the variable but before you have completed your actions in the main program.
Locks come to rescue here, as you can also create a shared lock Manager().Lock(). If your DoSomething() tries to acquire the lock and your main process does the same when operating on a passive instance, you avoid this problem. Of course your main program could then block the executor from processing if it reserves locks for lengthy operations, as then your two workers would be stuck waiting on locks if the execution processed to those instances where locks are being held by the main program. This case would not be suitable for parallel processing implemented using executors.
EDIT: if you are only interested in the running status, you can check the Future.running() status of your future objects, in this case items in your future array.
I am trying to flow through tasks pragmattically using custom viewswhich will be controlled by an api. I have figured out how to flow through a process if i pass in the a task object to a flow_function
task = Task.objects.get(id=task_id)
response = hello_world_approve(task, json=json, user=user)
#flow_func
def hello_world_approve(activation, **kwargs):
activation.process.approved = True
activation.process.save()
activation.prepare()
activation.done()
return activation
However i would like to be able to get the current task from the process object instead, like so
process = HelloWorldFlow.process_class.objects.get(id=task_id)
task = process.get_current_task()
Is this the way i should be going about this and is it possible or is there another approach i am missing?
It's available as activation.task
I have a loading widget that consists of two labels, one is the status label and the other one is the label that the animated gif will be shown in. If I call show() method before heavy stuff gets processed, the gif at the loading widget doesn't update itself at all. There's nothing wrong with the gif btw(looping problems etc.). The main code(caller) looks like this:
self.loadingwidget = LoadingWidgetForm()
self.setCentralWidget(self.loadingwidget)
self.loadingwidget.show()
...
...
heavy stuff
...
...
self.loadingwidget.hide()
The widget class:
class LoadingWidgetForm(QWidget, LoadingWidget):
def __init__(self, parent=None):
super().__init__(parent=parent)
self.setupUi(self)
self.setWindowFlags(self.windowFlags() | Qt.FramelessWindowHint)
self.setAttribute(Qt.WA_TranslucentBackground)
pince_directory = SysUtils.get_current_script_directory() # returns current working directory
self.movie = QMovie(pince_directory + "/media/loading_widget_gondola.gif", QByteArray())
self.label_Animated.setMovie(self.movie)
self.movie.setScaledSize(QSize(50, 50))
self.movie.setCacheMode(QMovie.CacheAll)
self.movie.setSpeed(100)
self.movie.start()
self.not_finished=True
self.update_thread = Thread(target=self.update_widget)
self.update_thread.daemon = True
def showEvent(self, QShowEvent):
QApplication.processEvents()
self.update_thread.start()
def hideEvent(self, QHideEvent):
self.not_finished = False
def update_widget(self):
while self.not_finished:
QApplication.processEvents()
As you see I tried to create a seperate thread to avoid workload but it didn't make any difference. Then I tried my luck with the QThread class by overriding the run() method but it also didn't work. But executing QApplication.processEvents() method inside of the heavy stuff works well. I also think I shouldn't be using seperate threads, I feel like there should be a more elegant way to do this. The widget looks like this btw:
Processing...
Full version of the gif:
Thanks in advance! Have a good day.
Edit: I can't move the heavy stuff to a different thread due to bugs in pexpect. Pexpect's spawn() method requires spawned object and any operations related with the spawned object to be in the same thread. I don't want to change the working flow of the whole program
In order to update GUI animations, the main Qt loop (located in the main GUI thread) has to be running and processing events. The Qt event loop can only process a single event at a time, however because handling these events typically takes a very short time control is returned rapidly to the loop. This allows the GUI updates (repaints, including animation etc.) to appear smooth.
A common example is having a button to initiate loading of a file. The button press creates an event which is handled, and passed off to your code (either via events directly, or via signals). Now the main thread is in your long-running code, and the event loop is stalled — and will stay stalled until the long-running job (e.g. file load) is complete.
You're correct that you can solve this with threads, but you've gone about it backwards. You want to put your long-running code in a thread (not your call to processEvents). In fact, calling (or interacting with) the GUI from another thread is a recipe for a crash.
The simplest way to work with threads is to use QRunner and QThreadPool. This allows for multiple execution threads. The following wall of code gives you a custom Worker class that makes it simple to handle this. I normally put this in a file threads.py to keep it out of the way:
import sys
from PyQt5.QtCore import QObject, QRunnable
class WorkerSignals(QObject):
'''
Defines the signals available from a running worker thread.
error
`tuple` (exctype, value, traceback.format_exc() )
result
`dict` data returned from processing
'''
finished = pyqtSignal()
error = pyqtSignal(tuple)
result = pyqtSignal(dict)
class Worker(QRunnable):
'''
Worker thread
Inherits from QRunnable to handler worker thread setup, signals and wrap-up.
:param callback: The function callback to run on this worker thread. Supplied args and
kwargs will be passed through to the runner.
:type callback: function
:param args: Arguments to pass to the callback function
:param kwargs: Keywords to pass to the callback function
'''
def __init__(self, fn, *args, **kwargs):
super(Worker, self).__init__()
# Store constructor arguments (re-used for processing)
self.fn = fn
self.args = args
self.kwargs = kwargs
self.signals = WorkerSignals()
#pyqtSlot()
def run(self):
'''
Initialise the runner function with passed args, kwargs.
'''
# Retrieve args/kwargs here; and fire processing using them
try:
result = self.fn(*self.args, **self.kwargs)
except:
traceback.print_exc()
exctype, value = sys.exc_info()[:2]
self.signals.error.emit((exctype, value, traceback.format_exc()))
else:
self.signals.result.emit(result) # Return the result of the processing
finally:
self.signals.finished.emit() # Done
To use the above, you need a QThreadPool to handle the threads. You only need to create this once, for example during application initialisation.
threadpool = QThreadPool()
Now, create a worker by passing in the Python function to execute:
from .threads import Worker # our custom worker Class
worker = Worker(fn=<Python function>) # create a Worker object
Now attach signals to get back the result, or be notified of an error:
worker.signals.error.connect(<Python function error handler>)
worker.signals.result.connect(<Python function result handler>)
Then, to execute this Worker, you can just pass it to the QThreadPool.
threadpool.start(worker)
Everything will take care of itself, with the result of the work returned to the connected signal... and the main GUI loop will be free to do it's thing!