Designing a system to centrally manage series of events on different systems

Designing a system to centrally manage series of events on different systems - python-3.x

I have a problem at work where I need to perform series of sequential tasks on different devices. These devices do not need to interact with each other and also each of the sequential tasks can be performed on each of the devices independently.
Assuming I have Tasks (A->B->C->D)(Ex: End of A triggers B and end of B triggers C and so on), Devices(dev1, dev2) can execute these tasks independent of each other.
How can I design a centralized system that executes each task on each device. I cannot use Threading or Multiprocessing due to Infra limitations.
I'm looking for some design suggestions(Classes) and How I can go about designing it.
First approach I thought about was brute force where I blindly use loops to loop over devices and perform each task.
Second approach I was reading about State Design Pattern and I was not sure how I can implement it.
EDIT: I have implemented the answer I have provided below. However I would like to know the correct way to transfer information between states. I know states needs to be mutually exclusive but each task needs to access certain resources that are common amongst all the resources. How can I structure this ?

I have used State design pattern to handle this. I have a Device class which is concrete class and have a method called "perform_task". This method changes behavior based on the state it is in. At a given point it can be in TaskA TaskB or etc.
class Device():
_state = None
def __init__(self):
"""Constructor method"""
self.switch_to(TaskA())
def switch_to(self, state):
self._state = state
self._state.context = self
def perform_task(self):
self._state.perform_task()
Then I have a State Abstract class which has abstract methods. Followed by State classes itself.
class State(ABC):
#property
def context(self):
return self._context
#context.setter
def context(self, context):
self._context = context
#abstractmethod
def perform_task(self):
pass
class TaskA():
def perform_task(self):
# Do something
self.context.switch_to(TaskB())
class TaskB():
def perform_task():
# Do something.
pass
Doing so we can extend this to any number of states in the future and handle new conditions too.

I probably try something with flask for super simple api and a client app on devices that "pool" data from center api and post results so center server know the progress and what is current used. client app would be super simple loop with sleep so it wont 100% cpu without needed.

Related

How do I retrieve data from a Django DB before sending off Celery task to remote worker?

I have a celery shared_task that is scheduled to run at certain intervals. Every time this task is run, it needs to first retrieve data from the Django DB in order to complete the calculation. This task may or may not be sent to a celery worker that is on a separate machine, so in the celery task I can't make any queries to a local celery database.
So far I have tried using signals to accomplish it, since I know that functions with the wrapper #before_task_publish are executed before the task is even published in the message queue. However, I don't know how I can actually get the data to the task.
#shared_task
def run_computation(data):
perform_computation(data)
#before_task_publish.connect
def receiver_before_task_publish(sender=None, headers=None, body=None, **kwargs):
data = create_data()
# How do I get the data to the task from here?
Is this the right way to approach this in the first place? Or would I be better off making an API route that the celery task can get to retrieve the data?

I'm posting the solution that worked for me, thanks for the help #solarissmoke.
What works best for me is utilizing Celery "chain" callback functions and separate RabbitMQ queues for designating what would be computed locally and what would be computed on the remote worker.
My solution looks something like this:
#app.task
def create_data_task():
# this creates the data to be passed to the analysis function
return create_data()
#app.task
def perform_computation_task(data):
# This performs the computation with given data
return perform_computation(data)
#app.task
def store_results(result):
# This would store the result in the DB here, but for now we just print it
print(result)
#app.task
def run_all_computation():
task = signature("path.to.tasks.create_data_task", queue="default") | signature("path.to.tasks.perform_computation_task", queue="remote_computation") | signature("path.to.tasks.store_results", queue="default")
task()
Its important to note here that these tasks were not run serially; they were in fact separate tasks that are run by the workers and therefore do not block a single thread. The other tasks are only activated by a callback function from the others. I declared two celery queues in RabbitMQ, a default one called default, and one specifically for remote computation called "remote_computation". This is described explicitly here including how to point celery workers at created queues.

It is possible to modify the task data in place, from the before_task_publish handler, so that it gets passed to the task. I will say upfront that there are many reasons why this is not a good idea:
#before_task_publish.connect
def receiver_before_task_publish(sender=None, headers=None, body=None, **kwargs):
data = create_data()
# Modify the body of the task data.
# Body is a tuple, the first entry of which is a tuple of arguments to the task.
# So we replace the first argument (data) with our own.
body[0][0] = data
# Alternatively modify the kwargs, which is a bit more explicit
body[1]['data'] = data
This works, but it should be obvious why it's risky and prone to breakage. Assuming you have control over the task call sites I think it would be better to drop the signal altogether and just have a simple function that does the work for you, i.e.,:
def create_task(data):
data = create_data()
run_computation.delay(data)
And then in your calling code, just call create_task(data) instead of calling the task directly (which is what you presumably do right now).

Asynchronous Communication between few 'loops'

I have 3 classes that represent nearly isolated processes that can be run concurrently (meant to be persistent, like 3 main() loops).
class DataProcess:
...
def runOnce(self):
...
class ComputeProcess:
...
def runOnce(self):
...
class OtherProcess:
...
def runOnce(self):
...
Here's the pattern I'm trying to achieve:
start various streams
start each process
allow each process to publish to any stream
allow each process to listen to any stream (at various points in it's loop) and behave accordingly (allow for interruption of it's current task or not, etc.)
For example one 'process' Listens for external data. Another process does computation on some of that data. The computation process might be busy for a while, so by the time it comes back to start and checks the stream, there may be many values that piled up. I don't want to just use a queue because, actually I don't want to be forced to process each one in order, I'd rather be able to implement logic like, "if there is one or multiple things waiting, just run your process one more time, otherwise go do this interruptible task while you wait for something to show up."
That's like a lot, right? So I was thinking of using an actor model until I discovered RxPy. I saw that a stream is like a subject
from reactivex.subject import BehaviorSubject
newData = BehaviorSubject()
newModel = BehaviorSubject()
then I thought I'd start 3 threads for each of my high level processes:
thread = threading.Thread(target=data)
threads = {'data': thread}
thread = threading.Thread(target=compute)
threads = {'compute': thread}
thread = threading.Thread(target=other)
threads = {'other': thread}
for thread in threads.values():
thread.start()
and I thought the functions of those threads should listen to the streams:
def data():
while True:
DataProcess().runOnce() # publishes to stream inside process
def compute():
def run():
ComuteProcess().runOnce()
newData.events.subscribe(run())
newModel.events.subscribe(run())
def other():
''' not done '''
ComuteProcess().runOnce()
Ok, so that's what I have so far. Is this pattern going to give me what I'm looking for?
Should I use threading in conjunction with rxpy or just use rxpy scheduler stuff to achieve concurrency? If so how?
I hope this question isn't too vague, I suppose I'm looking for the simplest framework where I can have a small number of computational-memory units (like objects because they have internal state) that communicate with each other and work in parallel (or concurrently). At the highest level I want to be able to treat these computational-memory units (which I've called processes above) as like individuals who mostly work on their own stuff but occasionally broadcast or send a message to a specific other individual, requesting information or providing information.
Am I perhaps actually looking for an actor model framework? or is this RxPy setup versatile enough to achieve that without extreme complexity?
Thanks so much!

Why does some widgets don't update on Qt5?

I am trying to create a PyQt5 application, where I have used certain labels for displaying status variables. To update them, I have implemented custom pyqtSignal manually. However, on debugging I find that the value of GUI QLabel have changed but the values don't get reflected on the main window.
Some answers suggested calling QApplication().processEvents() occasionally. However, this instantaneously crashes the application and also freezes the application.
Here's a sample code (all required libraries are imported, it's just the part creating problem, the actual code is huge):
from multiprocessing import Process
def sub(signal):
i = 0
while (True):
if (i % 5 == 0):
signal.update(i)
class CustomSignal(QObject):
signal = pyqtSignal(int)
def update(value):
self.signal.emit(value)
class MainApp(QWidget):
def __init__(self):
super().__init__()
self.label = QLabel("0");
self.customSignal = CustomSignal()
self.subp = Process(target=sub, args=(customSignal,))
self.subp.start()
self.customSignal.signal.connect(self.updateValue)
def updateValue(self, value):
print("old value", self.label.text())
self.label.setText(str(value))
print("new value", self.label.text())
The output of the print statements is as expected. However, the text in label does not change.
The update function in CustomSignal is called by some thread.
I've applied the same method to update progress bar which works fine.
Is there any other fix for this, other than processEvents()?
The OS is Ubuntu 16.04.

The key problem lies in the very concept behind the code.
Processes have their own address space, and don't share data with another processes, unless some inter-process communication algorithm is used. Perhaps, multithreading module was used instead of threading module to bring concurrency to avoid Python's GIL and speedup the program. However, subprocess has cannot access the data of parent process.
I have tested two solutions to this case, and they seem to work.
threading module: No matter threading in Python is inefficient due to GIL, but it's still sufficient to some extent for basic concurrency demands. Note the difference between concurrency and speedup.
QThread: Since you are using PyQt, there's isn't any issue in using QThread, which is a better option because it takes concurrency to multiple cores taking advantage of operating system's system call, rather than Python in the middle.

Try adding
self.label.repaint()
immediately after updating the text, like this:
self.label.setText(str(value))
self.label.repaint()

Should Observers be notified in separate threads each one?

I know it sounds heavy weight, but I'm trying to solve an hypothetical situation. Imagine you have N observers of some object. Each one interested in the object state. When applying the Observer Pattern the observable object tends to iterate through its observer list invoking the observer notify()|update() method.
Now imagine that a specific observer has a lot of work to do with the state of the observable object. That will slow down the last notification, for example.
So, in order to avoid slowing down notifications to all observers, one thing we can do is to notify the observer in a separate thread. In order for that to work, I suppose that a thread for each observer is needed. That is a painful overhead we are having in order to avoid the notification slow down caused by heavy work. Worst than slowing down if thread approach is used, is dead threads caused by infinite loops. It would be great reading experienced programmers for this one.
What people with years on design issues think?
Is this a problem without a substancial solution?
Is it a really bad idea? why?
Example
This is a vague example in order to demonstrate and, hopefully, clarify the basic idea that I don't even tested:
class Observable(object):
def __init__(self):
self.queues = {}
def addObserver(self, observer):
if not observer in self.queues:
self.queues[observer] = Queue()
ot = ObserverThread(observer, self.queues[observer])
ot.start()
def removeObserver(self, observer):
if observer in self.queues:
self.queues[observer].put('die')
del self.queues[observer]
def notifyObservers(self, state):
for queue in self.queues.values():
queue.put(state)
class ObserverThread(Thread):
def __init__(self, observer, queue):
self.observer = observer
self.queue = queue
def run(self):
running = True
while running:
state = self.queue.get()
if state == 'die':
running = False
else:
self.observer.stateChanged(state)

You're on the right track.
It is common for each observer to own its own input-queue and its own message handling thread (or better: the queue would own the thread, and the observer would own the queue). See Active object pattern.
There are some pitfalls however:
If you have 100's or 1000's of observers you may need to use a thread pool pattern
Note the you'll lose control over the order in which events are going to be processed (which observer handles the event first). This may be a non-issue, or may open a Pandora box of very-hard-to-detect bugs. It depends on your specific application.
You may have to deal with situations where observers are deleted before notifiers. This can be somewhat tricky to handle correctly.
You'll need to implement messages instead of calling functions. Message generation may require more resources, as you may need to allocate memory, copy objects, etc. You may even want to optimize by implementing a message pool for common message types (you may as well choose to implement a message factory that wrap such pools).
To further optimize, you'll probably like to generate one message and send it to all to observers (instead of generating many copies of the same message). You may need to use some reference counting mechanism for your messages.

Let each observer decide itself if its reaction is heavyweight, and if so, start a thread, or submit a task to a thread pool. Making notification in a separate thread is not a good solution: while freeing the observable object, it limits the processor power for notifications with single thread. If you do not trust your observers, then create a thread pool and for each notification, create a task and submit it to the pool.

In my opinion when you have a large no of Observers for an Observable, which do heavy processing, then the best thing to do is to have a notify() method in Observer.
Use of notify(): Just to set the dirty flag in the Observer to true. So whenever the Observer thread will find it appropriate it will query the Observable for the required updates.
And this would not require heavy processing on Observable side and shift the load to the Observer side.
Now it depends on the Observers when they have to Observe.

The answer of #Pathai is valid in a lot of cases.
One is that you are observing changes in a database. In many ways you can't reconstruct the final state from the snapshots alone, especially if your state is fetched as a complex query from the database, and the snapshot is an update to the database.
To implement it, I'd suggest using an Event object:
class Observer:
def __init__(self):
self.event = threading.Event()
# in observer:
while self.event.wait():
# do something
self.event.clear()
# in observable:
observer.event.set()

Is there a way to use cherrypy's Monitor to perform a single task and then stop?

I have a web application that requests a report that takes more than 10 minutes to run. Apart from improving that performance, I would for now prefer to set up a thread to run the report and mail it to the user, returning that decision message back to the user immediately.
I have been looking at cherrypy.process.plugins.Monitor, but I'm not clear if it is the correct choice (what to do with the frequency parameter?)

Monitor is not the correct choice; it's for running the same task repeatedly on a schedule. You're probably better off just calling threading.Thread(target=run_report).start(). You can then return 202 Accepted to the user, along with a URL for the client to watch the status and/or retrieve the newly-created report resource when it's ready.
The one caveat to that is that you might want your new thread to shut down gracefully when the cherrypy.engine stops. Have a look at the various plugins for examples of how to hook into the 'stop' channel on the bus. The other option would be to make your thread daemonic, if you don't care if it terminates abnormally.

Besides agreeing with fumanchu's answer, I would like to add that the frequency parameter is actually the period expressed in seconds.cherrypy.process.plugins.Monitor (the name is misleading).
Another possible solution could be having a monitor executed periodically, and a set of working computations which can be checked periodically for completion. The code would be something like
class Scheduler:
def __init__ (self):
self.lock = threading.Lock()
self.mon = Monitor(cherrypy.engine, check_computations, frequency=whatever)
self.mon.start()
self.computations = list() # on which we append stuff
def check_computations (self):
with self.lock:
for i in self.computations:
check(i) # Single check function
Caveats:
The computation time of check matters. You don't want to have workload on this perioic routine
Beware on how you use locks:
It is protecting the computations list;
If you access it (even indirectly) from with check your program gets into deadlock. This could be the case if you want to unsubscribe something from the computations list.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Designing a system to centrally manage series of events on different systems - python-3.x

I probably try something with flask for super simple api and a client app on devices that "pool" data from center api and post results so center server know the progress and what is current used. client app would be super simple loop with sleep so it wont 100% cpu without needed.

Related

How do I retrieve data from a Django DB before sending off Celery task to remote worker?

Asynchronous Communication between few 'loops'

Why does some widgets don't update on Qt5?

Should Observers be notified in separate threads each one?

Is there a way to use cherrypy's Monitor to perform a single task and then stop?

Categories

Resources