Managing different Python scripts with different priority to each - python-3.x

I need to run different Python processes, in a certain order of priority.
Specifically, I have 3 processes, and I need them to work this way:
An object detection script, used to locate a person and their position. I need this one to run continuously at a high FPS;
another process that, once some conditions are met (when the person is present in the picture in the required position) starts taking screenshots of the image for a certain amount of time;
another script that analyzes the screenshots taken by the second one.
I wrote the 3 scripts already and they work fine, but the problem is that process 3 is particularly computationally demanding, and I don't want it to prevent processes 1 and 2 from running smoothly.
My idea is that I could give highest priority to process 1, and send screenshots taken by process 2...to a queue, or something like this.
When the person is not detected in the picture, I could run process 3, and empty the queue as the screenshots are analyzed. However, script 3 should still run with limited resources, so that FPS of script 1 isn't affected too much, and it can still detect if the person enters the picture again.
I'm afraid this might all be a little vague, but could you please suggest me a way or tool I could use to manage the processes this way?
So far, I tried simply saving the screenshots to a folder, but I don't know how to limit the resources usage by process 3.
I'm familiar with the basic usage of Docker, so I was thinking that maybe I could:
run the processes in different containers, limiting resources allocated to the 3rd one (?);
use a message broker (Kafka, RabbitMQ?) to store screenshots;
but again, I'm a newbie when it comes to this stuff (speaking of which, I hope I tagged this question correctly), so I don't know if it's an efficient way to to do this (or if it can be done this way, for that matter).

Related

Sleep() Methods and OS - Scheduler (Camunda/Groovy)

I got a question for you guys and its not as specific as usual, which could make it a little annoying to answer.
The tool i'm working with is Camunda in combination with Groovy scripts and the goal is to reduce the maximum cpu load (or peak load). I'm doing this by "stretching" the work load over a certain time frame since the platform seems to be unhappy with huge work load inputs in a short amount of time. The resulting problem is that Camunda wont react smoothly when someone tries to operate it at the UI - Level.
So i wrote a small script which basically just lets each individual process determine his own "time to sleep" before running, if a certain threshold is exceeded. This is based on how many processes are trying to run at the same time as the individual process.
It looks like:
Process wants to start -> Process asks how many other processes are running ->
waitingTime = numberOfProcesses * timeToSleep * iterationOfMeasures
CPU-Usage Curve 1,3 without the Script. Curve 2,4 With the script
Testing it i saw that i could stretch the work load and smoothe out the UI - Levels. But now i need to describe why this is working exactly.
The Questions are:
What does a sleep method do exactly ?
What does the sleep method do on CPU - Level?
How does an OS-Scheduler react to a Sleep Method?
Namely: Does the scheduler reschedule or just simply "wait" for the time given?
How can i recreate and test the question given above?
The main goal is not for you to answer this, but could you give me a hint for finding the right Literature to answer these questions? Maybe you remember a book which helped you understand this kind of things or a Professor recommended something to you. (Mine wont answer, and i cant blame him)
I'm grateful for hints and or recommendations !
i'm sure you could use timer event
https://docs.camunda.org/manual/7.15/reference/bpmn20/events/timer-events/
it allows to postpone next task trigger for some time defined by expression.
about sleep in java/groovy: https://www.javamex.com/tutorials/threads/sleep.shtml
using sleep is blocking current thread in groovy/java/camunda.
so instead of doing something effective it's just blocked.

Node.Js - multiple workers on 1 cpu? (Cluster)

The situation is as following:
I have a node.js server with a script which takes pretty long before it finishes.
The script is getting an ID, looks up in a database which pictures belongs to this ID, and then it cache's the images and once all images are cached, it finishes.
Now the problem is that its possible there are 2 or more people at the same time using this feature. And once there are multiple people trying to get all these images, the images are combined to eachother and person A gets the pictures of person A + B. and also person B gets the pictures of A+B.
Now i know that a worker require's 1 cpu. i edited this so i can have multiple workers on 1 CPU. But they only switch from workers when the CPU usage is really high.
I want to switch workers when someone is already busy with getting these images, and someone else is trying to also get his/her images. (which are different for every person.)
How can this be done? Because the cluster only switches workers when the CPU usage is high. Or did i understand this incorrectly?
The clustering is not made for that.
You use clusters to avoid situations where one core is 100% busy while other cores are barely doing anything - like this:
You have a problem with improperly handling concurrent requests in your code and clustering will not solve that. Even if you have a cluster of 1000 workers there can still be situation when you get 1001 requests and all bets are off.
Working with Node you always have to take into account concurrency because if you don't you will not be able to use a simple solution like add clustering to solve the problems.
You didn't show even a single line of code so it's impossible to tell you what's wrong with it, but there is clearly a problem with improper request handling. Maybe you use global variables? Maybe you store some state in the wrong scope? The situation that you describe should never happen in any Node application, and the solution you're asking about would not solve it anyway. You need to fix your code.

Designing concurrency in a Python program

I'm designing a large-scale project, and I think I see a way I could drastically improve performance by taking advantage of multiple cores. However, I have zero experience with multiprocessing, and I'm a little concerned that my ideas might not be good ones.
Idea
The program is a video game that procedurally generates massive amounts of content. Since there's far too much to generate all at once, the program instead tries to generate what it needs as or slightly before it needs it, and expends a large amount of effort trying to predict what it will need in the near future and how near that future is. The entire program, therefore, is built around a task scheduler, which gets passed function objects with bits of metadata attached to help determine what order they should be processed in and calls them in that order.
Motivation
It seems to be like it ought to be easy to make these functions execute concurrently in their own processes. But looking at the documentation for the multiprocessing modules makes me reconsider- there doesn't seem to be any simple way to share large data structures between threads. I can't help but imagine this is intentional.
Questions
So I suppose the fundamental questions I need to know the answers to are thus:
Is there any practical way to allow multiple threads to access the same list/dict/etc... for both reading and writing at the same time? Can I just launch multiple instances of my star generator, give it access to the dict that holds all the stars, and have new objects appear to just pop into existence in the dict from the perspective of other threads (that is, I wouldn't have to explicitly grab the star from the process that made it; I'd just pull it out of the dict as if the main thread had put it there itself).
If not, is there any practical way to allow multiple threads to read the same data structure at the same time, but feed their resultant data back to a main thread to be rolled into that same data structure safely?
Would this design work even if I ensured that no two concurrent functions tried to access the same data structure at the same time, either for reading or for writing?
Can data structures be inherently shared between processes at all, or do I always explicitly have to send data from one process to another as I would with processes communicating over a TCP stream? I know there are objects that abstract away that sort of thing, but I'm asking if it can be done away with entirely; have the object each thread is looking at actually be the same block of memory.
How flexible are the objects that the modules provide to abstract away the communication between processes? Can I use them as a drop-in replacement for data structures used in existing code and not notice any differences? If I do such a thing, would it cause an unmanageable amount of overhead?
Sorry for my naivete, but I don't have a formal computer science education (at least, not yet) and I've never worked with concurrent systems before. Is the idea I'm trying to implement here even remotely practical, or would any solution that allows me to transparently execute arbitrary functions concurrently cause so much overhead that I'd be better off doing everything in one thread?
Example
For maximum clarity, here's an example of how I imagine the system would work:
The UI module has been instructed by the player to move the view over to a certain area of space. It informs the content management module of this, and asks it to make sure that all of the stars the player can currently click on are fully generated and ready to be clicked on.
The content management module checks and sees that a couple of the stars the UI is saying the player could potentially try to interact with have not, in fact, had the details that would show upon click generated yet. It produces a number of Task objects containing the methods of those stars that, when called, will generate the necessary data. It also adds some metadata to these task objects, assuming (possibly based on further information collected from the UI module) that it will be 0.1 seconds before the player tries to click anything, and that stars whose icons are closest to the cursor have the greatest chance of being clicked on and should therefore be requested for a time slightly sooner than the stars further from the cursor. It then adds these objects to the scheduler queue.
The scheduler quickly sorts its queue by how soon each task needs to be done, then pops the first task object off the queue, makes a new process from the function it contains, and then thinks no more about that process, instead just popping another task off the queue and stuffing it into a process too, then the next one, then the next one...
Meanwhile, the new process executes, stores the data it generates on the star object it is a method of, and terminates when it gets to the return statement.
The UI then registers that the player has indeed clicked on a star now, and looks up the data it needs to display on the star object whose representative sprite has been clicked. If the data is there, it displays it; if it isn't, the UI displays a message asking the player to wait and continues repeatedly trying to access the necessary attributes of the star object until it succeeds.
Even though your problem seems very complicated, there is a very easy solution. You can hide away all the complicated stuff of sharing you objects across processes using a proxy.
The basic idea is that you create some manager that manages all your objects that should be shared across processes. This manager then creates its own process where it waits that some other process instructs it to change the object. But enough said. It looks like this:
import multiprocessing as m
manager = m.Manager()
starsdict = manager.dict()
process = Process(target=yourfunction, args=(starsdict,))
process.run()
The object stored in starsdict is not the real dict. instead it sends all changes and requests, you do with it, to its manager. This is called a "proxy", it has almost exactly the same API as the object it mimics. These proxies are pickleable, so you can pass as arguments to functions in new processes (like shown above) or send them through queues.
You can read more about this in the documentation.
I don't know how proxies react if two processes are accessing them simultaneously. Since they're made for parallelism I guess they should be safe, even though I heard they're not. It would be best if you test this yourself or look for it in the documentation.

Waiting on many parallel shell commands with Perl

Concise-ish problem explanation:
I'd like to be able to run multiple (we'll say a few hundred) shell commands, each of which starts a long running process and blocks for hours or days with at most a line or two of output (this command is simply a job submission to a cluster). This blocking is helpful so I can know exactly when each finishes, because I'd like to investigate each result and possibly re-run each multiple times in case they fail. My program will act as a sort of controller for these programs.
for all commands in parallel {
submit_job_and_wait()
tries = 1
while ! job_was_successful and tries < 3{
resubmit_with_extra_memory_and_wait()
tries++
}
}
What I've tried/investigated:
I was so far thinking it would be best to create a thread for each submission which just blocks waiting for input. There is enough memory for quite a few waiting threads. But from what I've read, perl threads are closer to duplicate processes than in other languages, so creating hundreds of them is not feasible (nor does it feel right).
There also seem to be a variety of event-loop-ish cooperative systems like AnyEvent and Coro, but these seem to require you to rely on asynchronous libraries, otherwise you can't really do anything concurrently. I can't figure out how to make multiple shell commands with it. I've tried using AnyEvent::Util::run_cmd, but after I submit multiple commands, I have to specify the order in which I want to wait for them. I don't know in advance how long each submission will take, so I can't recv without sometimes getting very unlucky. This isn't really parallel.
my $cv1 = run_cmd("qsub -sync y 'sleep $RANDOM'");
my $cv2 = run_cmd("qsub -sync y 'sleep $RANDOM'");
# Now should I $cv1->recv first or $cv2->recv? Who knows!
# Out of 100 submissions, I may have to wait on the longest one before processing any.
My understanding of AnyEvent and friends may be wrong, so please correct me if so. :)
The other option is to run the job submission in its non-blocking form and have it communicate its completion back to my process, but the inter-process communication required to accomplish and coordinate this across different machines daunts me a little. I'm hoping to find a local solution before resorting to that.
Is there a solution I've overlooked?
You could rather use Scientific Workflow software such as fireworks or pegasus which are designed to help scientists submit large numbers of computing jobs to shared or dedicated resources. But they can also do much more so it might be overkill for your problem, but they are still worth having a look at.
If your goal is to try and find the tightest memory requirements for you job, you could also simply submit your job with a large amount or requested memory, and then extract actual memory usage from accounting (qacct), or , cluster policy permitting, logging on the compute node(s) where your job is running and view the memory usage with top or ps.

wxpython using gauge pulse with threaded long running processes

The program I am developing uses threads to deal with long running processes. I want to be able to use Gauge Pulse to show the user that whilst a long running thread is in progress, something is actually taking place. Otherwise visually nothing will happen for quite some time when processing large files & the user might think that the program is doing nothing.
I have placed a guage within the status bar of the program. My problem is this. I am having problems when trying to call gauge pulse, no matter where I place the code it either runs to fast then halts, or runs at the correct speed for a few seconds then halts.
I've tried placing the one line of code below into the thread itself. I have also tried create another thread from within the long running process thread to call the code below. I still get the same sort of problems.
I do not think that I could use wx.CallAfter as this would defeat the point. Pulse needs to be called whilst process is running, not after the fact. Also tried usin time.sleep(2) which is also not good as it slows the process down, which is something I want to avoid. Even when using time.sleep(2) I still had the same problems.
Any help would be massively appreciated!
progress_bar.Pulse()
You will need to find someway to send update requests to the main GUI from your thread during the long running process. For example, if you were downloading a very large file using a thread, you would download it in chunks and after each chunk is complete, you would send an update to the GUI.
If you are running something that doesn't really allow chunks, such as creating a large PDF with fop, then I suppose you could use a wx.Timer() that just tells the gauge to pulse every so often. Then when the thread finishes, it would send a message to stop the timer object from updating the gauge.
The former is best for showing progress while the latter works if you just want to show the user that your app is doing something. See also
http://wiki.wxpython.org/LongRunningTasks
http://www.blog.pythonlibrary.org/2010/05/22/wxpython-and-threads/
http://www.blog.pythonlibrary.org/2013/09/04/wxpython-how-to-update-a-progress-bar-from-a-thread/

Resources