I am new to Python. I have been trying to develop a GUI based tool to monitor a set of databases. I want to pull data with multiple threads to make the DB reads faster. I found that threads can be managed using threading class or concurrent.futures class or using queue. In my tool there will be frequent DB reads and GUI will updated accordingly. My question is - what will be best option to work with for threading ? And how to manage life cycle of threads ?
I tried few example provided in different websites with following results.
threads created using threading class are nicely updating the GUI. But I don't know how to manage 30 threads.
Threads created using concurrent.futures.ThreadPoolExecutor are managed by the class. But it is updating the GUI after all the threads complete their task.
The thing with python threading is that there isn't really a proper way to stop a thread without stopping the entire execution. I am guessing your using threading or _thread
What I would do is create a list and have each function access a certain index of the list.
Process ID = Item in list.
thread 0 would be checking item 0 in list "running".
an example using _thread
import _thread
running = []
def task(id):
global running
while running[id]:
#do something
#Create 5 tasks
for i in range(0,6):
running.append(True)
_thread.start_new_thread(task,(i,))
# Now lets stop tasks 2 and 4.
running[1] = False
running[3] = False
# After doing this the threads will end once code in while loop has finished
# To restart tasks 2 and 4
_thread.start_new_thread(task,(1,))
_thread.start_new_thread(task,(3,))
This is my rudimentary way of managing tasks.
It may or may not work for you.
I am not a professional. But it works.
Related
I would like to cache a large amount of data in a Flask application. Currently it runs on K8S pods with the following unicorn.ini
bind = "0.0.0.0:5000"
workers = 10
timeout = 900
preload_app = True
To avoid caching the same data in those 10 workers I would like to know if Python supports a way to multi-thread instead of multi-process. This would be very easy in Java but I am not sure if it is possible in Python. I know that you can share cache between Python instances using the file system or other methods. However it would be a lot simpler if it is all share in the same process space.
Edited:
There are couple post that suggested threads are supported in Python. This comment by Filipe Correia, or this answer in the same question.
Based on the above comment the Unicorn design document talks about workers and threads:
Since Gunicorn 19, a threads option can be used to process requests in multiple threads. Using threads assumes use of the gthread worker.
Based on how Java works, to shared some data among threads, I would need one worker and multiple threads. Based on this other link
I know it is possible. So I assume I can change my gunicorn configuration as follows:
bind = "0.0.0.0:5000"
workers = 1
threads = 10
timeout = 900
preload_app = True
This should give me 1 worker and 10 threads which should be able to process the same number of request as current configuration. However the question is: Would the cache still be instantiated once and shared among all the threads? How or where should I instantiate the cache to make sure is shared among all the threads.
would like to ... multi-thread instead of multi-process.
I'm not sure you really want that. Python is rather different from Java.
workers = 10
One way to read that is "ten cores", sure.
But another way is "wow, we get ten GILs!"
The global interpreter lock must be held
before the interpreter interprets a new bytecode instruction.
Ten interpreters offers significant parallelism,
executing ten instructions simultaneously.
Now, there are workloads dominated by async I/O, or where
the interpreter calls into a C extension to do the bulk of the work.
If a C thread can keep running, doing useful work
in the background, and the interpreter gathers the result later,
terrific. But that's not most workloads.
tl;dr: You probably want ten GILs, rather than just one.
To avoid caching the same data in those 10 workers
Right! That makes perfect sense.
Consider pushing the cache into a storage layer, or a daemon like Redis.
Or access memory-resident cache, in the context of your own process,
via mmap or shmat.
When running Flask under Gunicorn, you are certainly free
to set threads greater than 1,
though it's likely not what you want.
YMMV. Measure and see.
So I have a Python 3.7 program that uses Threading library to multiprocess tasks
def myFunc(stName,ndName,ltName):
##logic here
names = open('names.txt').read().splitlines() ## more than 30k name
for i in names:
processThread = threading.Thread(target=myFunc, args=(i,name2nd,lName,))
processThread.start()
time.sleep(0.4)
I have to open multiple windows to complete the tasks with different inputs, but eventually I ran into a very laggy situation where I cant even browse my OSX , I tried to use the multiprocessing library to solve the issue but unfortunately, multiprocessing seems not to be working correctly in OSX .
Anyone can advise ?
This behavior is to be expected. If myFunc is a CPU intensive task that takes time, you are potentially starting up to 30k threads doing this task which will use all the machine resources.
Another potential issue with your code is that Threads are expensive in term of memory (each thread uses 8MB of memory). Creating 30k threads would use up to 240GB of memory which your machine probably doesn't have, and will lead to an OutOfMemoryError.
Finally, another issue with that code is that your main routine is starting up all those threads, but not waiting for any of them to finish executing. This means that the last started threads will most likely not run until the end.
I would recommend using a ThreadPoolExecutor to solve all those issues:
from concurrent.futures.thread import ThreadPoolExecutor
def myFunc(stName,ndName,ltName):
##logic here
names = open('names.txt').read().splitlines() ## more than 30k name
num_workers = 8
with ThreadPoolExecutor(max_workers=num_workers) as executor:
for i in names:
executor.map(myFunc, (i, name2nd, lName))
You can play with num_workers to find a balance between amount of resources being used by this program and speed of execution that fits you.
I have a script, parts of which at some time able to run in parallel. Python 3.6.6
The goal is to decrease execution time at maximum.
One of the parts is connection to Redis, getting the data for two keys, pickle.loads for each and returning processed objects.
What’s the best solution for such a tasks?
I’ve tried Queue() already, but Queue.get_nowait() locks the script, and after {process}.join() it also stops execution even though the task is done. Using pool.map raises TypeError: can't pickle _thread.lock objects.
All I could achieve is parallel running of all parts but still cannot connect the results
cPickle.load() will release the GIL so you can use it in multiple threads easily. But cPickle.loads() will not, so don't use that.
Basically, put your data from Redis into a StringIO then cPickle.load() from there. Do this in multiple threads using concurrent.futures.ThreadPoolExecutor.
I'm trying to create a GUI for some of my Python scripts at work using PyQt5.
I'm interested in running a series of tasks on separate processes (not threads). I've been using the concurrent futures ProcessPoolExecutor to execute the jobs. I've tried using the iterators from concurrent.futures.as_completed() to update the value in my QProgressBar.
def join(self):
for fut in concurrent.futures.as_completed(self._tasks):
try:
self.results.put(fut.result())
self.dialogBox.setValue(self.results.qsize())
except concurrent.futures.CancelledError:
break
However, it seems like my method seems to block the gui even though the work is running on another processes.
Is it possible?
I'm learning python so I'm not expert.
I have 3 different scripts that basically do the same thing.
Each script attaches a consumer to a RabbitMQ queue and processes the queue.
I would like to build a wrapper to run these 3 scripts and build a deamon that start automatically with the system.
My wrapper also should have the logic for manage the errors and start a child process if one of the subprocesses dies and collect the output for each subprocess.
The structure is something like that:
main.py
|-->consumer_one.py
|-->consumer_two.py
|-->consumer_three.py
Could you suggest if exist a package that manages the process forking in a simple way?
Thank you so mutch
You may want to use the concurrent.future standard library module.
It is quite simple to use and very easy to manage
Here is a quick and dirty example:
from concurrent.futures import ProcessPoolExecutor
#import consumer_one
#import consumer_two
import a
import time
consumer_one, consumer_two = a,a
if __name__ == '__main__':
pool = ProcessPoolExecutor()
jobs = [pool.submit(module.start) for module in (consumer_one, consumer_two)]
print(jobs)
j1, j2 = jobs
print(j1.running())
while all(j.running() for j in jobs):
time.sleep(1)
print("all is well...")
print("some one has died!") #I guess now you can do something much more clever :)
pool.shutdown()
exit(1)
Read the docs for more info:
https://docs.python.org/3/library/concurrent.futures.html
After a few test, I think the best solution is to install the package http://supervisord.org/
In my scenario, I can manage easier the restart if the services die, also, I can manage differents logs for process and attach specifics event listeners.
Supervisor has a lot of good functions to manage asynchronous services.