Python script with Multithreading was killed due to out of memory - python-3.x

I have used threading module to process the data faster. The python program is killed as the memory usage keeps on increasing as the time goes. Here is the simple example to reproduce the issue. What is wrong with this code? Where the memory leakage is happening. Thanks for your help.
import threading
import time
def f1():return
def f2():
for i in (1,300):
t = threading.Thread(target=f1)
t.start()
return
def main():
while True:
for i in range(1,200):
t = threading.Thread(target=f2)
t.start()
time.sleep(0.5)
if __name__ == '__main__':
main()

You've got threads creating threads??? and 200 threads each creating 300 threads, for a total of 60,000 threads?
I think any machine will likely run out of memory trying to do this.
Your code has no memory leak, and there is nothing 'wrong' with it, except what you are trying to do is just, well, completely wrong.
So perhaps you should explain a bit of background about what you're trying to achieve and why.

Related

Python: running many subprocesses from different threads is slow

I have a program with 1 process that starts a lot of threads.
Each thread might use subprocess.Popen to run some command.
I see that the time to run the command increases with the number of threads.
Example:
>>> def foo():
... s = time.time()
... subprocess.Popen('ip link show'.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True).communicate()
... print(time.time() - s)
...
>>> foo()
0.028950929641723633
>>> [threading.Thread(target=foo).start() for _ in range(10)]
0.058995723724365234
0.07323050498962402
0.09158825874328613
0.11541390419006348 # !!!
0.08147192001342773
0.05238771438598633
0.0950784683227539
0.10175108909606934 # !!!
0.09703755378723145
0.06497764587402344
Is there another way of executing a lot of commands from single process in parallel that doesn't decrease the performance?
Python's threads are, of course, concurrent, but they do not really run in parallel because of the GIL. Therefore, they are not suitable for CPU-bound applications. If you need to truly parallelize something and allow it to run on all CPU cores, you will need to use multiple processes. Here is a nice answer discussing this in more detail: What are the differences between the threading and multiprocessing modules?.
For the above example, multiprocessing.pool may be a good choice (note that there is also a ThreadPool available in this module).
from multiprocessing.pool import Pool
import subprocess
import time
def foo(*args):
s = time.time()
subprocess.Popen('ip link show'.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True).communicate()
return time.time() - s
if __name__ == "__main__":
with Pool(10) as p:
result = p.map(foo, range(10))
print(result)
# [0.018695592880249023, 0.009021520614624023, 0.01150059700012207, 0.02113938331604004, 0.014114856719970703, 0.01342153549194336, 0.011168956756591797, 0.014746427536010742, 0.013572454452514648, 0.008752584457397461]
result = p.map_async(foo, range(10))
print(result.get())
# [0.00636744499206543, 0.011589527130126953, 0.010645389556884766, 0.0070612430572509766, 0.013571739196777344, 0.009610414505004883, 0.007040739059448242, 0.010993719100952148, 0.012415409088134766, 0.0070383548736572266]
However, if your function is similar to the example in that it mostly just launches other processes and doesn't do a lot of calculations - I doubt parallelizing it will make much of a difference because the subprocesses can already run in parallel. Perhaps the slowdown occurs because your whole system gets overwhelmed for a moment because of all those processes (could be CPU usage is high or too many disk reads/writes are attempted within a short time). I would suggest taking a close look at system resources (Task Manager etc.) while running the program.
maybe it has nothing to do with python: Opening a new shell = opening a new file since basically everything is a file on linux
take a look at your limit for open files with this command (default is 1024):
ulimit
and try to raise it with this command to see if your code gets faster :
ulimit -n 2048

How to get more events running with asyncio

I am currently learning asynchronous programming and wrote this program that will send requests asynchronously with python 3 asyncio.
When running it, my program is not that fast, I was trying to figure out how to do better.
To find the number of events running I thought about checking my kernel task thread number in the Activity Monitor. It appears I am only running 222 threads for a total of 2% of he CPU.
Is there a way to max the thread count ?
Can I make it faster by having a cleaner code ? As seen below my code is working but kind of hacky.
import asyncio
import requests
def main():
loop = asyncio.get_event_loop()
for i in enumerate(list):
f = loop.run_in_executor(make_request())
if i == end:
response = yield from f
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
Thank you.

Generator function causing exceptions to be caught after all processes complete

I wrote this short POC to help understand the issue I am having with the hope that someone can explain to me what is going on and how I can fix it and/or make it more efficient.
My goal of using iterators, itertools and generators is because I didn't want to store a huge list in memory, as I scale up the list will become unmanageable and I didn't want to have to loop over the entire list to do something every single time. Note, I am fairly new to the idea of generators, iterators and multiprocessing and wrote this code today, so, if you can clearly tell I am miss understanding the workflow on how these things are suppose to work, please educate me and help make my code better.
You should be able to run the code as is and see the problem I am facing. I am expecting as soon as the exception is caught, it gets raised and the script dies, but what I see is happening, the exception get caught but the other processes continue.
If I comment out the generateRange generator and create a dummy list and pass it into futures = (map(executor.submit, itertools.repeat(execute), mylist)), the exception does get caught and exits the script as intended.
My guess is, the generator/iterator has to complete generating the range before the script can die, which, to my understanding was not suppose to be the case.
The reason I opted in using a generator function/iterators was because you can access them the objects only when they are needed.
Is there a way for me to stop the generator from continuing and let the exception be raised appropriately.
Here is my POC:
import concurrent.futures
PRIMES = [0]*80
import time
def is_prime(n):
print("Enter")
time.sleep(5)
print("End")
1/0
child = []
def main():
with concurrent.futures.ProcessPoolExecutor(max_workers=1) as executor:
for i in PRIMES:
child.append(executor.submit(is_prime, i))
for future in concurrent.futures.as_completed(child):
if future.exception() is not None:
print("Throw an exception")
raise future.exception()
if __name__ == '__main__':
main()
EDIT: I updated the POC with something simpler.
It is not possible to cancel running futures immediately, but this at least makes it so only a few processes are run after the exception is raised:
import concurrent.futures
PRIMES = [0]*80
import time
def is_prime(n):
print("Enter")
time.sleep(5)
print("End")
1/0
child = []
def main():
with concurrent.futures.ProcessPoolExecutor(max_workers=1) as executor:
for i in PRIMES:
child.append(executor.submit(is_prime, i))
for future in concurrent.futures.as_completed(child):
if future.exception() is not None:
for fut in child:
fut.cancel()
print("Throw an exception")
raise future.exception()
if __name__ == '__main__':
main()

watchdog with Pool in python 3

I have a simple watchdog in python 3 that reboots my server if something goes wrong:
import time, os
from multiprocessing import Pool
def watchdog(x):
time.sleep(x)
os.system('reboot')
return
def main():
while True:
p = Pool(processes=1)
p.apply_async(watchdog, (60, )) # start watchdog with 60s interval
# here some code thas has a little chance to block permanently...
# reboot is ok because of many other files running independently
# that will get problems too if this one blocks too long and
# this will reset all together and autostart everything back
# block is happening 1-2 time a month, mostly within a http-request
p.terminate()
p.join()
return
if __name__ == '__main__':
main()
p = Pool(processes=1) is declared every time the while loop starts.
Now here the question: Is there any smarter way?
If I p.terminate() to prevent the process from reboot, the Pool becomes closed for any other work. Or is there even nothing wrong with declaring a new Pool every time because of garbage collection.
Use a process. Processes support all of the features you are using, so you don't need to make a pool with size one. While processes do have a warning about using the terminate() method (since it can corrupt pipes, sockets, and locking primitives), you are not using any of those items and don't need to care. (In any event, Pool.terminate() probably has the same issues with pipes etc. even though it lacks a similar warning.)

Python threads making them take 100% of my cpu

Python threads making them take 100% of my cpu
All,
How do I make this script take 100% of my cpu? If this post is bad please explain why! Any help would be much appreciated.
import threading
import sys
def isPrime(number):
# isPrime is to check if a int is Prime
if not isinstance(number,int):
#check if number is a int
raise Exception("Please enter a int. Function: isPrime.")
#create array of numbers to check
rangeOfNumbers=range(1,number+1,1)
#count of how many multiplacations if it is a prime number it would be 2
multiplicationCount=0
#tow for loops to loop through all possibilities
for n1 in rangeOfNumbers:
for n2 in rangeOfNumbers:
if (n1*n2==number):
multiplicationCount +=1
if (multiplicationCount==2):
print(number)
return True
else:
return False
if __name__ == "__main__":
if not sys.version_info[0] == 3:
raise Exception("Please Upgrade or Downgrade your python to python 3.")
number=0
while True:
threads=[]
for i in range(100):
number+=1
thread=threading.Thread(target=isPrime,args=[number])
thread.start()
threads=[]
threads.append(thread)
for thread in threads:
thread.join()
isPrime not does no IO or other operation that could potentially relinquish the CPU (except print). It therefore consumes 100% of one CPU core. Since enough such jobs are kicked of, measurable CPU usage should stay at about 100% of one core. Note that, since Python has the additional limitation that only one thread can execute bytecode at the same time (referred to as the Global Interpreter Lock), no parallelism is achieved here.
Look into Python's multiprocessing module to achieve real concurrency. It spawns off new Python processes, thus allowing multiple primality tests to execute at the same time.
Lastly, your code does not properly wait for all threads
while True:
threads=[]
for i in range(100):
number+=1
thread=threading.Thread(target=isPrime,args=[number])
thread.start()
threads=[] # threads being reset here!
threads.append(thread)
for thread in threads:
thread.join()
(This is probably not intentional). It means that you keep creating threads in an infinite loop, but only wait for one of them to finish. This is going to run you out of memory at some point. It would be much more catastrophic if Python had real threading, though ...

Resources