using dask distributed computing via jupyter notebook - python-3.x

I am seeing strange behavior from dask when using it from jupyter notebook. So I am initiating a local client and giving it a list of jobs to do. My real code is a bit complex so I am putting a simple example for you here:
from dask.distributed import Client
def inc(x):
return x + 1
if __name__ == '__main__':
c = Client()
futures = [c.submit(inc, i) for i in range(1,10)]
result = c.gather(futures)
print(len(result))
The problem is that, I realize that:
1. Dask initiates more than 9 processes for this example.
2. After the code has ran and it is done (nothing in the notebook is running), the processes created by dask are not killed (and the client is not shutdown). When I do a top, I can see all those processes still alive.
I saw in the documents that there is a client.close() option, but interestingly enough, such a functionality does not exist in 0.15.2.
The only time that the dask processes are killed, is when I stop the jupyter notebook. This issue is causing strange and unpredictable performance behavior. Is there anyway that the processes can get killed or the client shutdown when there is no code running on the notebook?

The default Client allows for optional parameters which are passed to LocalCluster (see the docs) and allow you to specify, for example, the number of processes you wish. Also, it is a context manager, which will close itself and end processes when you are done.
with Client(n_workers=2) as c:
futures = [c.submit(inc, i) for i in range(1,10)]
result = c.gather(futures)
print(len(result))
When this ends, the processes will be terminated.

Related

Python Multiprocessing GPU receiving SIGSEV meesage

I am looking to run 2 process at the same time. The processes use AI models. One of them is almost 1Gb. I have researched and seems that the best way is to use multiprocessing. This is a Linux server and it has 8 core CPU and one GPU. Due to the weight, I need GPU to process this files. archivo_diar is the path to the file and modelo is previously loaded. Code goes like this.
from multiprocessing import Process
def diariza(archivo_diar, pipeline):
dz = pipeline(archivo_diar, pipeline)
def transcribe_archivo(archivo_modelo, modelo):
resultado = modelo.transcribe(archivo_diar)
print(resultado)
p1 = Process(target= transcribe_archivo, args = (archivos_diar, modelo))
p1.start()
After p1.start() is run, I get the following message:
SIGSEGV received at time = 16766367473 on cpu 7*
PC: # 0x7fb2c29705 144 GOMP_pararallel
What I have found so far is that is it is a problem related to memory, but I have not seen any case related to multiprocessing. As I understand, This child process inherits memory from the main process and modelo which is the heavy file is already loaded in memory so it should not be the case.
As you can see, the 2 process (in the functions) are different, what I read is that in those cases the est approach is to use Pool. I also tried using pool like this:
pool = Pool (processes = 4)
pool.imap_unordered(transcribe_archivo, [archivo_diar, modelo]
And I got the following error.
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use 'spawn' start method.
I tried using
multiprocessing.set_start_method('spawn')
and when I do pool.join() it hangs forever.
Does anyone knows the reason of this?
Thanks.

Multiprocessing with Multiple Functions: Need to add a function to the pool from within another function

I am measuring the metrics of an encryption algorithm that I designed. I have declared 2 functions and a brief sample is as follows:
import sys, random, timeit, psutil, os, time
from multiprocessing import Process
from subprocess import check_output
pid=0
def cpuUsage():
global running
while pid == 0:
time.sleep(1)
running=true
p = psutil.Process(pid)
while running:
print(f'PID: {pid}\t|\tCPU Usage: {p.memory_info().rss/(1024*1024)} MB')
time.sleep(1)
def Encryption()
global pid, running
pid = os.getpid()
myList=[]
for i in range(1000):
myList.append(random.randint(-sys.maxsize,sys.maxsize)+random.random())
print('Now running timeit function for speed metrics.')
p1 = Process(target=metric_collector())
p1.start()
p1.join()
number=1000
unit='msec'
setup = '''
import homomorphic,random,sys,time,os,timeit
myList={myList}
'''
enc_code='''
for x in range(len(myList)):
myList[x] = encryptMethod(a, b, myList[x], d)
'''
dec_code='''
\nfor x in range(len(myList)):
myList[x] = decryptMethod(myList[x])
'''
time=timeit.timeit(setup=setup,
stmt=(enc_code+dec_code),
number=number)
running=False
print(f'''Average Time:\t\t\t {time/number*.0001} seconds
Total time for {number} Iters:\t\t\t {time} {unit}s
Total Encrypted/Decrypted Values:\t {number*len(myList)}''')
sys.exit()
if __name__ == '__main__':
print('Beginning Metric Evaluation\n...\n')
p2 = Process(target=Encryption())
p2.start()
p2.join()
I am sure there's an implementation error in my code, I'm just having trouble grabbing the PID for the encryption method and I am trying to make the overhead from other calls as minimal as possible so I can get an accurate reading of just the functionality of the methods being called by timeit. If you know a simpler implementation, please let me know. Trying to figure out how to measure all of the metrics has been killing me softly.
I've tried acquiring the pid a few different ways, but I only want to measure performance when timeit is run. Good chance I'll have to break this out separately and run it that way (instead of multiprocessing) to evaluate the function properly, I'm guessing.
There are at least three major problems with your code. The net result is that you are not actually doing any multiprocessing.
The first problem is here, and in a couple of other similar places:
p2 = Process(target=Encryption())
What this code passes to Process is not the function Encryption but the returned value from Encryption(). It is exactly the same as if you had written:
x = Encryption()
p2 = Process(target=x)
What you want is this:
p2 = Process(target=Encryption)
This code tells Python to create a new Process and execute the function Encryption() in that Process.
The second problem has to do with the way Python handles memory for Processes. Each Process lives in its own memory space. Each Process has its own local copy of global variables, so you cannot set a global variable in one Process and have another Process be aware of this change. There are mechanisms to handle this important situation, documented in the multiprocessing module. See the section titled "Sharing state between processes." The bottom line here is that you cannot simply set a global variable inside a Process and expect other Processes to see the change, as you are trying to do with pid. You have to use one of the approaches described in the documentation.
The third problem is this code pattern, which occurs for both p1 and p2.
p2 = Process(target=Encryption)
p2.start()
p2.join()
This tells Python to create a Process and to start it. Then you immediately wait for it to finish, which means that your current Process must stop at that point until the new Process is finished. You never allow two Processes to run at once, so there is no performance benefit. The only reason to use multiprocessing is to run two things at the same time, which you never do. You might as well not bother with multiprocessing at all since it is only making your life more difficult.
Finally I am not sure why you have decided to try to use multiprocessing in the first place. The functions that measure memory usage and execution time are almost certainly very fast, and I would expect them to be much faster than any method of synchronizing one Process to another. If you're worried about errors due to the time used by the diagnostic functions themselves, I doubt that you can make things better by multiprocessing. Why not just start with a simple program and see what results you get?

Threads will not close off after program completion

I have a script that receives temperature data via using requests. Since I had to make multiple requests (around 13000) I decided to explore the use of multi-threading which I am new at.
The programs work by grabbing longitude/latitude data from a csv file and then makes a request to retrieve the temperature data.
The problem that I am facing is that the script does not finish fully when the last temperature value is retrieved.
Here is the code. I have shortened so it is easy to see what I am doing:
num_threads = 16
q = Queue(maxsize=0)
def get_temp(q):
while not q.empty():
work = q.get()
if work is None:
break
## rest of my code here
q.task_done()
At main:
def main():
for o in range(num_threads):
logging.debug('Starting Thread %s', o)
worker = threading.Thread(target=get_temp, args=(q,))
worker.setDaemon(True)
worker.start()
logging.info("Main Thread Waiting")
q.join()
logging.info("Job complete!")
I do not see any errors on the console and temperature is being successfully being written to another file. I have a tried running a test csv file with only a few longitude/latitude references and the script seems to finish executing fine.
So is there a way of shedding light as to what might be happening in the background? I am using Python 3.7.3 on PyCharm 2019.1 on Linux Mint 19.1.
the .join() function waits for all threads to join before continuing to the next line

Multiprocessing hangs when applying function to pandas dataframe Python 3.7.1

I am trying to parallelize a function on my pandas dataframe and I'm running into an issue where it seems that the multiprocessing library is hanging. I am doing this all within a Jupyter notebook with myFunction() existing in a separate .py file. Can someone point out what I am doing wrong here?
Surprisingly, this piece of code has worked previously on my Windows 7 machine with the same version of python. I have just copied the file over to my Mac laptop.
I also use tqdm so I can monitor the progress, the behavior is the same with or without it.
#This function hands the multiprocessing
from multiprocessing import Pool, cpu_count
import numpy as np
import tqdm
def parallelize_dataframe(df, func):
num_partitions = cpu_count()*2 # number of partitions to split dataframe
num_cores = cpu_count() # number of cores on your machine
df_split = np.array_split(df, num_partitions)
pool = Pool(num_cores)
return pd.concat(list(tqdm.tqdm_notebook(pool.imap(func, df_split),total=num_partitions)))
#My function that I am applying to the dataframe is in another file
#myFunction retrieves a JSON from an API for each ID in myDF and converts it to a dataframe
from myFuctions import myFunction
#Code that calls the parallelize function
finalDF = parallelize_dataframe(myDF,myFunction)
The expected result is a concatenation of a list of dataframes that have been retrieved by myFunction(). This is worked in the past, but now the process seems to hang indefinitely without any error messages.
Q : Can someone point out what I am doing wrong here?
You just expected the MacOS to use the same mechanism for process-instantiations as the WinOS did in past.
The multiprocessing module does not do the same set of things on either of the supported O/S-es and even reported some methods to be dangerous and also had changed the default behaviour on MacOS- and Linux-based systems.
Next steps to try to move forward :
re-read how to do the explicit setup of the call-signatures in multiprocessing documentation ( avoid hidden dependency of the code-behaviour on "new" default values )
test if may avoid the cases where multiprocessing will spawn the full-copy of the python-interpreter process, that many times as you instruct ( memory allocations could soon get devastatingly large, if many replicas try to get instantiated beyond the localhost RAM-footprint, just due to a growing number of CPU-cores )
test if the "worker"-code is not computing intensive but rather network-remote API-call latency driven. In such a case asyncio/await decorated tools will help more with latency-masking than going into in the case of IO-latency dominated use-cases inefficient multiprocessing spawned and rather expensive full-copy concurrency of many python-processes (that just stay waitin for receiving remote-API answers ).
last but not least - performance-sensitive code best runs outside any mediating-ecosystem, like the interactivity-focused Jupyter-notebooks are.

python3 multiprocessing.Pool with maxtasksperchild=1 does not terminate

When using multiprocessing.Pool in python 3.6 or 3.7 with maxtasksperchild=1, I noticed that some processes spawned by the pool are hanging and do not quit, even though the callback to their tasks was already executed. As a result, Pool.join() will block forever, even though all tasks are finished. In the process tree, running but idle child processes can be seen. The problem does not occur if maxtasksperchild=None.
The problem seems to be related to what the callback precisely does. The docs point out that the callback "should return immediately", as it will block other threads managing the pool.
A minimal example to reproduce this behavior on my machine is as follows: (Give it a few tries or increase the number of tasks when it does not block forever.)
from multiprocessing import Pool
from os import getpid
from random import random
from time import sleep
def do_stuff():
pass
def cb(arg):
sleep(random()) # can be replaced with print('foo')
p = Pool(maxtasksperchild=1)
number_of_tasks = 100 # a value may depend on your machine -- for mine 20 is sufficient to trigger the behavior
for i in range(number_of_tasks):
p.apply_async(do_stuff, callback=cb)
p.close()
print("joining ... (this should take just seconds)")
print("use the following command to watch the process tree:")
print(" watch -n .2 pstree -at -p %i" % getpid())
p.join()
Contrary to what I expected, p.join() in the last line will block forever even though do_stuff and cb were both called 100 times.
I am aware that sleep(random()) is in violation of the docs, but is print() also taking 'too long'? The way the docs are written suggest that a non-blocking callback function is required for performance and efficiency and make not clear that a 'slow' callback function will break the pool entirely.
Is print() forbidden in any multiprocessing.Pool callback function? (How to replace that functionality? What is "returning immediately", what is not?)
If yes, should the python documentation be updated to make this clear?
If yes, is it good python practice to rely on "fast" execution of python threads? Does this violate the rule that one should not make assumptions on execution order of threads?
Should I report this to the python bug tracker?

Resources