I can implement Python multiprocessing with Spyder Windows PC, but why? - python-3.x

I'm so curious about this and need some advise about how can this happen? Yesterday I've tried to implement multiprocessing in Python script which is running on Spyder in Window PC. Here is the code I've first tried.
import multiprocessing
import time
start = time.perf_counter()
def do_something():
print('Sleeping 1 second...')
time.sleep(1)
print('Done sleeping')
p1 = multiprocessing.Process(target=do_something)
p2 = multiprocessing.Process(target=do_something)
p1.start()
p2.start()
p1.join()
p2.join()
finish = time.perf_counter()
print(f'Finished in {round(finish-start,2)} second(s)')
It's return an error.
AttributeError: Can't get attribute 'do_something' on <module '__main__' (built-in)
Then I search for survival from this problem and also my boss. And found this suggestion
Python's multiprocessing doesn't work in Spyder IDE
So I've followed it and installed Pycharm and try to run the code on PyCharm and it's seem to be work I didn't get AttributeError, however I got this new one instead of
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
I've googled again then finally I got this
RuntimeError on windows trying python multiprocessing
what I have to do is adding this one line
if __name__ == '__main__':
before starting multiprocessing.
import multiprocessing
import time
start = time.perf_counter()
def do_something():
print('Sleeping 1 second...')
time.sleep(1)
print('Done sleeping')
if __name__ == '__main__':
p1 = multiprocessing.Process(target=do_something)
p2 = multiprocessing.Process(target=do_something)
p1.start()
p2.start()
p1.join()
p2.join()
finish = time.perf_counter()
print(f'Finished in {round(finish-start,2)} second(s)')
And it's work now moreover, it's not working only on PyCharm, now I can run this code on Spyder too. So that is why I have so curious? how come Spyder also work? This is quite persist because I'm also run this code on my other PC which is Window server 2016 with Spyder , I'm also do something.
Anyone can help explain what happen here why it's work?
Thank you.

There's a lot to unpack here, so I'll just give a brief overview. There's also some missing information like how you have spyder/pycharm configured, and what operating system you use, so I'll have to make some assumptions...
Based on the error messages you are probably using MacOS or Windows which means the default way python creates a child process is called spawn. This means it will start a completely new process from the python executable ("python.exe" on windows for example). It will then send a message to the new process telling it what function to execute (target), and optionally what arguments to call that function with. The new process will have to import the main file to have access to that function however, so if you are running the python interpreter in interactive mode, there is no "main" file to import, and you get the first error message: AttributeError.
The second error is also related to the importing of the "main" file. When you import a file, it basically just runs the file like any other python script. If you were to create a new child process during import that child would then also create a new child when it imports the same file. You would end up recursively creating infinite child processes until the computer crashed, so python disallows creating additional child processes during the import phase of a child process hence the RuntimeError.

Related

Python multiprocessing manager showing error when used in flask API

I am pretty confused about the best way to do what I am trying to do.
What do I want?
API call to the flask application
Flask route starts 4-5 multiprocess using Process module and combine results(on a sliced pandas dataframe) using a shared Managers().list()
Return computed results back to the client.
My implementation:
pos_iter_list = get_chunking_iter_list(len(position_records), 10000)
manager = Manager()
data_dict = manager.list()
processes = []
for i in range(len(pos_iter_list) - 1):
temp_list = data_dict[pos_iter_list[i]:pos_iter_list[i + 1]]
p = Process(
target=transpose_dataset,
args=(temp_list, name_space, align_namespace, measure_master_id, df_searchable, products,
channels, all_cols, potential_col, adoption_col, final_segment, col_map, product_segments,
data_dict)
)
p.start()
processes.append(p)
for p in processes:
p.join()
My directory structure:
- main.py(flask entry point)
- helper.py(contains function where above code is executed & calls transpose_dataset function)
Error that i am getting while running the same?
RuntimeError: No root path can be found for the provided module "mp_main". This can happen because the module came from an import hook that does not provide file name information or because it's a namespace package. In this case the root path needs to be explicitly provided.
Not sure what went wong here, manager list works fine when called from a sample.py file using if __name__ == '__main__':
Update: The same piece of code is working fine on my MacBook and not on windows os.
A sample flask API call:
#app.route(PREFIX + "ping", methods=['GET'])
def ping():
man = mp.Manager()
data = man.list()
processes = []
for i in range(0,5):
pr = mp.Process(target=test_func, args=(data, i))
pr.start()
processes.append(pr)
for pr in processes:
pr.join()
return json.dumps(list(data))
Stack has an ongoing bug preventing me from commenting, so I'll just write up an answer..
Python has 2 (main) ways to start a new process: "spawn", and "fork". Fork is a system command only available in *nix (read: linux or macos), and therefore spawn is the only option in windows. After 3.8 spawn will be the default on MacOS, but fork is still available. The big difference is that fork basically makes a copy of the existing process while spawn starts a whole new process (like just opening a new cmd window). There's a lot of nuance to why and how, but in order to be able to run the function you want the child process to run using spawn, the child has to import the main file. Importing a file is tantamount to just executing that file and then typically binding its namespace to a variable: import flask will run the flask/__ini__.py file, and bind its global namespace to the variable flask. There's often code however that is only used by the main process, and doesn't need to be imported / executed in the child process. In some cases running that code again actually breaks things, so instead you need to prevent it from running outside of the main process. This is taken into account in that the "magic" variable __name__ is only equal to "__main__" in the main file (and not in child processes or when importing modules).
In your specific case, you're creating a new app = Flask(__name__), which does some amount of validation and checks before you ever run the server. It's one of these setup/validation steps that it's tripping over when run from the child process. Fixing it by not letting it run at all is imao the cleaner solution, but you can also fix it by giving it a value that it won't trip over, then just never start that secondary server (again by protecting it with if __name__ == "__main__":)

Python multiprocessing refuses to execute code and exits instantly

I was working on a larger piece of code and I kept getting an error when the program just seemed to end without doing anything. I narrowed down the problem and reproduced it below. As far as I understand the code should print the sentence and it seems to work on other online IDE's while failing on mine. Feel like I'm missing something super simple.
Failing On: IDLE Python 3.8.3 32-bit from python.org
Works On: onlinegdb.com
Code:
import multiprocessing
def x():
print("This is x func")
if __name__ == '__main__':
multiprocessing.freeze_support()
p = multiprocessing.Process(target=x)
p.start()
p.join()
Output:
>>>
I think the issue is IDLE just doesn't output stuff from stuff outside the main process. Need to use consoles which would output everything from the main and all other processes. Reference : Python multiprocessing example not working

Why Can't Jupyter Notebooks Handle Multiprocessing on Windows?

On my windows 10 machine (and seemingly other people's as well), Jupyter Notebook can't seem to handle some basic multiprocessing functions like pool.map(). I can't seem to figure out why this might be, even though a solution has been suggested to call the function to be mapped as a script from another file. My question, though is why does this not work? Is there a better way to do this kind of thing beyond saving in another file?
Note that the solution was suggested in a similar question here. But I'm left wondering why this bug occurs, and whether there is another easier fix. To show what goes wrong, I'm including below a very simple version that hangs on my computer where the same function runs with no problems when the built-in function map is used.
import multiprocessing as mp
# create a grid
iterable = [3, 5, 10]
def add_3(iterable):
a = iterable + 3
return a
# Below runs no problem
results = list(map(add_3, iterable))
print(results)
# multiprocessing attempt (hangs)
def main():
pool = mp.Pool(2)
results = pool.map(add_3, iterable)
return results
if __name__ == "__main__": #Required not to spawn deviant children
results = main()
Edit: I've just tried this in Spyder and I've managed to get it to work. Unsurprisingly running the following didn't work.
if __name__ == "__main__": #Required not to spawn deviant children
results = main()
print(results)
But running it as the following does work because map uses the yield command and isn't evaluated until called which gives the typical problem.
if __name__ == "__main__": #Required not to spawn deviant children
results = main()
print(results)
edit edit:
From what I've read on the issue, it turns out that the issue is largely because of the ipython shell that jupyter uses. I think there might be an issue setting name. Either way using spyder or a different ide solved the problem, as long as you're not still running the multiprocessing function in an iPython shell.
I faced a similar problem like this. I can't use multiprocessing with function on the same script. The solution that works is to put the function on different notebook file and import it use ipynb:
from ipynb.fs.full.script_name import function_name
pool = Pool()
result = pool.map(function_name,iterable_argument)

How to run PyQt5 GUIs in non-blocking threads?

I have a PyQt5 GUI class that I want to be able to create multiple instances of either from an interactive console or normal run. I need these GUIs to be non-blocking so that they can be used while subsequent code runs.
I've tried calling app.exec__() in separate threads for each GUI like this answer, but the program sometimes crashes as the comment on the answer warned it would:
Run pyQT GUI main app in seperate Thread
And now I'm trying to get the code below working which I made based on this answer:
Run Pyqt GUI main app as a separate, non-blocking process
But when I run it the windows pop and and immediately disappear
import sys
from PyQt5 import QtWidgets, QtGui, QtCore
import time
class MainWindow(QtWidgets.QWidget):
def __init__(self):
# call super class constructor
super(MainWindow, self).__init__()
# build the objects one by one
layout = QtWidgets.QVBoxLayout(self)
self.pb_load = QtWidgets.QPushButton('Load')
self.pb_clear= QtWidgets.QPushButton('Clear')
self.edit = QtWidgets.QTextEdit()
layout.addWidget(self.edit)
layout.addWidget(self.pb_load)
layout.addWidget(self.pb_clear)
# connect the callbacks to the push-buttons
self.pb_load.clicked.connect(self.callback_pb_load)
self.pb_clear.clicked.connect(self.callback_pb_clear)
def callback_pb_load(self):
self.edit.append('hello world')
def callback_pb_clear(self):
self.edit.clear()
def show():
app = QtWidgets.QApplication.instance()
if not app:
app = QtWidgets.QApplication(sys.argv)
win = MainWindow()
win.show()
if __name__ == '__main__':
show()
show()
EDIT - I don't see how this question is a duplicate. The 'duplicate' questions are only slightly related and don't provide solutions to my problem at all.
I want to be able to create multiple instances of a GUI (MainWindow in my example) by calling the show() function from either an interactive session or script, and I want those windows to stay on my screen while subsequent code is running.
EDIT2 - When I run the code as a script I can do what I want by using multiprocessing, see this demo:
https://www.screencast.com/t/5WvJNVSLm9OR
However I still need help because I want it to also work in interactive Python console sessions, and multiprocessing does not work in that case.
It isn't necessary to use separate threads or processes for this. You just need a way to maintain a reference to each new window when importing the script in a python interactive session. A simple list can be used for this. It is only necessary to explictly start an event-loop when running the script from the command-line; in an interactive session, it will be handled automatically by PyQt.
Here is an implementation of this approach:
...
_cache = []
def show(title=''):
if QtWidgets.QApplication.instance() is None:
_cache.append(QtWidgets.QApplication(sys.argv))
win = MainWindow()
win.setWindowTitle(title)
win.setAttribute(QtCore.Qt.WA_DeleteOnClose)
win.destroyed.connect(lambda: _cache.remove(win))
_cache.append(win)
win.show()
if __name__ == '__main__':
show('Foo')
show('Bar')
sys.exit(QtWidgets.QApplication.instance().exec_())
This is a minor addendum to #ekhumoro's answer. I don't have enough reputation to only add a comment so I had to write this as an answer.
#ekhumoro's answer almost fully answers #Esostack's question, but doesn't work in the Ipython console. After many hours of searching for the answer to this question myself, I came across a comment from #titusjan in a three year old thread (here) also responding to a good answer from #ekhumoro.
The missing part to #ekhumoro's answer which results in the gui windows freezing for Ipython specifically is that Ipython should be set to use the qt gui at launch or once running.
To make this work with Ipython:
Launch Ipython with ipython --gui=qt5
In a running Ipython console run the magic command %gui qt5
To fix it from a Python script you can run this function
def fix_ipython():
from IPython import get_ipython
ipython = get_ipython()
if ipython is not None:
ipython.magic("gui qt5")

Requests module crashes python when numpy is loaded and using process

Strange title I know, but it is exactly what I see. I am trying to run a requests (2.13.0) command from within a forked process (Mac OSX) using the multiprocessing module. I also happen to use numpy in my code (1.15.1) running on python 3.7. Here are my observations (see code below):
1) Without importing numpy: All works fine
2) Once I import numpy: Code crashes on starting of the forked process. Message given is:
objc[45539]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.
objc[45539]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
3) I could make it work again by calling a requests call from within the main process once before starting the new process (see commented section in code
4) On python 2.7, all seems to work fine in all cases above.
Sample minimal code to reproduce:
from multiprocessing import Process
import requests
import numpy # remove this import and it works fine on 3.7
def _worker():
full_url = "http://www.google.com"
result = requests.get(full_url)
print(result.text)
return 0
def run():
p=Process(target=_worker)
p.start()
p.join()
# Add these lines and the code works in 3.7 even with numpy imported
#try:
# requests.get('http://www.google.com')
#except:
# pass
run()
print('I am done')

Resources