Requests module crashes python when numpy is loaded and using process - python-3.x

Strange title I know, but it is exactly what I see. I am trying to run a requests (2.13.0) command from within a forked process (Mac OSX) using the multiprocessing module. I also happen to use numpy in my code (1.15.1) running on python 3.7. Here are my observations (see code below):
1) Without importing numpy: All works fine
2) Once I import numpy: Code crashes on starting of the forked process. Message given is:
objc[45539]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.
objc[45539]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
3) I could make it work again by calling a requests call from within the main process once before starting the new process (see commented section in code
4) On python 2.7, all seems to work fine in all cases above.
Sample minimal code to reproduce:
from multiprocessing import Process
import requests
import numpy # remove this import and it works fine on 3.7
def _worker():
full_url = "http://www.google.com"
result = requests.get(full_url)
print(result.text)
return 0
def run():
p=Process(target=_worker)
p.start()
p.join()
# Add these lines and the code works in 3.7 even with numpy imported
#try:
# requests.get('http://www.google.com')
#except:
# pass
run()
print('I am done')

Related

How to force os.stat re-read file stats by same path

I have a code that is architecturally close to posted below (unfortunately i can't post full version cause it's proprietary). I have an self-updating executable and i'm trying to test this feature. We assume that full path to this file will be in A.some_path after executing input. My problem is that assertion failed, because on second call os.stat still returning the previous file stats (i suppose it thinks that nothing could changed so it's unnecessary). I have tried to launch this manually and self-updating works completely fine and the file is really removing and recreating with stats changing. Is there any guaranteed way to force os.stat re-read file stats by the same path, or alternative option to make it works (except recreating an A object)?
from pathlib import Path
import unittest
import os
class A:
some_path = Path()
def __init__(self, _some_path):
self.some_path = Path(_some_path)
def get_path(self):
return self.some_path
class TestKit(unittest.TestCase):
def setUp(self):
pass
def check_body(self, a):
some_path = a.get_path()
modification_time = os.stat(some_path).st_mtime
# Launching self-updating executable
self.assertTrue(modification_time < os.stat(some_path).st_mtime)
def check(self):
a = A(input('Enter the file path\n'))
self.check_body(a)
def Tests():
suite = unittest.TestSuite()
suite.addTest(TestKit('check'))
return suite
def main():
tests_suite = Tests()
unittest.TextTestRunner().run(tests_suite)
if __name__ == "__main__":
main()
I have found the origins of the problem: i've tried to launch self-updating via os.system which wait till the process done. But first: during the self-updating we launch several detached proccesses and actually should wait unitl all them have ended, and the second: even the signal that the proccess ends doesn't mean that OS really completely realease the file, and looks like on assertTrue we are not yet done with all our routines. For my task i simply used sleep, but normal solution should analyze the existing proccesses in the system and wait for them to finish, or at least there should be several attempts with awaiting.

Python multiprocessing manager showing error when used in flask API

I am pretty confused about the best way to do what I am trying to do.
What do I want?
API call to the flask application
Flask route starts 4-5 multiprocess using Process module and combine results(on a sliced pandas dataframe) using a shared Managers().list()
Return computed results back to the client.
My implementation:
pos_iter_list = get_chunking_iter_list(len(position_records), 10000)
manager = Manager()
data_dict = manager.list()
processes = []
for i in range(len(pos_iter_list) - 1):
temp_list = data_dict[pos_iter_list[i]:pos_iter_list[i + 1]]
p = Process(
target=transpose_dataset,
args=(temp_list, name_space, align_namespace, measure_master_id, df_searchable, products,
channels, all_cols, potential_col, adoption_col, final_segment, col_map, product_segments,
data_dict)
)
p.start()
processes.append(p)
for p in processes:
p.join()
My directory structure:
- main.py(flask entry point)
- helper.py(contains function where above code is executed & calls transpose_dataset function)
Error that i am getting while running the same?
RuntimeError: No root path can be found for the provided module "mp_main". This can happen because the module came from an import hook that does not provide file name information or because it's a namespace package. In this case the root path needs to be explicitly provided.
Not sure what went wong here, manager list works fine when called from a sample.py file using if __name__ == '__main__':
Update: The same piece of code is working fine on my MacBook and not on windows os.
A sample flask API call:
#app.route(PREFIX + "ping", methods=['GET'])
def ping():
man = mp.Manager()
data = man.list()
processes = []
for i in range(0,5):
pr = mp.Process(target=test_func, args=(data, i))
pr.start()
processes.append(pr)
for pr in processes:
pr.join()
return json.dumps(list(data))
Stack has an ongoing bug preventing me from commenting, so I'll just write up an answer..
Python has 2 (main) ways to start a new process: "spawn", and "fork". Fork is a system command only available in *nix (read: linux or macos), and therefore spawn is the only option in windows. After 3.8 spawn will be the default on MacOS, but fork is still available. The big difference is that fork basically makes a copy of the existing process while spawn starts a whole new process (like just opening a new cmd window). There's a lot of nuance to why and how, but in order to be able to run the function you want the child process to run using spawn, the child has to import the main file. Importing a file is tantamount to just executing that file and then typically binding its namespace to a variable: import flask will run the flask/__ini__.py file, and bind its global namespace to the variable flask. There's often code however that is only used by the main process, and doesn't need to be imported / executed in the child process. In some cases running that code again actually breaks things, so instead you need to prevent it from running outside of the main process. This is taken into account in that the "magic" variable __name__ is only equal to "__main__" in the main file (and not in child processes or when importing modules).
In your specific case, you're creating a new app = Flask(__name__), which does some amount of validation and checks before you ever run the server. It's one of these setup/validation steps that it's tripping over when run from the child process. Fixing it by not letting it run at all is imao the cleaner solution, but you can also fix it by giving it a value that it won't trip over, then just never start that secondary server (again by protecting it with if __name__ == "__main__":)

I can implement Python multiprocessing with Spyder Windows PC, but why?

I'm so curious about this and need some advise about how can this happen? Yesterday I've tried to implement multiprocessing in Python script which is running on Spyder in Window PC. Here is the code I've first tried.
import multiprocessing
import time
start = time.perf_counter()
def do_something():
print('Sleeping 1 second...')
time.sleep(1)
print('Done sleeping')
p1 = multiprocessing.Process(target=do_something)
p2 = multiprocessing.Process(target=do_something)
p1.start()
p2.start()
p1.join()
p2.join()
finish = time.perf_counter()
print(f'Finished in {round(finish-start,2)} second(s)')
It's return an error.
AttributeError: Can't get attribute 'do_something' on <module '__main__' (built-in)
Then I search for survival from this problem and also my boss. And found this suggestion
Python's multiprocessing doesn't work in Spyder IDE
So I've followed it and installed Pycharm and try to run the code on PyCharm and it's seem to be work I didn't get AttributeError, however I got this new one instead of
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
I've googled again then finally I got this
RuntimeError on windows trying python multiprocessing
what I have to do is adding this one line
if __name__ == '__main__':
before starting multiprocessing.
import multiprocessing
import time
start = time.perf_counter()
def do_something():
print('Sleeping 1 second...')
time.sleep(1)
print('Done sleeping')
if __name__ == '__main__':
p1 = multiprocessing.Process(target=do_something)
p2 = multiprocessing.Process(target=do_something)
p1.start()
p2.start()
p1.join()
p2.join()
finish = time.perf_counter()
print(f'Finished in {round(finish-start,2)} second(s)')
And it's work now moreover, it's not working only on PyCharm, now I can run this code on Spyder too. So that is why I have so curious? how come Spyder also work? This is quite persist because I'm also run this code on my other PC which is Window server 2016 with Spyder , I'm also do something.
Anyone can help explain what happen here why it's work?
Thank you.
There's a lot to unpack here, so I'll just give a brief overview. There's also some missing information like how you have spyder/pycharm configured, and what operating system you use, so I'll have to make some assumptions...
Based on the error messages you are probably using MacOS or Windows which means the default way python creates a child process is called spawn. This means it will start a completely new process from the python executable ("python.exe" on windows for example). It will then send a message to the new process telling it what function to execute (target), and optionally what arguments to call that function with. The new process will have to import the main file to have access to that function however, so if you are running the python interpreter in interactive mode, there is no "main" file to import, and you get the first error message: AttributeError.
The second error is also related to the importing of the "main" file. When you import a file, it basically just runs the file like any other python script. If you were to create a new child process during import that child would then also create a new child when it imports the same file. You would end up recursively creating infinite child processes until the computer crashed, so python disallows creating additional child processes during the import phase of a child process hence the RuntimeError.

Python multiprocessing refuses to execute code and exits instantly

I was working on a larger piece of code and I kept getting an error when the program just seemed to end without doing anything. I narrowed down the problem and reproduced it below. As far as I understand the code should print the sentence and it seems to work on other online IDE's while failing on mine. Feel like I'm missing something super simple.
Failing On: IDLE Python 3.8.3 32-bit from python.org
Works On: onlinegdb.com
Code:
import multiprocessing
def x():
print("This is x func")
if __name__ == '__main__':
multiprocessing.freeze_support()
p = multiprocessing.Process(target=x)
p.start()
p.join()
Output:
>>>
I think the issue is IDLE just doesn't output stuff from stuff outside the main process. Need to use consoles which would output everything from the main and all other processes. Reference : Python multiprocessing example not working

How to run PyQt5 GUIs in non-blocking threads?

I have a PyQt5 GUI class that I want to be able to create multiple instances of either from an interactive console or normal run. I need these GUIs to be non-blocking so that they can be used while subsequent code runs.
I've tried calling app.exec__() in separate threads for each GUI like this answer, but the program sometimes crashes as the comment on the answer warned it would:
Run pyQT GUI main app in seperate Thread
And now I'm trying to get the code below working which I made based on this answer:
Run Pyqt GUI main app as a separate, non-blocking process
But when I run it the windows pop and and immediately disappear
import sys
from PyQt5 import QtWidgets, QtGui, QtCore
import time
class MainWindow(QtWidgets.QWidget):
def __init__(self):
# call super class constructor
super(MainWindow, self).__init__()
# build the objects one by one
layout = QtWidgets.QVBoxLayout(self)
self.pb_load = QtWidgets.QPushButton('Load')
self.pb_clear= QtWidgets.QPushButton('Clear')
self.edit = QtWidgets.QTextEdit()
layout.addWidget(self.edit)
layout.addWidget(self.pb_load)
layout.addWidget(self.pb_clear)
# connect the callbacks to the push-buttons
self.pb_load.clicked.connect(self.callback_pb_load)
self.pb_clear.clicked.connect(self.callback_pb_clear)
def callback_pb_load(self):
self.edit.append('hello world')
def callback_pb_clear(self):
self.edit.clear()
def show():
app = QtWidgets.QApplication.instance()
if not app:
app = QtWidgets.QApplication(sys.argv)
win = MainWindow()
win.show()
if __name__ == '__main__':
show()
show()
EDIT - I don't see how this question is a duplicate. The 'duplicate' questions are only slightly related and don't provide solutions to my problem at all.
I want to be able to create multiple instances of a GUI (MainWindow in my example) by calling the show() function from either an interactive session or script, and I want those windows to stay on my screen while subsequent code is running.
EDIT2 - When I run the code as a script I can do what I want by using multiprocessing, see this demo:
https://www.screencast.com/t/5WvJNVSLm9OR
However I still need help because I want it to also work in interactive Python console sessions, and multiprocessing does not work in that case.
It isn't necessary to use separate threads or processes for this. You just need a way to maintain a reference to each new window when importing the script in a python interactive session. A simple list can be used for this. It is only necessary to explictly start an event-loop when running the script from the command-line; in an interactive session, it will be handled automatically by PyQt.
Here is an implementation of this approach:
...
_cache = []
def show(title=''):
if QtWidgets.QApplication.instance() is None:
_cache.append(QtWidgets.QApplication(sys.argv))
win = MainWindow()
win.setWindowTitle(title)
win.setAttribute(QtCore.Qt.WA_DeleteOnClose)
win.destroyed.connect(lambda: _cache.remove(win))
_cache.append(win)
win.show()
if __name__ == '__main__':
show('Foo')
show('Bar')
sys.exit(QtWidgets.QApplication.instance().exec_())
This is a minor addendum to #ekhumoro's answer. I don't have enough reputation to only add a comment so I had to write this as an answer.
#ekhumoro's answer almost fully answers #Esostack's question, but doesn't work in the Ipython console. After many hours of searching for the answer to this question myself, I came across a comment from #titusjan in a three year old thread (here) also responding to a good answer from #ekhumoro.
The missing part to #ekhumoro's answer which results in the gui windows freezing for Ipython specifically is that Ipython should be set to use the qt gui at launch or once running.
To make this work with Ipython:
Launch Ipython with ipython --gui=qt5
In a running Ipython console run the magic command %gui qt5
To fix it from a Python script you can run this function
def fix_ipython():
from IPython import get_ipython
ipython = get_ipython()
if ipython is not None:
ipython.magic("gui qt5")

Resources