I am trying to import sklearn.cluster and scipy.spatial into a 3D CAD/CAM modeling program called NX.
I created a virtual environment with Anaconda for Python 3.3.2 (conda create -n py33) and I installed sklearn via conda. I am using Windows 7 on an Intel 64 bit machine. For the most part, I have been able to use numpy methods and attributes successfully although some (np.array_equiv)will lock up NX.
When I run a python file with import sklearn.cluster, it will crash NX. I have not used any sklearn classes or methods yet. The import line alone will crash NX. I also face a similar issue with import scipy.spatial. I did not use any scipy.spatial methods or classes.
According to the documentation, putting a comment #nx:threaded at the very top of the python file should resolve the issue but it has not.
It is my understanding that Python 3.2+ has a new GIL implementation. Importing threaded extension modules into NX can be problematic as the documentation below states
https://docs.plm.automation.siemens.com/tdoc/nx/10.0.3/release_notes/#uid:xid385122
Running threaded extension modules with Python
The embedded Python interpreter in NX runs Python scripts using subinterpreter threads to isolate the execution environments of different scripts that are running simultaneously. For example, you can use the startup script, utilize user exits, and explicitly run a journal in a session. Running each of these scripts in a separate subinterpreter keeps each of these environments separate from each other to avoid possible illegal access and collisions.
However, this approach has some drawbacks. There are a few third-party extension modules (non-NXOpen extension modules, such as matplotlib) that use C threads to perform operations. These extension modules could be imported safely, but when a function is called that starts a C thread, the subinterpreter hangs or crashes. These extensions run safely only in the main interpreter thread. Unfortunately, Python does not provide a safe way to run such extension modules in subinterpreter threads and does not provide any way to switch from a subinterpreter thread to the main interpreter thread when a script is already executing.
To support such threaded extension modules, NX must know if a Python script is using any of these modules before preparing its interpreter. So if the script is using these kinds of threaded extension modules or importing a module that is using threaded extension modules directly or indirectly, you should add a comment with the text nx:threaded anywhere in the first three lines. For example:
# nx:threaded
# some comments nx:threaded some comments
# some comments
# nx:threaded
# some comments
This instructs NX to prepare its embedded Python interpreter to run the script in the main thread instead of the subinterpreter thread to avoid a possible problem. Pure Python threads do not have those kinds of problems with subinterpreters and should be used without this extra comment. This comment could be added to any Python script whether it is a startup script, a user exit script, or a normal journal.
Do not use this comment unnecessarily. It runs all the scripts in the main interpreter thread and may exhibit some unusual behavior, such as illegal data access and object deallocation. Use this comment only when threaded extension modules are imported and used.
Try #nx: threaded token.
It may happen that some extension modules start native threads at the time of import itself. Somehow, sklearn.cluster import is interfering or modifying data of embedded subinterpreter state. The same problem could be seen in numpy too. The documentation might not be 100% correct. I think adding "nx: threaded" token should solve your problem.
Related
When I launch my main script on the cluster with ddp mode (2 GPU's), Pytorch Lightning duplicates whatever is executed in the main script, e.g. prints or other logic. I need some extended training logic, which I would like to handle myself. E.g. do something (once!) after Trainer.fit(). But with the duplication of the main script, this doesn't work as I intend. I also tried to wrap it in if __name__ == "__main__", but it doesn't change behavior. How could one solve this problem? Or, how can I use some logic around my Trainer object, without the duplicates?
I have since moved on to use the native "ddp" with multiprocessing in PyTorch. As far as I understand, PytorchLightning (PTL) is just running your main script multiple times on multiple GPU's. This is fine if you only want to fit your model in one call of your script. However, a huge drawback in my opinion is the lost flexibility during the training process. The only way of interacting with your experiment is through these (badly documented) callbacks. Honestly, it is much more flexible and convenient to use native multiprocessing in PyTorch. In the end it was so much faster and easier to implement, plus you don't have to search for ages through PTL documentation to achieve simple things.
I think PTL is going in a good direction with removing much of the boiler plate, however, in my opinion, the Trainer concept needs some serious rework. It is too closed in my opinion and violates PTL's own concept of "reorganizing PyTorch code, keep native PyTorch code".
If you want to use PTL for easy multi GPU training, I personally would strongly suggest to refrain from using it, for me it was a waste of time, better learn native PyTorch multiprocessing.
Asked this at the GitHub repo: https://github.com/PyTorchLightning/pytorch-lightning/issues/8563
There are different accelerators for training, and while DDP (DistributedDataParallel) runs the script once per GPU, ddp_spawn and dp doesn't.
However, certain plugins like DeepSpeedPlugin are built on DDP, so changing the accelerator doesn't stop the main script from running multiple times.
You could quit the duplicated sub-processes by putting the following code after Trainer.fit:
import sys
if model.global_rank != 0:
sys.exit(0)
where model is inherited from LightningModule, which has a property global_rank specifying the rank of the machine. We could roughly understand it as the gpu id or the process id. Everything after this code will only be executed in the main process, i.e., process with global_rank = 0.
For more information, please refer the documentation https://pytorch-lightning.readthedocs.io/en/latest/common/lightning_module.html#global_rank
Use global variables:
if __name__ == "__main__":
is_primary = os.environ.get(IS_PTL_PRIMARY) is None
os.environ[IS_PTL_PRIMARY] = "yes"
## code to run on each GPU
if is_primary:
## code to run only once
From Pytorch Lightning Official Document on DDP, we know that PL intendedly call the main script multiple times to spin off the child processes that take charge of GPUs:
They used the environment variable "LOCAL_RANK" and "NODE_RANK" to denote GPUs. So we can add conditions to bypass the code blocks that we don't want to get executed repeatedly. For example:
import os
if __name__ == "__main__":
if 'LOCAL_RANK' not in os.environ.keys() and 'NODE_RANK' not in os.environ.keys():
# code you only want to run once
The SubProcess is a pro feature, but there is no documentation on how to use it, is there any working example of using the SubProcess node?
Does that mean I can use SubProcess to chain up different workflows? How does one workflow pass information to another workflow?
Some code is better than no code, I presume:
...
spawn_next_flow = flow.SubProcess(another_flow)
...
I am trying to build a python GUI app which needs to run bash scripts and capture their output. Most of the commands used in these scripts require further inputs at run time, as they deal with network and remote systems.
Python's subprocess module allows one to easily run external programs, while providing various options for convenience and customizability.
To run simple programs that do not require interaction, the functions call(), check_call(), and check_output() (omitting arguments) are very useful.
For more complex use cases, where interaction with the running program is required, Popen Objects can be used, where you can customize input/output pipes, as well as many other options - the aformentioned functions are wrappers around these objects. You can interact with a running process through the provided methods poll(), wait(), communicate(), etc.
As well, if communicate() doesn't work for your use case, you can get the file descriptors of the PIPEs via Popen.stdin, Popen.stdout, and Popen.stderr, and interact with them directly with select. I prefer Polling Objects :)
I have written a nice parallel job processor that accepts jobs (functions, their arguments, timeout information etc.) and submits then to a Python multiprocessing pool. I can provide the full (long) code if requested, but the key step (as I see it) is the asynchronous application to the pool:
job.resultGetter = self.pool.apply_async(
func = job.workFunction,
kwds = job.workFunctionKeywordArguments
)
I am trying to use this parallel job processor with a large body of legacy code and, perhaps naturally, have run into pickling problems:
PicklingError: Can’t pickle <type ’instancemethod’>: attribute lookup builtin .instancemethod failed
This type of problem is observable when I try to submit a problematic object as an argument for a work function. The real problem is that this is legacy code and I am advised that I can make only very minor changes to it. So... is there some clever trick or simple modification I can make somewhere that could allow my parallel job processor code to cope with these traditionally unpicklable objects? I have total control over the parallel job processor code, so I am open to, say, wrapping every submitted function in another function. For the legacy code, I should be able to add the occasional small method to objects, but that's about it. Is there some clever approach to this type of problem?
use dill and pathos.multiprocessing instead of pickle and multiprocessing.
see here:
What can multiprocessing and dill do together?
http://matthewrocklin.com/blog/work/2013/12/05/Parallelism-and-Serialization/
How to pickle functions/classes defined in __main__ (python)
I've made a class in Python 3.x, that acts as a server. One method manages sending and receiving data via UDP/IP using the socket module (the data is stored in self.cmd, and self.msr respectively). I want to be able to modify the the self.msr, self.cmd variables from within the python interpreter online. For example:
>>> from myserver import MyServer
>>> s = MyServer()
>>> s.bakcground_recv_send() # runs in the background, constantly calling s.recv_msr(), s.send_cmd()
>>> process_data(s.msr) # I use the latest received data
>>> s.cmd[0] = 5 # this will be sent automatically
>>> s.msr # I can see what the newest data is
So far, s.bakcground_recv_send() does not exist. I need to manually call s.recv_msr() each time I want to see update the value of s.msr (s.recv_msr uses a blocking socket), and then call s.send_cmd() to send s.cmd.
In this particular case, which module makes more sense: multiprocess or threading?
Any hints how could I best solve this? I have no experience with either processes or threads (just read a lot, but I am still unsure which way to go).
In this case, threading makes most sense. In short, multiprocessing is for running processes on different procesors, threading is for doing things in the background.