Hacks to create new instance of rpy2 singleton? - rpy2

Since Rpy2 can be used in parallel, there should be some way to create a new R singleton, otherwise multiprocessing would have caused errors.
Is there a way I can start a new rpy2 instance myself using
a hack?

Python's multiprocessing module lets one achieve parallelism through parallel processes. Each such process will run its own embedded R.

Related

processing data in parallel python

I have a script, parts of which at some time able to run in parallel. Python 3.6.6
The goal is to decrease execution time at maximum.
One of the parts is connection to Redis, getting the data for two keys, pickle.loads for each and returning processed objects.
What’s the best solution for such a tasks?
I’ve tried Queue() already, but Queue.get_nowait() locks the script, and after {process}.join() it also stops execution even though the task is done. Using pool.map raises TypeError: can't pickle _thread.lock objects.
All I could achieve is parallel running of all parts but still cannot connect the results
cPickle.load() will release the GIL so you can use it in multiple threads easily. But cPickle.loads() will not, so don't use that.
Basically, put your data from Redis into a StringIO then cPickle.load() from there. Do this in multiple threads using concurrent.futures.ThreadPoolExecutor.

unable to import scikit-learn into NX Open API

I am trying to import sklearn.cluster and scipy.spatial into a 3D CAD/CAM modeling program called NX.
I created a virtual environment with Anaconda for Python 3.3.2 (conda create -n py33) and I installed sklearn via conda. I am using Windows 7 on an Intel 64 bit machine. For the most part, I have been able to use numpy methods and attributes successfully although some (np.array_equiv)will lock up NX.
When I run a python file with import sklearn.cluster, it will crash NX. I have not used any sklearn classes or methods yet. The import line alone will crash NX. I also face a similar issue with import scipy.spatial. I did not use any scipy.spatial methods or classes.
According to the documentation, putting a comment #nx:threaded at the very top of the python file should resolve the issue but it has not.
It is my understanding that Python 3.2+ has a new GIL implementation. Importing threaded extension modules into NX can be problematic as the documentation below states
https://docs.plm.automation.siemens.com/tdoc/nx/10.0.3/release_notes/#uid:xid385122
Running threaded extension modules with Python
The embedded Python interpreter in NX runs Python scripts using subinterpreter threads to isolate the execution environments of different scripts that are running simultaneously. For example, you can use the startup script, utilize user exits, and explicitly run a journal in a session. Running each of these scripts in a separate subinterpreter keeps each of these environments separate from each other to avoid possible illegal access and collisions.
However, this approach has some drawbacks. There are a few third-party extension modules (non-NXOpen extension modules, such as matplotlib) that use C threads to perform operations. These extension modules could be imported safely, but when a function is called that starts a C thread, the subinterpreter hangs or crashes. These extensions run safely only in the main interpreter thread. Unfortunately, Python does not provide a safe way to run such extension modules in subinterpreter threads and does not provide any way to switch from a subinterpreter thread to the main interpreter thread when a script is already executing.
To support such threaded extension modules, NX must know if a Python script is using any of these modules before preparing its interpreter. So if the script is using these kinds of threaded extension modules or importing a module that is using threaded extension modules directly or indirectly, you should add a comment with the text nx:threaded anywhere in the first three lines. For example:
# nx:threaded
# some comments nx:threaded some comments
# some comments
# nx:threaded
# some comments
This instructs NX to prepare its embedded Python interpreter to run the script in the main thread instead of the subinterpreter thread to avoid a possible problem. Pure Python threads do not have those kinds of problems with subinterpreters and should be used without this extra comment. This comment could be added to any Python script whether it is a startup script, a user exit script, or a normal journal.
Do not use this comment unnecessarily. It runs all the scripts in the main interpreter thread and may exhibit some unusual behavior, such as illegal data access and object deallocation. Use this comment only when threaded extension modules are imported and used.
Try #nx: threaded token.
It may happen that some extension modules start native threads at the time of import itself. Somehow, sklearn.cluster import is interfering or modifying data of embedded subinterpreter state. The same problem could be seen in numpy too. The documentation might not be 100% correct. I think adding "nx: threaded" token should solve your problem.

How to handle multiple threads, single outcome in a functional way?

This is not a 'pure' functional question as it involves side-effects. I have a function that may take 10 seconds, say to complete. The function generates data in a database (for example). If it is run twice at the same time it will create duplicate data. Lets say that the function can be triggered by clicking a button in the browser. If two people click within seconds of each other then the function can be running twice concurrently.
In Java and similar systems I would use synchronise on a semaphore. In Node or Django I can take advantage of the single threading to drop parallel runs.
running = False
def long_running_process():
global running
# only run once
if running: return
try:
running = True
.... go to it ...
finally:
running = False
The requirement in Python for a global reference is a clear hint that this function requires state - and so is imperative by nature.
So, to my questions.
How do the 'pure' functional programs that demand immutability handle this problem?
And how could I implement that in Python (for example)?
Is my best option to use a reactive Python library?
I already know that the Haskell people will tell me to create a state Monad, but how would Clojure or Elixir or ... handle it?

in a pickle: how to serialise legacy objects for submission to a Python multiprocessing pool

I have written a nice parallel job processor that accepts jobs (functions, their arguments, timeout information etc.) and submits then to a Python multiprocessing pool. I can provide the full (long) code if requested, but the key step (as I see it) is the asynchronous application to the pool:
job.resultGetter = self.pool.apply_async(
func = job.workFunction,
kwds = job.workFunctionKeywordArguments
)
I am trying to use this parallel job processor with a large body of legacy code and, perhaps naturally, have run into pickling problems:
PicklingError: Can’t pickle <type ’instancemethod’>: attribute lookup builtin .instancemethod failed
This type of problem is observable when I try to submit a problematic object as an argument for a work function. The real problem is that this is legacy code and I am advised that I can make only very minor changes to it. So... is there some clever trick or simple modification I can make somewhere that could allow my parallel job processor code to cope with these traditionally unpicklable objects? I have total control over the parallel job processor code, so I am open to, say, wrapping every submitted function in another function. For the legacy code, I should be able to add the occasional small method to objects, but that's about it. Is there some clever approach to this type of problem?
use dill and pathos.multiprocessing instead of pickle and multiprocessing.
see here:
What can multiprocessing and dill do together?
http://matthewrocklin.com/blog/work/2013/12/05/Parallelism-and-Serialization/
How to pickle functions/classes defined in __main__ (python)

pthread support in R

I was looking around for a way to run a R function as a separate thread in the background.
As R is written in C , i was hoping that some packages will support threading using pthreads.
So far i haven’t found anything good , some of the packages i tested were broken or implemented some other concepts.
So my requirement is as simple as running a R script as a separate pthread inside a R console.
How can i run a function or a script as a separate thread.
PS - I am not looking for fork like features.
Thanks
Vineeth
In the R extensions manual:
There is no direct support for the POSIX threads
Alternatively, you can use several R process in parallel. In linux you can simply fork a process by running an R script from a terminal and adding &, e.g.:
Rscript spam.R &
If you insist one doing this from within R:
system("Rscript spam.R", wait = FALSE)
Or you could have a look at the parallel package to run R operations in parallel.
Given your comments I think you could have a look at the HighPerformance task view. Quoting from that:
The bigmemory package by Kane and Emerson permits storing large
objects such as matrices in memory (as well as via files) and uses
external pointer objects to refer to them. This permits transparent
access from R without bumping against R's internal memory limits.
Several R processes on the same computer can also shared big memory
objects.
indicates that the bigmemory package might prove interesting in letting multiple R instances access the same data stored in memory. Than you could use forking to create multiple R instances.

Resources