How do I disable %%cython within Python Jupyter Notebook Script? - python-3.x

I'm creating an NLP related Jupyter notebook, hopefully to be released for public use. I'm using Cython magic commands for some code to increase speed, using "%%cython" at the beginning of some cells.
What I want to do is make the use of such cython magic commands (and related cython commands, cdef etc) a configureable parameter that users can specify in an earlier cell.
I've tried conditionals to allow users to "turn off" cython, however these don't seem to work as "%% cython" needs to be listed on the first line of the cell.
Here's the code:
%%cython
import numpy as np # access to Numpy from Python layer
import math
cdef:
function defn here
Here's my attempt at a solution:
Configuration Cell
turn_on_cython = True # May be changed by user to false
Subsequent Cell
if (turn_on_cython == True):
%%cython
(doesn't work)

Related

Why do I have to import a library twice in Python (IDLE and the imported file)?

I am running Python 3.7.6 shell and have the library numpy installed correctly.
In my shell I type:
import numpy as np
and can use numpy however I desire. I then proceed to import 'my_lib.py' which contains:
def softmax(x):
e_x = np.exp(x - np.max(x))
return e_x / e_x.sum(axis=0)
In my shell I can call the function softmax(x) but I immediately get the error
NameError: name 'np' is not defined
My hypothesis here would be I've imported numpy into 'shell scope' and i've also imported softmax(x) into 'shell scope' so everything should be happy. To fix this problem I have to add
import numpy as np
into 'my_lib.py'.
How come I have to import numpy twice?
The code in each module can only use identifiers (names) that have be defined in or imported into that module. The global dict in each module only contains names global to that module. It might better be called the module dict or modular dict, but the name goes back to when there were no modules in computing.
You might benefit from reading https://docs.python.org/3/tutorial/modules.html and probably elsewhere in the tutorial.
(None of this has anything to do with the editor you use to write code or the IDE or shell you use to pass code to Python.)

How to implement a Meta Path Importer on Python 3

I'm struggling to refactor some working import-hook-functionality that served us very well on Python 2 the last years... And honestly I wonder if something is broken in Python 3? But I'm unable to see any reports of that around so confidence in doing something wrong myself is still stronger! Ok. Code:
Here is a cooked down version for Python 3 with PathFinder from importlib.machinery:
import sys
from importlib.machinery import PathFinder
class MyImporter(PathFinder):
def __init__(self, name):
self.name = name
def find_spec(self, fullname, path=None, target=None):
print('MyImporter %s find_spec fullname: %s' % (self.name, fullname))
return super(MyImporter, self).find_spec(fullname, path, target)
sys.meta_path.insert(0, MyImporter('BEFORE'))
sys.meta_path.append(MyImporter('AFTER'))
print('sys.meta_path:', sys.meta_path)
# import an example module
import json
print(json)
So you see: I insert an instance of the class right in front and one at the end of sys.meta_path. Turns out ONLY the first one triggers! I never see any calls to the last one. That was different in Python 2!
Looking at the implementation in six I thought, well THEY need to know how to do this properly! ... 🤨 I don't see this working either! When I try to step in there or just put some prints... Nada!
After all:IF I actually put my Importer first in the sys.meta_path list, trigger on certain import and patch my module (which all works fine) It still gets overridden by the other importers in the list!
* How can I prevent that?
* Do I need to do that? It seems dirty!
I have been heavily studying the meta_path in Python3.8
The entire import mechanism has been moved from C to Python and manifests itself as sys.meta_path which contains 3 importers. The Python import machinery is cleverly stupid. i.e. uncomplex.
While the source code of the entire python import is to be found in importlib/
meta_path[1] pulls the importer from frozen something: bytecode=?
underscore import is still the central hook called when you "import mymod"
--import--() first checks if the module has already been imported in which case it retrieves it from sys.modules
if that doesn't work it calls find_spec() on each "spec finder" in meta_path.
If the "spec finder" is successful it return a "spec" needed by the next stage
If none of them find it, import fails
sys.meta_path is an array of "spec finders"
0: is the builtin spec finder: (sys, _sre)
1: is the frozen import lib: It imports the importer (importlib)
2: is the path finder and it finds both library modules: (os, re, inspect)
and your application modules based on sys.path
So regarding the question above, it shouldn't be happening. If your spec finder is first in the meta_path and it returns a valid spec then the module is found, and remaining entries in sys.meta_path won't even be asked.

how do I reload a module ONLY if it was NOT just loaded by the currently executing script

I've got a script I'm using to initialize and train a neural network with Keras, so I've initialized random seeds for 100% reproducible training results as I test and optimize my code. Meanwhile, I also have been using the importlib.reload() function to reload all my custom modules that I'm changing as I develop. But the problem is that my random sequence may be different the FIRST time I call the script in an ipython session, vs. the subsequent times. I found a solution (see below), but it seems pretty inelegant and heavy-handed. I'm wondering if there's a more efficient, or more pythonic way. Or perhaps I should handle the random seeds differently altogether?
import sys
modulesUnderDevelopment = ['common_config', 'cnn_tools', 'prep_dataset']
moduleShorthand = ['cc', 'ct', 'pd']
reloadTracker = dict()
#record which modules have not yet been imported, before we do any importing.
for moduleName, shorthand in zip(modulesUnderDevelopment, moduleShorthand):
reloadTracker[moduleName] = {'imported':moduleName in sys.modules, 'shorthand': shorthand}
#import modules. Anything that's already been imported won't be imported by these lines,
# so any randomness that occurs during import will NOT occur if the module was
# already imported. HOWEVER, if the module has not been imported, any randomness
# in the module import will be executed.
print('imports are beginning')
import common_config as cc
import cnn_tools as ct
import prep_dataset as pd
print('imports are ending')
import importlib
#now reload the key modules that I'm optimizing and regularly changing. These reloads will
# ALWAYS happen, so the randomness in them will always be executed.
print('reloads are beginning')
for key, item in reloadTracker.items():
if not item['imported']:
continue
if key == 'common_config':
importlib.reload(cc)
elif key == 'cnn_tools':
importlib.reload(ct)
elif key == 'prep_dataset':
importlib.reload(pd)
print('reloads are complete')
Meanwhile, my modules contain code like this:
## prep_dataset ##
import numpy as np
print('starting prep_dataset -- {}'.format(np.random.random()))
## cnn_tools ##
import numpy as np
print('starting cnn_tools -- {}'.format(np.random.random()))
The FIRST time I start ipython and run this script, it prints this:
imports are beginning
starting cnn_tools -- 0.5507979025745755
Using TensorFlow backend.
starting prep_dataset -- 0.7081478226181048
imports are ending
reloads are beginning
reloads are complete
Subsequent times, it prints this:
imports are beginning
imports are ending
reloads are beginning
starting cnn_tools -- 0.5507979025745755
starting prep_dataset -- 0.7081478226181048
reloads are complete
So that solves my problem - the random number stream is the same both times. But it seems really clumsy... Conversely, if i don't go through all this trouble and just import followed by reload, i get a DIFFERENT result the first time I run the script - 2 extra random numbers from the stream.
starting cnn_tools -- 0.5507979025745755
Using TensorFlow backend.
starting prep_dataset -- 0.7081478226181048
reloads are beginning
starting cnn_tools -- 0.2909047389129443
starting prep_dataset -- 0.510827605197663
reloads are complete

How to enable parallel in scipy.optimize.differential_evolution?

I am trying to find the global minimum of a function using differential_evolution from scipy.optimize. As explained in the scipy reference guide, I should set in the options:
updating='deferred',workers=number of cores
However, when I run the code, it freezes and does nothing. How can I solve this issue, or is there any better way for parallelizing the global optimizer?
The following is in my code:
scipy.optimize.differential_evolution(objective, bnds, args=(),
strategy='best1bin', maxiter=1e6,
popsize=15, tol=0.01, mutation=(0.5, 1),
recombination=0.7, seed=None,
callback=None, disp=False, polish=True,
init='latinhypercube', atol=0,
updating='deferred',workers=2)
Came across the same problem myself. The support for parallelism in scipy.optimize.differential_evolution was added in version 1.2.0 and the version I had was too old. When looking for the documentation, the top result also referred to the old version. The newer documentation can instead be found at https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.differential_evolution.html.
I use virtualenvironment and pip for package management, and to upgrade to the latest version of scipy I just had to run pip install --upgrade scipy. If using anaconda, you might need to do e.g. conda install scipy=1.4.1.
In order to activate the parallelism, set the workers flag to something > 1 for a specific number of cores or workers=-1 to use all available cores.
One caveat: Don't make the same mistake as me and try to run the differential evolution directly in the top level of a Python script on Windows because it won't run. This is due to how multiprocessing.Pool functions. Specifically, instead of the following:
import scipy.optimize
def minimize_me(x, *args):
... # Your code
return result
# DO NOT DO LIKE THIS
... # Prepare all the arguments
# This will give errors
result = scipy.optimize.differential_evolution(minimize_me, bounds=function_bounds, args=extraargs,
disp=True, polish=False, updating='deferred', workers=-1)
print(result)
use the code below:
import scipy.optimize
def minimize_me(x, *args):
... # Your code
return result
# DO LIKE THIS
if __name__ == "__main__":
... # Prepare all the arguments
result = scipy.optimize.differential_evolution(minimize_me, bounds=function_bounds, args=extraargs,
disp=True, polish=False, updating='deferred', workers=-1)
print(result)
See this post for more info about parallel execution on Windows: Compulsory usage of if __name__=="__main__" in windows while using multiprocessing
Note that even if not on Windows, it's anyway a good practice to use if __name__ == "__main__":.

Keras with Theano on GPU

While trying to run my Keras code on GPU (CUDA installed), I am not able to execute the following statement, as has been suggested on many online references.
set THEANO_FLAGS="mode=FAST_RUN,device=gpu,floatX=float32" & python theanogpu_example.py
I am getting the following error.
ValueError: Invalid value ("FAST_RUN,device=gpu,floatX=float32") for configurati
on variable "mode". Valid options are ('Mode', 'DebugMode', 'FAST_RUN', 'NanGuar
dMode', 'FAST_COMPILE', 'DEBUG_MODE')
I have tried the other mode suggested as well from inside the code.
import theano
theano.config.device = 'gpu'
theano.config.floatX = 'float32'
I get the following error.
Exception: Can't change the value of this config parameter after initialization!
Apart from knowing how to make it run, I would also take this opportunity to ask a simpler question. How to know in Windows what is my device i.e. whether 'gpu' or 'gpu1' or 'gpu0'? I have tried all 3 for my case but it hasn't yielded result.
Any suggestions will be appreciated.
The best way is using THEANO_FLAGS before run code, because the config variables cannot be changed after importing Theano, try this:
import os
os.environ['THEANO_FLAGS'] = "device=cuda,force_device=True,floatX=float32"
import theano

Resources