OverflowError when subclassing rv_continuous - python-3.x

import scipy.stats as st
import numpy as np # generic math functions
# https://scicomp.stackexchange.com/q/1658
class LorentzGen(st.rv_continuous):
"""Lorentz distribution"""
def _pdf(self, x):
gamma = 0.27
return 2 * gamma / (np.pi * (gamma ** 2 + x ** 2))
transverse_fields = LorentzGen(a=0)
gaussian_gen = st.norm()
L = 2
list_of_temps = np.linspace(1, 10, 40)
for T, temp in enumerate(list_of_temps):
print(f"Run {T}")
for t in range(5000):
if t%500==0:
print(f"Trial {t}")
h_x = [[-transverse_fields.rvs(), xx] for xx in range(L)] # OverflowError: (34, 'Result too large')
# h_y = [[-gaussian_gen.rvs(), xx] for xx in range(L)] # Works
In the above code, I have implemented my own probability distribution (essentially a half Lorentzian, x∈[0,∞]), modified from the answer from scicomp.SE, which I call transverse_fields.
I need to generate a whole bunch of values from this transverse_fields, using them in a nested For loop. The issue is that beyond a certain number of runs, here "Run 1 Trial ~3500", I get a bunch of errors:
C:\ProgramData\Anaconda3\lib\site-packages\scipy\integrate\quadpack.py:385: IntegrationWarning: The integral is probably divergent, or slowly convergent.
warnings.warn(msg, IntegrationWarning)
C:\ProgramData\Anaconda3\lib\site-packages\numpy\lib\function_base.py:2831: RuntimeWarning: overflow encountered in ? (vectorized)
outputs = ufunc(*inputs)
Traceback (most recent call last):
File "C:/<redacted>/stackoverflow.py", line 26, in <module>
h_x = [[-transverse_fields.rvs(), xx] for xx in range(L)] # Result Overflow
File "C:/<redacted>/stackoverflow.py", line 26, in <listcomp>
h_x = [[-transverse_fields.rvs(), xx] for xx in range(L)] # Result Overflow
File "C:\ProgramData\Anaconda3\lib\site-packages\scipy\stats\_distn_infrastructure.py", line 954, in rvs
vals = self._rvs(*args)
File "C:\ProgramData\Anaconda3\lib\site-packages\scipy\stats\_distn_infrastructure.py", line 889, in _rvs
Y = self._ppf(U, *args)
File "C:\ProgramData\Anaconda3\lib\site-packages\scipy\stats\_distn_infrastructure.py", line 902, in _ppf
return self._ppfvec(q, *args)
File "C:\ProgramData\Anaconda3\lib\site-packages\numpy\lib\function_base.py", line 2755, in __call__
return self._vectorize_call(func=func, args=vargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\numpy\lib\function_base.py", line 2831, in _vectorize_call
outputs = ufunc(*inputs)
File "C:\ProgramData\Anaconda3\lib\site-packages\scipy\stats\_distn_infrastructure.py", line 1587, in _ppf_single
while self._ppf_to_solve(right, q, *args) < 0.:
File "C:\ProgramData\Anaconda3\lib\site-packages\scipy\stats\_distn_infrastructure.py", line 1569, in _ppf_to_solve
return self.cdf(*(x, )+args)-q
File "C:\ProgramData\Anaconda3\lib\site-packages\scipy\stats\_distn_infrastructure.py", line 1745, in cdf
place(output, cond, self._cdf(*goodargs))
File "C:\ProgramData\Anaconda3\lib\site-packages\scipy\stats\_distn_infrastructure.py", line 1621, in _cdf
return self._cdfvec(x, *args)
File "C:\ProgramData\Anaconda3\lib\site-packages\numpy\lib\function_base.py", line 2755, in __call__
return self._vectorize_call(func=func, args=vargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\numpy\lib\function_base.py", line 2831, in _vectorize_call
outputs = ufunc(*inputs)
File "C:\ProgramData\Anaconda3\lib\site-packages\scipy\stats\_distn_infrastructure.py", line 1618, in _cdf_single
return integrate.quad(self._pdf, self.a, x, args=args)[0]
File "C:\ProgramData\Anaconda3\lib\site-packages\scipy\integrate\quadpack.py", line 341, in quad
points)
File "C:\ProgramData\Anaconda3\lib\site-packages\scipy\integrate\quadpack.py", line 448, in _quad
return _quadpack._qagse(func,a,b,args,full_output,epsabs,epsrel,limit)
File "C:/<redacted>/stackoverflow.py", line 11, in _pdf
return 2 * gamma / (np.pi * (gamma ** 2 + x ** 2))
OverflowError: (34, 'Result too large')
Process finished with exit code 1
Note that the error does not occur if I bump the number of trials, here t to a smaller number like 50, nor does it occur if list_of_temps has less values, e.g. np.linspace(1,10,4). Even though in the original problem, with np.linspace(1,10,40), the error popped up during Run 1.
With the original setup, there is also no overflow error when I use the standard Gaussian distribution function from scipy.stats.
Similar issues on SO that I've seen attribute this to the range over which the for loop is run is too big. But I don't quite see that here? And I don't quite understand how to implement Decimal as suggested in the answer in the linked question, in any case.
How can I fix this?
I'm running Python 3.6.5 with Anaconda on 64bit Windows 10.

It seems to be an issue with the way I'm declaring my probability distribution with transverse_fields/LorentzGen.
My solution was to use the in-built cauchy distribution in scipy.stats, with a modified scale.
Also since I wanted a half-Lorentzian, I just took the absolute np.abs(...) when drawing a random number from transverse_fields.
import scipy.stats as st
import numpy as np # generic math functions
transverse_fields = st.cauchy(scale=0.27)
L = 2
list_of_temps = np.linspace(1, 10, 40)
for T, temp in enumerate(list_of_temps):
print(f"Run {T}")
for t in range(5000):
if t%500==0:
print(f"Trial {t}")
h_x = [[-np.abs(transverse_fields.rvs()), xx] for xx in range(L)] # Now works
This is satisfactory enough for me now, but I would still appreciate someone explaining why my way of subclassing rv_continuous gave me the aforementioned errors.

Related

How to successfully install pymc3 on windows 10 64 bits?

To begin with, I have followed up the instruction of the installation guide and deploy the new virtual environment with the yml file presented over that page, but I still met with the same problem as I executed the following codes. I have tried out many ways to solve the problem, please assist me to solve the problem.
Plus, I have referred to the issues on the official pymc3 website, but the problem still existed.
The reproducible example:
import pymc3 as pm
import numpy as np
import pandas as pd
import scipy.stats as stats
from datetime import datetime
import theano.tensor as T
early = 10
late = 12
y = np.r_[np.random.poisson(early, 25), np.random.poisson(late, 75)]
niter = 10000
t = range(len(y))
with pm.Model() as change_point:
cp = pm.DiscreteUniform('change_point', lower=0, upper=len(y), testval=len(y)//2)
mu0 = pm.Exponential('mu0', 1/y.mean())
mu1 = pm.Exponential('mu1', 1/y.mean())
mu = T.switch(t < cp, mu0, mu1)
Y_obs = pm.Poisson('Y_obs', mu=mu, observed=y)
trace = pm.sample(niter)
pm.traceplot(trace, varnames=['change_point', 'mu0', 'mu1'])
Here is the error reports:
[You can find the C code in this temporary file: C:\Users\Mick\AppData\Local\Temp\theano_compilation_error_xk8zcr1g
Traceback (most recent call last):
File "C:\....py", line 48, in <module>
mu = T.switch(t < cp, mu0, mu1)
File "C:\...\lib\site-packages\theano\tensor\var.py", line 41, in __gt__
rval = theano.tensor.basic.gt(self, other)
File "C:\...\lib\site-packages\theano\graph\op.py", line 253, in __call__
compute_test_value(node)
File "C:\...\lib\site-packages\theano\graph\op.py", line 126, in compute_test_value
thunk = node.op.make_thunk(node, storage_map, compute_map, no_recycling=[])
File "C:\...\lib\site-packages\theano\graph\op.py", line 634, in make_thunk
return self.make_c_thunk(node, storage_map, compute_map, no_recycling)
File "C:\...\lib\site-packages\theano\graph\op.py", line 600, in make_c_thunk
outputs = cl.make_thunk(
File "C:\...\lib\site-packages\theano\link\c\basic.py", line 1203, in make_thunk
cthunk, module, in_storage, out_storage, error_storage = self.__compile__(
File "C:\...\lib\site-packages\theano\link\c\basic.py", line 1138, in __compile__
thunk, module = self.cthunk_factory(
File "C:\...\lib\site-packages\theano\link\c\basic.py", line 1634, in cthunk_factory
module = get_module_cache().module_from_key(key=key, lnk=self)
File "C:\...\lib\site-packages\theano\link\c\cmodule.py", line 1191, in module_from_key
module = lnk.compile_cmodule(location)
File "C:\...\lib\site-packages\theano\link\c\basic.py", line 1543, in compile_cmodule
module = c_compiler.compile_str(
File "C:\...\lib\site-packages\theano\link\c\cmodule.py", line 2546, in compile_str
raise Exception(
Exception: ('Compilation failed (return status=1): C:\\...\\AppData\\Local\\Temp\\ccujaONv.s: Assembler messages:\r. C:\\...\\AppData\\Local\\Temp\\ccujaONv.s:89: Error: invalid register for .seh_savexmm\r. ', 'FunctionGraph(Elemwise{gt,no_inplace}(<TensorType(int64, (True,))>, TensorConstant{[ 0 1 2 .. 97 98 99]}))')]

KeyError: 0 - Function works initially, but returns error when called on other data

I have a function:
def create_variables(name, probabilities, labels):
print('function called')
model = Metrics(probabilities, labels)
prec_curve = model.precision_curve()
kappa_curve = model.kappa_curve()
tpr_curve = model.tpr_curve()
fpr_curve = model.fpr_curve()
pr_auc = auc(tpr_curve, prec_curve)
roc_auc = auc(fpr_curve, tpr_curve)
auk = auc(fpr_curve, kappa_curve)
return [name, prec_curve, kappa_curve, tpr_curve, fpr_curve, pr_auc, roc_auc, auk]
I have the following variables:
svm = pd.read_csv('SVM.csv')
svm_prob_1 = svm.probability[svm.fold_number == 1]
svm_prob_2 = svm.probability[svm.fold_number == 2]
svm_label_1 = svm.true_label[svm.fold_number == 1]
svm_label_2 = svm.true_label[svm.fold_number == 2]
I want to execute the following lines:
svm1 = create_variables('svm_fold1', svm_prob_1, svm_label_1)
svm2 = create_variables('svm_fold2', svm_prob_2, svm_label_2)
Python works as expected for svm1. However, when it starts processing svm2, I receive the following error:
svm2 = create_variables('svm_fold2', svm_prob_2, svm_label_2)
function called
Traceback (most recent call last):
File "<ipython-input-742-702cfac4d100>", line 1, in <module>
svm2 = create_variables('svm_fold2', svm_prob_2, svm_label_2)
File "<ipython-input-741-b8b5a84f0298>", line 6, in create_variables
prec_curve = model.precision_curve()
File "<ipython-input-734-dd9c309be961>", line 59, in precision_curve
self.tp, self.tn, self.fp, self.fn = self.confusion_matrix(self.preds)
File "<ipython-input-734-dd9c309be961>", line 72, in confusion_matrix
if pred == self.labels[i]:
File "C:\Users\20200016\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\series.py", line 1068, in __getitem__
result = self.index.get_value(self, key)
File "C:\Users\20200016\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 4730, in get_value
return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None))
File "pandas\_libs\index.pyx", line 80, in pandas._libs.index.IndexEngine.get_value
File "pandas\_libs\index.pyx", line 88, in pandas._libs.index.IndexEngine.get_value
File "pandas\_libs\index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 992, in pandas._libs.hashtable.Int64HashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 998, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 0
svm_prob_1 and svm_prob_2 are both of the same shape and contain non-zero values. svm_label_2 contains 0's and 1's and has the same length as svm_prob_2.
Furthermore, the error seems to be in svm_label_1. After changing this variable, the following line does work:
svm2 = create_variables('svm_fold2', svm_prob_2, svm_label_1
Based on the code below, there seems to be no difference between svm_label_1 and svm_label_2 though.
type(svm_label_1)
Out[806]: pandas.core.series.Series
type(svm_label_2)
Out[807]: pandas.core.series.Series
min(svm_label_1)
Out[808]: 0
min(svm_label_2)
Out[809]: 0
max(svm_label_1)
Out[810]: 1
max(svm_label_2)
Out[811]: 1
sum(svm_label_1)
Out[812]: 81
sum(svm_label_2)
Out[813]: 89
len(svm_label_1)
Out[814]: 856
len(svm_label_2)
Out[815]: 856
Does anyone know what's going wrong here?
I don't know why it works, but converting svm_label_2 into a list worked:
svm_label_2 = list(svm.true_label[svm.fold_number == 2])
Since, svm_label_1 and svm_label_2 are of the same type, I don't understand why the latter raised an error and the first one did not. Therefore, I still welcome any explanation to this phenomenon.

matplotlib is now giving an 'Unknown property' AttributeError since update to Python 3:

I am using astroplan to set up some astronomical observations. Previously, when I ran my code using Python 2.7, it plotted the target on the sky properly. Now, I have moved to Python 3.7 and I get an AttributError on the same code.
I took my larger code and stripped out everything that did not seem to trigger the error. Here below is code that will generate the complaint.
from astroplan import Observer, FixedTarget
import astropy.units as u
from astropy.time import Time
import matplotlib.pyplot as plt
from astroplan.plots import plot_sky
import numpy as np
time = Time('2015-06-16 12:00:00')
subaru = Observer.at_site('subaru')
vega = FixedTarget.from_name('Vega')
sunset_tonight = subaru.sun_set_time(time, which='nearest')
vega_rise = subaru.target_rise_time(time, vega) + 5*u.minute
start = np.max([sunset_tonight, vega_rise])
plot_sky(vega, subaru, start)
plt.show()
Expected result was a simple plot of the target, in this case, the star Vega, on the sky as seen by the Subaru telescope in Hawaii. The astroplan docs give a tutorial that shows how it was to look at the very end of this page:
https://astroplan.readthedocs.io/en/latest/tutorials/summer_triangle.html
Instead, I now get the following error:
Traceback (most recent call last):
File "plot_sky.py", line 16, in <module>
plot_sky(vega, subaru, start)
File "/usr1/local/anaconda_py3/ana37/lib/python3.7/site-packages/astropy/utils/decorators.py", line 842, in plot_sky
func = make_function_with_signature(func, name=name, **wrapped_args)
File "/usr1/local/anaconda_py3/ana37/lib/python3.7/site-packages/astropy/units/decorators.py", line 222, in wrapper
return_ = wrapped_function(*func_args, **func_kwargs)
File "/local/data/fugussd/rkbarry/.local/lib/python3.7/site-packages/astroplan/plots/sky.py", line 216, in plot_sky
ax.set_thetagrids(range(0, 360, 45), theta_labels, frac=1.2)
File "/usr1/local/anaconda_py3/ana37/lib/python3.7/site-packages/matplotlib/projections/polar.py", line 1268, in set_thetagrids
t.update(kwargs)
File "/usr1/local/anaconda_py3/ana37/lib/python3.7/site-packages/matplotlib/text.py", line 187, in update
super().update(kwargs)
File "/usr1/local/anaconda_py3/ana37/lib/python3.7/site-packages/matplotlib/artist.py", line 916, in update
ret = [_update_property(self, k, v) for k, v in props.items()]
File "/usr1/local/anaconda_py3/ana37/lib/python3.7/site-packages/matplotlib/artist.py", line 916, in <listcomp>
ret = [_update_property(self, k, v) for k, v in props.items()]
File "/usr1/local/anaconda_py3/ana37/lib/python3.7/site-packages/matplotlib/artist.py", line 912, in _update_property
raise AttributeError('Unknown property %s' % k)
AttributeError: Unknown property frac

Numpy/Numba raises error when allocating very large empty set to CUDA

I am writing a Mandelbrot set generator with Numba/Numpy. One of the optimizations is to use cudatoolkit to push calculations to CUDA through Numba. The script works for low resolution sets, however it gives an error when trying to calculate large sets.
import numpy as np
from pylab import imshow, show
import time
from numba import cuda
from numba import *
import matplotlib
def mandel(x, y, max_iters):
c = complex(x, y)
z = 0.0j
for i in range(max_iters):
z = z*z + c
if (z.real*z.real + z.imag*z.imag) >= 4:
return i
return max_iters
mandel_gpu = cuda.jit(device=True)(mandel)
#cuda.jit
def mandel_kernel(min_x, max_x, min_y, max_y, image, iters):
height = image.shape[0]
width = image.shape[1]
pixel_size_x = (max_x - min_x) / width
pixel_size_y = (max_y - min_y) / height
startX, startY = cuda.grid(2)
gridX = cuda.gridDim.x * cuda.blockDim.x;
gridY = cuda.gridDim.y * cuda.blockDim.y;
for x in range(startX, width, gridX):
real = min_x + x * pixel_size_x
for y in range(startY, height, gridY):
imag = min_y + y * pixel_size_y
image[y, x] = mandel_gpu(real, imag, iters) / iters
gimage = np.zeros((65536, 65536), dtype = np.uint8)
#gimage = np.zeros((1024, 1024), dtype = np.uint8)
blockdim = (32, 8)
griddim = (32,16)
start = time.time()
d_image = cuda.to_device(gimage)
mandel_kernel[griddim, blockdim](-2.0, 2.0, -2.0, 2.0, d_image, 10000)
d_image.to_host()
dt = time.time() - start
print ("Mandelbrot created in " + str(dt) + " seconds")
imshow(gimage, 'gray')
show()
#matplotlib.image.imsave("mandel.png", gimage)
Above 46000 by 46000 pixels, python raises the following error:
Traceback (most recent call last):
File "C:\_main\Files\Mandel\mandel_cuda.py", line 46, in <module>
d_image = cuda.to_device(gimage)
File "C:\ProgramData\Anaconda3\lib\site-packages\numba\cuda\cudadrv\devices.py", line 212, in _require_cuda_context
return fn(*args, **kws)
File "C:\ProgramData\Anaconda3\lib\site-packages\numba\cuda\api.py", line 103, in to_device
to, new = devicearray.auto_device(obj, stream=stream, copy=copy)
File "C:\ProgramData\Anaconda3\lib\site-packages\numba\cuda\cudadrv\devicearray.py", line 688, in auto_device
devobj.copy_to_device(obj, stream=stream)
File "C:\ProgramData\Anaconda3\lib\site-packages\numba\cuda\cudadrv\devices.py", line 212, in _require_cuda_context
return fn(*args, **kws)
File "C:\ProgramData\Anaconda3\lib\site-packages\numba\cuda\cudadrv\devicearray.py", line 181, in copy_to_device
sentry_contiguous(self)
File "C:\ProgramData\Anaconda3\lib\site-packages\numba\cuda\cudadrv\devicearray.py", line 657, in sentry_contiguous
core = array_core(ary)
File "C:\ProgramData\Anaconda3\lib\site-packages\numba\cuda\cudadrv\devicearray.py", line 647, in array_core
return ary[tuple(core_index)]
File "C:\ProgramData\Anaconda3\lib\site-packages\numba\cuda\cudadrv\devices.py", line 212, in _require_cuda_context
return fn(*args, **kws)
File "C:\ProgramData\Anaconda3\lib\site-packages\numba\cuda\cudadrv\devicearray.py", line 479, in __getitem__
return self._do_getitem(item)
File "C:\ProgramData\Anaconda3\lib\site-packages\numba\cuda\cudadrv\devicearray.py", line 493, in _do_getitem
newdata = self.gpu_data.view(*extents[0])
File "C:\ProgramData\Anaconda3\lib\site-packages\numba\cuda\cudadrv\driver.py", line 1227, in view
raise RuntimeError("non-empty slice into empty slice")
RuntimeError: non-empty slice into empty slice
The script was run on a 1050ti with 4GB VRAM. For 46000 by 46000 pixels, the VRAM usage is only at 2.1GB. There should be plenty of VRAM for renders above 46000 by 46000.
It seems that it is a VRAM overflow issue. For the first 30 seconds of the render, more VRAM is used to store the empty set. When initializing, the 4GB limit is quickly reached, crashing the script.

Python Deap GP Evaluating individual causes error

I am currently experiencing an issue whenever I try to evaluate an individual using the GP portion of DEAP.
I receive the following error:
Traceback (most recent call last):
File "ImageGP.py", line 297, in <module>
pop, logs = algorithms.eaSimple(pop, toolbox, 0.9, 0.1, 60, stats=mstats, halloffame=hof, verbose=True)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/deap/algorithms.py", line 148, in eaSimple
for ind, fit in zip(invalid_ind, fitnesses):
File "ImageGP.py", line 229, in evalFunc
func = toolbox.compile(expr=individual)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/deap/gp.py", line 451, in compile
return eval(code, pset.context, {})
File "<string>", line 1
lambda oValue,oAvg13,oAvg17,oAvg21,sobelVal(v),sobelVal(h),edgeVal,blotchVal: [[[0, 75, 82.2857142857, 83.0, 82.9090909091, 4, 12, 4, 180], ... Proceed to print out all of my data ... [0, 147, 151.244897959, 150.728395062, 150.73553719, 248, 244, 5, 210]]]
^
SyntaxError: invalid syntax
If anyone has any ideas about what could be causing this problem, then I would really appreciate some advice. My current evaluation function looks like this:
def evalFunc(individual, data, points):
func = toolbox.compile(expr=individual)
total = 1.0
for point in points:
tmp = [float(x) for x in data[point[1]][point[0]][1:9]]
total += int((0 if (func(*tmp)) < 0 else 1) == points[2])
print ("Fitness: " + str(total))
return total,
Where the data contains the data being used (the values for the 8 variables listed in the error) and point specifying the x and y co-ordinates from which to get those 8 values. Thank you for your suggestions!

Resources