rpy2 automatic NumPy conversion NA/NaN/Inf in foreign function call - python-3.x

When trying to call the BayesTree R package from python with simple data, I am getting the error "NA/NaN/Inf in foreign function call" even though all datum are positive real numbers.
Source Code
import numpy as np
# R interface for python
import rpy2
# For importing R packages
from rpy2.robjects.packages import importr
# Activate conversion from numpy to R
import rpy2.robjects.numpy2ri
rpy2.robjects.numpy2ri.activate()
train_x_py = np.array([[0.0, 0.0],
[0.0, 1.0],
[1.0, 1.0]])
# Any 3-length float vector fails for training y
train_y_py = np.array([1.0,2.0,3.0])
test_x_py = np.array([[0.2, 0.0],
[0.2, 0.2],
[1.0, 0.2]])
# Create R versions of the training and testing data
train_x = rpy2.robjects.r.matrix(train_x_py, nrow=3, ncol=2)
train_y = rpy2.robjects.vectors.FloatVector(train_y_py)
test_x = rpy2.robjects.r.matrix(test_x_py, nrow=3, ncol=2)
print(train_x)
print(train_y)
print(test_x)
BayesTree = importr('BayesTree')
response = BayesTree.bart(train_x, train_y, test_x,
verbose=False, ntree=100)
# The 7th return value is the estimated response
response = response[7]
print(response)
Code Output / Error
[,1] [,2]
[1,] 0 0
[2,] 0 1
[3,] 1 1
[1] 1 2 3
[,1] [,2]
[1,] 0.2 0.0
[2,] 0.2 0.2
[3,] 1.0 0.2
Traceback (most recent call last):
File "broken_rpy2.py", line 32, in <module>
verbose=False, ntree=100)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/rpy2/robjects/functions.py", line 178, in __call__
return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/rpy2/robjects/functions.py", line 106, in __call__
res = super(Function, self).__call__(*new_args, **new_kwargs)
rpy2.rinterface.RRuntimeError: Error in (function (x.train, y.train, x.test = matrix(0, 0, 0), sigest = NA, :
NA/NaN/Inf in foreign function call (arg 7)
The error on line 32 to which it is referring is:
response = BayesTree.bart(train_x, train_y, test_x,
verbose=False, ntree=100)
System Setup
Operating System:
Mac OS X Sierra 10.12.6
Python Version:
Python 3.6.1
R Version:
R 3.4.1
Python Packages:
pip 9.0.1,
rpy2 2.8.6,
numpy 1.13.0
Question
Is this my own user error, or is this a bug in rpy2?

This is a problem in the R package "BayesTree". You can reproduce the problem in R directly with the following code (assuming you have installed the BayesTree package).
train_x = matrix(c(0,0,1,0,1,1),nrow=3,ncol=2)
train_y = as.vector(c(1,2,3))
test_x = matrix(c(.2,.2,1.,.0,.2,.2),nrow=3,ncol=2)
result = bart(train_x,train_y,test_x,verbose=FALSE,ntree=100)

Related

Pytorch Dataloader can't iterate image folders

I'm trying to load this data set https://github.com/jaddoescad/ants_and_bees
However there is a error when I try to iterate the data loader
training_dataset = datasets.ImageFolder('ants_and_bees/train', transform=transform_train)
validation_dataset = datasets.ImageFolder('ants_and_bees/val', transform=transform)
training_loader = torch.utils.data.DataLoader(training_dataset, batch_size=20, shuffle=True)
validation_loader = torch.utils.data.DataLoader(validation_dataset, batch_size = 20, shuffle=False)
def im_convert(tensor):
image = tensor.cpu().clone().detach().numpy()
image = image.transpose(1, 2, 0)
image = image * np.array((0.5, 0.5, 0.5)) + np.array((0.5, 0.5, 0.5))
image = image.clip(0, 1)
return image
classes = ('ant', 'bee')
dataiter = iter(training_loader)
images, labels = next(dataiter)
fig = plt.figure(figsize=(25, 4))
for idx in np.arange(20):
ax = fig.add_subplot(2, 10, idx+1, xticks=[], yticks=[])
plt.imshow(im_convert(images[idx]))
ax.set_title(classes[labels[idx].item()])
The error message doesn't help much, I read some similar problems here, but couldn't find a solution.
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-58-fb882084a0d1> in <module>
1 dataiter = iter(training_loader)
----> 2 images, labels = next(dataiter)
3 fig = plt.figure(figsize=(25, 4))
4
5 for idx in np.arange(20):
10 frames
/usr/local/lib/python3.8/dist-packages/PIL/TgaImagePlugin.py in _open(self)
64 flags = i8(s[17])
65
---> 66 self.size = i16(s[12:]), i16(s[14:])
67
68 # validate header fields
AttributeError: can't set attribute
The code is from this Pytorch tutorial https://github.com/rslim087a/PyTorch-for-Deep-Learning-and-Computer-Vision-Course-All-Codes-/blob/master/PyTorch%20for%20Deep%20Learning%20and%20Computer%20Vision%20Course%20(All%20Codes)/Transfer_Learning.ipynb
I'm running on Google Colab.
OBS: This seems to be a Colab problem or the python version there.
I was able to run locally with Python 3.9.13 environment.
Add this please to your code :
transform_train = transforms.Compose([
transforms.ToTensor()
])

Why i am getting "NotImplementedError()" when building a custom optimizer in Tensorflow

I am working on image classification and i am trying to implement a custom optimizer(based on a paper published on ELSEVIER) in Tensorflow,
I tried to modify the code as below: I have some other functions but it is all related with preprocessing and model architecture etc. My optimizer code follows;
import os
os.environ['TF_KERAS'] = '1'
from tensorflow import keras
import tensorflow as tf
from sklearn.model_selection import train_test_split
from sklearn.utils import shuffle
import cv2
import imutils
import matplotlib.pyplot as plt
from os import listdir
from sklearn.metrics import confusion_matrix,classification_report
import logging, warnings
import numpy as np
from tensorflow.python.training import optimizer
from tensorflow.python.ops import math_ops, state_ops, control_flow_ops, variable_scope
from tensorflow.python.framework import ops
class BPVAM(optimizer.Optimizer):
"""Back-propagation algorithm with variable adaptive momentum.
Variables are updated in two steps:
1) v(t + 1) = alpha * v(t)- lr * g(t)
2) w(t + 1) = w(t) + v(t + 1)
where
- v(t + 1): delta for update at step t + 1
- w(t + 1): weights at step t + 1 (after update)
- g(t): gradients at step t.
- lr: learning rate
- alpha: momentum parameter
In the algorithm alpha is not fixed. It is variable and it is parametrized by:
alpha(t) = lambda / (1 - beta ^ t)
"""
def __init__(
self,
lr: float = 0.001,
lam: float = 0.02,
beta: float = 0.998,
use_locking: bool = False,
name: str = 'BPVAM'
):
"""
Args:
lr: learning rate
lam: momentum parameter
beta: momentum parameter
use_locking:
name:
"""
super(BPVAM, self).__init__(use_locking, name)
self._lr = lr
self._lambda = lam
self._beta = beta
self._lr_tensor = None
self._lambda_tensor = None
self._beta_tensor = None
def _create_slots(self, var_list):
for v in var_list:
self._zeros_slot(v, 'v', self._name)
self._get_or_make_slot(v,
ops.convert_to_tensor(self._beta),
'beta',
self._name)
def _prepare(self):
self._lr_tensor = ops.convert_to_tensor(self._lr, name='lr')
self._lambda_tensor = ops.convert_to_tensor(self._lambda, name='lambda')
def _apply_dense(self, grad, var):
lr_t = math_ops.cast(self._lr_tensor, var.dtype.base_dtype)
lambda_t = math_ops.cast(self._lambda_tensor, var.dtype.base_dtype)
v = self.get_slot(var, 'v')
betas = self.get_slot(var, 'beta')
beta_t = state_ops.assign(betas, betas * betas)
alpha = lambda_t / (1 - beta_t)
v_t = state_ops.assign(v, alpha * v - lr_t * grad)
var_update = state_ops.assign_add(var, v_t, use_locking=self._use_locking)
return control_flow_ops.group(*[beta_t, v_t, var_update])
After i create my optimizer and run;
myopt = BPVAM()
model.compile(optimizer= myopt, loss='sparse_categorical_crossentropy', metrics=['accuracy'])
I got this error message;
Traceback (most recent call last):
File "/Users/classification.py", line 264, in <module>model.fit(x=X_train,y=y_train, batch_size=32, epochs=50, validation_data=(X_val, y_val))
File"/Users/ venv/lib/python3.7/sitepackages/tensorflow/python/keras/engine/training.py", line 780, in fit steps_name='steps_per_epoch')
File"/Users/venv/lib/python3.7/sitepackages/tensorflow/python/keras/engine/training_arrays.py", line 157, in model_iteration f = _make_execution_function(model, mode)
File"/Users/ venv/lib/python3.7/sitepackages/tensorflow/python/keras/engine/training_arrays.py", line 532, in _make_execution_function return model._make_execution_function(mode)
File"/Users/ venv/lib/python3.7/sitepackages/tensorflow/python/keras/engine/training.py", line 2276, in _make_execution_function self._make_train_function()
File"/Users/ venv/lib/python3.7/sitepackages/tensorflow/python/keras/engine/training.py", line 2219, in _make_train_function params=self._collected_trainable_weights, loss=self.total_loss)
File "/Users/ venv/lib/python3.7/site-packages/tensorflow/python/keras/optimizers.py", line 753, in get_updates grads, global_step=self.iterations)
File "/Users/ venv/lib/python3.7/site-packages/tensorflow/python/training/optimizer.py", line 614, in apply_gradients update_ops.append(processor.update_op(self, grad))
File "/Users/venv/lib/python3.7/site-packages/tensorflow/python/training/optimizer.py", line 171, in update_op update_op = optimizer._resource_apply_dense(g, self._v)
File "/Users/venv/lib/python3.7/site-packages/tensorflow/python/training/optimizer.py", line 954, in _resource_apply_dense
raise NotImplementedError()
NotImplementedError
I can not understand where the problem is. I am using Tensorflow 1.14.0 version and python 3.7. I created virtual environment and tried other tensorflow and python versions but it still does not work.
In order to use a class that inherits from tensorflow.python.training.optimizer.Optimizer you have to implement at least the following methods:
_apply_dense
_resource_apply_dense
_apply_sparse
Check out the source code of the Optimizer for more information.
Since you try to implement a custom momentum method you might want to subclass MomentumOptimizer directly.

Use TransformDataset without using AnalyzeAndTransformDataset

I am trying to use tensorflow transform and I would like to serialise a whole pipeline composed by different transformations. Let say I have a transformation that doesn't have to be fitted (as feature interaction between numeric columns). I want to use the TransformDataset function directly on the preprocessing function I have already defined. Anyway it seems this is not possible
If a run something like this
import pprint
import tempfile
import apache_beam as beam
import pandas as pd
import tensorflow as tf
import tensorflow_transform.beam as tft_beam
from tensorflow_transform.tf_metadata import dataset_metadata, schema_utils
NUMERIC_FEATURE_KEYS = ['a', 'b', 'c']
impute_dictionary = dict(b=1.0, c=0.0)
RAW_DATA_FEATURE_SPEC = dict([(name, tf.io.FixedLenFeature([], tf.float32)) for name in NUMERIC_FEATURE_KEYS])
RAW_DATA_METADATA = dataset_metadata.DatasetMetadata(schema_utils.schema_from_feature_spec(RAW_DATA_FEATURE_SPEC))
def interaction_fn(inputs):
outputs = inputs.copy()
new_numeric_feature_keys = []
for i in range(len(NUMERIC_FEATURE_KEYS)):
for j in range(i, len(NUMERIC_FEATURE_KEYS)):
if i == j:
outputs[f'{NUMERIC_FEATURE_KEYS[i]}_squared'] = outputs[NUMERIC_FEATURE_KEYS[i]] * outputs[NUMERIC_FEATURE_KEYS[i]]
new_numeric_feature_keys.append(f'{NUMERIC_FEATURE_KEYS[i]}_squared')
else:
outputs[f'{NUMERIC_FEATURE_KEYS[i]}_{NUMERIC_FEATURE_KEYS[j]}'] = outputs[NUMERIC_FEATURE_KEYS[i]] * outputs[ NUMERIC_FEATURE_KEYS[j]]
new_numeric_feature_keys.append(f'{NUMERIC_FEATURE_KEYS[i]}_{NUMERIC_FEATURE_KEYS[j]}')
NUMERIC_FEATURE_KEYS.extend(new_numeric_feature_keys)
return outputs
if __name__ == '__main__':
temp = tempfile.gettempdir()
data = pd.DataFrame(dict(
a=[1.0, 2.0, 3.0, 4.0, 5.0, 6.0],
b=[1.0, 1.0, 1.0, 2.0, 0.0, 1.0],
c=[0.9, 2.0, 1.0, 0.0, 0.0, 0.0]
))
data.to_parquet('data_no_nans.parquet')
x = {}
for col in data.columns:
x[col] = tf.constant(data[col], dtype=tf.float32, name=col)
with beam.Pipeline() as pipeline:
with tft_beam.Context(temp_dir=tempfile.mkdtemp()):
raw_data = pipeline | 'ReadTrainData' >> beam.io.ReadFromParquet('data_no_nans.parquet')
raw_dataset = (raw_data, RAW_DATA_METADATA)
transformed_data, _ = (raw_data, interaction_fn) | tft_beam.TransformDataset()
transformed_data | beam.Map(pprint.pprint)
I get the error
2020-02-11 15:49:37.025525: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-02-11 15:49:37.132944: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f87ddda6d30 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-02-11 15:49:37.132959: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
WARNING:tensorflow:Tensorflow version (2.1.0) found. Note that Tensorflow Transform support for TF 2.0 is currently in beta, and features such as tf.function may not work as intended.
WARNING:tensorflow:Tensorflow version (2.1.0) found. Note that Tensorflow Transform support for TF 2.0 is currently in beta, and features such as tf.function may not work as intended.
Traceback (most recent call last):
File "/Users/andrea.marchini/Hackathon/tfx_test/foo.py", line 56, in <module>
transformed_data, _ = (raw_data, interaction_fn) | tft_beam.TransformDataset()
File "/Users/andrea.marchini/.local/share/virtualenvs/tfx_test-jg7eSsGQ/lib/python3.7/site-packages/apache_beam/transforms/ptransform.py", line 482, in __ror__
pvalueish, pvalues = self._extract_input_pvalues(left)
File "/Users/andrea.marchini/.local/share/virtualenvs/tfx_test-jg7eSsGQ/lib/python3.7/site-packages/tensorflow_transform/beam/impl.py", line 908, in _extract_input_pvalues
dataset_and_transform_fn)
TypeError: cannot unpack non-iterable PCollection object
Is the TransformDatasetsupposed to be used only on the result of the AnalyzeAndTransformDataset one?
Maybe you could you try this:
transformed_data = (raw_dataset, interaction_fn) | tft_beam.TransformDataset()
I think it tried to unpack raw_data which does not contain the metadata. Moreover TransformDataset returns only on variable, not two.

The WebAgg backend requires Tornado

I have found this python code for bezier_curves in Github and the link of the code is shown below:
https://gist.github.com/Juanlu001/7284462
When I run the code I always get this error as shown below:
Traceback (most recent call last):
File "C:\Users\raineen\AppData\Local\Programs\Python\Python37\lib\site-packages\matplotlib\backends\backend_webagg.py", line 27, in <module>
import tornado
ModuleNotFoundError: No module named 'tornado'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\raineen\Desktop\Bezier.py", line 8, in <module>
import matplotlib.pyplot as plt
File "C:\Users\raineen\AppData\Local\Programs\Python\Python37\lib\site-packages\matplotlib\pyplot.py", line 2372, in <module>
switch_backend(rcParams["backend"])
File "C:\Users\raineen\AppData\Local\Programs\Python\Python37\lib\site-packages\matplotlib\pyplot.py", line 207, in switch_backend
backend_mod = importlib.import_module(backend_name)
File "C:\Users\raineen\AppData\Local\Programs\Python\Python37\lib\importlib\__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "C:\Users\raineen\AppData\Local\Programs\Python\Python37\lib\site-packages\matplotlib\backends\backend_webagg.py", line 29, in <module>
raise RuntimeError("The WebAgg backend requires Tornado.")
RuntimeError: The WebAgg backend requires Tornado.
I used to install this Tornado by the line below and still gives the same error:
python -m pip install tornado
So how could I fix the above error?
The code below:
import matplotlib
matplotlib.use('webagg')
import numpy as np
from scipy.special import binom
import matplotlib.pyplot as plt
from matplotlib.lines import Line2D
class BezierBuilder(object):
"""Bézier curve interactive builder.
"""
def __init__(self, control_polygon, ax_bernstein):
"""Constructor.
Receives the initial control polygon of the curve.
"""
self.control_polygon = control_polygon
self.xp = list(control_polygon.get_xdata())
self.yp = list(control_polygon.get_ydata())
self.canvas = control_polygon.figure.canvas
self.ax_main = control_polygon.axes
self.ax_bernstein = ax_bernstein
# Event handler for mouse clicking
self.cid = self.canvas.mpl_connect('button_press_event', self)
# Create Bézier curve
line_bezier = Line2D([], [],
c=control_polygon.get_markeredgecolor())
self.bezier_curve = self.ax_main.add_line(line_bezier)
def __call__(self, event):
# Ignore clicks outside axes
if event.inaxes != self.control_polygon.axes:
return
# Add point
self.xp.append(event.xdata)
self.yp.append(event.ydata)
self.control_polygon.set_data(self.xp, self.yp)
# Rebuild Bézier curve and update canvas
self.bezier_curve.set_data(*self._build_bezier())
self._update_bernstein()
self._update_bezier()
def _build_bezier(self):
x, y = Bezier(list(zip(self.xp, self.yp))).T
return x, y
def _update_bezier(self):
self.canvas.draw()
def _update_bernstein(self):
N = len(self.xp) - 1
t = np.linspace(0, 1, num=200)
ax = self.ax_bernstein
ax.clear()
for kk in range(N + 1):
ax.plot(t, Bernstein(N, kk)(t))
ax.set_title("Bernstein basis, N = {}".format(N))
ax.set_xlim(0, 1)
ax.set_ylim(0, 1)
def Bernstein(n, k):
"""Bernstein polynomial.
"""
coeff = binom(n, k)
def _bpoly(x):
return coeff * x ** k * (1 - x) ** (n - k)
return _bpoly
def Bezier(points, num=200):
"""Build Bézier curve from points.
"""
N = len(points)
t = np.linspace(0, 1, num=num)
curve = np.zeros((num, 2))
for ii in range(N):
curve += np.outer(Bernstein(N - 1, ii)(t), points[ii])
return curve
if __name__ == '__main__':
# Initial setup
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))
# Empty line
line = Line2D([], [], ls='--', c='#666666',
marker='x', mew=2, mec='#204a87')
ax1.add_line(line)
# Canvas limits
ax1.set_xlim(0, 1)
ax1.set_ylim(0, 1)
ax1.set_title("Bézier curve")
# Bernstein plot
ax2.set_title("Bernstein basis")
# Create BezierBuilder
bezier_builder = BezierBuilder(line, ax2)
plt.show()
You can change your backend to one that is supported in Windows without using a third-party library such as Tornado. Change matplotlib.use('webagg') to matplotlib.use('tkagg').
Looks like you are missing an import statement for tornado after installing it with pip.
But you may have another issue as well. According to the tornado documentation here, it requires the following versions of python:
Prerequisites: Tornado runs on Python 2.6, 2.7, 3.2, and 3.3
You are running python 3.7 so that could be the reason. Try running it in one of the versions above.

Modifying old GaussianProcessor example to run with GaussianProcessRegressor

I have an example from a data science book I am trying to run in a Jupyter notebook. The code sippet looks like this
from sklearn.gaussian_process import GaussianProcess
# define the model and draw some data
model = lambda x: x * np.sin(x)
xdata = np.array([1, 3, 5, 6, 8])
ydata = model(xdata)
# Compute the Gaussian process fit
gp = GaussianProcess(corr='cubic', theta0=1e-2, thetaL=1e-4, thetaU=1E-1,
random_start=100)
gp.fit(xdata[:, np.newaxis], ydata)
xfit = np.linspace(0, 10, 1000)
yfit, MSE = gp.predict(xfit[:, np.newaxis], eval_MSE=True)
dyfit = 2 * np.sqrt(MSE) # 2*sigma ~ 95% confidence region
since GaussianProcess has been deprecated and replaced with GaussianProcessRegressor. I tried to fix the code snippet to look like this
from sklearn.gaussian_process import GaussianProcessRegressor
# define the model and draw some data
model = lambda x: x * np.sin(x)
xdata = np.array([1, 3, 5, 6, 8])
ydata = model(xdata)
# Compute the Gaussian process fit
gp = GaussianProcessRegressor(random_state=100)
gp.fit(xdata[:, np.newaxis], ydata)
xfit = np.linspace(0, 10, 1000)
yfit, MSE = gp.predict(xfit[:, np.newaxis])
dyfit = 2 * np.sqrt(MSE) # 2*sigma ~ 95% confidence region
but I get a ValueError
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-47-c04ac57d1897> in <module>
11
12 xfit = np.linspace(0, 10, 1000)
---> 13 yfit, MSE = gp.predict(xfit[:, np.newaxis])
14 dyfit = 2 * np.sqrt(MSE) # 2*sigma ~ 95% confidence region
ValueError: too many values to unpack (expected 2)
bit unsure why the predict function complains here?
The error has the answer.
At yfit, MSE = gp.predict(xfit[:, np.newaxis]) you are trying to assign the result of predict to two variables while the predict only returns a single numpy.ndarray.
To solve this issue, run
yfit = gp.predict(xfit[:, np.newaxis])

Resources