Protocol problem with PyMc3 on jupyter notebook - python-3.x

I am working with the following code, but I get an error
import pymc3 as pm
import theano.tensor as tt
with pm.Model() as model:
alpha = 1.0/count_data.mean() # Recall count_data is the
# variable that holds our txt counts
lambda_1 = pm.Exponential("lambda_1", alpha)
lambda_2 = pm.Exponential("lambda_2", alpha)
tau = pm.DiscreteUniform("tau", lower=0, upper=n_count_data - 1)
with model:
idx = np.arange(n_count_data) # Index
lambda_ = pm.math.switch(tau > idx, lambda_1, lambda_2)
with model:
observation = pm.Poisson("obs", lambda_, observed=count_data)
with model:
step = pm.Metropolis()
trace = pm.sample(10000, tune=5000,step=step)
But I get the error
ValueError: must use protocol 4 or greater to copy this object; since getnewargs_ex returned keyword arguments.
I have windows-10, python-3.5.6,
pymc3- 3.5, ipython-6.5.0. Any help is deeply appreciated. Thanks in advance.

It sounds like this exception is being thrown by the joblib library, which uses pickle to send the model to different processes. The easiest fix is to use only a single core, by changing the last line to
trace = pm.sample(10000, tune=5000, step=step, cores=1, chains=4)
It will be hard to diagnose the problem with joblib without more details. Creating a fresh conda environment might help.

The workaround suggested by colcarroll did not work for me. The behavior you are seeing is related to PR#3140 of PyMC3, which you may want to track there. The solution and/or workaround may depend on how you are running theano (with or without GPU support).

Related

Pycaret - 'Make_Time_Features' object has no attribute 'list_of_features'

I am trying to create a model using pycaret just as:
from pycaret.classification import *
clf1 = setup(data = dt, target = 'group')
lr = create_model('lr')
Then I get:
AttributeError: 'Simple_Imputer' object has no attribute 'fill_value_categorical'
So, following here, I added:
clf1 = setup(data = dt, target = 'group', imputation_type='iterative' )
lr = create_model('lr')
Then I get:
AttributeError: 'Make_Time_Features' object has no attribute 'list_of_features'
My version of sklearn is 0.23.2 and pycaret is 2.3.2
You mentioned my previous question here.
I just faced the same issue as you on Colab. It is 100% issue with libraries.
Initially, I got the error for SMOTE:
`AttributeError: 'SMOTE' object has no attribute '_validate_data'
After installing/reinstalling libraries I got exactly your error.
How did I resolve it?
Started to run Colab and imported all common libraries (pd, np, scikit, etc).
Installed PyCaret via pip install. Then import pycaret and from pycaret.classification import *
Colab reacted: you have issues with scipy, sklearn, lightgbm, please restart your runtime.
Restarted my runtime on Colab
Imported all libraries again as I did in step 1
Ran import pycaret and from pycaret.classification import * only
My final code:
# Initialize the setup with SMOTE
clf_smote = setup(
data,
session_id = 123,
target = 'Target',
remove_multicollinearity = True,
multicollinearity_threshold = 0.95,
fix_imbalance = True
)
I did not use imputation_type='iterative' as in my question above.
Proof of running:
It worked, but it was my solution. Would be great to have a more detailed guide on how to deal with such issues, using this amazing library.
Interestingly for me pip install scikit-learn==0.23.2 did the trick. It was the version.

Using tfp.mcmc.MetropolisHastings for physical model

I am new to Tensorflow and would like to use the Tensorflow Probability Library to model a physical problem. Tensorflow comes with the tfp.mcmc.MetropolisHastings function which is the algorithm I want to use.
I provided my initial distribution. In my case this is a 2d grid and on each grid point sits a 'spin' (the physics don't really matter right know) that can be either +1 or -1.
The proposal of a new state x' should be the old grid with one of these spins flipped, so on one point +1 becomes -1 or vice versa. I can pass the step size argument, but my x is not a scalar I can simply increase. How do I model this? Is there a way I can pass an update rule that is not just increasing a value by a certain step size?
I just answered a similar question Perform a RandomWalk step with Tensorflow Probability's RandomWalkMetropolis function
RandomWalkMetropolis accepts a constructor argument new_state_fn, which is a custom proposal function that consumes the previous state and returns a proposal.
# TF/TFP Imports
!pip install --quiet tfp-nightly tf-nightly
import tensorflow.compat.v2 as tf
tf.enable_v2_behavior()
import tensorflow_probability as tfp
tfd = tfp.distributions
tfb = tfp.bijectors
import matplotlib.pyplot as plt
def log_prob(x):
return tfd.Normal(0, 1).log_prob(x)
def custom_proposal(state, extra):
return state + tfd.Uniform(-.5, .75).sample()
kernel = tfp.mcmc.RandomWalkMetropolis(log_prob, new_state_fn=custom_proposal)
state = tfd.Normal(0, 1).sample()
extra = kernel.bootstrap_results(state)
samples = []
for _ in range(1000):
state, extra = kernel.one_step(state, extra)
samples.append(state)
plt.hist(samples, bins=20)

Numba #jit fails to speed up the performance of this function. Anyway to fix that?

I am quite new to the numba package in python. I am not sure if I am using the numba.jit correctly, but the code just runs too slow with 23.7s per loops over the line: Z1 = mmd(X,Y,20)
What is the correct way to optimize the code? I need your help guys. Thank you.
Here is my code:
import pandas as pd
import numba as nb
import numpy as np
#nb.jit
def mmd(array1, array2, n):
n1 = array1.shape[0]
MMD = np.empty(n1, dtype = 'float64')
for i in range(n-1,n1):
MMD[i] = np.average(abs(array1[i+1-n:i+1] - array2[i]))
return MMD
X = np.array([i**2 for i in range(1000000)])
Y = np.array([i for i in range(1000000)])
Z1 = mmd(X,Y,20)
EDIT: simplified the code even further
EDIT2: tried #nb.jit(nopython = True), then there is an error message:
KeyError: "<class 'numba.targets.cpu.CPUTargetOptions'> does not support option: 'nonpython'"
also tried:
#nb.jit(nb.float32[:](nb.float32[:],nb.float32[:],nb.int8))
To make Numba work well you need to use "nopython" mode, as you mentioned. To enable this, simply run the program with jit replaced by njit (or equivalently, jit(nopython=True), and fix the errors one by one:
np.empty() doesn't support the dtype='float64' argument in Numba. That's OK though, because float64 is the default. Just remove it.
np.average() is not supported in Numba. That's OK, since we are not passing any weights anyway, it's the same as np.mean(). Replace it.
The built-in abs() is not supported in Numba. Use np.abs() instead.
We end up with this:
#nb.njit
def mmd(array1, array2, n):
n1 = array1.shape[0]
MMD = np.empty(n1)
for i in range(n-1,n1):
MMD[i] = np.mean(np.abs(array1[i+1-n:i+1] - array2[i]))
return MMD
And it is 100x faster.
Bonus tips:
You can initialize your sample data more concisely and faster like this:
Y = np.arange(1000000)
X = Y * Y
The first n values in the result are uninitialized garbage. You might want to clean that up somehow.

Scikit-Learn: set_param() for custom estimator sets nested parameter before component

I implemented several custom estimators, following the developer guide, so that all of them are inheriting from BaseEstimator. Some of these use other scikit-learn estimators or transformers as attributes (say for example, to build an ensemble). Inheriting from BaseEstimator should give me the convenience of accessing the parameters through get_params() and setting them through set_params() as described here, in the form component__parameter, for example for use in grid search. Find below a minimal example.
from sklearn.base import BaseEstimator
from sklearn.linear_model import LinearRegression
class MyForecaster(BaseEstimator):
def __init__(self, base_estimator=LinearRegression()):
self.base_estimator = base_estimator
def fit(self, X, y):
pass
def predict(self, X, y):
pass
# instantiate forecaster and set parameters
mf = MyForecaster()
mf.set_params(**{"base_estimator" : "ElasticNet", "base_estimator__alpha": 0.05})
This fails with:
ValueError: Invalid parameter alpha for estimator LinearRegression. Check the list of available parameters with `estimator.get_params().keys()`.
This indicates it tries to set the params with first for the nested attribute, instead of checking first if I want to overwrite the "higher level" attribute (ElasticNet has the attribute alpha, LinearRegression not).
One way to handle this would be to overwrite set_params() for each estimator, to make sure that it is handled correctly.
Is there any "built in" way to achieve this, which I simply overlooked another solution? Is this really intended behavior by scikit-learn?
Edit:
So indeed due to some very big coincidence a very similar issue seems to have been fixed with version 0.19.1. However, my particular case still fails, only the case with Pipelines is fixed!
To make it reproduciable I copied the current code of set_params() into my minimal example (only added comment in line 20)
1 def set_params(self, **params):
2 if not params:
3 # Simple optimization to gain speed (inspect is slow)
4 return self
5 valid_params = self.get_params(deep=True)
6
7 nested_params = defaultdict(dict) # grouped by prefix
8 for key, value in params.items():
9 key, delim, sub_key = key.partition('__')
10 if key not in valid_params:
11 raise ValueError('Invalid parameter %s for estimator %s. '
12 'Check the list of available parameters '
13 'with `estimator.get_params().keys()`.' %
14 (key, self))
15
16 if delim:
17 nested_params[key][sub_key] = value
18 else:
19 setattr(self, key, value)
20 #valid_params[key] = value
21
22 for key, sub_params in nested_params.items():
23 valid_params[key].set_params(**sub_params)
24
25 return self
It fails, because it will set the attribute in line 19, but as it not updates valid_params, it will still fail in the next iteration, when the attribute is tried to be set. So I added line 20 which would fix this.
It does work as tested in the current fix of 0.19.1, as it was only tested for Pipelines. Here, set_param() is overwritten to first call _set_param() of _BaseComposition, where appereantly this is handled.
Should I raise this in the scikit-learn github or reopen the other issue?
This is a bug. It has been reported a week ago, and has already been fixed and backported in v0.19.1, which has been released yesterday.
The easiest fix is to update scikit-learn to v0.19.1 (or to master dev branch).
So the fix mentioned in #TomDLT's answer fixed a very similar issue and led to the fix above making it most likely into a future version of sklearn (9999).
So for here: if you come across the problem in the meantime, either use the code above to overwrite set_params() or wait for the fix.

tf.global_variables_initializer() does not work

Hello Tensorflow users/developers,
Even though I call initializer function, reporter tells me that none of my variable is initialized. I created them using tf.get_variable(). Here is where my session and graph objects are created:
with tf.Graph().as_default():
# Store all scores (each score is a loss-per-episode)
init = tf.global_variables_initializer()
all_scores, scores = [], []
# Build common tensors used throughout entire session
nn.build(seq_len)
# Generate inference and loss models
[loss, train_op] = nn.generate_models()
with tf.Session() as sess:
try:
st = time.time()
# Initialize all variables (Note that! not operation tensors; but variable tensors)
print('Initializing variables...')
sess.run(init)
print('Training starts...')
for e, (input_, target) in sample_generator:
feed_dict = nn.prepare_dict(input_, target)
# Run one step of the model. The return values are the activations
# from the `train_op` (which is discarded) and the `loss` Op.
x = sess.run(tf.report_uninitialized_variables(tf.global_variables()))
print(x)
_, score = sess.run([train_op, loss],
feed_dict=feed_dict)
all_scores.append(score)
scores.append(score)
# Asses your predictions against target
if e > 0 and not (e%100):
print('Episode %05d: %.6f' % (e, np.mean(scores).tolist()[0]))
scores.clear()
except KeyboardInterrupt:
print('Elapsed time: %ld' % (time.time()-st))
pass
I've called this method for millions of times before, and it had worked perfectly; but right now it is leaving me in the lurch. What do you think the cause might be? Any suggestion would really be appreciated.
P.S. I tried calling tf.local_variables_initializer() too; though reporter told me that you don't have any local at all.
Thanks in advance.
Thanks for the reply.
Well I've figured it out. I shouldn't have executed the following assignment instruction before I build my model:
init = tf.global_variables_initializer()
For anyone's information: You may think that "I'll execute and get the result of this operation called 'init' when I do so in a Session. So it doesn't matter where I do the assignment specified above".
No! It is not true. Tensorflow decides on which variables to be initialized right after this assignment instruction is executed. Thus, call it after you build your entire model.
If it does not exist I suspect you accidentally downgraded you Tensorflow version.
Can you try tf.initialize_all_variables ?
If this does not work, can you post what version you are using?
I got the same error. However this is my solution: just skip the init = tf.global_variables_initializer()
and just use :
sess = tf.Session
sess.run(init = tf.global_variables_initializer())

Resources