I am trying to fit quantile regression with statsmodels. The same code that works in my laptop fails in the cloud and says it does not have fit method. But in the documentation, I see it has fit method. What is causing it? I am using it inside zeppelin notebook.
from statsmodels.regression.quantile_regression import QuantReg
from statsmodels.tools.tools import add_constant
X = temp[['X']]
y = temp['y']
X = add_constant(X)
mod = QuantReg(y, X)
res = mod.fit(q = 0.5)
This is the error message I am getting:
AttributeError: 'Interactive' object has no attribute 'fit'
It seems that the variable (mod) might have a namespace conflict internally within the statsmodel. Using a different name for mod variable to mod1 etc might help here.
Related
I am going through https://towardsdatascience.com/explain-your-model-with-the-shap-values-bc36aac4de3d trying to get the force_plot to print.
I'm running Python 3.8.5 on Ubuntu 20.04
I run this code:
shap.initjs()
# Write in a function
random_picks = np.arange(1,330,50) # Every 50 rows
S = X_test.iloc[random_picks]
def shap_plot(j):
explainerModel = shap.TreeExplainer(xg_clf)
shap_values_Model = explainerModel.shap_values(S)
p = shap.force_plot(explainerModel.expected_value, shap_values_Model[j], S.iloc[[j]])
return(p)
z = shap_plot(3)
and I get <shap.plots._force.AdditiveForceVisualizer object at 0x7f1568cac070>
to return.
I'm not a python expert, so I've tried looking at this data:
display(z)
whieh isn't defined.
and print(z) which just returns the name of the object, and doesn't help me to see what was plotted.
I've also tried using matplotlib which is already loaded,
def shap_plot(j):
explainerModel = shap.TreeExplainer(xg_clf)
shap_values_Model = explainerModel.shap_values(S)
p = shap.force_plot(explainerModel.expected_value, shap_values_Model[j], S.iloc[[j]])
plt.savefig('tmp.svg')
plt.close()
return(p)
shap_plot(3)
but this just gives an empty image.
If there is an error, I don't see it.
How can I get this shap.force_plot to show the image?
The solution is in the manual:
help(shap.force_plot)
which shows
matplotlib : bool
Whether to use the default Javascript output, or the (less developed) matplotlib output. Using matplotlib can be helpful in scenarios where rendering Javascript/HTML is inconvenient.
Indeed, running a notebook is very inconvenient for my purposes.
so in order to save an image:
def shap_plot(j):
explainerModel = shap.TreeExplainer(xg_clf)
shap_values_Model = explainerModel.shap_values(S)
p = shap.force_plot(explainerModel.expected_value, shap_values_Model[j], S.iloc[[j]], matplotlib = True, show = False)
plt.savefig('tmp.svg')
plt.close()
return(p)
I defined a subclass of Atom in rdkit.Chem. I also defined an instance attribute in it but I could not get that instance from RWMol object in rdkit.
Below there is a sample code for my problem:
from rdkit import Chem
class MyAtom(Chem.Atom):
def __init__(self, symbol, **kwargs):
super().__init__(symbol, **kwargs)
self.my_attribute = 0
def get_my_attribute(self):
return self.my_attribute
if __name__ == '__main__':
rw_mol = Chem.RWMol()
# I created MyAtom class object then added to RWMol. But I couldn't get it again.
my_atom = MyAtom('C')
my_atom.my_attribute = 3
rw_mol.AddAtom(my_atom)
atom_in_mol = rw_mol.GetAtoms()[0]
# I can access my_atom new defined attributes.
print(my_atom.get_my_attribute())
# below two line gives error: AttributeError: 'Atom' object has no attribute 'get_my_attribute'
print(atom_in_mol.get_my_attribute())
print(atom_in_mol.my_attribute)
# type(atom1): <class '__main__.MyAtom'>
# type(atom_in_mol): <class 'rdkit.Chem.rdchem.Atom'>
# Why below atom types are different? Thanks to polymorphism, that two object types must be same.
Normally this code must run but it gives error due to last line because atom_in_mol object type is Chem.Atom. But should it be MyAtom? I also cannot access my_attribute directly.
rdkit Python library is a wrapper of C++. So is the problem this? Cannot I use inheritance for this library?
Note: I researched rdkit documentation and there is a SetProp method for saving values in atoms. It uses dictionary to save values. It runs fine but it is too slow for my project. I want to use instance attributes to save my extra values. Is there any solution for that inheritance problem, or faster different solution?
Python RDKit library is a C++ wrapper, so sometimes it does not follows the conventional Python object handling.
To go deeper, you will have to dig through the source code:
rw_mol.AddAtom(my_atom)
Above will execute AddAtom method in rdkit/Code/GraphMol/Wrap/Mol.cpp, which, in turn, calls addAtom method in rdkit/Code/GraphMol/RWMol.h, which then calls addAtom method in rdkit/Code/GraphMol/ROMol.cpp with default argument of updateLabel = true and takeOwnership = false.
The takeOwnership = false condition makes the argument atom to be duplicated,
// rdkit/Code/GraphMol/ROMol.cpp
if (!takeOwnership)
atom_p = atom_pin->copy();
else
atom_p = atom_pin;
Finally, if you look into what copy method do in rdkit/Code/GraphMol/Atom.cpp
Atom *Atom::copy() const {
auto *res = new Atom(*this);
return res;
}
So, it reinstantiate Atom class and returns it.
When trying to run a ScriptRunConfig, using :
src = ScriptRunConfig(source_directory=project_folder,
script='train.py',
arguments=['--input-data-dir', ds.as_mount(),
'--reg', '0.99'],
run_config=run_config)
run = experiment.submit(config=src)
It doesn't work and breaks with this when I submit the job :
... lots of things... and then
TypeError: Object of type 'DataReference' is not JSON serializable
However if I run it with the Estimator, it works. One of the differences is the fact that with a ScriptRunConfig we're using a list for parameters and the other is a dictionary.
Thanks for any pointers!
Being able to use DataReference in ScriptRunConfig is a bit more involved than doing just ds.as_mount(). You will need to convert it into a string in arguments and then update the RunConfiguration's data_references section with the DataReferenceConfiguration created from ds. Please see here for an example notebook on how to do that.
If you are just reading from the input location and not doing any writes to it, please check out Dataset. It allows you to do exactly what you are doing without doing anything extra. Here is an example notebook that shows this in action.
Below is a short version of the notebook
from azureml.core import Dataset
# more imports and code
ds = Datastore(workspace, 'mydatastore')
dataset = Dataset.File.from_files(path=(ds, 'path/to/input-data/within-datastore'))
src = ScriptRunConfig(source_directory=project_folder,
script='train.py',
arguments=['--input-data-dir', dataset.as_named_input('input').as_mount(),
'--reg', '0.99'],
run_config=run_config)
run = experiment.submit(config=src)
you can see this link how-to-migrate-from-estimators-to-scriptrunconfig in official documents.
The core code of using DataReference in ScriptRunConfig is
# if you want to pass a DataReference object, such as the below:
datastore = ws.get_default_datastore()
data_ref = datastore.path('./foo').as_mount()
src = ScriptRunConfig(source_directory='.',
script='train.py',
arguments=['--data-folder', str(data_ref)], # cast the DataReference object to str
compute_target=compute_target,
environment=pytorch_env)
src.run_config.data_references = {data_ref.data_reference_name: data_ref.to_config()} # set a dict of the DataReference(s) you want to the `data_references` attribute of the ScriptRunConfig's underlying RunConfiguration object.
I am running a function inside another function in a loop and having a problem when the first module (Mod1) uses values defined in the second module (Mod2) to generate the result for the equation called Overall.
The error returned is thus: TypeError: 'numpy.float64' object cannot be interpreted as an integer
The code is as follows
def Mod1(A,
B,
C):
As = pd.read_csv('AProps.csv',index_col='AName')
As = As.dropna(0,how='all',thresh=None,subset=None,inplace=False)
AvailAs = As.index.tolist()
Bs = pd.read_csv('BProps.csv',index_col='BName')
Bs = Matrices.dropna(0,how='all',thresh=None, subset=None,inplace=False)
AvailBs = Bs.index.tolist()
Prop1_A = As['Prop1'][A]
Prop2_A = As['Prop2'][A]
Prop3_A = As['Prop3'][A]
Prop4_A = Prop1_A/(2*(1+Prop3_A))
Prop1_B = Bs['Prop1'][B]
Prop3_B = Bs['Prop3'][B]
Prop4 = Prop1_B/(2*(1+Prop3_B)
Overall = Prop1_A*C+Prop1_B*(1-C)
Return Overall
def Mod2(NumItems):
A_List = []
B_List = []
C_List = []
for i in range(NumItems):
A_List.append(input('Choose the A_Input for item {0}: '.format(i+1)))
B_List.append(input('Choose the B_input for item {0}: '.format(i+1)))
C_List.append(input('Choose the C_input for item {0}: '.format(i+1)))
for i in range (NumLams):
Func1(A_List[i],B_List[i],C_List[i])
Just to be clear, the intention of the code is that Mod2 creates lists of inputs which are then used within Mod1. In Mod1 inputs A and B are used to pull outputs from csv files which are read into As and Bs. Input C is used in functions within Mod1, ie Overall.
When I run Mod1 manually, there are no issues at all. But when run in conjunction with Mod2 as outlined above...
The code in Mod1 runs through using the values Prop1_A and Prop1_B pulled from As and Bs without issue, but when it comes to run the function for Overall the aforementioned error is returned within Python.
I am certain that the issue is with the way the value for C is interpreted, ie as a float64, but don’t understand why python might be expecting to see an integer there.
I am using Spyder 3.2.6 as installed with the Anaconda package.
Any help given is gratefully received.
I solved the problem by explicitly defining all the selections from the csv, Prop_1 etc, in Mod1 as floats. A simple answer really.
I am trying to obtain which features in my dataset affects the principal components, and trying to observe how my data fitted in my Kernel PCA algorithm.
I tried to use X_transformed_fit_ attribute which exists in documentary but I got this error: AttributeError: 'KernelPCA' object has no attribute 'X_transformed_fit_'
My code for KPCA is below:
from sklearn.decomposition import KernelPCA
kpca = KernelPCA(n_components = 2, kernel = 'cosine', fit_inverse_transform = False)
X = kpca.fit_transform(X)
kpca.X_transformed_fit_
If it is not the way I can obtain how to interpret the composition of my KPCA, then how am I going to understand that these principal components are constructed?
The reason why I am investigating is that I will continue to this process with clustering algorithm implementation (K-means, agglomerative HC), and I want to understand the characters of my distinct clusters that will be derived from the algorithms at the end (by understanding the structure of the principal components).
The attribute X_transformed_fit_ is only available when you set the parameter fit_inverse_transform to True.
Try:
kpca = KernelPCA(n_components = 2, kernel = 'cosine', fit_inverse_transform = True)
X = kpca.fit_transform(X)
kpca.X_transformed_fit_