Tensorflow causes errors in scikit-learn - python-3.x

When I import scikit-learn before importing tensorflow I don't have any issues. Running this block of code produces an output of 1.7766212763101197e-12.
import numpy as np
np.random.seed(123)
import numpy.random as rand
from sklearn.decomposition import PCA
import tensorflow as tf
X = rand.randn(100,15)
X = X - X.mean(axis=0)
mod = PCA()
w = mod.fit_transform(X)
h = mod.components_
print(np.sum(np.abs(X-np.dot(w,h))))
However, if I import tensorflow before importing scikit-learn my code no longer functions. When I run this code-block
import tensorflow as tf
import numpy as np
np.random.seed(123)
import numpy.random as rand
from sklearn.decomposition import PCA
X = rand.randn(100,15)
X = X - X.mean(axis=0)
mod = PCA()
w = mod.fit_transform(X)
h = mod.components_
print(np.sum(np.abs(X-np.dot(w,h))))
I get an output of 130091393261440.25.
Why is that? My versions for the packages are:
numpy - 1.13.1
sklearn - 0.19.0
tensorflow - 1.3.0

Import order should not affect output, as python modules are self-contained, except in the case of dependencies.
I was unable to reproduce your error, and get an output of 1.7951539777252834e-12 for both code blocks.
This is an interesting problem and I am curious to see if others can provide a better response for why you are seeing this issue.

Note: the present answer is an answer to the title for the ones looking for using TensorFlow within Scikit-Learn, and does not just regards some import errors as you've had.
You can use TensorFlow within Scikit-Learn pipelines using Neuraxle.
Neuraxle is an extension of Scikit-Learn to make it more compatible with all deep learning libraries.
Problem: You can’t Parallelize nor Save Pipelines Using Steps that Can’t be Serialized “as-is” by Joblib (e.g.: a TensorFlow step)
Whereas a step is a transformer or estimator in a scikit-learn Pipeline.
This problem will only surface past some point of using Scikit-Learn. This is the point of no-return: you’ve coded your entire production pipeline, but once you trained it and selected the best model, you realize that what you’ve just coded can’t be serialized.
This means once trained, your pipeline can’t be saved to disks because one of its steps imports things from a weird python library coded in another language and/or uses GPU resources. Your code smells weird and you start panicking over what was a full year of research development.
Solution with Code Examples:
Here is a full project example from A to Z where TensorFlow is used with Neuraxle as if it was used with Scikit-Learn.
Here is another practical example where TensorFlow is used within a scikit-learn-like pipeline
The trick is performed by using Neuraxle-TensorFlow.
This is to make use of Neuraxle's savers.
Read also: https://stackoverflow.com/a/60557192/2476920

Related

mxnet import nd or np to use arrays

I'm starting to study mxnet and gluon, but I'm getting somewhat confused about the usage of np/nd arrays.
As suggested on the gluon website, I installed mxnet and gluon by running:
pip install --upgrade mxnet gluoncv
which installs mxnet version 1.5.1.post0. In this case, to use arrays I need:
from mxnet import ndarray as nd
On the other hand, I found a deep learning book based on mxnet, where they have you install a later mxnet version by:
pip install mxnet==1.6.0b20190915
and in this case, you can either import ndarray or directly np, so:
from mxnet import ndarray as nd
from mxnet import np
both work (while, with mxnet 1.5.1, from mxnet import np fails).
Why is it possible to import np in the new version, if we had nd already? Are there any differences between arrays created from nd or np?
It seems that I can use mxnet features (such as attach_grad()) in both cases...for example, the following works:
from mxnet import np
array = np.array([1,2,3)
array.attach_grad()
Thanks!
Here is the RFC explaining the motivation of introducing mx.np module. To give a couple of highlighted differences between mx.np and mx.nd modules:
mx.np module adopts the official NumPy operator API, which has evolved for almost twenty years, to provide an easy transitioning experience for users coming from the NumPy world to deep learning, while the development of mx.nd module did not follow a well-established spec to define operator signatures, which sometimes caused confusions, e.g., dim vs axis.
Operators registered in mx.np have highly compatible behaviors with the official NumPy operators, while mx.nd operators do not share such aspects. For example, zero-dim/scalar, zero-size, boolean tensors, and boolean indexing, are supported in mx.np, but not in mx.nd.
You can consider mx.np as an enhanced version of mx.nd in terms of usability, functionality, and performance. mx.nd will be gradually deprecated in the future releases.

Python 3.x - AttributeError: 'function' object has no attribute 'Kfold'

I'm following along an older tutorial with an SVM optimized with a genetic algorithm. I originally thought the issue was just with versions of python and/or scikit, but I'm now unsure of what the issue could be as it continues displaying the same error. I'm currently using python-scikit-learn 0.20.3-1 on Antergos and found a link here that unfortunately didn't seem to help.
So far I've found a few links and examples that have had me alter different aspects of the code, which overall just jumbled everything up. This GitHub page was useful in at least understanding the version difference, as was the first link. This blog post was also neat, but again didn't really help me narrow down the exact issue as to why it's reading out the error. I even tried looking at the sklearn documentation but I still couldn't get it.
These are what I've imported:
import numpy as np
import pandas as pd
import random as rd
from sklearn.model_selection import cross_validate
from sklearn import preprocessing
from sklearn import svm
I had "kfold" defined earlier in the program as such:
kfold = 3
As well, this is the exact line it seem to be having issues with:
kf = cross_validate.KFold(Cnt1,n_splits=kfold)
What it should be doing is simply applying cross validation. However, the error reads:
AttributeError: 'function' object has no attribue 'KFold'
I can't tell if the issue is that I'm not understanding what I should be altering via the links I've given, or if it's a different error born of ignorance. Is there something I'm missing in order to get this to work?
The KFold function is in the sklearn.model_selection module not in sklearn.model_selection.cross_validate
So you sould import
from sklearn import model_selection
and then using like
model_selection.KFold(...)
or you can import the function
from sklearn.model_selection import KFold
just like in the KFold Doc example.

Alternative to partial dependence plot?

as far as I can see, sklearn has deprecated the partial dependence functionality. I tried to run a simple example:
from sklearn.datasets import make_friedman1
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.inspection import partial_dependence
from sklearn.inspection import plot_partial_dependence
X, y = make_friedman1()
clf = GradientBoostingRegressor(n_estimators=10).fit(X, y)
plot_partial_dependence(clf, X, [0, (0, 1)])
But I am returned the following error message: ImportError: No module named 'sklearn.inspection'
To me, partial dependence (and marginal effects) plots a very important (in combination with relative importances) to better understand machine learning results and predictions.
Is there an alternative available? Respectively, how can I plot the partial dependence?
I think there might be a confusion with versions of sklearn. Just as a suggestion -- I would check yours (e.g., import sklearn; sklearn.__version__). For example, if it is v.0.20.3, by chance -- aren't you looking for partial_dependence and plot_partial_dependence from sklearn.ensemble.partial_dependence instead of sklearn.inspection?
I had the same issue, I resolved it by simply updating sklearn, which now contains sklearn.inspection. I'm using Anaconda, if you're also using Anaconda, simply type this in the Anaconda Propmt:
conda update --all
to update all packages. Restart your jupyter notebook and now it should work.

What are common sources of randomness in Machine Learning projects with Keras?

Reproducibility is important. In a closed-source machine learning project I'm currently working on it is hard to achieve it. What are the parts to look at?
Setting seeds
Computers have pseudo-random number generators which are initialized with a value called the seed. For machine learning, you might need to do the following:
# I've heard the order here is important
import random
random.seed(0)
import numpy as np
np.random.seed(0)
import tensorflow as tf
tf.set_random_seed(0)
session_conf = tf.ConfigProto(intra_op_parallelism_threads=1,
inter_op_parallelism_threads=1)
sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)
from keras import backend as K
K.set_session(sess) # tell keras about the seeded session
# now import keras stuff
See also: Keras FAQ: How can I obtain reproducible results using Keras during development?
sklearn
sklearn.model_selection.train_test_split has a random_state parameter.
What to check
Am I loading the data in the same order every time?
Do I initialize the model the same way?
Do you use external data that might change?
Do you use external state that might change (e.g. datetime.now)?

name 'classification_model' is not defined

Im trying to model in Python 3.5 and am following an example that can be found at here.
I have imported all the required libraries from sklearn.
However I'm getting the following error.
Code:
from sklearn.linear_model import LogisticRegression
from sklearn.cross_validation import KFold #For K-fold cross validation
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier, export_graphviz
from sklearn import metrics
outcome_var = 'Loan_Status'
model = LogisticRegression()
predictor_var = ['Credit_History']
classification_model(model, loan,predictor_var,outcome_var)
When I run the above code I get the following error:
NameError: name 'classification_model' is not defined
I'm not sure how to resolve this as I tried importing sklearn and all the sub libraries.
P.S. I'm new to Python, hence I'm trying to figure out basic steps
Depending on the exact details this may not be what you want but I have never had a problem with
import sklearn.linear_model as sk
logreg = sk.LogisticRegressionCV()
logreg.fit(predictor_var,outcome_var)
This means you have to explicitly separate your training and test set, but having fit to a training set (the process in the final line of my code), you can then use the methods detailed in the documentation [1].
For example figuring out what scores (how many did I get correct) you get on unseen data with the .score method
[1] http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegressionCV.html
It appears this code came from this tutorial.
The issue is exactly as the error describes. classification_model is currently undefined. You need to create this function yourself before you can call it. Check out this part of that tutorial so you can see how it's defined. Good luck!
from sklearn.metrics import classification_report

Resources