Plotting decision tree, graphvizm pydotplus - scikit-learn

I'm following the tutorial for decision tree on scikit documentation.
I have pydotplus 2.0.2 but it is telling me that it does not have write method - error below. I've been struggling for a while with it now, any ideas, please? Many thanks!
from sklearn import tree
from sklearn.datasets import load_iris
iris = load_iris()
clf = tree.DecisionTreeClassifier()
clf = clf.fit(iris.data, iris.target)
from IPython.display import Image
dot_data = tree.export_graphviz(clf, out_file=None)
import pydotplus
graph = pydotplus.graphviz.graph_from_dot_data(dot_data)
Image(graph.create_png())
and my error is
/Users/air/anaconda/bin/python /Users/air/PycharmProjects/kiwi/hemr.py
Traceback (most recent call last):
File "/Users/air/PycharmProjects/kiwi/hemr.py", line 10, in <module>
dot_data = tree.export_graphviz(clf, out_file=None)
File "/Users/air/anaconda/lib/python2.7/site-packages/sklearn/tree/export.py", line 375, in export_graphviz
out_file.write('digraph Tree {\n')
AttributeError: 'NoneType' object has no attribute 'write'
Process finished with exit code 1
----- UPDATE -----
Using the fix with out_file, it throws another error:
Traceback (most recent call last):
File "/Users/air/PycharmProjects/kiwi/hemr.py", line 13, in <module>
graph = pydotplus.graphviz.graph_from_dot_data(dot_data)
File "/Users/air/anaconda/lib/python2.7/site-packages/pydotplus/graphviz.py", line 302, in graph_from_dot_data
return parser.parse_dot_data(data)
File "/Users/air/anaconda/lib/python2.7/site-packages/pydotplus/parser.py", line 548, in parse_dot_data
if data.startswith(codecs.BOM_UTF8):
AttributeError: 'NoneType' object has no attribute 'startswith'
---- UPDATE 2 -----
Also, se my own answer below which solves another problem

The problem is that you are setting the parameter out_file to None.
If you look at the documentation, if you set it at None it returns the string file directly and does not create a file. And of course a string does not have a write method.
Therefore, do as follows :
dot_data = tree.export_graphviz(clf)
graph = pydotplus.graphviz.graph_from_dot_data(dot_data)

Method graph_from_dot_data() didn't work for me even after specifying proper path for out_file.
Instead try using graph_from_dot_file method:
graph = pydotplus.graphviz.graph_from_dot_file("iris.dot")

I met the same error this morning. I use python 3.x and here is how I solve the problem.
from sklearn import tree
from sklearn.datasets import load_iris
from IPython.display import Image
import io
iris = load_iris()
clf = tree.DecisionTreeClassifier()
clf = clf.fit(iris.data, iris.target)
# Let's give dot_data some space so it will not feel nervous any more
dot_data = io.StringIO()
tree.export_graphviz(clf, out_file=dot_data)
import pydotplus
graph = pydotplus.graphviz.graph_from_dot_data(dot_data.getvalue())
# make sure you have graphviz installed and set in path
Image(graph.create_png())
if you use python 2.x, I believe you need to change "import io" as:
import StringIO
and,
dot_data = StringIO.StringIO()
Hope it helps.

Also another problem was the backend settings to my Graphviz!! It is solved nicely here. you just need to lookup that settings file and change backend, or in the code mpl.use("TkAgg") as suggested there in the comments. After I only got error that pydotplot couldn't find my Graphviz executable, hence I reinstalled Graphviz via homebrew: brew install graphviz which solved the issue and I can make plots now!!

What really helped me solve the problem was:-
I executed the code from the same user through which graphviz was installed. So executing from any other user would give your error

i would suggest avoid graphviz & use the following alternate approach
from sklearn.tree import plot_tree
plt.figure(figsize=(60,30))
plot_tree(clf, filled=True);

Related

Programming a simple Stock prediction service with Alpha Vantage in Python. I get this error

This is the program for the stock prediction to be simply printed...
from alpha_vantage.timeseries import TimeSeries
# Your key here
key = 'yourkeyhere'
ts = TimeSeries(key)
aapl, meta = ts.get_daily(symbol='AAPL')
print(aapl['2020-22-5'])
I get this error...
Traceback (most recent call last):
File "C:/Users/PycharmProjects/AlphaVantageTest/AlphaVantageTest.py", line 7, in <module>
print(aapl['2020-22-5'])
KeyError: '2020-22-5'
Since that didn't work, I tried getting a little more technical with it...
from alpha_vantage.timeseries import TimeSeries
from alpha_vantage.techindicators import TechIndicators
from matplotlib.pyplot import figure
import matplotlib.pyplot as plt
# Your key here
key = 'W01B6S3ALTS82VRF'
# Chose your output format, or default to JSON (python dict)
ts = TimeSeries(key, output_format='pandas')
ti = TechIndicators(key)
# Get the data, returns a tuple
# aapl_data is a pandas dataframe, aapl_meta_data is a dict
aapl_data, aapl_meta_data = ts.get_daily(symbol='AAPL')
# aapl_sma is a dict, aapl_meta_sma also a dict
aapl_sma, aapl_meta_sma = ti.get_sma(symbol='AAPL')
# Visualization
figure(num=None, figsize=(15, 6), dpi=80, facecolor='w', edgecolor='k')
aapl_data['4. close'].plot()
plt.tight_layout()
plt.grid()
plt.show()
I get these errors...
Traceback (most recent call last):
File "C:/Users/PycharmProjects/AlphaVantageTest/AlphaVantageTest.py", line 9, in <module>
ts = TimeSeries(key, output_format='pandas')
File "C:\Users\PycharmProjects\AlphaVantageTest\venv\lib\site-packages\alpha_vantage\alphavantage.py", line 66, in __init__
raise ValueError("The pandas library was not found, therefore can "
ValueError: The pandas library was not found, therefore can not be used as an output format, please install manually
How can I improve my program so that I don't receive these errors? None of these programs are bad syntax wise. Thank you to anyone that can help.
You need to install pandas. If you're just using pip, you can run pip install pandas if you are using conda to manage your envs you can use conda install pandas.
Glad it worked. According to this meta overflow post: What if I answer a question in a comment?
I am posting my comment as an answer so you can mark the question as answered.

Function not working, Syntax errors and more

The other day I was working on a project for an Image captioning model on Keras. But when I am running it, I am facing a host of error. Note that I am using Atom Editor and a virtual environment in Python, Running everything from a Command-line.
train_features = load_photo_features(os.path('C:/Users/neelg/Documents/Atom_projects/Main/features.pkl'), train)
In this line, I am receiving this error==>
File "C:\Users\neelg\Documents\Atom_projects\Main\Img_cap.py", line 143
train_features = load_photo_features(os.path('C:/Users/neelg/Documents/Atom_projects/Main/features.pkl'), train)
^
SyntaxError: invalid syntax
I think that the syntax is correct regarding the function, yet the error persists. So, in a seperate file I copied the function and tried to isolate problem.
Code for the standalone function:-
from pickle import load
import os
def load_photo_features(filename, dataset):
all_features = load(open(filename, 'rb'))
features = {k: all_features[k] for k in dataset}
return features
filename = 'C:/Users/neelg/Documents/Atom_projects/Main/Flickr8k_text/Flickr8k.trainImages.txt'
train_features = load_photo_features(os.path('C:/Users/neelg/Documents/Atom_projects/Main/features.pkl'), train)
Now, A different type of problem crops up:
Traceback (most recent call last):
File "C:\Users\neelg\Documents\Atom_projects\Main\testing.py", line 10, in <module>
train_features = load_photo_features(os.path('C:/Users/neelg/Documents/Atom_projects/Main/features.pkl'), train)
TypeError: 'module' object is not callable
Any help? I am trying to import the Flickr_8k dataset, which contains random pictures and another small dataset which are the labels of those photographs...
P.S=>Pls send suggestions after testing the code on tour own editors before submitting because I suspect there is some core problem arising due to the System encoding(As suggested by some others). Also, it is not possible to load the whole code due to it's length and requirement of multiple files.
This error comes from the fact that you're calling os.path which is a module not a function. Just remove it, you don't need it in this use-case, a string is enough for filename in open
I was about to ask you the same question with #ted why do you use os.path when you are trying to load the file.
Normally, I am using the following code for loading from pickle:
def load_obj(filename):
with open(filename, "rb") as fp:
return pickle.load(fp, enconding = 'bytes')
Furthermore, if I try something like that it works:
from pickle import load
import os
import pdb
def load_photo_features(filename):
all_features = load(open(filename, 'rb'))
pdb.set_trace()
#features = {k: all_features[k] for k in dataset}
#return features
train_features = load_photo_features('train.pkl')
I do not know what is the dataset input to proceed, but loading of the pickle file works fine.

module 'seaborn' has no attribute 'distplot'

I've some code like:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
data = pd.read_csv('StudentsPerformance.csv')
#print(data.isnull().sum()) // checking if there are some missing values or not
#print(data.dtypes)checking datatypes of the dataset
# ANALYSİS VALUES OF THE COLUMN'S
"""print(data['gender'].value_counts())
print(data['parental level of education'].value_counts())
print(data['race/ethnicity'].value_counts())
print(data['lunch'].value_counts())
print(data['test preparation course'].value_counts())"""
# Adding column total and average to the dataset
data['total'] = data['math score'] + data['reading score'] + data['writing score']
data['average'] = data ['total'] / 3
sns.distplot(data['average'])
I would like to see distplot of average for visualization but I run the program that gives me an error like
Traceback (most recent call last): File
"C:/Users/usersample/PycharmProjects/untitled1/sample.py", line 22, in
sns.distplot(data['average']) AttributeError: module 'seaborn' has no attribute 'distplot'
I've tried to reinstall and install seaborn and upgrade the seaborn to 0.9.0 but it doesn't work.
head of my data female,"group B","bachelor's
degree","standard","none","72","72","74" female,"group C","some
college","standard","completed","69","90","88" female,"group
B","master's degree","standard","none","90","95","93" male,"group
A","associate's degree","free/reduced","none","47","57","44"
this might be due to removal of paths in environment variables section. Try considering to add your IDE scripts and python folder. I am using pycharm IDE, and did the same and its working fine.

Why is my image_path undefined when using export_graphviz? - Python 3

I'm trying to run this machine learning tree algorithm code in IPython:
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
iris = load_iris()
X = iris.data[:, 2:] # petal length and width
y = iris.target
tree_clf = DecisionTreeClassifier(max_depth=2)
tree_clf.fit(X, y)
from sklearn.tree import export_graphviz
export_graphviz(tree_clf, out_file=image_path("iris_tree.dot"),
feature_names=iris.feature_names[2:],
class_names=iris.target_names,
rounded=True,
filled=True
)
But I get this error when run in IPython:
I'm unfamiliar with export_graphviz, does anyone have any idea how to correct this?
I guess you are following "Hands on Machine Learning with Scikit-Learn and TensorFlow" book by Aurelien Geron. I encountered with the same problem while trying out "Decision Trees" chapter. You can always refer to his GitHub notebooks . For your code, you may refer "decision tree" notebook.
Below I paste the code from notebook. Please do go ahead and have a look at the notebook also.
# To support both python 2 and python 3
from __future__ import division, print_function, unicode_literals
# Common imports
import numpy as np
import os
# to make this notebook's output stable across runs
np.random.seed(42)
# To plot pretty figures
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
plt.rcParams['axes.labelsize'] = 14
plt.rcParams['xtick.labelsize'] = 12
plt.rcParams['ytick.labelsize'] = 12
# Where to save the figures
PROJECT_ROOT_DIR = "."
CHAPTER_ID = "decision_trees"
def image_path(fig_id):
return os.path.join(PROJECT_ROOT_DIR, "images", CHAPTER_ID, fig_id)
def save_fig(fig_id, tight_layout=True):
print("Saving figure", fig_id)
if tight_layout:
plt.tight_layout()
plt.savefig(image_path(fig_id) + ".png", format='png', dpi=300)
To get rid of all the mess simply remove image_path,
now out_file="iris_tree.dot", after running that command a file will be saved in your folder named iris_tree. Open that file in Microsoft Word and copy all of its content. Now open your browser and type "webgraphviz" and then click on the first link. Then delete whatever is written in white space and paste your code which is copied from iris_tree. Then click "generate graph". Scroll down and your graph is ready.
I know you might have got what you were looking for. But in case you don't, all you need to do is just replace:
out_file=image_path("iris_tree.dot")
with:
out_file="iris_tree.dot"
This will create the .dot file in the same directory in which your current script is.
You can also give the absolute path to where you want to save the .dot file as:
out_file="/home/cipher/iris_tree.dot"
you must correct
out_file=image_path("iris_tree.dot"),
in below code line:
out_file="C:/Users/VIDA/Desktop/python/iris_tree.dot",
You can directly type instead of using the webgraphviz, if you are using sklearn version 0.20.
import graphviz
with open ("iris_tree.dot") as f:
dot_graph = f.read()
display (graphviz.Source(dot_graph))
With sklearn 0.22 you have to change again. See sklearn users guide.
I have a sklearn with the version of 0.20.1, and I got the example to work through the line below.
export_graphviz(
tree_clf,
out_file = "iris_tree.dot",
feature_names = iris.feature_names[2:])

Creating PDF using pydot

I got the below code from Visualizing a Decision Tree - Machine Learning
import numpy as np
from sklearn.datasets import load_iris
from sklearn import tree
iris = load_iris()
test_idx = [0, 50 , 100]
train_target = np.delete(iris.target, test_idx)
train_data = np.delete(iris.data, test_idx , axis=0)
test_target = iris.target[test_idx]
test_data = iris.data[test_idx]
clf = tree.DecisionTreeClassifier()
clf.fit(train_data, train_target)
print(test_target)
print(clf.predict(test_data))
#viz_code
from sklearn.externals.six import StringIO
import pydot
dot_data = StringIO()
tree.export_graphviz(clf,
out_file=dot_data,
feature_names = iris.feature_names,
class_names = iris.target_names,
filled = True, rounded = True,
impurity = False)
graph = pydot.graph_from_dot_data(dot_data.getvalue())
graph.write_pdf("iris.pdf")
I tried to run it in my python 3.5 but i get an error saying that graph is a list.
Traceback (most recent call last):
File "Iris.py", line 31, in <module>
graph.write_pdf("iris.pdf")
AttributeError: 'list' object has no attribute 'write_pdf'
Press any key to continue . . .
How come graph here is a list?
I think this is a duplicate, here is answered the same question link
because pydot.graph_from_dot_data return a list the solution is:
graph = pydot.graph_from_dot_data(dot_data.getvalue())
graph[0].write_pdf("iris.pdf")
This solved the problem for me with Python 3.6.5 :: Anaconda, Inc.
Pydot will not work in Python3.
You can use Pydotplus (graph.write_pdf("iris.pdf") AttributeError: 'list' object has no attribute 'write_pdf'") for python3 instead of pydot.
Although, the code shown on youtube is for Python2. So, it will be better if you use Python2.

Resources