Graphviz.Source not rendering in Jupyter Notebook - scikit-learn

After exporting a .dot file using scikit-learn's handy export_graphviz function.
I am trying to render the dot file using Graphviz into a cell in my Jupyter Notebook:
import graphviz
from IPython.display import display
with open("tree_1.dot") as f:
dot_graph = f.read()
display(graphviz.Source(dot_graph))
However the out[ ] is just an empty cell.
I am using graphviz 0.5 (pip then conda installed), iPython 5.1, and Python 3.5
The dot file looks correct here are the first characters:
digraph Tree {\nnode [shape=box, style="filled", color=
iPython display seems to work for other objects including Matplotlib plots and Pandas dataframes.
I should note the example on Graphviz' site also doesn't work.

It's possible that since you posted this, changes were made so you might want to update your libraries if that's possible.
The versions of relevance here I used are:
Python 2.7.10
IPython 5.1.0
graphviz 0.7.1
If you have a well formed .dot file, you can display it to the jupyter out[.] cell by the following:
import graphviz
with open("tree_1.dot") as f:
dot_graph = f.read()
# remove the display(...)
graphviz.Source(dot_graph)

this solution allows you to insert DOT text directly (without saving it to file first)
# convert a DOT source into graph directly
import graphviz
from IPython.display import display
source= '''\
digraph sample {
A[label="AL"]
B[label="BL"]
C[label="CL"]
A->B
B->C
B->D
D->C
C->A
}
'''
print (source)
gvz=graphviz.Source(source)
# produce PDF
#gvz.view()
print (gvz.source)
display(gvz)

Try to use pydotplus.
import pydotplus
by (1.1) Importing the .dot from outside
pydot_graph = pydotplus.graph_from_dot_file("clf.dot")
or (1.2) Directly using the .export_graphviz output
dt = tree.DecisionTreeClassifier()
dt = clf.fit(x,y)
dt_graphviz = tree.export_graphviz(dt, out_file = None)
pydot_graph = pydotplus.graph_from_dot_data(dt_graphviz)
(2.) and than display the pyplot graph using
from IPython.display import Image
Image(pydot_graph.create_png())

try to reinstall graphviz
conda remove graphviz
conda install python-graphviz
graphviz.Source(dot_graph).view()

graphviz.Source(dot_graph).view()

Related

backtesting.py ploting function not working

I'm trying to learn backtesting.py, when I run the following sample code, it pops up these errors, anyone could help? I tried to uninstall the Bokeh package and reinstall an older version, but it doen't work.
BokehDeprecationWarning: Passing lists of formats for DatetimeTickFormatter scales was deprecated in Bokeh 3.0. Configure a single string format for each scale
C:\Users\paul_\AppData\Local\Programs\Python\Python310\lib\site-packages\bokeh\models\formatters.py:399: UserWarning: DatetimeFormatter scales now only accept a single format. Using the first prodvided: '%d %b'
warnings.warn(f"DatetimeFormatter scales now only accept a single format. Using the first prodvided: {fmt[0]!r} ")
BokehDeprecationWarning: Passing lists of formats for DatetimeTickFormatter scales was deprecated in Bokeh 3.0. Configure a single string format for each scale
C:\Users\paul_\AppData\Local\Programs\Python\Python310\lib\site-packages\bokeh\models\formatters.py:399: UserWarning: DatetimeFormatter scales now only accept a single format. Using the first prodvided: '%m/%Y'
warnings.warn(f"DatetimeFormatter scales now only accept a single format. Using the first prodvided: {fmt[0]!r} ")
GridPlot(id='p11925', ...)
import bokeh
import datetime
import pandas_ta as ta
import pandas as pd
from backtesting import Backtest
from backtesting import Strategy
from backtesting.lib import crossover
from backtesting.test import GOOG
class RsiOscillator(Strategy):
upper_bound = 70
lower_bound = 30
rsi_window = 14
# Do as much initial computation as possible
def init(self):
self.rsi = self.I(ta.rsi, pd.Series(self.data.Close), self.rsi_window)
# Step through bars one by one
# Note that multiple buys are a thing here
def next(self):
if crossover(self.rsi, self.upper_bound):
self.position.close()
elif crossover(self.lower_bound, self.rsi):
self.buy()
bt = Backtest(GOOG, RsiOscillator, cash=10_000, commission=.002)
stats = bt.run()
bt.plot()
An issue was opened for this in the GitHub repo:
https://github.com/kernc/backtesting.py/issues/803
A comment in the issue suggests to downgrade bokeh to 2.4.3:
python3 -m pip install bokeh==2.4.3
This worked for me.
I had a similar issue, using Spyder IDE.
Found out I need to call the below for the plot to show for Spyder.
backtesting.set_bokeh_output(notebook=False)
I have update Python to version 3.11 & downgrade bokeh to 2.4.3
This worked for me.
Downgrading Bokeh didn't work for me.
But, after importing backtesting in Jupyter, I needed to do:
backtesting.set_bokeh_output(notebook=False)
The expected plot was then generated in a new interactive browser tab.

is there any other way to load data

I am new to data science and Python programming. I am having trouble loading a csv file in a jupyter notebook.
This is for Windows 10. I have already tried restarting the kernel and clearing the output.
import numpy as np
import pandas as pd
data = pd.read_csv("C/users/SHIVAM/desktop/brazil.csv.csv")
I expected the dataset to be loaded in jupyter notebook. It also raises file not found error.
You have to use a different separator (\) for windows paths and they should be escaped properly with a double-slash (\\). You're also missing a colon in C:
You path should look like this: 'C:\\users\\SHIVAM\\desktop\\brazil.csv.csv' or using your code:
import numpy as np
import pandas as pd
data = pd.read_csv('C:\\users\\SHIVAM\\desktop\\brazil.csv.csv')
All of this assumes that this path is really the correct path you want and the file is actually there, you should make sure that it does.
Some of these different path separator problems can be fixed if you use something like pathlib which is intended to be cross platform:
>>> from pathlib import Path
>>> p = Path('C:/users/SHIVAM/desktop/brazil.csv.csv')
>>> p
WindowsPath('C:/users/SHIVAM/desktop/brazil.csv.csv')
>>> str(p)
'C:\\users\\SHIVAM\\desktop\\brazil.csv.csv'

Is there any way I can save the plot as .jpg [duplicate]

I am using matplotlib (within pylab) to display figures. And I want to save them in .jpg format. When I simply use the savefig command with jpg extension this returns :
ValueError: Format "jpg" is not supported.
Supported formats: emf, eps, pdf, png, ps, raw, rgba, svg, svgz.
Is there a way to perform this ?
You can save an image as 'png' and use the python imaging library (PIL) to convert this file to 'jpg':
import Image
import matplotlib.pyplot as plt
plt.plot(range(10))
plt.savefig('testplot.png')
Image.open('testplot.png').save('testplot.jpg','JPEG')
The original:
The JPEG image:
To clarify and update #neo useful answer and the original question. A clean solution consists of installing Pillow, which is an updated version of the Python Imaging Library (PIL). This is done using
pip install pillow
Once Pillow is installed, the standard Matplotlib commands
import matplotlib.pyplot as plt
plt.plot([1, 2])
plt.savefig('image.jpg')
will save the figure into a JPEG file and will not generate a ValueError any more.
Contrary to #amillerrhodes answer, as of Matplotlib 3.1, JPEG files are still not supported. If I remove the Pillow package I still receive a ValueError about an unsupported file type.
Just install pillow with pip install pillow and it will work.
I just updated matplotlib to 1.1.0 on my system and it now allows me to save to jpg with savefig.
To upgrade to matplotlib 1.1.0 with pip, use this command:
pip install -U 'http://sourceforge.net/projects/matplotlib/files/matplotlib/matplotlib-1.1.0/matplotlib-1.1.0.tar.gz/download'
EDIT (to respond to comment):
pylab is simply an aggregation of the matplotlib.pyplot and numpy namespaces (as well as a few others) jinto a single namespace.
On my system, pylab is just this:
from matplotlib.pylab import *
import matplotlib.pylab
__doc__ = matplotlib.pylab.__doc__
You can see that pylab is just another namespace in your matplotlib installation. Therefore, it doesn't matter whether or not you import it with pylab or with matplotlib.pyplot.
If you are still running into problem, then I'm guessing the macosx backend doesn't support saving plots to jpg. You could try using a different backend. See here for more information.
Matplotlib can handle directly and transparently jpg if you have installed PIL. You don't need to call it, it will do it by itself. If Python cannot find PIL, it will raise an error.
I'm not sure about all versions of Matplotlib, but in the official documentation for v3.5.0 savfig allows you to pass settings through to the underlying Pillow library which anyway does the image saving. So if you want a jpg with specific compression settings for example:
import matplotlib.pyplot as plt
plt.plot(...) # Plot stuff
plt.savefig('filename.jpg', pil_kwargs={
'quality': 20,
'subsampling': 10
})
This should give you a highly compressed jpg as the output.
Just for completeness, if you also want to control the quality (i.e. compression level) of the saved result, it seems to get a bit more complicated, as directly passing plt.savefig(..., quality=5) does not seem to have an effect on the output size and quality. So, on the one hand, one could either go the way of saving the result as a png first, then reloading it with PIL, then saving it again as a jpeg, using PIL's quality parameter – similar to what is suggested in Yann's answer.
On the other hand, one can avoid this deviation of loading and saving, by using BytesIO (following the answer to this question):
from io import BytesIO
import matplotlib.pyplot as plt
from PIL import Image
buf = BytesIO()
plt.plot(...) # Plot something here
plt.savefig(buf)
Image.open(buf).convert("RGB").save("testplot.jpg", quality=5)

How to set dash style to majorGridlines with openpyxl

Python 3.7, Openpyxl 2.5.12, O.S. Windows 7.
I would like to get majorGridlines with dot style. Is it possible to get this with openpyxl?
I have checked openpyxl.drawing.line.LineProperties class and I have seen there is an option called prstDash = "dot". I have managed to get dash and dot styles with different series of a ScatterChart() like:
serie.graphicalProperties.line.dashStyle = "sysDot"
However, I am not able to give this property to majorGridLines. Is there any way of doing it?
The best thing to do is to create a sample file with the styling you need and compare it with one created or processed by openpyxl. openpyxl implements the OOXML pretty closely so it should be possible to work out how what you need.
Update : 2022
Openpyxl 3.0.10, Python 3.8.3
#required modules to import
import openpyxl
from openpyxl.chart.shapes import GraphicalProperties
from openpyxl.drawing.line import LineProperties
from openpyxl.chart.axis import ChartLines
#Add dash style to majorgridlines or minorgridlines
chart = ScatterChart()
chart.y_axis.majorGridlines=
ChartLines(spPr=GraphicalProperties(ln=LineProperties(prstDash
= 'dash'))))

No graphviz output in console of Spyder

Working on a salary dataset. Everything works fine except when I use the python-graphviz module in Spyder 3.3.2 to show the decision tree graph it simply shows an image icon in console window. Well the same thing works on other systems. What I'm missing here?
The output image is here.Console Output
from sklearn.tree import DecisionTreeClassifier
dtf = DecisionTreeClassifier()
dtf.fit(X_train, y_train)
from sklearn.tree import export_graphviz
export_graphviz(dtf, out_file="tree.dot", class_names=["Less than 50k",
"More than 50k"])
import graphviz
with open("tree.dot") as f:
dot_graph = f.read()
graphviz.Source(dot_graph)
(spyder maintainer here) This seems a limitation of QtConsole, which is the package that powers our IPython consoles.
Please open an issue on the repo referenced above about this so we don't forget to fix it in the future.

Resources