I downloaded the nltk (book) using the code:
import nltk
nltk.download()
however when I try to test the nltk data using the below code, it won't work:
from nltk.corpus import brown
brown.words()
In the NLTK Window, navigate to the "Corpora" tab and check that brown has been installed. It should be highlighted green with status set to "installed" (see image below). Next check that you can find the unzipped brown folder in C:\Users\User\AppData\Roaming\nltk_data. Finally, assuming everything is present and working (i.e. you're not receiving any errors) when running the code, you will not be seeing any output in PyCharm without calling print. Instead, try this:
import nltk
from nltk.corpus import brown
nltk.download()
print(brown.words())
# Output
>>> ['The', 'Fulton', 'County', 'Grand', 'Jury', 'said', ...]
Related
I downloaded some of the datasets from nltk using
import nltk
import nltk.corpus
nltk.download()
Now I want to list all the downloaded dataset
I don't know how.
You need to find a path where the downloads are stored. It should be nltk.data.path.
Also, try using nltk.data.find:
import os
import nltk
print(os.listdir(nltk.data.find("corpora")))
Working on a salary dataset. Everything works fine except when I use the python-graphviz module in Spyder 3.3.2 to show the decision tree graph it simply shows an image icon in console window. Well the same thing works on other systems. What I'm missing here?
The output image is here.Console Output
from sklearn.tree import DecisionTreeClassifier
dtf = DecisionTreeClassifier()
dtf.fit(X_train, y_train)
from sklearn.tree import export_graphviz
export_graphviz(dtf, out_file="tree.dot", class_names=["Less than 50k",
"More than 50k"])
import graphviz
with open("tree.dot") as f:
dot_graph = f.read()
graphviz.Source(dot_graph)
(spyder maintainer here) This seems a limitation of QtConsole, which is the package that powers our IPython consoles.
Please open an issue on the repo referenced above about this so we don't forget to fix it in the future.
On my mac, I installed multiple versions of Sklearn as shown below:
Sklearn 0.19.1
~/anaconda2/pkgs/scikit-learn-0.19.1-py27h9788993_0/lib/python2.7/site-packages/sklearn
Sklearn 0.20.0
~/anaconda2/pkgs/scikit-learn-0.20.0-py27h4f467ca_1/lib/python2.7/site-packages/sklearn
When starting jupyter, it automatically runs the sklearn 0.20.0. I was wondering whether there is a way to run sklearn 0.19.1.
Thanks a lot,
Jeff
This should work, am not saying it is elegant but is what I would personally try first. sys.path is a list of all the places where it goes to import modules, so you first remove any occurrences of the one you don't want and then put in the one you do want.
In a cell before you import from sklearn:
import sys
syspath = sys.path
indexes = [i for i, s in enumerate(syspath) if 'scikit-learn-0.20.0-py27h4f467ca_1' in s]
for index in indexes:
syspath.pop(index)
sys.path.insert(0, '~/anaconda2/pkgs/scikit-learn-0.19.1-py27h9788993_0/lib/python2.7/site-packages/sklearn')
# now if you import from sklearn, should come from 19
After exporting a .dot file using scikit-learn's handy export_graphviz function.
I am trying to render the dot file using Graphviz into a cell in my Jupyter Notebook:
import graphviz
from IPython.display import display
with open("tree_1.dot") as f:
dot_graph = f.read()
display(graphviz.Source(dot_graph))
However the out[ ] is just an empty cell.
I am using graphviz 0.5 (pip then conda installed), iPython 5.1, and Python 3.5
The dot file looks correct here are the first characters:
digraph Tree {\nnode [shape=box, style="filled", color=
iPython display seems to work for other objects including Matplotlib plots and Pandas dataframes.
I should note the example on Graphviz' site also doesn't work.
It's possible that since you posted this, changes were made so you might want to update your libraries if that's possible.
The versions of relevance here I used are:
Python 2.7.10
IPython 5.1.0
graphviz 0.7.1
If you have a well formed .dot file, you can display it to the jupyter out[.] cell by the following:
import graphviz
with open("tree_1.dot") as f:
dot_graph = f.read()
# remove the display(...)
graphviz.Source(dot_graph)
this solution allows you to insert DOT text directly (without saving it to file first)
# convert a DOT source into graph directly
import graphviz
from IPython.display import display
source= '''\
digraph sample {
A[label="AL"]
B[label="BL"]
C[label="CL"]
A->B
B->C
B->D
D->C
C->A
}
'''
print (source)
gvz=graphviz.Source(source)
# produce PDF
#gvz.view()
print (gvz.source)
display(gvz)
Try to use pydotplus.
import pydotplus
by (1.1) Importing the .dot from outside
pydot_graph = pydotplus.graph_from_dot_file("clf.dot")
or (1.2) Directly using the .export_graphviz output
dt = tree.DecisionTreeClassifier()
dt = clf.fit(x,y)
dt_graphviz = tree.export_graphviz(dt, out_file = None)
pydot_graph = pydotplus.graph_from_dot_data(dt_graphviz)
(2.) and than display the pyplot graph using
from IPython.display import Image
Image(pydot_graph.create_png())
try to reinstall graphviz
conda remove graphviz
conda install python-graphviz
graphviz.Source(dot_graph).view()
graphviz.Source(dot_graph).view()
for example, I want to get word "india"(NOUN) from "indian"(ADJ).
I can find india from indian using wordnet browser but I don't know how to implement with python using nltk.
You can do something like this:
from nltk.corpus import wordnet as wn
syns = wn.synset('indian.a.01')
print syns.lemmas[0].derivationally_related_forms()
print syns.lemmas[0].derivationally_related_forms()[0].name
You get:
[Lemma('india.n.01.India')]
India