how to list all downloaded datset from nltk - python-3.x

I downloaded some of the datasets from nltk using
import nltk
import nltk.corpus
nltk.download()
Now I want to list all the downloaded dataset
I don't know how.

You need to find a path where the downloads are stored. It should be nltk.data.path.
Also, try using nltk.data.find:
import os
import nltk
print(os.listdir(nltk.data.find("corpora")))

Related

How do i import the saved module in Google Colab?

I was build a NER module using Spacy in Google Colab. I saved it to the disk using nlp.to_disk() function.
nlp.to_disk("RCM.model")
This module is saved under the files. How should i import the RCM module for testing purpose?
i have tried the below code but it didn't work.
from google.colab import drive
my_module = drive.mount('/content/RCM.model', force_remount=True)
If you save a model you can load it using spacy.load.
import spacy
spacy.load("RCM.model") # the argument should be the path to the directory

ImportError: cannot import name 'PCA' from 'matplotlib.mlab'

According to this task:
Principal Component Analysis (PCA) in Python
I included this line
import from matplotlib.mlab import PCA
but I get the error message:
cannot import name 'PCA' from 'matplotlib.mlab'
I'm using Python3.7 and I have no idea how I can use the PCA function from matlab. Is the new version of matplotlib depricated or is PCA included to another library?
I really don't know if it is too late to reply now. But I will just place it here anyways.
import numpy as np
from sklearn.decomposition import PCA

How to import tfrecord files in a pandas dataframe?

I have a tfrecord file and would like to import it in a pandas dataframe or numpy array.
I found tools to read tfrecords but they only work inside a tensorflow session, which is not the use case I have...
Thanks for any help I could get !
In Colab you can type (or on your cmd without !)
!pip install pandas-tfrecords
After installation you can use:
import pandas as pd
import pandas_tfrecords as pdtfr
pdtfr.tfrecords_to_pandas(file_paths=r'/folder/file.tfrecords')
Good luck!

How can I solve cannot import name 'fetch_openml' from 'sklearn.datasets'

I'm learning sklearn, but I can't use fetch_openml(). It says,
ImportError: cannot import name 'fetch_openml' from 'sklearn.datasets'
In the new version of sklearn, it's even easier to fetch open ML Datasets. For example, you can add import and fetch mnist dataset as:
from sklearn.datasets import fetch_openml
X, y = fetch_openml('mnist_784', version=1, return_X_y=True, as_frame=False)
print(X.shape, y.shape)
For more details check official example.
You can use this:
from sklearn.datasets import fetch_openml
Apparently, fetch_mldata has been deprecated in the newer sklearn. Use load_digits to achieve loading the MNIST data.
To solve this problem in jupyter follow these steps:
Download file mnist-original from " https://osf.io/jda6s/"
after download file copy it into C:\Users\YOURUSERNAME\scikit_learn_data\mldata
in notebook jupyter do:
from sklearn.datasets import fetch_mldata
mnist = fetch_mldata('mnist-original')

How to run a specific sklearn version?

On my mac, I installed multiple versions of Sklearn as shown below:
Sklearn 0.19.1
~/anaconda2/pkgs/scikit-learn-0.19.1-py27h9788993_0/lib/python2.7/site-packages/sklearn
Sklearn 0.20.0
~/anaconda2/pkgs/scikit-learn-0.20.0-py27h4f467ca_1/lib/python2.7/site-packages/sklearn
When starting jupyter, it automatically runs the sklearn 0.20.0. I was wondering whether there is a way to run sklearn 0.19.1.
Thanks a lot,
Jeff
This should work, am not saying it is elegant but is what I would personally try first. sys.path is a list of all the places where it goes to import modules, so you first remove any occurrences of the one you don't want and then put in the one you do want.
In a cell before you import from sklearn:
import sys
syspath = sys.path
indexes = [i for i, s in enumerate(syspath) if 'scikit-learn-0.20.0-py27h4f467ca_1' in s]
for index in indexes:
syspath.pop(index)
sys.path.insert(0, '~/anaconda2/pkgs/scikit-learn-0.19.1-py27h9788993_0/lib/python2.7/site-packages/sklearn')
# now if you import from sklearn, should come from 19

Resources