Issue importing MNIST dataset from sklearn [duplicate] - scikit-learn

This question already has answers here:
Can't load 'mnist-original' dataset using sklearn [duplicate]
(9 answers)
How to use datasets.fetch_mldata() in sklearn?
(11 answers)
Closed 12 months ago.
As written in pg. 79 of Hands-On Machine Learning with Scikit-Learn and Tensorflow, I tried to import the MNIST data liked the book told me. The code didn't work, and I looked at other stackoverflow questions & github issue, but neither of those solutions worked, so I wanted to ask if anyone knows the solution. Thanks again for helping me solve this issue.
from sklearn.datasets import fetch_mldata
minst = fetch_mldata('MNIST Original')

The function fetch_mldata has been rerplaced in sklearn, you should use fetch_openml instead :
from sklearn.datasets import fetch_openml
mnist = fetch_openml('mnist_784')
However I am not sure weither the book is up to date with recent versions of sklearn, you should probably downgrade your sklearn version to the one used in the book or use a book that would be up to date with the current version.

Related

python 3.8 error with cross validation Implementation

When I run the following code from a tutorial, I keep getting the following error at the end in almost every video I attempt.
Source: https://pythonprogramming.net/k-means-from-scratch-2-machine-learning-tutorial/?completed=/k-means-from-scratch-machine-learning-tutorial/
I get the following error:
from sklearn import preprocessing, cross_validation
ImportError: cannot import name 'cross_validation' from 'sklearn
I did pip installs, changing the way cross_validation is stated based on other suggestions but I still can't solve it.
I could not find cross_validation as a library in sklearn.
You need to use from sklearn.model_selection import cross_validate as per the documentation. This has already been answered here.
I also suggest going through the documentation of the functions you use to gain a better understanding of what you are doing.

Python 3.x - AttributeError: 'function' object has no attribute 'Kfold'

I'm following along an older tutorial with an SVM optimized with a genetic algorithm. I originally thought the issue was just with versions of python and/or scikit, but I'm now unsure of what the issue could be as it continues displaying the same error. I'm currently using python-scikit-learn 0.20.3-1 on Antergos and found a link here that unfortunately didn't seem to help.
So far I've found a few links and examples that have had me alter different aspects of the code, which overall just jumbled everything up. This GitHub page was useful in at least understanding the version difference, as was the first link. This blog post was also neat, but again didn't really help me narrow down the exact issue as to why it's reading out the error. I even tried looking at the sklearn documentation but I still couldn't get it.
These are what I've imported:
import numpy as np
import pandas as pd
import random as rd
from sklearn.model_selection import cross_validate
from sklearn import preprocessing
from sklearn import svm
I had "kfold" defined earlier in the program as such:
kfold = 3
As well, this is the exact line it seem to be having issues with:
kf = cross_validate.KFold(Cnt1,n_splits=kfold)
What it should be doing is simply applying cross validation. However, the error reads:
AttributeError: 'function' object has no attribue 'KFold'
I can't tell if the issue is that I'm not understanding what I should be altering via the links I've given, or if it's a different error born of ignorance. Is there something I'm missing in order to get this to work?
The KFold function is in the sklearn.model_selection module not in sklearn.model_selection.cross_validate
So you sould import
from sklearn import model_selection
and then using like
model_selection.KFold(...)
or you can import the function
from sklearn.model_selection import KFold
just like in the KFold Doc example.

Problem in importing statsmodels.api in jupyter

I am trying to import statsmodel.api for conducting a linear regression in jupyter notebook (python 3 kernel). But I am getting the following error:
https://gist.github.com/ashusopku/e7e4da92babfdab5952f6836f2d4af69
Please help as I cannot find any solution to the problem elsewhere.

what's the difference between "import keras" and "import tensorflow.keras"

I was wondering, what's the difference between importing keras from tensorflow using import tensorflow.keras or just pip installing keras alone and importing it using import keras as both seemed to work nicely so far, the only difference I noticed is that i get Using TensorFlow backend. in the command line every time I execute the one using keras.
Tensorflow.keras is an version of Keras API implemented specifically for use with Tensorflow. It is a part of Tensorflow repo and from TF version 2.0 will become main high level API replacing tf.layers and slim.
The only reason to use standalone keras is to maintain framework-agnostic code, i.e. use it with another backend.

Tensorflow causes errors in scikit-learn

When I import scikit-learn before importing tensorflow I don't have any issues. Running this block of code produces an output of 1.7766212763101197e-12.
import numpy as np
np.random.seed(123)
import numpy.random as rand
from sklearn.decomposition import PCA
import tensorflow as tf
X = rand.randn(100,15)
X = X - X.mean(axis=0)
mod = PCA()
w = mod.fit_transform(X)
h = mod.components_
print(np.sum(np.abs(X-np.dot(w,h))))
However, if I import tensorflow before importing scikit-learn my code no longer functions. When I run this code-block
import tensorflow as tf
import numpy as np
np.random.seed(123)
import numpy.random as rand
from sklearn.decomposition import PCA
X = rand.randn(100,15)
X = X - X.mean(axis=0)
mod = PCA()
w = mod.fit_transform(X)
h = mod.components_
print(np.sum(np.abs(X-np.dot(w,h))))
I get an output of 130091393261440.25.
Why is that? My versions for the packages are:
numpy - 1.13.1
sklearn - 0.19.0
tensorflow - 1.3.0
Import order should not affect output, as python modules are self-contained, except in the case of dependencies.
I was unable to reproduce your error, and get an output of 1.7951539777252834e-12 for both code blocks.
This is an interesting problem and I am curious to see if others can provide a better response for why you are seeing this issue.
Note: the present answer is an answer to the title for the ones looking for using TensorFlow within Scikit-Learn, and does not just regards some import errors as you've had.
You can use TensorFlow within Scikit-Learn pipelines using Neuraxle.
Neuraxle is an extension of Scikit-Learn to make it more compatible with all deep learning libraries.
Problem: You can’t Parallelize nor Save Pipelines Using Steps that Can’t be Serialized “as-is” by Joblib (e.g.: a TensorFlow step)
Whereas a step is a transformer or estimator in a scikit-learn Pipeline.
This problem will only surface past some point of using Scikit-Learn. This is the point of no-return: you’ve coded your entire production pipeline, but once you trained it and selected the best model, you realize that what you’ve just coded can’t be serialized.
This means once trained, your pipeline can’t be saved to disks because one of its steps imports things from a weird python library coded in another language and/or uses GPU resources. Your code smells weird and you start panicking over what was a full year of research development.
Solution with Code Examples:
Here is a full project example from A to Z where TensorFlow is used with Neuraxle as if it was used with Scikit-Learn.
Here is another practical example where TensorFlow is used within a scikit-learn-like pipeline
The trick is performed by using Neuraxle-TensorFlow.
This is to make use of Neuraxle's savers.
Read also: https://stackoverflow.com/a/60557192/2476920

Resources