ImportError: cannot import name 'TfidVectorizer' in anaconda

ImportError: cannot import name 'TfidVectorizer' in anaconda - scikit-learn

Can't import multinomialNB and make_pipeline from sklearn.naive_bayes and sklearn.pipeline respectively, screenshot is attached.I'm using python3. I uninstalled and installed anaconda from "https://conda.io/docs/user-guide/install/index.html" last time.
I installed and uninstalled from separate sources too.
I tried installing packages separately also. sklearn,scipy or other packages are installed and upgraded but this piece of code is giving the same error again and again.
I tried every possible solutions on internet and stackoverflow.
#importing necessary packages
from sklearn.feature_extraction.text import TfidVectorizer
from sklearn.naive_bayes import multinomialNB
from sklearn.pipeline import make_pipeline
#creating a model based on multinomial naive-bayes
model = make_pipeline(TfidVectorizer(), multinomialNB())
#training the model with train data
model.fit(train.data, train.target)
#creating labels for test data
labels = model.predict(test.data)

You have some spelling mistakes in your imports. Also, include the error messages the next time you have an error.
from sklearn.feature_extraction.text import TfidfVectorizer # notice the spelling with the f before Vectorizer
from sklearn.naive_bayes import MultinomialNB # notice the Caps on the M
from sklearn.pipeline import make_pipeline
EDIT: Also, please read this about a minimum example, it will make your life a lot easier when trying to get answers from SO in the future.
Welcome to SO!

Related

Why should I import keras again even if after importing tensorflow2?

In TensorFlow version2, when I want to use tf.keras, all the example codes import keras again even if tensorflow has imported like below example.
import tensorflow as tf
from tensorflow import keras
tf.kera...
Can't I just skip second line which is from tensorflow import keras?? Why should I import keras separately even if I use keras in forms of tf.keras??

Python 3.x - AttributeError: 'function' object has no attribute 'Kfold'

I'm following along an older tutorial with an SVM optimized with a genetic algorithm. I originally thought the issue was just with versions of python and/or scikit, but I'm now unsure of what the issue could be as it continues displaying the same error. I'm currently using python-scikit-learn 0.20.3-1 on Antergos and found a link here that unfortunately didn't seem to help.
So far I've found a few links and examples that have had me alter different aspects of the code, which overall just jumbled everything up. This GitHub page was useful in at least understanding the version difference, as was the first link. This blog post was also neat, but again didn't really help me narrow down the exact issue as to why it's reading out the error. I even tried looking at the sklearn documentation but I still couldn't get it.
These are what I've imported:
import numpy as np
import pandas as pd
import random as rd
from sklearn.model_selection import cross_validate
from sklearn import preprocessing
from sklearn import svm
I had "kfold" defined earlier in the program as such:
kfold = 3
As well, this is the exact line it seem to be having issues with:
kf = cross_validate.KFold(Cnt1,n_splits=kfold)
What it should be doing is simply applying cross validation. However, the error reads:
AttributeError: 'function' object has no attribue 'KFold'
I can't tell if the issue is that I'm not understanding what I should be altering via the links I've given, or if it's a different error born of ignorance. Is there something I'm missing in order to get this to work?

The KFold function is in the sklearn.model_selection module not in sklearn.model_selection.cross_validate
So you sould import
from sklearn import model_selection
and then using like
model_selection.KFold(...)
or you can import the function
from sklearn.model_selection import KFold
just like in the KFold Doc example.

Tensorflow causes errors in scikit-learn

When I import scikit-learn before importing tensorflow I don't have any issues. Running this block of code produces an output of 1.7766212763101197e-12.
import numpy as np
np.random.seed(123)
import numpy.random as rand
from sklearn.decomposition import PCA
import tensorflow as tf
X = rand.randn(100,15)
X = X - X.mean(axis=0)
mod = PCA()
w = mod.fit_transform(X)
h = mod.components_
print(np.sum(np.abs(X-np.dot(w,h))))
However, if I import tensorflow before importing scikit-learn my code no longer functions. When I run this code-block
import tensorflow as tf
import numpy as np
np.random.seed(123)
import numpy.random as rand
from sklearn.decomposition import PCA
X = rand.randn(100,15)
X = X - X.mean(axis=0)
mod = PCA()
w = mod.fit_transform(X)
h = mod.components_
print(np.sum(np.abs(X-np.dot(w,h))))
I get an output of 130091393261440.25.
Why is that? My versions for the packages are:
numpy - 1.13.1
sklearn - 0.19.0
tensorflow - 1.3.0

Import order should not affect output, as python modules are self-contained, except in the case of dependencies.
I was unable to reproduce your error, and get an output of 1.7951539777252834e-12 for both code blocks.
This is an interesting problem and I am curious to see if others can provide a better response for why you are seeing this issue.

Note: the present answer is an answer to the title for the ones looking for using TensorFlow within Scikit-Learn, and does not just regards some import errors as you've had.
You can use TensorFlow within Scikit-Learn pipelines using Neuraxle.
Neuraxle is an extension of Scikit-Learn to make it more compatible with all deep learning libraries.
Problem: You can’t Parallelize nor Save Pipelines Using Steps that Can’t be Serialized “as-is” by Joblib (e.g.: a TensorFlow step)
Whereas a step is a transformer or estimator in a scikit-learn Pipeline.
This problem will only surface past some point of using Scikit-Learn. This is the point of no-return: you’ve coded your entire production pipeline, but once you trained it and selected the best model, you realize that what you’ve just coded can’t be serialized.
This means once trained, your pipeline can’t be saved to disks because one of its steps imports things from a weird python library coded in another language and/or uses GPU resources. Your code smells weird and you start panicking over what was a full year of research development.
Solution with Code Examples:
Here is a full project example from A to Z where TensorFlow is used with Neuraxle as if it was used with Scikit-Learn.
Here is another practical example where TensorFlow is used within a scikit-learn-like pipeline
The trick is performed by using Neuraxle-TensorFlow.
This is to make use of Neuraxle's savers.
Read also: https://stackoverflow.com/a/60557192/2476920

sklearn.cross_validation is deprecated warning triggered by sklearn.model_selection

When I first changed my code to "model_selection", the warnings quit. Over the weekend I updated Anaconda and now any import of sklearn triggers the "cross_validation" warning.
I've found several samples of this error on the net, none addressing this specific issues. It if has been, it's due to my old brain not being able properly form the question. Apologies in advance. Clarity greatly appreciated.
#!/usr/bin/env python
# tpot pipeline
from tpot import TPOTClassifier
from sklearn.model_selection import train_test_split
import numpy as np
from numpy import loadtxt

Cannot resolve import "MultiLabelBinarizer"

I am new to the "scikit-Learn" API and wish to implement a multilabel classification problem. After importing the following packages:
import numpy as np
from sklearn.multiclass import OneVsRestClassifier
from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.svm import LinearSVC
from sklearn.metrics import classification_report
I get an error which says 'Unresolved import: MultiLabelBinarizer'. But other related packages imported seem to work fine. I wonder why the 'MultiLabelBinarizer' cannot be imported, given the fact that the 'sklearn ' package was properly installed. Any help will be appreciated.

I found out the reason, in case someone comes across the same problem. The error was due to fact that I was running the above code on 'sklearn' version 0.14 (which was installed by default on Ubuntu 14.04 LTS) instead of 0.16. I also think the MultiLabelBinarizer Class is only available on 'sklearn' version 0.16 (I have not tried the 0.15 - in case there is any).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string