Not able to write the Count Vectorizer vocabulary - python-3.x

I want to save and load the count vectorizer vocabulary.This is my code
from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer(max_features = 1500)
Cv_vec = cv.fit(X['review'])
X_cv=Cv_vec.transform(X['review']).toarray()
dictionary_filepath='CV_dict'
pickle.dump(Cv_vec.vocabulary_, open(dictionary_filepath, 'w'))
It shows me
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-407-3a9b06f969a9> in <module>()
1 dictionary_filepath='CV_dict'
----> 2 pickle.dump(Cv_vec.vocabulary_, open(dictionary_filepath, 'w'))
TypeError: write() argument must be str, not bytes
I want to save the vocabulary of the count vectorizer and load it.Can anyone help me with it please?.

Open the file in binary mode when pickling out an object. And try to use a context manager, i.e.
from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer(max_features = 1500)
Cv_vec = cv.fit(X['review'])
X_cv=Cv_vec.transform(X['review']).toarray()
dictionary_filepath='CV_dict'
with open('CV_dict.pkl', 'wb') as fout:
pickle.dump(Cv_vec.vocabulary_, fout)

Related

Trying to print image count

I am new to Python and I am trying to start CNN for one project. I mounted the gdrive and I am trying to download images from the gdrive directory. After, I am trying to count the images that I have in that directory. Here is my code:
import pathlib
dataset_dir = "content/drive/My Drive/Species_Samples"
data_dir = tf.keras.utils.get_file('Species_Samples', origin=dataset_dir, untar=True)
data_dir = pathlib.Path(data_dir)
image_count = len(list(data_dir('*/*.png')))
print(image_count)
However, I get the following error.
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-78-e5d9409807d9> in <module>()
----> 1 image_count = len(list(data_dir('*/*.png')))
2 print(image_count)
TypeError: 'PosixPath' object is not callable
Can you help, please?
After suggestion, my code looks like this:
import pathlib
data_dir = pathlib.Path("content/drive/My Drive/Species_Samples/")
count = len(list(data_dir.rglob("*.png")))
print(count)
You are trying to glob files you need to use one of the glob methods that pathlib has:
import pathlib
data_dir = pathlib.Path("/path/to/dir/")
count = len(list(data_dir.rglob("*.png")))
In this case .rglob is a recursive glob.

Linear Regression Prediction on Python3

I am trying to use LinearRegression on a data set using Python 3. I am trying to see the influence of Order Size on the metric OTIF (On Time In Full). The metric is a percentage of the amount of deliveries delivered in on time and in full. I get an error when I try to use LinearRegression.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# path of data
path = 'c:/Data/ame.csv'
df = pd.read_csv(path)
df.head()
from sklearn.linear_model import LinearRegression
lm = LinearRegression
lm
X = df[['Order Units']]
Y = df['OTIF%']
lm.fit(X,Y)
Yhat=lm.predict(X)
Yhat[0:5]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-39-b4b21bd8b84e> in <module>
----> 1 Yhat=lm.predict(X)
2 Yhat[0:5]
TypeError: predict() missing 1 required positional argument: 'X'
I think issue is you are not creating LinearRegression object for you.you must call its own constructor to get a object of the class.try this.
from sklearn.linear_model import LinearRegression
lm = LinearRegression()
X = df[['Order Units']]
Y = df['OTIF%']
lm.fit(X,Y)
Yhat=lm.predict(X)

Tensorflow TypeError: 'numpy.ndarray' object is not callable

while trying to predict the model i am getting this numpy.ndarray error .it might be the returning statement of the prepare function. what can be possibly done to get rid of this error .
import cv2
import tensorflow as tf
CATEGORIES = ["Dog", "Cat"]
def prepare(filepath):
IMG_SIZE = 50 # 50 in txt-based
img_array = cv2.imread(filepath, cv2.IMREAD_GRAYSCALE)
new_array = cv2.resize(img_array, (IMG_SIZE, IMG_SIZE))
return new_array.reshape(-1, IMG_SIZE, IMG_SIZE, 1)
model = tf.keras.models.load_model("64x3-CNN.model")
prediction = model.predict([prepare('dog.jpg')])
print(prediction) # will be a list in a list.
tried to give the full path still the same error persist.
TypeError Traceback (most recent call last)
<ipython-input-45-f9de27e9ff1e> in <module>
15
16 prediction = model.predict([prepare('dog.jpg')])
---> 17 print(prediction) # will be a list in a list.
18 print(CATEGORIES[int(prediction[0][0])])
TypeError: 'numpy.ndarray' object is not callable
Not sure what the rest of your code looks like. But if you use 'print' as a variable in Python 3 you can get this error:
import numpy as np
x = np.zeros((2,2))
print = np.ones((2,2))
print(x)
Output:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'numpy.ndarray' object is not callable
This type of errors mostly occurs when trying to print an array instead of simple strings or single variable numbers, so I would recommend you to change:
17 print(prediction) # will be a list in a list.
18 print(CATEGORIES[int(prediction[0][0])])
Then you would get:
17 print(str(prediction)) # will be a list in a list.
18 print(str(CATEGORIES[int(prediction[0][0])]))

XGBModel' object has no attribute 'evals_result_'

I am trying to use xgboost on a dataset. I have seen the same syntax in various blogs but I am getting an error while calling clf.evals_result()
here is my code
from xgboost import XGBRegressor as xgb
from sklearn.metrics import mean_absolute_error as mae
evals_result ={}
eval_s = [(x, y),(xval,yval)]
clf = xgb(n_estimators=100,learning_rate=0.03,tree_method='gpu_hist',lamda=0.1,eval_metric='mae',eval_set=eval_s,early_stopping_rounds=0,evals_result=evals_result)
clf.fit(x,y)
r = clf.evals_result()
here is error I am receiving
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-138-2d6867968043> in <module>
1
----> 2 r = clf.evals_result()
3
4 p = clf.predict(xval)
/opt/conda/lib/python3.6/site-packages/xgboost/sklearn.py in evals_result(self)
399 'validation_1': {'logloss': ['0.41965', '0.17686']}}
400 """
--> 401 if self.evals_result_:
402 evals_result = self.evals_result_
403 else:
AttributeError: 'XGBRegressor' object has no attribute 'evals_result_'
I got exactly the same error, the solution it's to pass the eval_set to the fit function and not in the creation of the classifier
clf.fit(x,y,eval_set=eval_s)
Then you can run clf.evals_result()

Unable to call the fit function on randomforest regressor python sklearn

I'm unable to call the fit function on the RandomForestRegressor and even the intellisense is only showing the predict and some other parameters. Below is my code, traceback call and an image showing the content of the intellisense.
import pandas
import numpy as np
from sklearn.model_selection import KFold
from sklearn.ensemble import RandomForestRegressor
def predict():
Fvector = 'C:/Users/Oussema/Desktop/Cred_Data/VEctors/FinalFeatureVector.csv'
data = np.genfromtxt(Fvector, dtype=float, delimiter=',', names=True)
AnnotArr = np.array(data['CredAnnot']) #this is a 1D array containig the ground truth (50000 rows)
TempTestArr = np.array([data['GrammarV'],data['TweetSentSc'],data['URLState']]) #this is the features vector the shape is (3,50000) the values range is [0-1]
FeatureVector = TempTestArr.transpose() #i used the transpose method to get the shape (50000,3)
RF_model = RandomForestRegressor(n_estimators=20, max_features = 'auto', n_jobs = -1)
RF_model.fit(FeatureVector,AnnotArr)
print(RF_model.oob_score_)
predict()
Intelisense content:
[1]: https://i.stack.imgur.com/XweOo.png
Traceback call
Traceback (most recent call last):
File "C:\Users\Oussema\source\repos\Regression_Models\Regression_Models\Random_forest_TCA.py", line 15, in <module>
predict()
File "C:\Users\Oussema\source\repos\Regression_Models\Regression_Models\Random_forest_TCA.py", line 14, in predict
print(RF_model.oob_score_)
AttributeError: 'RandomForestRegressor' object has no attribute 'oob_score_'
You need to set the oob_score param to True when initializing the RandomForestRegressor.
As per the documentation:
oob_score : bool, optional (default=False)
whether to use out-of-bag samples to estimate the R^2 on unseen data.
So the attribute oob_score_ is only available if you do this:
def predict():
....
....
RF_model = RandomForestRegressor(n_estimators=20,
max_features = 'auto',
n_jobs = -1,
oob_score=True) #<= This is what you want
....
....
print(RF_model.oob_score_)

Resources