LabelEncoder instance is not fitted yet - python-3.x

I have a code for prediction of unseen data in a sentence classification task.
The code is
from sklearn.preprocessing import LabelEncoder
maxlen = 1152
### PREDICT NEW UNSEEN DATA ###
tokenizer = Tokenizer()
label_enc = LabelEncoder()
X_test = ['this is boring', 'wow i like this you did a great job']
X_test = tokenizer.texts_to_sequences(X_test)
X_test = sequence.pad_sequences(X_test, maxlen=maxlen)
a = (model.predict(X_test)>0.5).astype(int).ravel()
print(a)
reverse_pred = label_enc.inverse_transform(a.ravel())
print(reverse_pred)
But I am getting this error
[1 1]
---------------------------------------------------------------------------
NotFittedError Traceback (most recent call last)
<ipython-input-33-7e12dbe8aec1> in <module>()
39 print(a)
40
---> 41 reverse_pred = label_enc.inverse_transform(a.ravel())
42 print(reverse_pred)
1 frames
/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py in check_is_fitted(estimator, attributes, msg, all_or_any)
965
966 if not attrs:
--> 967 raise NotFittedError(msg % {'name': type(estimator).__name__})
968
969
NotFittedError: This LabelEncoder instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.
I have used Sequential model and the model.fit is written as history=model.fit() in the training part. Why am I getting this error?

following the sklearn documentation and what reported here, you have simply to fit your encoder before making an inverse transform
y = ['positive','negative','positive','negative','positive','negative']
label_enc = LabelEncoder()
label_enc.fit(y)
model_predictions = np.random.uniform(0,1, 3)
model_predictions = (model_predictions>0.5).astype(int).ravel()
model_predictions = label_enc.inverse_transform(model_predictions)

Related

How can I solve svm predict model problem

Im having problem by svm predict model
from sklearn.svm import SVC
svm_model = SVC(kernel='rbf', C=8, gamma=0.1)
svm_model.fit(X_train_std, y_train)
y_pred = svm_model.predict(X_test_std)
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:993: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
y = column_or_1d(y, warn=True)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-53-398f1caaa8e8> in <module>
3 svm_model = SVC(kernel='rbf', C=8, gamma=0.1)
4
----> 5 svm_model.fit(X_train_std, y_train)
6
7 y_pred = svm_model.predict(X_test_std)
2 frames
/usr/local/lib/python3.8/dist-packages/sklearn/utils/multiclass.py in check_classification_targets(y)
195 "multilabel-sequences",
196 ]:
--> 197 raise ValueError("Unknown label type: %r" % y_type)
198
199
ValueError: Unknown label type: 'continuous'
I thought y type problem
train = pd.get_dummies(train, columns=['LSTAT'], drop_first=True)
So I use that but problem was disappeared
Somebody help me

ValueError: continuous is not supported for RandomForestRegressor

After I had Pipeline preprocessed the weld data, I was able to get clean data in the output. Next, I need to pass the cleaned data through the model for training. Both the data preprocessing and model training steps can be further encapsulated in a Pipeline as follows:
from sklearn.ensemble import RandomForestRegressor
completed_pl = Pipeline(
steps=[
("preprocessor", preprocessor),
("classifier", RandomForestRegressor())
]
)
# training
completed_pl.fit(X_train, y_train)
# accuracy
y_train_pred = completed_pl.predict(X_train)
print(f"Accuracy on train: {accuracy_score(list(y_train), list(y_train_pred)):.2f}")
y_pred = completed_pl.predict(X_test)
print(f"Accuracy on test: {accuracy_score(list(y_test), list(y_pred)):.2f}")
I have used load_boston dataset from sklearn
And the error :
ValueError Traceback (most recent call last)
<ipython-input-86-d0b1928cf1a7> in <module>
12 # accuracy
13 y_train_pred = completed_pl.predict(X_train)
---> 14 print(f"Accuracy on train: {accuracy_score(list(y_train), list(y_train_pred)):.2f}")
15
16 y_pred = completed_pl.predict(X_test)
1 frames
/usr/local/lib/python3.7/dist-packages/sklearn/metrics/_classification.py in _check_targets(y_true, y_pred)
102 # No metrics support "multiclass-multioutput" format
103 if y_type not in ["binary", "multiclass", "multilabel-indicator"]:
--> 104 raise ValueError("{0} is not supported".format(y_type))
105
106 if y_type in ["binary", "multiclass"]:
ValueError: continuous is not supported

Huggingface trainer.train() throws "IndexError: Target -1 is out of bounds" during Slovak sentences sentiment analysis using SlovakBert

My goal is to train a classifier able to do sentiment analysis in Slovak language using loaded SlovakBert model and HuggingFace library.
Code is executed on Google Colaboratory.
My dataset is read from one csv file for testing:
https://raw.githubusercontent.com/kinit-sk/slovakbert-auxiliary/main/sentiment_reviews/kinit_golden_games.csv
and one csv files for training :
https://raw.githubusercontent.com/kinit-sk/slovakbert-auxiliary/main/sentiment_reviews/kinit_golden_accomodation.csv
Data has two columns: column of sentences and 2nd column of labels which indicate sentiment of the sentence. Labels have values -1, 0 or 1.
After execution of trainer.train(), there is an error:
Num examples = 89
Num Epochs = 3
Instantaneous batch size per device = 8
Total train batch size (w. parallel, distributed & accumulation) = 8
Gradient Accumulation steps = 1
Total optimization steps = 36
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-11-44bfec8c5f70> in <module>()
40 )
41 #Then fine-tune your model by calling train():
---> 42 trainer.train()
43
44 trainer.evaluate()
7 frames
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction, label_smoothing)
2994 if size_average is not None or reduce is not None:
2995 reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 2996 return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
2997
2998
IndexError: Target -1 is out of bounds.
Code:
!pip install transformers==4.10.0 -qqq
!pip install datasets -qqq
from re import M
import numpy as np
from datasets import load_metric, load_dataset, Dataset
from transformers import TrainingArguments, Trainer, AutoModelForSequenceClassification, AutoTokenizer, DataCollatorWithPadding
import pandas as pd
from textblob import TextBlob
from textblob.sentiments import NaiveBayesAnalyzer
#links to dataset
test = 'https://raw.githubusercontent.com/kinit-sk/slovakbert-auxiliary/main/sentiment_reviews/kinit_golden_games.csv'
train = 'https://raw.githubusercontent.com/kinit-sk/slovakbert-auxiliary/main/sentiment_reviews/kinit_golden_accomodation.csv'
model_name = 'gerulata/slovakbert'
#Load data
dataset = load_dataset('csv', data_files={'train': train, 'test': test}, on_bad_lines='skip', column_names=["text", "label"], delimiter = ",")
#Preparing dataset
def tokenize_function(examples):
return tokenizer(examples['text'])
tokenized_datasets = dataset.map(tokenize_function, batched=True)
data_collator = DataCollatorWithPadding(tokenizer=tokenizer, padding="max_length")
#Train
#Load model
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=3)
#Specify place where we save checkpoints
training_args = TrainingArguments(output_dir="test_trainer")
#Metrics
metric = load_metric("accuracy")
def compute_metrics(eval_pred):
logits, labels = eval_pred
predictions = np.argmax(logits, axis=-1)
return metric.compute(predictions=predictions, references=labels)
# Create a Trainer object with your model, training arguments, training and test datasets, and evaluation function
training_args = TrainingArguments(output_dir="test_trainer", evaluation_strategy="epoch")
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_datasets['train'],
eval_dataset=tokenized_datasets['test'],
compute_metrics=compute_metrics,
data_collator=data_collator
)
#Then fine-tune your model by calling train():
trainer.train()
trainer.evaluate()
What is the reason of this error and how can it be solved?
Edit: I tried to cahnge the line:
model =AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=3)
to:
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=3, id2label={"LABEL_0": -1, "LABEL_1": 0, "LABEL_2": 1})
but the same error is thrown.

Using GridSearchCV for NLP Missing Positional Argument Self

I am working on a NLP problem. I've been testing various models and the process has been working fine.
from sklearn.linear_model import SGDClassifier
classifier = SGDClassifier().fit(X_train_tfidf, y_train)
y_predicted_tfidf = classifier.predict(X_test_tfidf)
from sklearn.metrics import precision_score
precision = precision_score(y_test, y_predicted_tfidf, pos_label=None,average='weighted')
print(precision)
>>> 0.79708294305
Now I am trying to employ Grid Search in order find tune parameters and running into an error.
from sklearn.model_selection import GridSearchCV
parameters = {'alpha': [0.00001, 0.0001, 0.001, 0.001, 0.01] }
gs_classifier = GridSearchCV(SGDClassifier, parameters, n_jobs=-1)
gs_classifier = gs_classifier.fit(X_train_tfidf, y_train)
Which results in the following output:
TypeError Traceback (most recent call last)
<ipython-input-25-95b85f78662f> in <module>()
1 gs_classifier = GridSearchCV(SGDClassifier, parameters, n_jobs=-1)
----> 2 gs_classifier = gs_classifier.fit(X_train_tfidf, y_train)
anaconda/lib/python3.6/site-packages/sklearn/model_selection/_search.py in fit(self, X, y, groups)
943 train/test set.
944 """
--> 945 return self._fit(X, y, groups,
...
/anaconda/lib/python3.6/site-packages/sklearn/base.py in clone(estimator, safe)
65 % (repr(estimator), type(estimator)))
66 klass = estimator.__class__
---> 67 new_object_params = estimator.get_params(deep=False)
68 for name, param in six.iteritems(new_object_params):
69 new_object_params[name] = clone(param, safe=False)
TypeError: get_params() missing 1 required positional argument: 'self'
I've tried various combinations of parameters and all result in the same error. For this example I've kept it simple and am just using a range of alpha values.

Error when fiting linear binary classifier with tensorflow ValueError: No gradients provided for any variable, check your graph

I have an error when trying to fit a linear binary classifier using step function and MSE, instead of softmax and cross-entropy loss. I have and error which I can't overcome probably due to shape inconsistencies. I provide a code sample. Please help
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification as gen_data
from sklearn.model_selection import train_test_split
rng = np.random
# Setting hyperparameters
n_observations = 100
lr = 0.005
n_iter = 100
# Generate input data
xs, ys = gen_data(n_features=2, n_redundant=0, n_informative=2,
random_state=0, n_clusters_per_class=1)
# Split data into train and test
X_train, X_test, y_train, y_test = train_test_split(xs, ys, test_size=.4)
X_train = np.float32(X_train)
X_test = np.float32(X_test)
# Graph
X = tf.placeholder(tf.float32)
Y = tf.placeholder(tf.float32)
W = tf.Variable(np.float32(rng.randn(2)), name="weight")
b = tf.Variable(np.float32(rng.randn()), name="bias")
def step(x):
is_greater = tf.greater(x, 0)
as_float = tf.to_float(is_greater)
doubled = tf.multiply(as_float, 2)
return tf.subtract(doubled, 1)
Y_pred = step(tf.add(tf.multiply(X , W), b))
cost = tf.reduce_mean(tf.squared_difference(Y_pred, Y))
# Using built-in optimization algorithm to train the model:
train_step = tf.train.GradientDescentOptimizer(0.005).minimize(cost)
sess = tf.Session()
sess.run(tf.initialize_all_variables())
for step in range(n_iter):
sess.run(train_step, feed_dict={X:X_train, Y:y_train})
print ("iter: {0}; weight: {1}; bias: {2}".format(step,
sess.run(W),
sess.run(b)))
This is the error:
ValueErrorTraceback (most recent call last)
<ipython-input-17-5a0c4711802c> in <module>()
26
27 # Using built-in optimization algorithm to train the model:
---> 28 train_step = tf.train.GradientDescentOptimizer(0.005).minimize(cost)
29
30 # Using TF differentiation from scratch to implement a step-by-step optimizer
/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.pyc in minimize(self, loss, global_step, var_list, gate_gradients, aggregation_method, colocate_gradients_with_ops, name, grad_loss)
405 "No gradients provided for any variable, check your graph for ops"
406 " that do not support gradients, between variables %s and loss %s." %
--> 407 ([str(v) for _, v in grads_and_vars], loss))
408
409 return self.apply_gradients(grads_and_vars, global_step=global_step,
ValueError: No gradients provided for any variable, check your graph for ops that do not support gradients, between variables ["<tf.Variable 'weight:0' shape=(2,) dtype=float64_ref>", "<tf.Variable 'bias:0' shape=() dtype=float32_ref>", "<tf.Variable 'weight_1:0' shape=(2,) dtype=float64_ref>", "<tf.Variable 'bias_1:0' shape=() dtype=float32_ref>",
Your training data isn't changing between training steps. That is, each training step feeds the same values for X and Y:
for step in range(n_iter):
sess.run(train_step, feed_dict={X:X_train, Y:y_train})
If you set different values for X and Y between training steps, the error should go away.

Resources