Problem with forward step in pytorch model - pytorch

Traceback saying that:
* Epoch 1/20
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-24-4f0f868c6227> in <module>()
1 max_epochs = 20
2 optim = torch.optim.Adam(model.parameters(), lr=1e-3)
----> 3 train(model, optim, bce_loss, max_epochs, data_tr, data_val)
2 frames
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),
TypeError: forward() takes 2 positional arguments but 3 were given
But in train section of my net I did not pass more than two arguments(including self argument)
this is link to my code
Maybe problem not in train

Your model has sub-modules for which you (implicitly) call forward with more than one argument:
x = self.dec_conv3(self.unpool3(x, indices3))
Your unpool are simply MaxPool layers - they do not expect two input arguments.

Related

NotImplementedError for predict keras for vae

So I've been using this example for convolutional vae here for mnist:
https://keras.io/examples/generative/vae/
vae.predict(mnist_digits)
NotImplementedError Traceback (most recent call last)
<ipython-input-8-2e6bf7edcacc> in <module>()
----> 1 vae.predict(mnist_digits)
1 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/func_graph.py in autograph_handler(*args, **kwargs)
1145 except Exception as e: # pylint:disable=broad-except
1146 if hasattr(e, "ag_error_metadata"):
-> 1147 raise e.ag_error_metadata.to_exception(e)
1148 else:
1149 raise
NotImplementedError: in user code:
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1801, in predict_function *
return step_function(self, iterator)
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1790, in step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1783, in run_step **
outputs = model.predict_step(data)
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1751, in predict_step
return self(x, training=False)
File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 67, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 475, in call
raise NotImplementedError('Unimplemented `tf.keras.Model.call()`: if you '
NotImplementedError: Exception encountered when calling layer "vae" (type VAE).
Unimplemented `tf.keras.Model.call()`: if you intend to create a `Model` with the Functional API, please provide `inputs` and `outputs` arguments. Otherwise, subclass `Model` with an overridden `call()` method.
Call arguments received:
• inputs=tf.Tensor(shape=(None, 28, 28, 1), dtype=float32)
• training=False
• mask=None
And similarly, with
vae(mnist_digits)
I get the following:
---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
<ipython-input-8-aa5f4fb52e20> in <module>()
----> 1 vae(mnist_digits)
1 frames
/usr/local/lib/python3.7/dist-packages/keras/engine/training.py in call(self, inputs, training, mask)
473 a list of tensors if there are more than one outputs.
474 """
--> 475 raise NotImplementedError('Unimplemented `tf.keras.Model.call()`: if you '
476 'intend to create a `Model` with the Functional '
477 'API, please provide `inputs` and `outputs` '
NotImplementedError: Exception encountered when calling layer "vae" (type VAE).
Unimplemented `tf.keras.Model.call()`: if you intend to create a `Model` with the Functional API, please provide `inputs` and `outputs` arguments. Otherwise, subclass `Model` with an overridden `call()` method.
Call arguments received: • inputs=tf.Tensor(shape=(70000, 28, 28, 1), dtype=float32) • training=None • mask=None
Do I have to create a custom predict function to resolve this.
The predict method is not inside your VAE class but inside the encoder or decoder as they are your Keras Models.
If you want to see how different digit classes are clustered:
z, _, _ = vae.encoder.predict(data)
If you want to sample digits from your latent space:
decoded = vae.decoder.predict(z)

Can not find the pytorch model when loading BERT model in Python

I am following this article to find the text similarity.
The code I have is this:
from sentence_transformers import SentenceTransformer
from tqdm import tqdm
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
import pandas as pd
documents = [
"Vodafone Wins ₹ 20,000 Crore Tax Arbitration Case Against Government",
"Voda Idea shares jump nearly 15% as Vodafone wins retro tax case in Hague",
"Gold prices today fall for 4th time in 5 days, down ₹6500 from last month high",
"Silver futures slip 0.36% to Rs 59,415 per kg, down over 12% this week",
"Amazon unveils drone that films inside your home. What could go wrong?",
"IPHONE 12 MINI PERFORMANCE MAY DISAPPOINT DUE TO THE APPLE B14 CHIP",
"Delhi Capitals vs Chennai Super Kings: Prithvi Shaw shines as DC beat CSK to post second consecutive win in IPL",
"French Open 2020: Rafael Nadal handed tough draw in bid for record-equaling 20th Grand Slam"
]
model = SentenceTransformer('sentence-transformers/bert-base-nli-mean-tokens')
I get an error when running the above code:
Full:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~\anaconda3\envs\py3_nlp\lib\tarfile.py in nti(s)
188 s = nts(s, "ascii", "strict")
--> 189 n = int(s.strip() or "0", 8)
190 except ValueError:
ValueError: invalid literal for int() with base 8: 'ld_tenso'
During handling of the above exception, another exception occurred:
InvalidHeaderError Traceback (most recent call last)
~\anaconda3\envs\py3_nlp\lib\tarfile.py in next(self)
2298 try:
-> 2299 tarinfo = self.tarinfo.fromtarfile(self)
2300 except EOFHeaderError as e:
~\anaconda3\envs\py3_nlp\lib\tarfile.py in fromtarfile(cls, tarfile)
1092 buf = tarfile.fileobj.read(BLOCKSIZE)
-> 1093 obj = cls.frombuf(buf, tarfile.encoding, tarfile.errors)
1094 obj.offset = tarfile.fileobj.tell() - BLOCKSIZE
~\anaconda3\envs\py3_nlp\lib\tarfile.py in frombuf(cls, buf, encoding, errors)
1034
-> 1035 chksum = nti(buf[148:156])
1036 if chksum not in calc_chksums(buf):
~\anaconda3\envs\py3_nlp\lib\tarfile.py in nti(s)
190 except ValueError:
--> 191 raise InvalidHeaderError("invalid header")
192 return n
InvalidHeaderError: invalid header
During handling of the above exception, another exception occurred:
ReadError Traceback (most recent call last)
~\anaconda3\envs\py3_nlp\lib\site-packages\torch\serialization.py in _load(f, map_location,
pickle_module, **pickle_load_args)
594 try:
--> 595 return legacy_load(f)
596 except tarfile.TarError:
~\anaconda3\envs\py3_nlp\lib\site-packages\torch\serialization.py in legacy_load(f)
505
--> 506 with closing(tarfile.open(fileobj=f, mode='r:', format=tarfile.PAX_FORMAT)) as
tar, \
507 mkdtemp() as tmpdir:
~\anaconda3\envs\py3_nlp\lib\tarfile.py in open(cls, name, mode, fileobj, bufsize, **kwargs)
1590 raise CompressionError("unknown compression type %r" % comptype)
-> 1591 return func(name, filemode, fileobj, **kwargs)
1592
~\anaconda3\envs\py3_nlp\lib\tarfile.py in taropen(cls, name, mode, fileobj, **kwargs)
1620 raise ValueError("mode must be 'r', 'a', 'w' or 'x'")
-> 1621 return cls(name, mode, fileobj, **kwargs)
1622
~\anaconda3\envs\py3_nlp\lib\tarfile.py in __init__(self, name, mode, fileobj, format, tarinfo, dereference, ignore_zeros, encoding, errors, pax_headers, debug, errorlevel, copybufsize)
1483 self.firstmember = None
-> 1484 self.firstmember = self.next()
1485
~\anaconda3\envs\py3_nlp\lib\tarfile.py in next(self)
2310 elif self.offset == 0:
-> 2311 raise ReadError(str(e))
2312 except EmptyHeaderError:
ReadError: invalid header
During handling of the above exception, another exception occurred:
RuntimeError Traceback (most recent call last)
~\anaconda3\envs\py3_nlp\lib\site-packages\transformers\modeling_utils.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
1210 try:
-> 1211 state_dict = torch.load(resolved_archive_file, map_location="cpu")
1212 except Exception:
~\anaconda3\envs\py3_nlp\lib\site-packages\torch\serialization.py in load(f, map_location, pickle_module, **pickle_load_args)
425 pickle_load_args['encoding'] = 'utf-8'
--> 426 return _load(f, map_location, pickle_module, **pickle_load_args)
427 finally:
~\anaconda3\envs\py3_nlp\lib\site-packages\torch\serialization.py in _load(f, map_location, pickle_module, **pickle_load_args)
598 # .zip is used for torch.jit.save and will throw an un-pickling error here
--> 599 raise RuntimeError("{} is a zip archive (did you mean to use torch.jit.load()?)".format(f.name))
600 # if not a tarfile, reset file offset and proceed
RuntimeError: C:\Users\user1/.cache\torch\sentence_transformers\sentence-transformers_bert-base-nli-mean-tokens\pytorch_model.bin is a zip archive (did you mean to use torch.jit.load()?)
During handling of the above exception, another exception occurred:
OSError Traceback (most recent call last)
<ipython-input-3-bba56aac60aa> in <module>
----> 1 model = SentenceTransformer('sentence-transformers/bert-base-nli-mean-tokens')
~\anaconda3\envs\py3_nlp\lib\site-packages\sentence_transformers\SentenceTransformer.py in __init__(self, model_name_or_path, modules, device, cache_folder)
88
89 if os.path.exists(os.path.join(model_path, 'modules.json')): #Load as SentenceTransformer model
---> 90 modules = self._load_sbert_model(model_path)
91 else: #Load with AutoModel
92 modules = self._load_auto_model(model_path)
~\anaconda3\envs\py3_nlp\lib\site-packages\sentence_transformers\SentenceTransformer.py in _load_sbert_model(self, model_path)
820 for module_config in modules_config:
821 module_class = import_from_string(module_config['type'])
--> 822 module = module_class.load(os.path.join(model_path, module_config['path']))
823 modules[module_config['name']] = module
824
~\anaconda3\envs\py3_nlp\lib\site-packages\sentence_transformers\models\Transformer.py in load(input_path)
122 with open(sbert_config_path) as fIn:
123 config = json.load(fIn)
--> 124 return Transformer(model_name_or_path=input_path, **config)
125
126
~\anaconda3\envs\py3_nlp\lib\site-packages\sentence_transformers\models\Transformer.py in __init__(self, model_name_or_path, max_seq_length, model_args, cache_dir, tokenizer_args, do_lower_case, tokenizer_name_or_path)
27
28 config = AutoConfig.from_pretrained(model_name_or_path, **model_args, cache_dir=cache_dir)
---> 29 self.auto_model = AutoModel.from_pretrained(model_name_or_path, config=config, cache_dir=cache_dir)
30 self.tokenizer = AutoTokenizer.from_pretrained(tokenizer_name_or_path if tokenizer_name_or_path is not None else model_name_or_path, cache_dir=cache_dir, **tokenizer_args)
31
~\anaconda3\envs\py3_nlp\lib\site-packages\transformers\models\auto\auto_factory.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
393 if type(config) in cls._model_mapping.keys():
394 model_class = _get_model_class(config, cls._model_mapping)
--> 395 return model_class.from_pretrained(pretrained_model_name_or_path, *model_args, config=config, **kwargs)
396 raise ValueError(
397 f"Unrecognized configuration class {config.__class__} for this kind of AutoModel: {cls.__name__}.\n"
~\anaconda3\envs\py3_nlp\lib\site-packages\transformers\modeling_utils.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
1212 except Exception:
1213 raise OSError(
-> 1214 f"Unable to load weights from pytorch checkpoint file for '{pretrained_model_name_or_path}' "
1215 f"at '{resolved_archive_file}'"
1216 "If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True. "
OSError: Unable to load weights from pytorch checkpoint file for 'C:\Users\user1/.cache\torch\sentence_transformers\sentence-transformers_bert-base-nli-mean-tokens\' at 'C:\Users\user1/.cache\torch\sentence_transformers\sentence-transformers_bert-base-nli-mean-tokens\pytorch_model.bin'If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.
Short:
OSError: Unable to load weights from pytorch checkpoint file for 'C:\Users\user1/.cache\torch\sentence_transformers\sentence-transformers_bert-base-nli-mean-tokens' at 'C:\Users\user1/.cache\torch\sentence_transformers\sentence-transformers_bert-base-nli-mean-tokens\pytorch_model.bin'If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.
I do have the pytorch_model.bin in the '.cache\torch\sentence_transformers\sentence-transformers_bert-base-nli-mean-tokens' folder.
Why am I getting this error?
The reason for the error seems to be that the pre-trained model weight files are not available or loadable.
You can try that one to load pretrained model weight file:
from transformers import AutoModel
model = AutoModel.from_pretrained('sentence-transformers/bert-base-nli-mean-tokens')
Reference: https://huggingface.co/sentence-transformers/bert-base-nli-mean-tokens
Also, the model's hugging face page says:
This model is deprecated. Please don't use it as it produces sentence embeddings of low quality. You can find recommended sentence embedding models here: SBERT.net - Pretrained Models
Maybe you might want to take a look.
You may need to use the model without sentence_transformers.
The following code is tweaked from https://www.sbert.net/examples/applications/computing-embeddings/README.html
As I understand it, from the exception you need to pass from_tf=True to AutoModel.
from transformers import AutoTokenizer, AutoModel
import torch
#Mean Pooling - Take attention mask into account for correct averaging
def mean_pooling(model_output, attention_mask):
token_embeddings = model_output[0] #First element of model_output contains all token embeddings
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
sum_embeddings = torch.sum(token_embeddings * input_mask_expanded, 1)
sum_mask = torch.clamp(input_mask_expanded.sum(1), min=1e-9)
return sum_embeddings / sum_mask
#Sentences we want sentence embeddings for
sentences = ['This framework generates embeddings for each input sentence',
'Sentences are passed as a list of string.',
'The quick brown fox jumps over the lazy dog.']
#Load AutoModel from huggingface model repository
tokenizer = AutoTokenizer.from_pretrained('sentence-transformers/bert-base-nli-mean-tokens')
model = AutoModel.from_pretrained('sentence-transformers/bert-base-nli-mean-tokens',from_tf=True)
#Tokenize sentences
encoded_input = tokenizer(sentences, padding=True, truncation=True, max_length=128, return_tensors='pt')
#Compute token embeddings
with torch.no_grad():
model_output = model(**encoded_input)
#Perform pooling. In this case, mean pooling
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])

Why does ndcg_score result in nan values?

Consider the following code:
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import classification_report, ndcg_score, make_scorer
from sklearn.svm import SVC
X_data = pd.DataFrame(np.random.randint(0,1,size=(100, 4)), columns=list('ABCD'))
X_data = sp.csr_matrix(X_data.to_numpy())
Y_data = pd.DataFrame(np.random.choice([0,1,5], 100), columns=['Y'])
# Set the parameters by cross-validation
param_grid = {'kernel': ['rbf'], 'gamma': [1e-3, 1e-4],
'C': [1, 10, 100, 1000]}
clf = GridSearchCV(SVC(), param_grid, scoring=ndcg_score, refit=True, verbose=3, n_jobs=-1, error_score='raise')
test = clf.fit(X_data, Y_data)
I am wondering why this would raise the following error:
Fitting 5 folds for each of 8 candidates, totalling 40 fits
---------------------------------------------------------------------------
_RemoteTraceback Traceback (most recent call last)
_RemoteTraceback:
"""
Traceback (most recent call last):
File "C:\Users\test\Anaconda3\envs\kaggleSVM\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
r = call_item()
File "C:\Users\test\Anaconda3\envs\kaggleSVM\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
return self.fn(*self.args, **self.kwargs)
File "C:\Users\test\Anaconda3\envs\kaggleSVM\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
return self.func(*args, **kwargs)
File "C:\Users\test\Anaconda3\envs\kaggleSVM\lib\site-packages\joblib\parallel.py", line 262, in __call__
return [func(*args, **kwargs)
File "C:\Users\test\Anaconda3\envs\kaggleSVM\lib\site-packages\joblib\parallel.py", line 262, in <listcomp>
return [func(*args, **kwargs)
File "C:\Users\test\Anaconda3\envs\kaggleSVM\lib\site-packages\sklearn\utils\fixes.py", line 222, in __call__
return self.function(*args, **kwargs)
File "C:\Users\test\Anaconda3\envs\kaggleSVM\lib\site-packages\sklearn\model_selection\_validation.py", line 625, in _fit_and_score
test_scores = _score(estimator, X_test, y_test, scorer, error_score)
File "C:\Users\test\Anaconda3\envs\kaggleSVM\lib\site-packages\sklearn\model_selection\_validation.py", line 687, in _score
scores = scorer(estimator, X_test, y_test)
File "C:\Users\test\Anaconda3\envs\kaggleSVM\lib\site-packages\sklearn\utils\validation.py", line 74, in inner_f
return f(**kwargs)
File "C:\Users\test\Anaconda3\envs\kaggleSVM\lib\site-packages\sklearn\metrics\_ranking.py", line 1564, in ndcg_score
y_true = check_array(y_true, ensure_2d=False)
File "C:\Users\test\Anaconda3\envs\kaggleSVM\lib\site-packages\sklearn\utils\validation.py", line 63, in inner_f
return f(*args, **kwargs)
File "C:\Users\test\Anaconda3\envs\kaggleSVM\lib\site-packages\sklearn\utils\validation.py", line 710, in check_array
array = array.astype(np.float64)
TypeError: float() argument must be a string or a number, not 'SVC'
"""
The above exception was the direct cause of the following exception:
TypeError Traceback (most recent call last)
<ipython-input-45-93a8890b095c> in <module>
18
19 clf = GridSearchCV(SVC(), param_grid, scoring=ndcg_score, refit=True, verbose=3, n_jobs=-1, error_score='raise')
---> 20 test = clf.fit(X_data, Y_data)
21 #print(test.best_score_)
~\Anaconda3\envs\kaggleSVM\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
61 extra_args = len(args) - len(all_args)
62 if extra_args <= 0:
---> 63 return f(*args, **kwargs)
64
65 # extra_args > 0
~\Anaconda3\envs\kaggleSVM\lib\site-packages\sklearn\model_selection\_search.py in fit(self, X, y, groups, **fit_params)
839 return results
840
--> 841 self._run_search(evaluate_candidates)
842
843 # multimetric is determined here because in the case of a callable
~\Anaconda3\envs\kaggleSVM\lib\site-packages\sklearn\model_selection\_search.py in _run_search(self, evaluate_candidates)
1294 def _run_search(self, evaluate_candidates):
1295 """Search all candidates in param_grid"""
-> 1296 evaluate_candidates(ParameterGrid(self.param_grid))
1297
1298
~\Anaconda3\envs\kaggleSVM\lib\site-packages\sklearn\model_selection\_search.py in evaluate_candidates(candidate_params, cv, more_results)
793 n_splits, n_candidates, n_candidates * n_splits))
794
--> 795 out = parallel(delayed(_fit_and_score)(clone(base_estimator),
796 X, y,
797 train=train, test=test,
~\Anaconda3\envs\kaggleSVM\lib\site-packages\joblib\parallel.py in __call__(self, iterable)
1052
1053 with self._backend.retrieval_context():
-> 1054 self.retrieve()
1055 # Make sure that we get a last message telling us we are done
1056 elapsed_time = time.time() - self._start_time
~\Anaconda3\envs\kaggleSVM\lib\site-packages\joblib\parallel.py in retrieve(self)
931 try:
932 if getattr(self._backend, 'supports_timeout', False):
--> 933 self._output.extend(job.get(timeout=self.timeout))
934 else:
935 self._output.extend(job.get())
~\Anaconda3\envs\kaggleSVM\lib\site-packages\joblib\_parallel_backends.py in wrap_future_result(future, timeout)
540 AsyncResults.get from multiprocessing."""
541 try:
--> 542 return future.result(timeout=timeout)
543 except CfTimeoutError as e:
544 raise TimeoutError from e
~\Anaconda3\envs\kaggleSVM\lib\concurrent\futures\_base.py in result(self, timeout)
442 raise CancelledError()
443 elif self._state == FINISHED:
--> 444 return self.__get_result()
445 else:
446 raise TimeoutError()
~\Anaconda3\envs\kaggleSVM\lib\concurrent\futures\_base.py in __get_result(self)
387 if self._exception:
388 try:
--> 389 raise self._exception
390 finally:
391 # Break a reference cycle with the exception in self._exception
TypeError: float() argument must be a string or a number, not 'SVC'
I am not quite sure why this would result in a TypeError.
I cannot recreate the error you are reporting, but using error_score="raise" and n_jobs=1 (not strictly necessary, but the output is a little easier to read), and wrapping ndcg_score with make_scorer with needs_proba=True, I get this one:
Only ('multilabel-indicator', 'continuous-multioutput', 'multiclass-multioutput') formats are supported. Got multiclass instead
which supports my first comment: NDCG assumes multilabel format. That suggests you need to understand whether NDCG is really appropriate for your task, and if so either turn your problem into a multilabel one or write a custom scorer that converts the multiclass output into a multilabel (one-hot encoded) one before computing the score.

issue using imbalanced dataset with logloss and RFECV

I am using imbalanced dataset(54:38:7%) with RFECV for feature selection like this:
# making a multi logloss metric
from sklearn.metrics import log_loss, make_scorer
log_loss_rfe = make_scorer(score_func=log_loss, greater_is_better=False)
# initiating Light GBM classifier
lgb_rfe = LGBMClassifier(objective='multiclass', learning_rate=0.01, verbose=0, force_col_wise=True,
random_state=100, n_estimators=5_000, n_jobs=7)
# initiating RFECV
rfe = RFECV(estimator=lgb_rfe, min_features_to_select=2, verbose=3, n_jobs=2, cv=3, scoring=log_loss_rfe)
# fitting it
rfe.fit(X=X_train, y=y_train)
And I got an error, presumably because the subsamples sklearn's RFECV has made doesn't have all of the classes from my data. I had no issues fitting the very same data outside of RFECV.
Here's the complete error:
---------------------------------------------------------------------------
_RemoteTraceback Traceback (most recent call last)
_RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/ubuntu/ds_jup_venv/lib/python3.8/site-packages/joblib/externals/loky/process_executor.py", line 431, in _process_worker
r = call_item()
File "/home/ubuntu/ds_jup_venv/lib/python3.8/site-packages/joblib/externals/loky/process_executor.py", line 285, in __call__
return self.fn(*self.args, **self.kwargs)
File "/home/ubuntu/ds_jup_venv/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 595, in __call__
return self.func(*args, **kwargs)
File "/home/ubuntu/ds_jup_venv/lib/python3.8/site-packages/joblib/parallel.py", line 262, in __call__
return [func(*args, **kwargs)
File "/home/ubuntu/ds_jup_venv/lib/python3.8/site-packages/joblib/parallel.py", line 262, in <listcomp>
return [func(*args, **kwargs)
File "/home/ubuntu/ds_jup_venv/lib/python3.8/site-packages/sklearn/utils/fixes.py", line 222, in __call__
return self.function(*args, **kwargs)
File "/home/ubuntu/ds_jup_venv/lib/python3.8/site-packages/sklearn/feature_selection/_rfe.py", line 37, in _rfe_single_fit
return rfe._fit(
File "/home/ubuntu/ds_jup_venv/lib/python3.8/site-packages/sklearn/feature_selection/_rfe.py", line 259, in _fit
self.scores_.append(step_score(estimator, features))
File "/home/ubuntu/ds_jup_venv/lib/python3.8/site-packages/sklearn/feature_selection/_rfe.py", line 39, in <lambda>
lambda estimator, features: _score(
File "/home/ubuntu/ds_jup_venv/lib/python3.8/site-packages/sklearn/model_selection/_validation.py", line 674, in _score
scores = scorer(estimator, X_test, y_test)
File "/home/ubuntu/ds_jup_venv/lib/python3.8/site-packages/sklearn/metrics/_scorer.py", line 199, in __call__
return self._score(partial(_cached_call, None), estimator, X, y_true,
File "/home/ubuntu/ds_jup_venv/lib/python3.8/site-packages/sklearn/metrics/_scorer.py", line 242, in _score
return self._sign * self._score_func(y_true, y_pred,
File "/home/ubuntu/ds_jup_venv/lib/python3.8/site-packages/sklearn/utils/validation.py", line 63, in inner_f
return f(*args, **kwargs)
File "/home/ubuntu/ds_jup_venv/lib/python3.8/site-packages/sklearn/metrics/_classification.py", line 2265, in log_loss
raise ValueError("y_true and y_pred contain different number of "
ValueError: y_true and y_pred contain different number of classes 3, 2. Please provide the true labels explicitly through the labels argument. Classes found in y_true: [0 1 2]
"""
The above exception was the direct cause of the following exception:
ValueError Traceback (most recent call last)
<ipython-input-9-5feb62a6f457> in <module>
1 rfe = RFECV(estimator=lgb_rfe, min_features_to_select=2, verbose=3, n_jobs=2, cv=3, scoring=log_loss_rfe)
----> 2 rfe.fit(X=X_train, y=y_train)
~/ds_jup_venv/lib/python3.8/site-packages/sklearn/feature_selection/_rfe.py in fit(self, X, y, groups)
603 func = delayed(_rfe_single_fit)
604
--> 605 scores = parallel(
606 func(rfe, self.estimator, X, y, train, test, scorer)
607 for train, test in cv.split(X, y, groups))
~/ds_jup_venv/lib/python3.8/site-packages/joblib/parallel.py in __call__(self, iterable)
1052
1053 with self._backend.retrieval_context():
-> 1054 self.retrieve()
1055 # Make sure that we get a last message telling us we are done
1056 elapsed_time = time.time() - self._start_time
~/ds_jup_venv/lib/python3.8/site-packages/joblib/parallel.py in retrieve(self)
931 try:
932 if getattr(self._backend, 'supports_timeout', False):
--> 933 self._output.extend(job.get(timeout=self.timeout))
934 else:
935 self._output.extend(job.get())
~/ds_jup_venv/lib/python3.8/site-packages/joblib/_parallel_backends.py in wrap_future_result(future, timeout)
540 AsyncResults.get from multiprocessing."""
541 try:
--> 542 return future.result(timeout=timeout)
543 except CfTimeoutError as e:
544 raise TimeoutError from e
1 frames
/usr/lib/python3.8/concurrent/futures/_base.py in __get_result(self)
386 def __get_result(self):
387 if self._exception:
--> 388 raise self._exception
389 else:
390 return self._result
ValueError: y_true and y_pred contain different number of classes 3, 2. Please provide the true labels explicitly through the labels argument. Classes found in y_true: [0 1 2]
How to fix this to be able to select features recursively?
Log-loss needs the probability predictions, not the class predictions, so you should add
log_loss_rfe = make_scorer(score_func=log_loss, needs_proba=True, greater_is_better=False)
The error is because without that, the passed y_pred is one-dimensional (classes 0,1,2) and sklearn assumes it's a binary classification problem and those predictions are probability of the positive class. To deal with that, it adds on the probability of the negative class, but then there are only two columns compared to your three classes.
Consider applying stratified cross-validation, which will try to preserve the fraction of samples for each class. Experiment with one of these scikit-learn cross-validators:
sklearn.model_selection.StratifiedKFold,
StratifiedShuffleSplit,
RepeatedStratifiedKFold, replacing cv=3 in your RFECV with the chosen cross-validator.
Edit
I have missed the fact that StratifiedKFold is the default cross-validator in RFECV. Actually, the error is related to log_loss_rfe, which was defined with needs_proba=False. Credit to #BenReiniger!

Bert forward reports error on GPU; but runs fine on CPU

I would like to encode articles using Bert. My code runs fine on CPU, but failed on GPU. The GPU is a on a remote server.
My strategy is to tokenize the articles using CPU first. Then move both model and tokens to GPU. I am not sure if this is a good approach. Any suggestion of improvements are welcomed. Thank you in advance.
def assign_gpu(token):
token_tensor = token['input_ids'].to('cuda')
token_typeid = token['token_type_ids'].to('cuda')
attention_mask = token['attention_mask'].to('cuda')
output = {'input_ids': token_tensor,
'token_type_ids': token_typeid,
'attention_mask': attention_mask}
return output
bs = 16
data_dl = DataLoader(PatentDataset(df), batch_size=bs, shuffle=False)
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased/')
model = BertModel.from_pretrained('bert-base-uncased/')
inputs_list = []
for a in data_dl:
inputs_list.append(tokenizer.batch_encode_plus(a, pad_to_max_length=True, return_tensors='pt'))
# GPU part
model.cuda()
model.eval()
out_list = []
with torch.no_grad():
for i, inputs in enumerate(inputs_list):
inputs = assign_gpu(inputs)
output = model(**inputs)[0][:, 0, :]
out_list.append(output)
The error message is
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-3-f6979d077f60> in <module>
6 for i, inputs in enumerate(inputs_list):
7 inputs = assign_gpu(inputs)
----> 8 output = model(**inputs)[0][:, 0, :]
9 out_list.append(output)
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
530 result = self._slow_forward(*input, **kwargs)
531 else:
--> 532 result = self.forward(*input, **kwargs)
533 for hook in self._forward_hooks.values():
534 hook_result = hook(self, input, result)
~/volta_pypkg/lib/python3.6/site-packages/transformers/modeling_bert.py in forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, output_attentions, output_hidden_states)
751
752 embedding_output = self.embeddings(
--> 753 input_ids=input_ids, position_ids=position_ids, token_type_ids=token_type_ids, inputs_embeds=inputs_embeds
754 )
755 encoder_outputs = self.encoder(
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
530 result = self._slow_forward(*input, **kwargs)
531 else:
--> 532 result = self.forward(*input, **kwargs)
533 for hook in self._forward_hooks.values():
534 hook_result = hook(self, input, result)
~/volta_pypkg/lib/python3.6/site-packages/transformers/modeling_bert.py in forward(self, input_ids, token_type_ids, position_ids, inputs_embeds)
180 token_type_embeddings = self.token_type_embeddings(token_type_ids)
181
--> 182 embeddings = inputs_embeds + position_embeddings + token_type_embeddings
183 embeddings = self.LayerNorm(embeddings)
184 embeddings = self.dropout(embeddings)
RuntimeError: CUDA error: device-side assert triggered
Another issue is that if I rerun the GPU for loop on a new cell in the interactive Jupyter notebook, it reports error on the assign_gpu part. I am not sure if it's caused by my remote server or some error in my code.
inputs = assign_gpu(inputs_list[0])
output = model(**inputs)
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-5-ec07b46d1a63> in <module>
----> 1 inputs = assign_gpu(inputs_list[0])
2 output = model(**inputs)
<ipython-input-1-cb4bdd793d7f> in assign_gpu(token)
29
30 def assign_gpu(token):
---> 31 token_tensor = token['input_ids'].to('cuda')
32 token_typeid = token['token_type_ids'].to('cuda')
33 attention_mask = token['attention_mask'].to('cuda')
RuntimeError: CUDA error: device-side assert triggered

Resources