Dealing with infs in Seq2Seq Trainer - python-3.x

I am trying to fine tune a hugging face model onto a Shell Code dataset (https://huggingface.co/datasets/SoLID/shellcode_i_a32)
The training code is a basic hugging face trainer method but we keep running into nan/inf issues
from transformers import PreTrainedTokenizerFast
tokenizer = PreTrainedTokenizerFast(tokenizer_file="tkn1.json", padding_side="right")
special_tokens={'pad_token': "[PAD]"}
tokenizer.add_special_tokens(special_tokens)
# token_wrap = PreTrainedTokenizer()
data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=model)
training_args = Seq2SeqTrainingArguments(
output_dir="./results",
evaluation_strategy="epoch",
lr_scheduler_type = "cosine",
weight_decay=0.01,
save_total_limit=3,
per_device_train_batch_size=128,
num_train_epochs=5,
warmup_ratio=0.06,
learning_rate=1.0e-04,
# fp16=True,
debug=["underflow_overflow"]
)
trainer = Seq2SeqTrainer(
model=model,
args=training_args,
train_dataset=tokenized_datasets["test"],
eval_dataset=tokenized_datasets["test"],
tokenizer=tokenizer,
data_collator=data_collator,
)
# trainer.train()
# print(tokenizer.)
trainer.train()
# eval_loss = trainer.evaluate()
# print(f">>> Perplexity: {math.exp(eval_loss['eval_loss']):.2f}")
The outputs look like -
You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
Detected inf/nan during batch_number=0
Last 1 forward frames:
abs min abs max metadata
shared Embedding
5.42e-06 2.04e+04 weight
0.00e+00 1.46e+03 input[0]
1.56e-03 2.04e+04 output
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-120-ff4a54906908> in <module>
33 # trainer.train()
34 # print(tokenizer.)
---> 35 trainer.train()
36 # eval_loss = trainer.evaluate()
37 # print(f">>> Perplexity: {math.exp(eval_loss['eval_loss']):.2f}")
9 frames
/usr/local/lib/python3.8/dist-packages/transformers/debug_utils.py in forward_hook(self, module, input, output)
278
279 # now we can abort, as it's pointless to continue running
--> 280 raise ValueError(
281 "DebugUnderflowOverflow: inf/nan detected, aborting as there is no point running further. "
282 "Please scroll up above this traceback to see the activation values prior to this event."
ValueError: DebugUnderflowOverflow: inf/nan detected, aborting as there is no point running further. Please scroll up above this traceback to see the activation values prior to this event.
The very first layer seems to start throwing inf/nans when we start training and doesn't go much beyond that
We have tried tweaking our training arguments but have hit a brick wall here. Any help appreciated!

Related

Classification Metrics for Sequential tagging in NLP

I am writing one sequential tagging code in NLP using python google colab. As I have used crf layer in my model, I used crf.metrices to get the results . I had executed the following code :
//
pred_cat = model.predict(X_te)
pred = np.argmax(pred_cat, axis=-1)
y_te_true = np.argmax(y_te, axis=-1)
tags=['O', 'BOC', 'IOC','<pad>']
from sklearn_crfsuite import metrics as crf_metrics
print(crf_metrics.flat_classification_report(y_true=y_te_true,y_pred=pred,labels=tags))
//
I am getting the following error:
TypeError Traceback (most recent call last)
<ipython-input-21-510856efb26d> in <module>()
1 from sklearn_crfsuite import metrics as crf_metrics
----> 2 print(crf_metrics.flat_classification_report(y_true=y_te_true,y_pred=pred,labels=tags))
1 frames
/usr/local/lib/python3.7/dist-packages/sklearn_crfsuite/metrics.py in flat_classification_report(y_true, y_pred, labels, **kwargs)
66 """
67 from sklearn import metrics
---> 68 return metrics.classification_report(y_true, y_pred, labels, **kwargs)
69
70
TypeError: classification_report() takes 2 positional arguments but 3 were given
Can anyone please throw some light on this?

save and load fine-tuned bert classification model using tensorflow 2.0

I am trying to save a fine-tuned binary classification model based on pretrained Bert module 'uncased_L-12_H-768_A-12'. I'm using tf2.
The code set up the model structure:
bert_classifier, bert_encoder =bert.bert_models.classifier_model(bert_config, num_labels=2)
then:
# import pre-trained model structure from the check point file
checkpoint = tf.train.Checkpoint(model=bert_encoder)
checkpoint.restore(
os.path.join(gs_folder_bert, 'bert_model.ckpt')).assert_consumed()
then: I compiled and fit the model
bert_classifier.compile(
optimizer=optimizer,
loss=loss,
metrics=metrics)
bert_classifier.fit(
Text_train, Label_train,
validation_data=(Text_val, Label_val),
batch_size=32,
epochs=1)
at last: I saved the model in the model folder which then automatically generates a file named saved_model.pb within
bert_classifier.save('/content/drive/My Drive/model')
also tried this:
tf.saved_model.save(bert_classifier, export_dir='/content/drive/My Drive/omg')
now I try to load the model and apply it on test data:
from tensorflow import keras
ttt = keras.models.load_model('/content/drive/My Drive/model')
I got:
KeyError Traceback (most recent call last)
<ipython-input-77-93f80aa585da> in <module>()
----> 1 tf.keras.models.load_model(filepath='/content/drive/My Drive/omg', custom_objects={'Transformera':bert_classifier})
9 frames
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/saving/saved_model/load.py in _revive_graph_network(self, metadata, node_id)
392 else:
393 model = models_lib.Functional(
--> 394 inputs=[], outputs=[], name=config['name'])
395
396 # Record this model and its layers. This will later be used to reconstruct
KeyError: 'name'
This error message doesn't help me with what to do...please kindly advice.
I also tried to save the model in h5 format, but when i load it
ttt = keras.models.load_model('/content/drive/My Drive/model.h5')
I got this error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-36-12f76139ec24> in <module>()
----> 1 ttt = keras.models.load_model('/content/drive/My Drive/model.h5')
5 frames
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/utils/generic_utils.py in class_and_config_for_serialized_keras_object(config, module_objects, custom_objects, printable_module_name)
294 cls = get_registered_object(class_name, custom_objects, module_objects)
295 if cls is None:
--> 296 raise ValueError('Unknown ' + printable_module_name + ': ' + class_name)
297
298 cls_config = config['config']
ValueError: Unknown layer: BertClassifier
Seems as if you have the answer right in the question: '/content/drive/My Drive/model' will fail due to the whitespace character.
You could try it with escaping the backspace: '/content/drive/My\ Drive/model'.
Other option, after I had exactly the same problem with saving and loading. What helped was to just save the weights of the pre-trained model and not saving the whole model:
Just take a look right here: https://keras.io/api/models/model_saving_apis/, especially at the methods save_weights() and load_weights().

XLNetForSequenceClassification Pretrained model unable to load

I tried loading the XLNet pretrained but this occurred. I've tried this before and it worked, however, now it doesn't. Any suggestion on how to fix this problem?
model = XLNetForSequenceClassification.from_pretrained("xlnet-large-cased", num_labels = 2)
model.to(device)
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-55-d6f698a3714b> in <module>()
----> 1 model = XLNetForSequenceClassification.from_pretrained("xlnet-large-cased", num_labels = 2)
2 model.to(device)
3 frames
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/sparse.py in __init__(self, num_embeddings, embedding_dim, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse, _weight)
95 self.scale_grad_by_freq = scale_grad_by_freq
96 if _weight is None:
---> 97 self.weight = Parameter(torch.Tensor(num_embeddings, embedding_dim))
98 self.reset_parameters()
99 else:
RuntimeError: Trying to create tensor with negative dimension -1: [-1, 1024]
You should import XLNetForSequenceClassification from transformers and not from pytorch-transformers. First, make sure transformers is installed:
> pip install transformers
Then, in your code:
from transformers import XLNetForSequenceClassification
model = XLNetForSequenceClassification.from_pretrained("xlnet-large-cased", num_labels = 2)
This should work.
If you've not changed internally anything, most likely a version mismatch. Have you upgraded any relevant modules? Go back to previous version if you have that should solve it.
Pytorch Quantization RuntimeError: Trying to create tensor with negative dimension

Interfacing with tensorflow model results in an error

I'm currently using DNN regressor with 11 input neurons multiple hidden layers, and 11 output neurons.
I've have finished training the model. However, when I try to interface with the model with this code:
a = np.array(0,0,0,0,0,0,0,0,0,0,0)
y_pred = regressor.predict(a, as_iterable=False)
print(y_pred)
However this generates an error:
InvalidArgumentError (see above for traceback): Input to reshape is a tensor with 11 values, but the requested shape has 121
[[node dnn/input_from_feature_columns/input_layer/X/Reshape (defined at regressor_full.py:177) ]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "regressor_full.py", line 206, in <module>
a = np.array(0,0,0,0,0,0,0,0,0,0,0)
ValueError: only 2 non-keyword arguments accepted.
I'm just trying to generate a prediction for 11 input variables and predict 11 output variables(as I trained the model), however, it generates an error.
I've tried changing the as_iterable to True, however this doesn't change anything. What is currently causing the error, and how do I fix it? Thank you for your time.
As requested, I included the code that defines the model.
import tensorflow.contrib.learn as skflow
regressor = skflow.DNNRegressor(feature_columns=feature_columns,
label_dimension=11,
hidden_units=hidden_layers,
model_dir=MODEL_PATH,
dropout=dropout,
config=test_config,
activation_fn = tf.nn.relu,
optimizer = tf.compat.v1.train.AdamOptimizer(learning_rate = learning_rate)
)
Try changing the input array to a = np.array([0,0,0,0,0,0,0,0,0,0,0],dtype=" ") and also define the dtype which might solve the issue.

I am trying to classify text using NLTK Naive Bayes Classifier. I am getting ValueError: too many values to unpack (expected 2)

I text cleaned and generated bi-grams. The bi-grams are displaying but I am trying to train and test text to classify using NLTK Naïve Bayes. I am getting the error shown in the title.
import nltk
from nltk.util import ngrams
#generating bigrams for all narratives
bigrams_all=ngrams(df,2)
#printing bigrams of one narrative
ninety_seven=df.loc[97].loc['FSR Narrative']
nine_bi=ngrams(ninety_seven,2)
print(nine_bi)
print([" ".join(t) for t in nine_bi])
# set that we'll train our classifier with
training_set = df[:1280]
# set that we'll test our classifier with
training_set = df[1280:]
classifier = nltk.NaiveBayesClassifier.train(training_set)
the Error trace:
ValueError Traceback (most recent call last)
<ipython-input-13-745201c14989> in <module>()
113 training_set = df[1280:]
114
--> 115 classifier = nltk.NaiveBayesClassifier.train(training_set)
C:\Anaconda\envs\py35\lib\site-packages\nltk\classify\naivebayes.py in train(cls, labeled_featuresets, estimator)
195 # Count up how many times each feature value occurred, given
196 # the label and featurename.
--> 197 for featureset, label in labeled_featuresets:
198 label_freqdist[label] += 1
199 for fname, fval in featureset.items():
ValueError: too many values to unpack (expected 2)

Resources