How to perform Batch inferencing with RoBERTa ONNX quantized model?

How to perform Batch inferencing with RoBERTa ONNX quantized model? - nlp

I have converted RoBERTa PyTorch model to ONNX model and quantized it. I am able to get the scores from ONNX model for single input data point (each sentence). I want to understand how to get batch predictions using ONNX Runtime inference session by passing multiple inputs to the session. Below is the example scenario.
Model : roberta-quant.onnx which is a ONNX quantized version of RoBERTa PyTorch model
Code used to convert RoBERTa to ONNX:
torch.onnx.export(model,
args=tuple(inputs.values()), # model input
f=export_model_path, # where to save the model
opset_version=11, # the ONNX version to export the model to
do_constant_folding=True, # whether to execute constant folding for optimization
input_names=['input_ids', # the model's input names
'attention_mask'],
output_names=['output_0'], # the model's output names
dynamic_axes={'input_ids': symbolic_names, # variable length axes
'attention_mask' : symbolic_names,
'output_0' : {0: 'batch_size'}})
Input sample to ONNXRuntime inference session:
{
'input_ids': array([[ 0, 510, 35, 21071, ....., 1, 1, 1, 1, 1, 1]]),
'attention_mask': array([[1, 1, 1, 1, ......., 0, 0, 0, 0, 0, 0]])
}
Running ONNX model for 400 data samples(sentences) using ONNXRuntime inference session:
session = onnxruntime.InferenceSession("roberta_quantized.onnx", providers=['CPUExecutionProvider'])
for i in range(400):
ort_inputs = {
'input_ids': input_ids[i].cpu().reshape(1, max_seq_length).numpy(), # max_seq_length=128 here
'input_mask': attention_masks[i].cpu().reshape(1, max_seq_length).numpy()
}
ort_outputs = session.run(None, ort_inputs)
In the above code I am looping through 400 sentences sequentially to get the scores "ort_outputs". Please help me understand how can I perform batch processing here using the ONNX model, where I can send the inputs_ids and attention_masks for multiple sentences and get the scores for all sentences in ort_outputs.
Thanks in advance!

Related

How to change max sequence length for transformers.bert?

I download bert-base pretrained model. I edit the config.json (from 512 to 256)
"max_position_embeddings": 256,
Then I want to use bert model,
from transformers import BertForSequenceClassification
model = BertForSequenceClassification.from_pretrained(
MODEL_PATH,
num_labels = 2, # The number of output labels--2 for binary classification.
output_attentions = False,
output_hidden_states = False,
)
# Tell pytorch to run this model on the GPU.
model.cuda()
But it raise an error
Error(s) in loading state_dict for BertForSequenceClassification:
size mismatch for bert.embeddings.position_embeddings.weight: copying a param with shape torch.Size([512, 768]) from checkpoint, the shape in current model is torch.Size([256, 768]).
I know the reason is because I change the max sequence length. What is the right way, if I want to change the max seq lenght?

The error says that the saved weights cannot be loaded to initialized model because of the difference in shapes of layers.
If you want to finetune the model on a subsequent task you can not change pretrained model config. Instead you should set max_length in encode_plus function and that will truncate the input sequence to max_length.
But if you want to pretrain model with a specific config you should initialize model with no weights or may find appropriate weights on huggingface repository.

Does oversampling happen before or after cross-validation using imblearn pipelines?

I have split my data into train/test before doing cross-validation on the training data to validate my hyperparameters. I have an unbalanced dataset and want to perform SMOTE oversampling on each iteration, so I have established a pipeline using imblearn.
My understanding is that oversampling should be done after dividing the data into k-folds to prevent information leaking. Is this order of operations (data split into k-folds, k-1 folds oversampled, predict on remaining fold) preserved when using Pipeline in the setup below?
from imblearn.pipeline import Pipeline
model = Pipeline([
('sampling', SMOTE()),
('classification', xgb.XGBClassifier())
])
param_dist = {'classification__n_estimators': stats.randint(50, 500),
'classification__learning_rate': stats.uniform(0.01, 0.3),
'classification__subsample': stats.uniform(0.3, 0.6),
'classification__max_depth': [3, 4, 5, 6, 7, 8, 9],
'classification__colsample_bytree': stats.uniform(0.5, 0.5),
'classification__min_child_weight': [1, 2, 3, 4],
'sampling__ratio': np.linspace(0.25, 0.5, 10)
}
random_search = RandomizedSearchCV(model,
param_dist,
cv=StratifiedKFold(n_splits=5),
n_iter=10,
scoring=scorer_cv_cost_savings)
random_search.fit(X_train.values, y_train)

Your understanding is right. When you feed the pipeline as model, the training data (k-1) is applied using .fit() and testing is done on the kth fold. Then sampling would be done on the training data.
The documentation for imblearn.pipeline .fit() says:
Fit the model
Fit all the transforms/samplers one after the other and transform/sample the data,
then fit the transformed/sampled data using the final estimator.

Transferring pretrained pytorch model to onnx

I am trying to convert pytorch model to ONNX, in order to use it later for TensorRT. I followed the following tutorial https://pytorch.org/tutorials/advanced/super_resolution_with_caffe2.html, but my kernel dies all the time.
This is the code that I implemented.
# Some standard imports
import io
import numpy as np
from torch import nn
import torch.onnx
from deepformer.nets.quicknat import quickNAT
param = {
'num_channels': 64,
'num_filters': 64,
'kernel_h': 5,
'kernel_w': 5,
'kernel_c': 1,
'stride_conv': 1,
'pool': 2,
'stride_pool': 2,
'num_classes': 1,
'padding': 'reflection'
}
net = quickNAT(param)
checkpoint_path = 'checkpoint_epoch36_loss0.78.t7'
checkpoints=torch.load(checkpoint_path)
map_location = lambda storage, loc: storage
if torch.cuda.is_available():
map_location = None
net.load_state_dict(checkpoints['net'])
net.train(False)
# Input to the modelvcdfx
x = torch.rand(1, 64, 256, 1600, requires_grad=True)
# Export the model
torch_out = torch.onnx._export(net, # model being run
x, # model input (or a tuple for multiple inputs)
"quicknat.onnx", # where to save the model (can be a file or file-like object)
export_params=True) # store the trained parameter weights inside the model file

What is the output you get? It seems SuperResolution is supported with the export operators in pytorch as mentioned in the documentation
Are you sure the input to your model is:
x = torch.rand(1, 64, 256, 1600, requires_grad=True)
That could be the variable that you used for training, since for deployment you run the network on one or multiple images the dummy input to export to onnx is usually:
dummy_input = torch.randn(1, 3, 720, 1280, device='cuda')
With 1 being the batch size, 3 being the channels of the image(RGB), and then the size of the image, in this case 720x1280. Check on that input, I guess you don't have a 64 channel image as input right?
Also, it'd be helpful if you post the terminal output to see where it fails.
Good luck!

Convert tensor to numpy without a session

I'm using the estimator library of tensorflow on python. I want to train a student network by using a pre-trained teacher.I'm facing the following issue.
train_input_fn = tf.estimator.inputs.numpy_input_fn(
x={"x": train_data},
y=train_labels,
batch_size=100,
num_epochs=None,
shuffle=True)
student_classifier.train(
input_fn=train_input_fn,
steps=20,
hooks=None)
This code returns a generator object that is passed to a student classifier. Inside the generator, we have the inputs and labels (in batches of 100) as tensors. The problem is, I want to pass the same values to the teacher model and extract its softmax outputs. But unfortunately, the model input requires a numpy array as follows
student_classifier = tf.estimator.Estimator(
model_fn=student_model_fn, model_dir="./models/mnist_student")
def student_model_fn(features, labels, mode):
sess=tf.InteractiveSession()
tf.train.start_queue_runners(sess)
data=features['x'].eval()
out=labels.eval()
sess.close()
input_layer = tf.reshape(features["x"], [-1, 28, 28, 1])
eval_teacher_fn = tf.estimator.inputs.numpy_input_fn(
x={"x":data},
y=out,
num_epochs=1,
shuffle=False)
This requires x and y to be numpy arrays so I converted it via using such as ugly hack of using a session to convert tensor to numpy. Is there a better way of doing this?
P.S. I tried tf.estimator.Estimator.get_variable_value() but it retrieves weights from the model, not the input and output

Convert Tensor to Numpy_array using tf.make_ndarray.
tf.make_ndarray(), Create a numpy ndarray with the same shape and data as the tensor.
Sample working code:
import tensorflow as tf
a = tf.constant([[1,2,3],[4,5,6]])
proto_tensor = tf.make_tensor_proto(a)
tf.make_ndarray(proto_tensor)
output:
array([[1, 2, 3],
[4, 5, 6]], dtype=int32)
# output has shape (2,3)

Input tensors to a Model must be Keras tensors

Input tensors to a Model must be Keras tensors. Found:
Tensor("my_layer/Identity:0", shape=(?, 10, 1152, 16), dtype=float32)
(missing Keras metadata).
Hi, I get this error when trying to take one layer's intermediate variable to use it as input to a parallel network. Such that one layer's intermediate variable will be input to the other network.
def call(self, inputs, training=None):
inputs_expand = K.expand_dims(inputs, 1)
tensor_b = K.tile(inputs_expand, [1, 16, 1, 1])
tensor_a = K.map_fn(lambda x: K.batch_dot(x, self.Weights, [2, 3]), elems=tensor_b)
# I need this tensor_a
# I tried many things but ended putting it to member variable.
self.tensor_a = K.identity(inputs_hat)
....
outside when trying to build the parallel model I do this
a_model = models.Model([my_layer.tensor_a],[my_layer.c])
I could not find any good solution to this problem? How can I turn the tensor into K.tensor??

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to perform Batch inferencing with RoBERTa ONNX quantized model? - nlp

Related

How to change max sequence length for transformers.bert?

Does oversampling happen before or after cross-validation using imblearn pipelines?

Transferring pretrained pytorch model to onnx

Convert tensor to numpy without a session

Input tensors to a Model must be Keras tensors

Categories

Resources