MLflow webserver returns 400 status, "Incompatible input types for column X. Can not safely convert float64 to <U0." - scikit-learn

I am implementing an anomaly detection web service using MLflow and sklearn.pipeline.Pipeline(). The aim of the model is to detect web crawlers using server log and response_length column is one of my features. After serving model, for testing the web service I send below request that contains the 20 first columns of the train data.
$ curl --location --request POST '127.0.0.1:8000/invocations'
--header 'Content-Type: text/csv' \
--data-binary 'datasets/test.csv'
But response of the web server has status code 400 (BAD REQUEST) and this JSON body:
{
"error_code": "BAD_REQUEST",
"message": "Incompatible input types for column response_length. Can not safely convert float64 to <U0."
}
Here is the model compilation MLflow Tracking component log:
[Pipeline] ......... (step 1 of 3) Processing transform, total=11.8min
[Pipeline] ............... (step 2 of 3) Processing pca, total= 4.8s
[Pipeline] ........ (step 3 of 3) Processing rule_based, total= 0.0s
2021/07/16 04:55:12 WARNING mlflow.sklearn: Training metrics will not be recorded because training labels were not specified. To automatically record training metrics, provide training labels as inputs to the model training function.
2021/07/16 04:55:12 WARNING mlflow.utils.autologging_utils: MLflow autologging encountered a warning: "/home/matin/workspace/Rahnema College/venv/lib/python3.8/site-packages/mlflow/models/signature.py:129: UserWarning: Hint: Inferred schema contains integer column(s). Integer columns in Python cannot represent missing values. If your input data contains missing values at inference time, it will be encoded as floats and will cause a schema enforcement error. The best way to avoid this problem is to infer the model schema based on a realistic data sample (training dataset) that includes missing values. Alternatively, you can declare integer columns as doubles (float64) whenever these columns may have missing values. See `Handling Integers With Missing Values <https://www.mlflow.org/docs/latest/models.html#handling-integers-with-missing-values>`_ for more details."
Logged data and model in run: 8843336f5c31482c9e246669944b1370
---------- logged params ----------
{'memory': 'None',
'pca': 'PCAEstimator()',
'rule_based': 'RuleBasedEstimator()',
'steps': "[('transform', <log_transformer.LogTransformer object at "
"0x7f05a8b95760>), ('pca', PCAEstimator()), ('rule_based', "
'RuleBasedEstimator())]',
'transform': '<log_transformer.LogTransformer object at 0x7f05a8b95760>',
'verbose': 'True'}
---------- logged metrics ----------
{}
---------- logged tags ----------
{'estimator_class': 'sklearn.pipeline.Pipeline', 'estimator_name': 'Pipeline'}
---------- logged artifacts ----------
['model/MLmodel',
'model/conda.yaml',
'model/model.pkl',
'model/requirements.txt']
Could anyone tell me exactly how I can fix this model serve problem?

The problem caused by mlflow.utils.autologging_utils WARNING.
When the model is created, data input signature is saved on the MLmodel file with some.
You should change response_length signature input type from string to double by replacing
{"name": "response_length", "type": "double"}
instead of
{"name": "response_length", "type": "string"}
so it doesn't need to be converted. After serving the model with edited MLmodel file, the web server worked as expected.

Related

Clarifications on training job parameters with Tensorflow

Im using the new Tensorflow object detection API.
I need to replicate training parameters used on a paper but Im a bit confused.
In the paper is stated
When training neural network models, their base confguration is similar to that used to
train on the COCO 2017 dataset. For the unambiguous comparison of the selected models, the total number of
training steps was set to 100 equal to 100′000 iterations of learning.
Inside model_main_tf2.py, which is the script used to start the training, I can read the following:
"""Creates and runs TF2 object detection models.
For local training/evaluation run:
PIPELINE_CONFIG_PATH=path/to/pipeline.config
MODEL_DIR=/tmp/model_outputs
NUM_TRAIN_STEPS=10000
SAMPLE_1_OF_N_EVAL_EXAMPLES=1
python model_main_tf2.py -- \
--model_dir=$MODEL_DIR --num_train_steps=$NUM_TRAIN_STEPS \
--sample_1_of_n_eval_examples=$SAMPLE_1_OF_N_EVAL_EXAMPLES \
--pipeline_config_path=$PIPELINE_CONFIG_PATH \
--alsologtostderr
"""
Also, you can specify the num_steps and total_steps parameters in the pipeline.config file (used by the training script):
train_config: {
batch_size: 1
sync_replicas: true
startup_delay_steps: 0
replicas_to_aggregate: 8
num_steps: 50000
optimizer {
momentum_optimizer: {
learning_rate: {
cosine_decay_learning_rate {
learning_rate_base: .16
total_steps: 50000
warmup_learning_rate: 0
warmup_steps: 2500
}
}
momentum_optimizer_value: 0.9
}
use_moving_average: false
}
So, what Im not understanding is how should I map what is written in the paper with tensorflow parameters.
What is the num steps and total_steps inside the pipeline.config file?
What is the NUM_TRAIN_STEPS argument instead?
Does it overwrite config file steps or its a completely different thing?
If more details are needed feel free to ask.

Natural Nodejs NLP classifier gives heap dump when training more than 2000 data points

We are using natural nodejs library to classify our users queries in the following way:
const natural = require("natural");
const cls = new natural.LogisticRegressionClassifier();
cls.addDocument("book", "stationary");
cls.addDocument("pen", "stationary");
...
.....
....
5000 + data points
cls.addDocument("last one", "last one");
cls.train(); ----- But This gives heap error and the program crashes.
pino.info("StockNames training successfully completed.");
The train() functions works fine when dataset size is under few hundreds but throwing heap errors and crashing when dataset size is in few thousands. Any suggestions please help. Thanks

My tensorflow object detection model produced the average precision of zero for first class

I have trained the object detection model for three classes: id=1 (LR), id=2 (PM), id=3 (YR). Model produced AP(LR):0.002, PM:0.84,YR:1.00 and after that changed id=1 (YR), id=2(PM), id=3(YR). Model gives AP(YR):0.002, AP(PM):0.79, AP(LR):0.89.
Is this is taking first class as dummy class or there is another reason for that. Please help me out this.
Following are the changes i performed in the .config file to get the average precision.
eval_config: {
metrics_set: "pascal_voc_detection_metrics"
use_moving_averages: false
batch_size: 1;
num_visualizations: 20
max_num_boxes_to_visualize: 10
visualize_groundtruth_boxes: true
eval_interval_secs: 30
}

How to use resnet by reading the .ckpt file of my learned weights with pytorch

In pytorch, how can I write the code that loads my .ckpt file instead of
model = torchvision.models.resnet50(pretrained=True)
Here is my attempt below
model = torchvision.models.resnet50(pretrained=False)
PATH = "/content/drive/MyDrive/Colab Notebooks/mlearning2/multi_logs/resnet_2/version_0/checkpoints/epoch=1-step=2543.ckpt"
model.load_state_dict(torch.load(PATH, map_location=torch.device('cpu')))
But it could not work and the following error appeared.
RuntimeError: Error(s) in loading state_dict for ResNet:
Missing key(s) in state_dict: "conv1.weight", "bn1.weight", "bn1.bias", "bn1.running_mean", "bn1.running_var", "layer1.0.conv1.weight", "layer1.0.bn1.weight", "layer1.0.bn1.bias", "layer1.0.bn1.running_mean", "layer1.0.bn1.running_var", "layer1.0.conv2.weight", "layer1.0.bn2.weight", "layer1.0.bn2.bias", "layer1.0.bn2.running_mean", "layer1.0.bn2.running_var", "layer1.0.conv3.weight", "layer1.0.bn3.weight", "layer1.0.bn3.bias", "layer1.0.bn3.running_mean", "layer1.0.bn3.running_var", "layer1.0.downsample.0.weight", "layer1.0.downsample.1.weight", "layer1.0.downsample.1.bias", "layer1.0.downsample.1.running_mean", "layer1.0.downsample.1.running_var", "layer1.1.conv1.weight", "layer1.1.bn1.weight", "layer1.1.bn1.bias", "layer1.1.bn1.running_mean", "layer1.1.bn1.running_var", "layer1.1.conv2.weight", "layer1.1.bn2.weight", "layer1.1.bn2.bias", "layer1.1.bn2.running_mean", "layer1.1.bn2.running_var", "layer1.1.conv3.weight", "layer1.1.bn3.weight", "layer1.1.bn3.bias", "layer1.1.bn3.running_mean", "layer1.1.bn3.running_var", "layer1.2.conv1.weight", "layer1.2.bn1.weight", "layer1.2.bn1.bias", "layer1.2.bn1.running_mean", "layer1.2.bn1.running_var", "layer1.2.conv2.weight", "layer1.2.bn2.weight", "layer1.2.bn2.bias", "layer1.2.bn2.running_mean", "layer1.2.bn2.running_var", "layer1.2.conv3.weight", "layer1.2.bn3.weight", "layer1.2.bn3.bias", "layer1.2.bn3.running_mean", "layer1.2.bn3.running_var", "layer2.0.conv1.weight", "layer2.0.bn1.weight", "layer2.0.bn1.bias", "layer2.0.bn1...
Unexpected key(s) in state_dict: "epoch", "global_step", "pytorch-lightning_version", "state_dict", "callbacks", "optimizer_states", "lr_schedulers", "hparams_name", "hyper_parameters".
How can I do it?
#Shai I tried to run
model.load_state_dict(torch.load(PATH, map_location=torch.device('cpu'))['state_dict'])
however got the following error.
Your saved checkpoint contains not only a snapshot of the trained weights of the model but some other useful information on the state of the training (e.g., the state of the optimizer etc.).
Try selecting only the relevant part of the saved checkpoint:
model.load_state_dict(torch.load(PATH, map_location=torch.device('cpu'))['state_dict'])
Update
Based on the modification you made and the new error you received, it seems like the model that was saved is model.backbone = torchvision.models.resnet50().
You need to instantiate your model in the same manner as done during training.

Input 'input_image' of layer '63' not found in any of the outputs of the preceeding layers

Update #1 (original question and details below):
As per the suggestion of #MatthijsHollemans below I've tried to run this by removing dynamic_axes from the initial create_onnx step below. This removed both:
Description of image feature 'input_image' has missing or non-positive width 0.
and
Input 'input_image' of layer '63' not found in any of the outputs of the preceeding layers.
Unfortunately this opens up two sub-questions:
I still want to have a functional ONNX model. Is there a more appropriate way to make H and W dynamic? Or should I be saving two versions of the ONNX model, one without dynamic_axes for the CoreML conversion, and one with for use as a valid ONNX model?
Although this solves the compilation error in xcode (specified below) it introduces the following runtime issues:
Finalizing CVPixelBuffer 0x282f4c5a0 while lock count is 1.
[espresso] [Espresso::handle_ex_plan] exception=Invalid X-dimension 1/480 status=-7
[coreml] Error binding image input buffer input_image: -7
[coreml] Failure in bindInputsAndOutputs.
I am calling this the same way I was calling the fixed size model, which does still work fine. The image dimensions are 640 x 480.
As specified below the model should accept any image between 64x64 and higher.
For flexible shape models, do I need to provide an input differently in xcode?
Original Question (parts still relevant)
I have been slowly working on converting a style transfer model from pytorch > onnx > coreml. One of the issues that has been a struggle is flexible/dynamic input + output shape.
This method (besides i/o renaming) has worked well on iOS 12 & 13 when using a static input shape.
I am using the following code to do the onnx > coreml conversion:
def create_coreml(name):
mlmodel = convert(
model="onnx/" + name + ".onnx",
preprocessing_args={'is_bgr': True},
deprocessing_args={'is_bgr': True},
image_input_names=['input_image'],
image_output_names=['stylized_image'],
minimum_ios_deployment_target='13'
)
spec = mlmodel.get_spec()
img_size_ranges = flexible_shape_utils.NeuralNetworkImageSizeRange()
img_size_ranges.add_height_range((64, -1))
img_size_ranges.add_width_range((64, -1))
flexible_shape_utils.update_image_size_range(
spec,
feature_name='input_image',
size_range=img_size_ranges)
flexible_shape_utils.update_image_size_range(
spec,
feature_name='stylized_image',
size_range=img_size_ranges)
mlmodel = coremltools.models.MLModel(spec)
mlmodel.save("mlmodel/" + name + ".mlmodel")
Although the conversion 'succeeds' there are a couple of warnings (spaces added for readability):
Translation to CoreML spec completed. Now compiling the CoreML model.
/usr/local/lib/python3.7/site-packages/coremltools/models/model.py:111:
RuntimeWarning: You will not be able to run predict() on this Core ML model. Underlying exception message was:
Error compiling model:
"Error reading protobuf spec. validator error: Description of image feature 'input_image' has missing or non-positive width 0.".
RuntimeWarning)
Model Compilation done.
/usr/local/lib/python3.7/site-packages/coremltools/models/model.py:111:
RuntimeWarning: You will not be able to run predict() on this Core ML model. Underlying exception message was:
Error compiling model:
"compiler error: Input 'input_image' of layer '63' not found in any of the outputs of the preceeding layers.
".
RuntimeWarning)
If I ignore these warnings and try to compile the model for latest targets (13.0) I get the following error in xcode:
coremlc: Error: compiler error: Input 'input_image' of layer '63' not found in any of the outputs of the preceeding layers.
Here is what the problematic area appears to look like in netron:
My main question is how can I get these two warnings out of the way?
Happy to provide any other details.
Thanks for any advice!
Below is my pytorch > onnx conversion:
def create_onnx(name):
prior = torch.load("pth/" + name + ".pth")
model = transformer.TransformerNetwork()
model.load_state_dict(prior)
dummy_input = torch.zeros(1, 3, 64, 64) # I wasn't sure what I would set the H W to here?
torch.onnx.export(model, dummy_input, "onnx/" + name + ".onnx",
verbose=True,
opset_version=10,
input_names=["input_image"], # These are being renamed from garbled originals.
output_names=["stylized_image"], # ^
dynamic_axes={'input_image':
{2: 'height', 3: 'width'},
'stylized_image':
{2: 'height', 3: 'width'}}
)
onnx.save_model(original_model, "onnx/" + name + ".onnx")

Resources