Difference between predictions of exported TensorRT engine and PyTorch pth models - onnx

Problem: inference results from deepstream and local inference do not match (using same png images).
While testing what percentage of predictions match between engine and pth models, only 26% matched out of 180k images.
How I reproduce results: I save images after they go through streammux in 416x416 shape and .png format. For each image I also save bounding box coordinates where YoloV4 detected objects. To test predictions I download images and bounding box coordinates for each image, then I crop object based on bounding box coordinates and run resulting image through pth model.
Version: Deepstream 5.1
Model training: I train EfficientNetB0 locally with PyTorch and use following transformations for loading data (we are training 128 classes):
import Albumentations as A from albumentations.pytorch
import ToTensorV2
train_transforms = A.Compose(
[
A.Resize(height=224, width=224),
A.HorizontalFlip(p=0.5),
A.VerticalFlip(p=0.5),
A.RandomGamma(gamma_limit=(75, 90), p=0.8),
A.GridDropout(ratio=0.47, p=0.6),
A.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
ToTensorV2(),
]
)
I run model inference locally with following preprocessing:
test_transforms = A.Compose(
[
A.Resize(height=224, width=224),
A.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
ToTensorV2(),
]
)
To export the model:
Convert trained model to .onnx:
model = efficientnet_b0(pretrained=False)
pt_model = torch.load(path_to_torch_model, map_location=torch.device("cpu"))
n_features = model.classifier[1].in_features
model.classifier[1] = nn.Linear(n_features, classes)
model.load_state_dict(pt_model)
model = nn.Sequential(model, nn.Softmax(-1))
dummy_input = torch.randn(batch_size, 3, 224, 224)
torch.onnx.export(
model,
dummy_input,
path_to_onnx,
verbose=False,
input_names=["input_names"],
output_names=["output_names"],
export_params=True,
)
I checked that converted onnx model gives same results as pytorch model.
Export .onnx to engine file with following command:
docker container run \
--gpus all \
--rm \
--volume $(pwd):/workspace/ \
--volume $(pwd):/data/ \
--workdir /workspace/ \
nvcr.io/nvidia/tensorrt:21.02-py3 \
trtexec --explicitBatch \
--onnx=best_23.onnx \
--saveEngine=efficientnet.engine \
--fp16 \
--workspace=4096
Deepstream configuration:
RTSP stream → Streammux (reshaping to 416x416) → YoloV4 (bounding boxes) → Classification
Deepstream classification config:
[property]
gpu-id=0
offsets=103.53;116.28;123.675
net-scale-factor=0.01735207357279195
labelfile-path=../classifier/labels.txt
model-engine-file=…/classifier/efficientnet.engine
infer-dims=3;224;224
network-mode=2
network-type=1
num-detected-classes=128
interval=0
classifier-threshold=0
Questions:
How can I achieve same preprocessing during training in python as in deepstream inference, because I guess that albumentation package gives different interpolation result than deepstream inference?
Are there any other mistakes that I haven't noticed?

Related

How can I test my own image on my CNN model?

I'm a beginner programmer trying out image classification using CNN. I'm aiming to build a model which classifies if an image is an aluminum can or not, and I want to test it with my own image.
I've resized the images with the code below:
#Resizing to 128,128
files = os.listdir("../input/aluminum-can-image-data/Aluminum Cans")
for f in files:
img = Image.open("../input/aluminum-can-image-data/Aluminum Cans/" + f)
img = img.resize((128,128))
ds_train_ = image_dataset_from_directory(
'../input/aluminum-can-image-data',
labels='inferred',
image_size=[128, 128],
interpolation='nearest',
batch_size=64,
)
ds_valid_ = image_dataset_from_directory(
'../input/aluminum-can-image-data',
labels='inferred',
image_size=[128, 128],
interpolation='nearest',
batch_size=64,
)
ds_train, ds_valid = train_test_split(files, test_size=0.2, random_state=1)
I want to build a code which shows the percentage of how likely an image is an aluminum can when it has received a single image. Any help with the codes to build this function would be highly appreciated~!
let us suppose the model you use is named "model" and you have has 2 ouput labels -"Aluminium","Not Aluminium" .
As you only need the prediction of only a single image, you have to use np.expandims(image,axis=0) to increase the dimension for the input for the model to work.
Code:
class=["alum","not_alum"]
prediction=model.predict(np.expand_dims(image,axis=0))
confidence=round(100 * (np.max(prediction[0]),2)
argclass=np.argmax(prediction,axis=1)
print(class[argclass[0])
print(confidence) .

Predicting single image using Tensorflow not being accurate

I'm trying to build a CNN model in order to classify an image, but whenever the training is done and I try to feed it a single image (from the training dataset) it misclassifies this image always.
Please take a look at the code I wrote below.
Thank you in advance.
First, I declared an Image Data Generator for both my training and testing sets:
train_datagen = ImageDataGenerator(rescale = 1./255, rotation_range=20, horizontal_flip = True,
validation_split=0.3)
test_datagen = ImageDataGenerator(rescale = 1./255,validation_split=0.3)
Then, I used the flow_from_directory() function to load the images:
train_generator = train_datagen.flow_from_directory(
data_dir,
shuffle=False,
subset='training',
target_size = (224, 224),
class_mode = 'categorical'
)
test_generator = test_datagen.flow_from_directory(
data_dir,
shuffle=False,
subset='validation',
target_size = (224, 224),
class_mode = 'categorical'
)
I then loaded a pretrained model and added a few layers to build my model:
pretrained_model = VGG16(weights="imagenet", include_top=False,
input_tensor=input_shape)
pretrained_model.trainable = False
model = tf.keras.Sequential([
pretrained_model,
Flatten(name="flatten"),
Dense(3, activation="softmax")
])
I then trained the model :
INIT_LR = 3e-4
EPOCHS = 15
opt = Adam(lr=INIT_LR)
model.compile(loss="categorical_crossentropy", optimizer='Adam', metrics=["accuracy"])
H = model.fit(
train_generator,
validation_data=test_generator,
epochs=EPOCHS,
verbose= 1)
Then came the part to predict a single image:
I chose an image that was part of the training set, I even overfitted the model to make sure the predictions should be correct, but it was giving me wrong results for every image I input to the model.
I tried the following ways:
image = image.load_img(url,target_size = (224, 224))
img = tf.keras.preprocessing.image.img_to_array(image)
img = np.array([img])
img = img.astype('float32') / 255.
img = tf.keras.applications.vgg16.preprocess_input(img)
This didn't work
image = cv2.imread(url)
image = cv2.normalize(image, None,beta=255, dtype=cv2.CV_32F)
image = cv2.resize(image, (224, 224))
image = np.expand_dims(image, axis=0)
This also didn't work, I also tried many other ways to predict a single image, but none worked.
Finally, the only way was that I had to create an Image Data Generator and Flow From Directory for this single image, and it worked, but I believe that's not how it should be done.
The code img = tf.keras.applications.vgg16.preprocess_input(img) scales the pixel
values in the image to values between -1 to +1 assuming the original pixel values are in the range 0 to 255. In the previous line of code
img = img.astype('float32') / 255.
You rescaled the pixels. So remove that line of code. Now to predict a single image you need to expand the dimensions with
img = np.expand_dims(img, axis=0)
In your second code effort be aware the CV2 reads in images as BGR. If your model was trained on RGB images then your predictions will be wrong. Use the code below to convert the image to RGB.
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
As a side note you can replace tf.keras.applications.vgg16.preprocess_input(img) with the function below which will scale the images between -1 to +1
def scalar(img):
return img/127.5 - 1
This answer could be one starting point:
Resnet50 produces different prediction when image loading and resizing is done with OpenCV
These are possible differences (short gist):
RGB vs BGR (OpenCV loads BGR)
The interpolation method used (INTER_LINEAR vs INTER_NEAREST).
img_to_array() transforms the data type into float32 rather than uint8 which is obtained by default when loading with OpenCV.
tf.keras.applications.vgg16.preprocess_input(img). This preprocessing function can actually differ from what you have written above as image preprocessing; it is also notable that, if you do not preprocess it while training in this particular way (preprocess_input()) then it also makes sense to have bad results on the test set, since the preprocessings are different.
Hope these observations shed some light.

Low Validation Score on Pretrained Alexnet from Pytorch models for ImageNet 2012 dataset

I am using pre-trained AlexNet network to validate some prior work.
The code is as follows:
import os
import torch
import torchvision
import torchvision.datasets as datasets
import torchvision.models as models
import torchvision.transforms as transforms
model = torch.hub.load('pytorch/vision:v0.6.0', 'alexnet', pretrained=True)
model.eval()
batchsize = 50000
workers = 1
dataset_path = 'data/imagenet_2012/'
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
val_data = datasets.ImageFolder(root=os.path.join(dataset_path, 'val'), transform=transforms.Compose( [transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(), normalize,]))
val_loader = torch.utils.data.DataLoader(val_data, batch_size=batchsize, num_workers=workers)
batch = next(iter(val_loader))
images, labels = batch
with torch.no_grad():
output = model(images)
for i in output:
out_soft = torch.nn.functional.softmax(i, dim=0)
print(int(torch.argmax(out_soft)))
When I execute this and compare with ILSVRC2012_validation_ground_truth.txt, I get top-1 accuracy of 5% only.
What am I doing wrong here?
Thank you.
So, Pytorch/Caffe have their own "ground truth" files, which can be obtained from here:
https://gist.github.com/ksimonyan/fd8800eeb36e276cd6f9#note
I manually tested the images in the validation folder of the Imagenet dataset against the val.txt file in the tar file provided at the link above to verify the order.
Update:
New validation accuracy based on the groundtruth in the zip file in the link:
Top_1 = 56.522%
Top_5 = 79.066%

How to get the prediction percentage of objects by class Tensorflow object detection

I am starting to study TensorFlow. I refer to TensorFlow object-detection research models and execute them. it works and displays the image with objection detection in the rectangular area.
I need the prediction percentage of the class in the images. Means human - some %, birds - some% , kite - some% etc. How can I get the prediction percentage
def run_inference_for_single_image(model, image):
image = np.asarray(image)
# The input needs to be a tensor, convert it using `tf.convert_to_tensor`.
input_tensor = tf.convert_to_tensor(image)
# The model expects a batch of images, so add an axis with `tf.newaxis`.
input_tensor = input_tensor[tf.newaxis,...]
# Run inference
output_dict = model(input_tensor)
num_detections = int(output_dict.pop('num_detections'))
output_dict = {key:value[0, :num_detections].numpy()
for key,value in output_dict.items()}
output_dict['num_detections'] = num_detections
# detection_classes should be ints.
output_dict['detection_classes'] = output_dict['detection_classes'].astype(np.int64)
# Handle models with masks:
if 'detection_masks' in output_dict:
# Reframe the the bbox mask to the image size.
detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks(
output_dict['detection_masks'], output_dict['detection_boxes'],
image.shape[0], image.shape[1])
detection_masks_reframed = tf.cast(detection_masks_reframed > 0.5,
tf.uint8)
output_dict['detection_masks_reframed'] = detection_masks_reframed.numpy()
return output_dict
def show_inference(model, image_path):
image_np = np.array(Image.open(image_path))
# Actual detection.
output_dict = run_inference_for_single_image(model, image_np)
print("---------------------------------------------------------------------------------------")
print("---------------------------------------------------------------------------------------")
# Visualization of the results of a detection.
vis_util.visualize_boxes_and_labels_on_image_array(
image_np,
output_dict['detection_boxes'],
output_dict['detection_classes'],
output_dict['detection_scores'],
category_index,
instance_masks=output_dict.get('detection_masks_reframed', None),
use_normalized_coordinates=True,
line_thickness=8)
display(Image.fromarray(image_np))
for image_path in TEST_IMAGE_PATHS:
show_inference(detection_model, image_path
Have you tried looking at output_dict['detection_scores']?
So it's only the percentage of the detected class, not all of them, so what you want doesn't come out of the box. I see two options, either you try to look for the name of the tf.Operation in the tensorflow graph that does what you want and try to output it alongside the other outputs (the hard but fast way), or just run your boxes trough the classifier backbone again (e.g. if the CNN backbone is VGG16, run it through VGG 16) (the easy but slow way).

Keras predict_generator and Image generator

How to use ImageDataGenerator and predict_generator on a single JPEG file in Keras?
I am having a single jpeg and i want to predict the probability using model trained using model.fita-generator function.
If you just have a single .jpeg, you don't need to use the ImageDataGenerator. In the code below I'm assuming you trained your model with RGB images sized 150px x 150px.
img = image.load_img(img_path, target_size=(150, 150))
img_tensor = image.img_to_array(img)
img_tensor = np.expand_dims(img_tensor, axis=0)
img_tensor /= 255.
model.predict(img_tensor)
For more info, check out Francois Chollet's excellent Ipython Notebooks. Specifically, Line (In [2]) of https://github.com/fchollet/deep-learning-with-python-notebooks/blob/master/5.4-visualizing-what-convnets-learn.ipynb
In this section, he looks at the intermediate activation layers for an image that wasn't in his train_generator. He loads in a model he created in another Ipython notebook: https://github.com/fchollet/deep-learning-with-python-notebooks/blob/master/5.2-using-convnets-with-small-datasets.ipynb

Resources