Hyperparameter optimization in pytorch (currently with sklearn GridSearchCV) - pytorch

I use this(link) pytorch tutorial and wish to add the grid search functionality in it ,sklearn.model_selection.GridSearchCV (link), in order to optimize the hyper parameters. I struggle in understanding what X and Y in gs.fit(x,y) should be; per the documentation (link) x and y are supposed to have the following structure but I have trouble figuring out how to get these off the code. The output of the class PennFudanDataset returns img and target in a form that does not align with the X, Y I need.
Are n_samples, n_features within the following block of code or in the tutorial’s block regarding the model?
fit(X, y=None, *, groups=None, **fit_params)[source]
Run fit with all sets of parameters.
Parameters
Xarray-like of shape (n_samples, n_features)
Training vector, where n_samples is the number of samples and n_features is the number of features.
yarray-like of shape (n_samples, n_output) or (n_samples,), default=None
Target relative to X for classification or regression; None for unsupervised learning.
Is there something else we could use instead that is easier to implement for this particular tutorial? I’ve read about ray tune(link), optuna(link) etc. but they seem more complex than that. I am currently also looking into scipy.optimize.brute(link) which seems simpler.
PennFundanDataset class:
import os
import numpy as np
import torch
from PIL import Image
class PennFudanDataset(object):
def __init__(self, root, transforms):
self.root = root
self.transforms = transforms
# load all image files, sorting them to
# ensure that they are aligned
self.imgs = list(sorted(os.listdir(os.path.join(root, "PNGImages"))))
self.masks = list(sorted(os.listdir(os.path.join(root, "PedMasks"))))
def __getitem__(self, idx):
# load images ad masks
img_path = os.path.join(self.root, "PNGImages", self.imgs[idx])
mask_path = os.path.join(self.root, "PedMasks", self.masks[idx])
img = Image.open(img_path).convert("RGB")
# note that we haven't converted the mask to RGB,
# because each color corresponds to a different instance
# with 0 being background
mask = Image.open(mask_path)
# convert the PIL Image into a numpy array
mask = np.array(mask)
# instances are encoded as different colors
obj_ids = np.unique(mask)
# first id is the background, so remove it
obj_ids = obj_ids[1:]
# split the color-encoded mask into a set
# of binary masks
masks = mask == obj_ids[:, None, None]
# get bounding box coordinates for each mask
num_objs = len(obj_ids)
boxes = []
for i in range(num_objs):
pos = np.where(masks[i])
xmin = np.min(pos[1])
xmax = np.max(pos[1])
ymin = np.min(pos[0])
ymax = np.max(pos[0])
boxes.append([xmin, ymin, xmax, ymax])
# convert everything into a torch.Tensor
boxes = torch.as_tensor(boxes, dtype=torch.float32)
# there is only one class
labels = torch.ones((num_objs,), dtype=torch.int64)
masks = torch.as_tensor(masks, dtype=torch.uint8)
image_id = torch.tensor([idx])
area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
# suppose all instances are not crowd
iscrowd = torch.zeros((num_objs,), dtype=torch.int64)
target = {}
target["boxes"] = boxes
target["labels"] = labels
target["masks"] = masks
target["image_id"] = image_id
target["area"] = area
target["iscrowd"] = iscrowd
if self.transforms is not None:
img, target = self.transforms(img, target)
return img, target
def __len__(self):
return len(self.imgs)

Related

Reduce multiclass image classification to binary classification in Pytorch

I am working on an stl-10 image dataset that consists of 10 different classes. I want to reduce this multiclass image classification problem to the binary class image classification such as class 1 Vs rest. I am using PyTorch torchvision to download and use the stl data but I am unable to do it as one Vs the rest.
train_data=torchvision.datasets.STL10(root='data',split='train',transform=data_transforms['train'], download=True)
test_data=torchvision.datasets.STL10(root='data',split='test',transform=data_transforms['val'], download=True)
train_dataloader = DataLoader(train_data,batch_size = 64,shuffle=True,num_workers=2)
test_dataloader = DataLoader(test_data,batch_size = 64,shuffle=True,num_workers=2)
For torchvision datasets, there is an inbuilt way to do this. You need to define a transformation function or class and add that into the target_transform while creating the dataset.
torchvision.datasets.STL10(root: str, split: str = 'train', folds: Union[int, NoneType] = None, transform: Union[Callable, NoneType] = None, target_transform: Union[Callable, NoneType] = None, download: bool = False)
Here is a working example for reference :
import torchvision
from torch.utils.data import DataLoader
from torchvision import transforms
class Multi2UniLabelTfm():
def __init__(self,pos_label=5):
if isinstance(pos_label,int) or isinstance(pos_label,float):
pos_label = [pos_label,]
self.pos_label = pos_label
def __call__(self,y):
# if y==self.pos_label:
if y in self.pos_label:
return 1
else:
return 0
if __name__=='__main__':
test_tfms = transforms.Compose([
transforms.ToTensor()
])
data_transforms = {'val':test_tfms}
#Original Labels
# target_transform = None
# Label 5 is converted to 1. Rest are 0.
# target_transform = Multi2UniLabelTfm(pos_label=5)
# Labels 5,6,7 are converted to 1. Rest are 0.
target_transform = Multi2UniLabelTfm(pos_label=[5,6,7])
test_data=torchvision.datasets.STL10(root='data',split='test',transform=data_transforms['val'], download=True, target_transform=target_transform)
test_dataloader = DataLoader(test_data,batch_size = 64,shuffle=True,num_workers=2)
for idx,(x,y) in enumerate(test_dataloader):
print(idx,y)
if idx == 5:
break
You need to relabel the image. At the beginning, class 0 corresponds to label 0, class 1 corresponds to label 1, ..., and class 10 corresponds to label 9. If you want to achieve binary classification, you need to change the label of the picture of category 1 (or other) to 0, and the picture of all other categories to 1.
One way is to update label values at runtime before passing them to loss function in the training loop. Let's say we want to relabel class 5 as 1, and the rest as 0:
my_class_id = 5
for imgs, labels in train_dataloader:
labels = torch.where(labels == my_class_id, 1, 0)
...
You may also need to do similar relabeling for test_dataloader. Also, I am not sure about the datatype of labels. If its float, change accordingly.

What I missing here, using ImageFolder to get the full folder name as labels for MNIST-double dataset images?

I would like to use dataset.ImageFolder to create an Image Dataset.
My current image directory structure looks like this:
1: In train images, I have subfolders which are my labels contain 00, 01, and so on. In each folder, images contain double digits corresponding to each label
Here is the code I used followed by the output where the labels does not. match with the images
paths here
data_dir = "/home/mhamdan/hamdan/MNIST_muldigits/data/double_mnist"
train_dir = data_dir + '/train' # training_set contains training dataset
val_dir = data_dir + '/val' #contains validation dataset
test_dir = data_dir + '/test' #contains test dataset
Loading the data here
#Load the dataset with Image Folder
trainset = datasets.ImageFolder(train_dir, transform = transformation)
valset = datasets.ImageFolder(val_dir, transform = transformation)
testset = datasets.ImageFolder(test_dir, transform = transformation)
Data loaders
#define data loaders
batch_size = 32
train_loader = DataLoader(trainset, batch_size=batch_size, shuffle=True,num_workers=2)
val_loader = DataLoader(valset, batch_size=batch_size, shuffle=True,num_workers=2)
test_loader = DataLoader(testset, batch_size=batch_size,num_workers=1)
Here is the plotting of random training images
examples = enumerate(train_loader)
batch_idx, (example_data, example_targets) = next(examples)
import matplotlib.pyplot as plt
fig = plt.figure()
for i in range(6):
plt.subplot(2,3,i+1)
plt.tight_layout()
plt.imshow(example_data[i][0], cmap='gray', interpolation='none')
plt.title("Ground Truth: {}".format(example_targets[1]))
plt.xticks([])
plt.yticks([])
fig
As you see here, the labels are different than images
labels differ than images
Each subfolders contains a unique label associated with a label
here the images in 01 subdirectory
Last update after using the index.
I think the problem is in printing the labels,
Here is the plotting of random training images
For this, the code should be,
examples = enumerate(train_loader)
batch_idx, (example_data, example_targets) = next(examples)
import matplotlib.pyplot as plt
fig = plt.figure()
for i in range(6):
plt.subplot(2,3,i+1)
plt.tight_layout()
plt.imshow(example_data[i][0], cmap='gray', interpolation='none')
plt.title("Ground Truth: {}".format(example_targets[i]))
plt.xticks([])
plt.yticks([])
fig
In your code, it was example_targets[1]), instead of i.
Here is the solution to my question by taking the indexes as dictionary labelsdec = trainset.class_to_idx and by extracting the keys as labels/classes using this functions
def getList(dict):
list = []
for key in dict.keys():
list.append(key)
return list
def getList(dict):
list = []
for key in dict.keys():
list.append(key)
return list
classes = getList(labelsdec)
Thin plotting 10 images:
def imshow(img):
img = img / 2 + 0.5 # unnormalize
plt.imshow(np.transpose(img, (1, 2, 0))) # convert from Tensor image
# obtain one batch of training images
data_iter = iter(train_loader)
images, lbls = data_iter.next()
images = images.numpy() # convert images to numpy for display
# plot the images in the batch, along with the corresponding labels
fig = plt.figure(figsize=(10, 4))
# display 20 images
for idx in np.arange(10):
ax = fig.add_subplot(2, 10/2, idx+1, xticks=[], yticks=[])
imshow(images[idx])
label = lbls[idx]
ax.set_title(classes[lbls[idx]])
Here is how it looks see image

Applying a simple transformation to get a binary image using pytorch

I'd like to binarize image before passing it to the dataloader, I have created a dataset class which works well. but in the __getitem__() method I'd like to threshold the image:
def __getitem__(self, idx):
# Open image, apply transforms and return with label
img_path = os.path.join(self.dir, self.filelist[filename"])
image = Image.open(img_path)
label = self.x_data.iloc[idx]["label"]
# Applying transformation to the image
if self.transforms is not None:
image = self.transforms(image)
# applying threshold here:
my_threshold = 240
image = image.point(lambda p: p < my_threshold and 255)
image = torch.tensor(image)
return image, label
And then I tried to invoke the dataset:
data_transformer = transforms.Compose([
transforms.Resize((10, 10)),
transforms.Grayscale()
//transforms.ToTensor()
])
train_set = MyNewDataset(data_path, data_transformer, rows_train)
Since I have applied the threshold on a PIL object I need to apply afterwards a conversion to a tensor object , but for some reason it crashes. can somebody please assist me?
Why not apply the binarization after the conversion from PIL.Image to torch.Tensor?
class ThresholdTransform(object):
def __init__(self, thr_255):
self.thr = thr_255 / 255. # input threshold for [0..255] gray level, convert to [0..1]
def __call__(self, x):
return (x > self.thr).to(x.dtype) # do not change the data type
Once you have this transformation, you simply add it:
data_transformer = transforms.Compose([
transforms.Resize((10, 10)),
transforms.Grayscale(),
transforms.ToTensor(),
ThresholdTransform(thr_255=240)
])

Keras Image Preprocessing

My training images are downscaled versions of their associated HR image. Thus, the input and the output images aren't the same dimension. For now, I'm using a hand-crafted sample of 13 images, but eventually I would like to be able to use my 500-ish HR (high-resolution) images dataset. This dataset, however, does not have images of the same dimension, so I'm guessing I'll have to crop them in order to obtain a uniform dimension.
I currently have this code set up: it takes a bunch of 512x512x3 images and applies a few transformations to augment the data (flips). I thus obtain a basic set of 39 images in their HR form, and then I downscale them by a factor of 4, thus obtaining my trainset which consits of 39 images of dimension 128x128x3.
import numpy as np
from keras.preprocessing.image import ImageDataGenerator
import matplotlib.image as mpimg
import skimage
from skimage import transform
from constants import data_path
from constants import img_width
from constants import img_height
from model import setUpModel
def setUpImages():
train = []
finalTest = []
sample_amnt = 11
max_amnt = 13
# Extracting images (512x512)
for i in range(sample_amnt):
train.append(mpimg.imread(data_path + str(i) + '.jpg'))
for i in range(max_amnt-sample_amnt):
finalTest.append(mpimg.imread(data_path + str(i+sample_amnt) + '.jpg'))
# # TODO: https://keras.io/preprocessing/image/
# ImageDataGenerator(featurewise_center=False, samplewise_center=False, featurewise_std_normalization=False,
# samplewise_std_normalization=False, zca_whitening=False, zca_epsilon=1e-06, rotation_range=0,
# width_shift_range=0.0, height_shift_range=0.0, brightness_range=None, shear_range=0.0,
# zoom_range=0.0, channel_shift_range=0.0, fill_mode='nearest', cval=0.0, horizontal_flip=False,
# vertical_flip=False, rescale=None, preprocessing_function=None, data_format=None,
# validation_split=0.0, dtype=None)
# Augmenting data
trainData = dataAugmentation(train)
testData = dataAugmentation(finalTest)
setUpData(trainData, testData)
def setUpData(trainData, testData):
# print(type(trainData)) # <class 'numpy.ndarray'>
# print(len(trainData)) # 64
# print(type(trainData[0])) # <class 'numpy.ndarray'>
# print(trainData[0].shape) # (1400, 1400, 3)
# print(trainData[len(trainData)//2-1].shape) # (1400, 1400, 3)
# print(trainData[len(trainData)//2].shape) # (350, 350, 3)
# print(trainData[len(trainData)-1].shape) # (350, 350, 3)
# TODO: substract mean of all images to all images
# Separating the training data
Y_train = trainData[:len(trainData)//2] # First half is the unaltered data
X_train = trainData[len(trainData)//2:] # Second half is the deteriorated data
# Separating the testing data
Y_test = testData[:len(testData)//2] # First half is the unaltered data
X_test = testData[len(testData)//2:] # Second half is the deteriorated data
# Adjusting shapes for Keras input # TODO: make into a function ?
X_train = np.array([x for x in X_train])
Y_train = np.array([x for x in Y_train])
Y_test = np.array([x for x in Y_test])
X_test = np.array([x for x in X_test])
# # Sanity check: display four images (2x HR/LR)
# plt.figure(figsize=(10, 10))
# for i in range(2):
# plt.subplot(2, 2, i + 1)
# plt.imshow(Y_train[i], cmap=plt.cm.binary)
# for i in range(2):
# plt.subplot(2, 2, i + 1 + 2)
# plt.imshow(X_train[i], cmap=plt.cm.binary)
# plt.show()
setUpModel(X_train, Y_train, X_test, Y_test)
# TODO: possibly remove once Keras Preprocessing is integrated?
def dataAugmentation(dataToAugment):
print("Starting to augment data")
arrayToFill = []
# faster computation with values between 0 and 1 ?
dataToAugment = np.divide(dataToAugment, 255.)
# TODO: switch from RGB channels to CbCrY
# # TODO: Try GrayScale
# trainingData = np.array(
# [(cv2.cvtColor(np.uint8(x * 255), cv2.COLOR_BGR2GRAY) / 255).reshape(350, 350, 1) for x in trainingData])
# validateData = np.array(
# [(cv2.cvtColor(np.uint8(x * 255), cv2.COLOR_BGR2GRAY) / 255).reshape(1400, 1400, 1) for x in validateData])
# adding the normal images (8)
for i in range(len(dataToAugment)):
arrayToFill.append(dataToAugment[i])
# vertical axis flip (-> 16)
for i in range(len(arrayToFill)):
arrayToFill.append(np.fliplr(arrayToFill[i]))
# horizontal axis flip (-> 32)
for i in range(len(arrayToFill)):
arrayToFill.append(np.flipud(arrayToFill[i]))
# downsizing by scale of 4 (-> 64 images of 128x128x3)
for i in range(len(arrayToFill)):
arrayToFill.append(skimage.transform.resize(
arrayToFill[i],
(img_width/4, img_height/4),
mode='reflect',
anti_aliasing=True))
# # Sanity check: display the images
# plt.figure(figsize=(10, 10))
# for i in range(64):
# plt.subplot(8, 8, i + 1)
# plt.imshow(arrayToFill[i], cmap=plt.cm.binary)
# plt.show()
return np.array(arrayToFill)
My question is: in my case, can I use the Preprocessing tool that Keras offers? I would ideally like to be able to input my varying sized images of high quality, crop them (not downsize them) to 512x512x3, and data augment them through flips and whatnot. Substracting the mean would also be part of what I'd like to achieve. That set would represent my validation set.
Reusing the validation set, I want to downscale by a factor of 4 all the images, and that would generate my training set.
Those two sets could then be split appropriately to obtain, ultimately, the famous X_train Y_train X_test Y_test.
I'm just hesitant about throwing out all the work I've done so far to preprocess my mini sample, but I'm thinking if it can all be done with a single built-in function, maybe I should give that a go.
This is my first ML project, hence me not understanding very well Keras, and the documentation isn't always the clearest. I'm thinking that the fact that I'm working with a X and Y that are different in size, maybe this function doesn't apply to my project.
Thank you! :)
Yes you can use keras preprocessing function. Below some snippets to help you...
def cropping_function(x):
...
return cropped_image
X_image_gen = ImageDataGenerator(preprocessing_function = cropping_function,
horizontal_flip = True,
vertical_flip=True)
X_train_flow = X_image_gen.flow(X_train, batch_size = 16, seed = 1)
Y_image_gen = ImageDataGenerator(horizontal_flip = True,
vertical_flip=True)
Y_train_flow = Y_image_gen.flow(y_train, batch_size = 16, seed = 1)
train_flow = zip(X_train_flow,Y_train_flow)
model.fit_generator(train_flow)
Christof Henkel's suggestion is very clean and nice. I would just like to offer another way to do it using imgaug, a convenient way to augment images in lots of different ways. It's usefull if you want more implemented augmentations or if you ever need to use some ML library other than Keras.
It unfortunatly doesn't have a way to make crops that way but it allows implementing custom functions. Here is an example function for generating random crops of a set size from an image that's at least as big as the chosen crop size:
from imgaug import augmenters as iaa
def random_crop(images, random_state, parents, hooks):
crop_h, crop_w = 128, 128
new_images = []
for img in images:
if (img.shape[0] >= crop_h) and (img.shape[1] >= crop_w):
rand_h = np.random.randint(0, img.shape[0]-crop_h)
rand_w = np.random.randint(0, img.shape[1]-crop_w)
new_images.append(img[rand_h:rand_h+crop_h, rand_w:rand_w+crop_w])
else:
new_images.append(np.zeros((crop_h, crop_w, 3)))
return np.array(new_images)
def keypoints_dummy(keypoints_on_images, random_state, parents, hooks):
return keypoints_on_images
cropper = iaa.Lambda(func_images=random_crop, func_keypoints=keypoints_dummy)
You can then combine this function with any other builtin imgaug function, for example the flip functions that you're already using like this:
seq = iaa.Sequential([cropper, iaa.Fliplr(0.5), iaa.Flipud(0.5)])
This function could then generate lots of different crops from each image. An example image with some possible results (note that it would result in actual (128, 128, 3) images, they are just merged into one image here for visualization):
Your image set could then be generated by:
crops_per_image = 10
images = [skimage.io.imread(path) for path in glob.glob('train_data/*.jpg')]
augs = np.array([seq.augment_image(img)/255 for img in images for _ in range(crops_per_image)])
It would also be simple to add new functions to be applied to the images, for example the remove mean functions you mentioned.
Here's another way performing random and center crop before resizing using native ImageDataGenerator and flow_from_directory. You can add it as preprocess_crop.py module into your project.
It first resizes image preserving aspect ratio and then performs crop. Resized image size is based on crop_fraction which is hardcoded but can be changed. See crop_fraction = 0.875 line where 0.875 appears to be the most common, e.g. 224px crop from 256px image.
Note that the implementation has been done by monkey patching keras_preprocessing.image.utils.loag_img function as I couldn't find any other way to perform crop before resizing without rewriting many other classes above.
Due to these limitations, the cropping method is enumerated into the interpolation field. Methods are delimited by : where the first part is interpolation and second is crop e.g. lanczos:random. Supported crop methods are none, center, random. When no crop method is specified, none is assumed.
How to use it
Just drop the preprocess_crop.py into your project to enable cropping. The example below shows how you can use random cropping for the training and center cropping for validation:
import preprocess_crop
from keras.preprocessing.image import ImageDataGenerator
from keras.applications.inception_v3 import preprocess_input
#...
# Training with random crop
train_datagen = ImageDataGenerator(
rotation_range=20,
channel_shift_range=20,
horizontal_flip=True,
preprocessing_function=preprocess_input
)
train_img_generator = train_datagen.flow_from_directory(
train_dir,
target_size = (IMG_SIZE, IMG_SIZE),
batch_size = BATCH_SIZE,
class_mode = 'categorical',
interpolation = 'lanczos:random', # <--------- random crop
shuffle = True
)
# Validation with center crop
validate_datagen = ImageDataGenerator(
preprocessing_function=preprocess_input
)
validate_img_generator = validate_datagen.flow_from_directory(
validate_dir,
target_size = (IMG_SIZE, IMG_SIZE),
batch_size = BATCH_SIZE,
class_mode = 'categorical',
interpolation = 'lanczos:center', # <--------- center crop
shuffle = False
)
Here's preprocess_crop.py file to include with your project:
import random
import keras_preprocessing.image
def load_and_crop_img(path, grayscale=False, color_mode='rgb', target_size=None,
interpolation='nearest'):
"""Wraps keras_preprocessing.image.utils.loag_img() and adds cropping.
Cropping method enumarated in interpolation
# Arguments
path: Path to image file.
color_mode: One of "grayscale", "rgb", "rgba". Default: "rgb".
The desired image format.
target_size: Either `None` (default to original size)
or tuple of ints `(img_height, img_width)`.
interpolation: Interpolation and crop methods used to resample and crop the image
if the target size is different from that of the loaded image.
Methods are delimited by ":" where first part is interpolation and second is crop
e.g. "lanczos:random".
Supported interpolation methods are "nearest", "bilinear", "bicubic", "lanczos",
"box", "hamming" By default, "nearest" is used.
Supported crop methods are "none", "center", "random".
# Returns
A PIL Image instance.
# Raises
ImportError: if PIL is not available.
ValueError: if interpolation method is not supported.
"""
# Decode interpolation string. Allowed Crop methods: none, center, random
interpolation, crop = interpolation.split(":") if ":" in interpolation else (interpolation, "none")
if crop == "none":
return keras_preprocessing.image.utils.load_img(path,
grayscale=grayscale,
color_mode=color_mode,
target_size=target_size,
interpolation=interpolation)
# Load original size image using Keras
img = keras_preprocessing.image.utils.load_img(path,
grayscale=grayscale,
color_mode=color_mode,
target_size=None,
interpolation=interpolation)
# Crop fraction of total image
crop_fraction = 0.875
target_width = target_size[1]
target_height = target_size[0]
if target_size is not None:
if img.size != (target_width, target_height):
if crop not in ["center", "random"]:
raise ValueError('Invalid crop method {} specified.', crop)
if interpolation not in keras_preprocessing.image.utils._PIL_INTERPOLATION_METHODS:
raise ValueError(
'Invalid interpolation method {} specified. Supported '
'methods are {}'.format(interpolation,
", ".join(keras_preprocessing.image.utils._PIL_INTERPOLATION_METHODS.keys())))
resample = keras_preprocessing.image.utils._PIL_INTERPOLATION_METHODS[interpolation]
width, height = img.size
# Resize keeping aspect ratio
# result shold be no smaller than the targer size, include crop fraction overhead
target_size_before_crop = (target_width/crop_fraction, target_height/crop_fraction)
ratio = max(target_size_before_crop[0] / width, target_size_before_crop[1] / height)
target_size_before_crop_keep_ratio = int(width * ratio), int(height * ratio)
img = img.resize(target_size_before_crop_keep_ratio, resample=resample)
width, height = img.size
if crop == "center":
left_corner = int(round(width/2)) - int(round(target_width/2))
top_corner = int(round(height/2)) - int(round(target_height/2))
return img.crop((left_corner, top_corner, left_corner + target_width, top_corner + target_height))
elif crop == "random":
left_shift = random.randint(0, int((width - target_width)))
down_shift = random.randint(0, int((height - target_height)))
return img.crop((left_shift, down_shift, target_width + left_shift, target_height + down_shift))
return img
# Monkey patch
keras_preprocessing.image.iterator.load_img = load_and_crop_img

Speed up predictions for Object Detection

I am struggling to get good FPS for my predictions. I am running my predictions on a Tesla K80 and I'd like to speed up my predictions by at least a factor of 20. Here is my code:
def load_detection_graph(PATH_TO_CKPT):
detection_graph = tf.Graph()
with detection_graph.as_default():
od_graph_def = tf.GraphDef()
with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
serialized_graph = fid.read()
od_graph_def.ParseFromString(serialized_graph)
tf.import_graph_def(od_graph_def, name='')
return detection_graph
def load_image_into_numpy_array(image):
'''
convert image to numpy arrays
'''
(im_width, im_height) = image.size
return np.array(image.getdata()).reshape((im_height, im_width, 3)).astype(np.uint8)
def run_inference_for_single_image(image, graph, filename):
with graph.as_default():
with tf.Session() as sess:
# Get handles to input and output tensors
ops = tf.get_default_graph().get_operations()
all_tensor_names = {output.name for op in ops for output in op.outputs}
tensor_dict = {}
for key in [
'num_detections', 'detection_boxes', 'detection_scores',
'detection_classes', 'detection_masks'
]:
tensor_name = key + ':0'
if tensor_name in all_tensor_names:
tensor_dict[key] = tf.get_default_graph().get_tensor_by_name(
tensor_name)
if 'detection_masks' in tensor_dict:
# The following processing is only for single image
detection_boxes = tf.squeeze(tensor_dict['detection_boxes'], [0])
detection_masks = tf.squeeze(tensor_dict['detection_masks'], [0])
# Reframe is required to translate mask from box coordinates to image coordinates and fit the image size.
real_num_detection = tf.cast(tensor_dict['num_detections'][0], tf.int32)
detection_boxes = tf.slice(detection_boxes, [0, 0], [real_num_detection, -1])
detection_masks = tf.slice(detection_masks, [0, 0, 0], [real_num_detection, -1, -1])
detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks(
detection_masks, detection_boxes, image.shape[0], image.shape[1])
detection_masks_reframed = tf.cast(
tf.greater(detection_masks_reframed, 0.5), tf.uint8)
# Follow the convention by adding back the batch dimension
tensor_dict['detection_masks'] = tf.expand_dims(
detection_masks_reframed, 0)
image_tensor = tf.get_default_graph().get_tensor_by_name('image_tensor:0')
# Run inference
output_dict = sess.run(tensor_dict,
feed_dict={image_tensor: np.expand_dims(image, 0)})
# all outputs are float32 numpy arrays, so convert types as appropriate
output_dict['filename'] = filename
output_dict['num_detections'] = int(output_dict['num_detections'][0])
output_dict['detection_classes'] = output_dict[
'detection_classes'][0].astype(np.uint8)
output_dict['detection_boxes'] = output_dict['detection_boxes'][0]
output_dict['detection_scores'] = output_dict['detection_scores'][0]
if 'detection_masks' in output_dict:
output_dict['detection_masks'] = output_dict['detection_masks'][0]
return output_dict
def predict_image(TEST_IMAGE_PATHS, PATH_TO_CKPT, category_index, save_path):
detection_graph = load_detection_graph(PATH_TO_CKPT)
prediction_dict = defaultdict()
start_time = time.time()
for image_path in TEST_IMAGE_PATHS:
toc = time.time()
filename = image_path
image = Image.open(image_path)
# the array based representation of the image will be used later in order to prepare the
# result image with boxes and labels on it.
image_np = load_image_into_numpy_array(image)
# Expand dimensions since the model expects images to have shape: [1, None, None, 3]
image_np_expanded = np.expand_dims(image_np, axis=0)
# Actual detection.
output_dict = run_inference_for_single_image(image_np, detection_graph, filename)
# Visualization of the results of a detection.
vis_util.visualize_boxes_and_labels_on_image_array(
image_np,
output_dict['detection_boxes'],
output_dict['detection_classes'],
output_dict['detection_scores'],
category_index,
instance_masks=output_dict.get('detection_masks'),
use_normalized_coordinates=True,
line_thickness=1)
prediction_dict[filename] = output_dict
plt.figure(figsize=(8,6), dpi=100)
plt.imshow(image_np)
plt.savefig(save_path+'{}'.format(filename))
tic = time.time()
print('{0} saved in {1:.2f}sec'.format(filename, tic-toc))
end_time = time.time()
print('{0:.2f}min to predict all images'.format((end_time-start_time)/60))
with open('../predictions/predictions.pickle', 'wb') as f:
pickle.dump(prediction_dict, f)
return prediction_dict
Right now I am getting about 1.8 sec per detection. That includes saving image and drawing bounding boxes. I do not need to save image or draw bounding boxes, I just need the output_dict. Any advice on how to speed this up?
Session creation is the most costly operation, dont re-create it everytime, try to re-use the session object
check this - run_inference_for_single_image(image, graph) - Tensorflow, object detection
I observed that using skimage.io.imread() or cv2.imread() is pretty fast in loading images. These functions directly load images as numpy arrays. So you can skip "image = Image.open(image_path)" and "image_np = load_image_into_numpy_array(image)". Just make sure "image_tensor" in sess.run gets the correct dimension.
Also skimage or opencv are faster than matplotlib for saving images

Resources