Tensorflow Endless eval [duplicate] - python-3.x

i am loading the cifar-10 data set , the methods adds the data to tensor array , so to access the data i used .eval() with session , on a normal tf constant it return the value , but on the labels and the train set which are tf array it wont
1- i am using docker tensorflow-jupyter
2- it uses python 3
3- the batch file must be added to data folder
i am using the first batch [data_batch_1.bin]from this file
http://www.cs.toronto.edu/~kriz/cifar-10-binary.tar.gz
As notebook:
https://drive.google.com/open?id=0B_AFMME1kY1obkk1YmJHcjV0ODA
The code[As in tensorflow site but modified to read 1 patch] [check the last 7 lines for the data loading] :
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import urllib
import tensorflow as tf
from six.moves import xrange # pylint: disable=redefined-builtin
# Global constants describing the CIFAR-10 data set.
NUM_CLASSES = 10
NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN = 5000
NUM_EXAMPLES_PER_EPOCH_FOR_EVAL = 1000
IMAGE_SIZE = 32
def _generate_image_and_label_batch(image, label, min_queue_examples,
batch_size, shuffle):
"""Construct a queued batch of images and labels.
Args:
image: 3-D Tensor of [height, width, 3] of type.float32.
label: 1-D Tensor of type.int32
min_queue_examples: int32, minimum number of samples to retain
in the queue that provides of batches of examples.
batch_size: Number of images per batch.
shuffle: boolean indicating whether to use a shuffling queue.
Returns:
images: Images. 4D tensor of [batch_size, height, width, 3] size.
labels: Labels. 1D tensor of [batch_size] size.
"""
# Create a queue that shuffles the examples, and then
# read 'batch_size' images + labels from the example queue.
num_preprocess_threads = 2
if shuffle:
images, label_batch = tf.train.shuffle_batch(
[image, label],
batch_size=batch_size,
num_threads=num_preprocess_threads,
capacity=min_queue_examples + 3 * batch_size,
min_after_dequeue=min_queue_examples)
else:
images, label_batch = tf.train.batch(
[image, label],
batch_size=batch_size,
num_threads=num_preprocess_threads,
capacity=min_queue_examples + 3 * batch_size)
# Display the training images in the visualizer.
tf.image_summary('images', images)
return images, tf.reshape(label_batch, [batch_size])
def read_cifar10(filename_queue):
"""Reads and parses examples from CIFAR10 data files.
Recommendation: if you want N-way read parallelism, call this function
N times. This will give you N independent Readers reading different
files & positions within those files, which will give better mixing of
examples.
Args:
filename_queue: A queue of strings with the filenames to read from.
Returns:
An object representing a single example, with the following fields:
height: number of rows in the result (32)
width: number of columns in the result (32)
depth: number of color channels in the result (3)
key: a scalar string Tensor describing the filename & record number
for this example.
label: an int32 Tensor with the label in the range 0..9.
uint8image: a [height, width, depth] uint8 Tensor with the image data
"""
class CIFAR10Record(object):
pass
result = CIFAR10Record()
# Dimensions of the images in the CIFAR-10 dataset.
# See http://www.cs.toronto.edu/~kriz/cifar.html for a description of the
# input format.
label_bytes = 1 # 2 for CIFAR-100
result.height = 32
result.width = 32
result.depth = 3
image_bytes = result.height * result.width * result.depth
# Every record consists of a label followed by the image, with a
# fixed number of bytes for each.
record_bytes = label_bytes + image_bytes
# Read a record, getting filenames from the filename_queue. No
# header or footer in the CIFAR-10 format, so we leave header_bytes
# and footer_bytes at their default of 0.
reader = tf.FixedLengthRecordReader(record_bytes=record_bytes)
result.key, value = reader.read(filename_queue)
# Convert from a string to a vector of uint8 that is record_bytes long.
record_bytes = tf.decode_raw(value, tf.uint8)
# The first bytes represent the label, which we convert from uint8->int32.
result.label = tf.cast(
tf.slice(record_bytes, [0], [label_bytes]), tf.int32)
# The remaining bytes after the label represent the image, which we reshape
# from [depth * height * width] to [depth, height, width].
depth_major = tf.reshape(tf.slice(record_bytes, [label_bytes], [image_bytes]),
[result.depth, result.height, result.width])
# Convert from [depth, height, width] to [height, width, depth].
result.uint8image = tf.transpose(depth_major, [1, 2, 0])
return result
def inputs(eval_data, data_dir, batch_size):
"""Construct input for CIFAR evaluation using the Reader ops.
Args:
eval_data: bool, indicating if one should use the train or eval data set.
data_dir: Path to the CIFAR-10 data directory.
batch_size: Number of images per batch.
Returns:
images: Images. 4D tensor of [batch_size, IMAGE_SIZE, IMAGE_SIZE, 3] size.
labels: Labels. 1D tensor of [batch_size] size.
"""
filenames=[];
filenames.append(os.path.join(data_dir, 'data_batch_1.bin') )
num_examples_per_epoch = NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN
print(filenames)
# Create a queue that produces the filenames to read.
filename_queue = tf.train.string_input_producer(filenames)
# Read examples from files in the filename queue.
read_input = read_cifar10(filename_queue)
reshaped_image = tf.cast(read_input.uint8image, tf.float32)
height = IMAGE_SIZE
width = IMAGE_SIZE
# Image processing for evaluation.
# Crop the central [height, width] of the image.
resized_image = tf.image.resize_image_with_crop_or_pad(reshaped_image,
width, height)
# Subtract off the mean and divide by the variance of the pixels.
float_image = tf.image.per_image_whitening(resized_image)
# Ensure that the random shuffling has good mixing properties.
min_fraction_of_examples_in_queue = 0.4
min_queue_examples = int(num_examples_per_epoch *
min_fraction_of_examples_in_queue)
# Generate a batch of images and labels by building up a queue of examples.
return _generate_image_and_label_batch(float_image, read_input.label,
min_queue_examples, batch_size,
shuffle=False)
sess = tf.InteractiveSession()
train_data,train_labels = inputs(False,"data",6000)
print (train_data,train_labels)
train_data=train_data.eval()
train_labels=train_labels.eval()
print(train_data)
print(train_labels)
sess.close()

You must call tf.train.start_queue_runners(sess) before you call train_data.eval() or train_labels.eval().
This is a(n unfortunate) consequence of how TensorFlow input pipelines are implemented: the tf.train.string_input_producer(), tf.train.shuffle_batch(), and tf.train.batch() functions internally create queues that buffer records between different stages in the input pipeline. The tf.train.start_queue_runners() call tells TensorFlow to start fetching records into these buffers; without calling it the buffers remain empty and eval() hangs indefinitely.

Related

What would be the equivalent of keras.layers.Masking in pytorch?

I have time-series sequences which I needed to keep the length of sequences fixed to a number by padding zeroes into matrix and using keras.layers.Masking in keras I could neglect those padded zeros for further computations, I am wondering how could it be done in Pytorch?
Either I need to do the padding in pytroch and pytorch can't handle the sequences with varying lengths what is the equivalent to Masking layer of keras in pytorch, or if pytorch handles the sequences with varying lengths, how could it be done?
You can use PackedSequence class as equivalent to keras masking. you can find more features at torch.nn.utils.rnn
Here putting example from packing for variable-length sequence inputs for rnn
import torch
import torch.nn as nn
from torch.autograd import Variable
batch_size = 3
max_length = 3
hidden_size = 2
n_layers =1
# container
batch_in = torch.zeros((batch_size, 1, max_length))
#data
vec_1 = torch.FloatTensor([[1, 2, 3]])
vec_2 = torch.FloatTensor([[1, 2, 0]])
vec_3 = torch.FloatTensor([[1, 0, 0]])
batch_in[0] = vec_1
batch_in[1] = vec_2
batch_in[2] = vec_3
batch_in = Variable(batch_in)
seq_lengths = [3,2,1] # list of integers holding information about the batch size at each sequence step
# pack it
pack = torch.nn.utils.rnn.pack_padded_sequence(batch_in, seq_lengths, batch_first=True)
>>> pack
PackedSequence(data=Variable containing:
1 2 3
1 2 0
1 0 0
[torch.FloatTensor of size 3x3]
, batch_sizes=[3])
# initialize
rnn = nn.RNN(max_length, hidden_size, n_layers, batch_first=True)
h0 = Variable(torch.randn(n_layers, batch_size, hidden_size))
#forward
out, _ = rnn(pack, h0)
# unpack
unpacked, unpacked_len = torch.nn.utils.rnn.pad_packed_sequence(out)
>>> unpacked
Variable containing:
(0 ,.,.) =
-0.7883 -0.7972
0.3367 -0.6102
0.1502 -0.4654
[torch.FloatTensor of size 1x3x2]
more you would find this article useful. [Jum to Title - "How the PackedSequence object works"] - link
You can use a packed sequence to mask a timestep in the sequence dimension:
batch_mask = ... # boolean mask e.g. (seq x batch)
# move `padding` at right place then it will be cut when packing
compact_seq = torch.zeros_like(x)
for i, seq_len in enumerate(batch_mask.sum(0)):
compact_seq[:seq_len, i] = x[batch_mask[:,i],i]
# pack in sequence dimension (the number of agents)
packed_x = pack_padded_sequence(compact_seq, batch_mask.sum(0).cpu().numpy(), enforce_sorted=False)
packed_scores, rnn_hxs = nn.GRU(packed_x, rnn_hxs)
# restore sequence dimension
scores, _ = pad_packed_sequence(packed_scores)
# restore order, moving padding in its place
scores = torch.zeros((*batch_mask.shape,scores.size(-1))).to(scores.device).masked_scatter(batch_mask.unsqueeze(-1), scores)
instead use a mask select/scatter to mask in the batch dimension:
batch_mask = torch.any(x, -1).unsqueeze(-1) # boolean mask (batch,1)
batch_x = torch.masked_select(x, batch_mask).reshape(-1, x.size(-1))
batch_rnn_hxs = torch.masked_select(rnn_hxs, batch_mask).reshape(-1, rnn_hxs.size(-1))
batch_rnn_hxs = nn.GRUCell(batch_x, batch_rnn_hxs)
rnn_hxs = rnn_hxs.masked_scatter(batch_mask, batch_rnn_hxs) # restore batch
Note that using scatter function is safe for gradient backpropagation

Overlapped predictions on segmented image

Context and examples of symptoms
I am using a neural network to do super-resolution (increase the resolution of images). However, since an image can be big, I need to segment it in multiple smaller images and make predictions on each one of those separately before merging the result back together.
Here are examples of what this gives me:
Example 1: you can see a subtle vertical line passing through the shoulder of the skier in the output picture.
Example 2: once you start seeing them, you'll notice that the subtle lines are forming squares throughout the whole image (remnants of the way I segmented the image for individual predictions).
Example 3: you can clearly see the vertical line crossing the lake.
Source of the problem
Basically, my network makes poor predictions along the edges, which I believe is normal since there is less "surrounding" information.
Source code
import numpy as np
import matplotlib.pyplot as plt
import skimage.io
from keras.models import load_model
from constants import verbosity, save_dir, overlap, \
model_name, tests_path, input_width, input_height
from utils import float_im
def predict(args):
model = load_model(save_dir + '/' + args.model)
image = skimage.io.imread(tests_path + args.image)[:, :, :3] # removing possible extra channels (Alpha)
print("Image shape:", image.shape)
predictions = []
images = []
crops = seq_crop(image) # crops into multiple sub-parts the image based on 'input_' constants
for i in range(len(crops)): # amount of vertical crops
for j in range(len(crops[0])): # amount of horizontal crops
current_image = crops[i][j]
images.append(current_image)
print("Moving on to predictions. Amount:", len(images))
for p in range(len(images)):
if p%3 == 0 and verbosity == 2:
print("--prediction #", p)
# Hack because GPU can only handle one image at a time
input_img = (np.expand_dims(images[p], 0)) # Add the image to a batch where it's the only member
predictions.append(model.predict(input_img)[0]) # returns a list of lists, one for each image in the batch
return predictions, image, crops
def show_pred_output(input, pred):
plt.figure(figsize=(20, 20))
plt.suptitle("Results")
plt.subplot(1, 2, 1)
plt.title("Input : " + str(input.shape[1]) + "x" + str(input.shape[0]))
plt.imshow(input, cmap=plt.cm.binary).axes.get_xaxis().set_visible(False)
plt.subplot(1, 2, 2)
plt.title("Output : " + str(pred.shape[1]) + "x" + str(pred.shape[0]))
plt.imshow(pred, cmap=plt.cm.binary).axes.get_xaxis().set_visible(False)
plt.show()
# adapted from https://stackoverflow.com/a/52463034/9768291
def seq_crop(img):
"""
To crop the whole image in a list of sub-images of the same size.
Size comes from "input_" variables in the 'constants' (Evaluation).
Padding with 0 the Bottom and Right image.
:param img: input image
:return: list of sub-images with defined size
"""
width_shape = ceildiv(img.shape[1], input_width)
height_shape = ceildiv(img.shape[0], input_height)
sub_images = [] # will contain all the cropped sub-parts of the image
for j in range(height_shape):
horizontal = []
for i in range(width_shape):
horizontal.append(crop_precise(img, i*input_width, j*input_height, input_width, input_height))
sub_images.append(horizontal)
return sub_images
def crop_precise(img, coord_x, coord_y, width_length, height_length):
"""
To crop a precise portion of an image.
When trying to crop outside of the boundaries, the input to padded with zeros.
:param img: image to crop
:param coord_x: width coordinate (top left point)
:param coord_y: height coordinate (top left point)
:param width_length: width of the cropped portion starting from coord_x
:param height_length: height of the cropped portion starting from coord_y
:return: the cropped part of the image
"""
tmp_img = img[coord_y:coord_y + height_length, coord_x:coord_x + width_length]
return float_im(tmp_img) # From [0,255] to [0.,1.]
# from https://stackoverflow.com/a/17511341/9768291
def ceildiv(a, b):
return -(-a // b)
# adapted from https://stackoverflow.com/a/52733370/9768291
def reconstruct(predictions, crops):
# unflatten predictions
def nest(data, template):
data = iter(data)
return [[next(data) for _ in row] for row in template]
if len(crops) != 0:
predictions = nest(predictions, crops)
H = np.cumsum([x[0].shape[0] for x in predictions])
W = np.cumsum([x.shape[1] for x in predictions[0]])
D = predictions[0][0]
recon = np.empty((H[-1], W[-1], D.shape[2]), D.dtype)
for rd, rs in zip(np.split(recon, H[:-1], 0), predictions):
for d, s in zip(np.split(rd, W[:-1], 1), rs):
d[...] = s
return recon
if __name__ == '__main__':
print(" - ", args)
preds, original, crops = predict(args) # returns the predictions along with the original
enhanced = reconstruct(preds, crops) # reconstructs the enhanced image from predictions
plt.imsave('output/' + args.save, enhanced, cmap=plt.cm.gray)
show_pred_output(original, enhanced)
The question (what I want)
There are many obvious naive approaches to solving this problem, but I'm convinced there must be a very concise way of doing it: how do I add an overlap_amount variable which would allow me to make overlapped predictions, thus discarding the "edge parts" of each sub-image ("segments") and replacing it with the result of the predictions on the segments surrounding it (since they would not contain "edge-predictions")?
I, of course, want to minimize the amount of "useless" predictions (pixels to be discarded). It might also be worth noting that the input segments produce an output segment which is 4 times bigger (i.e. if it was a 20x20 pixels image, you now get a 80x80 pixels image as output).
I solved a similiar problem by moving inference into the CPU. It was much, much slower but at least in my case solved the patch border problems better than overlapping ROI voting- or discarding based approaches I also tested.
Assuming you are using the Tensorflow backend:
from tensorflow.python import device
with device('cpu:0')
prediction = model.predict(...)
Of course assuming that you have enough RAM to fit your model. Comment below if that is not the case and I'll check out if there's something in my code that could be used here.
Solved it through a naive approach. It could be much better, but at least this works.
The process
Basically, it takes the initial image, then adds a padding around it, then crops it in multiple sub-images which are all lined up into an array. The crops are done so that all images overlap their surrounding neighbours as well.
Then, each image is fed into the network and the predictions are collected (4x on the resolution of the image, basically, in this case). When reconstructing the image, each prediction is taken individually and it's edge is cropped out (since it contains errors). The cropping is done so that the gluing of all the predictions ends up with no overlap, and only the middle parts of the predictions coming from the neural network are stuck together.
Finally, the surrounding padding is removed.
Result
No more line! :D
Code
import numpy as np
import matplotlib.pyplot as plt
import skimage.io
from keras.models import load_model
from constants import verbosity, save_dir, overlap, \
model_name, tests_path, input_width, input_height, scale_fact
from utils import float_im
def predict(args):
"""
Super-resolution on the input image using the model.
:param args:
:return:
'predictions' contains an array of every single cropped sub-image once enhanced (the outputs of the model).
'image' is the original image, untouched.
'crops' is the array of every single cropped sub-image that will be used as input to the model.
"""
model = load_model(save_dir + '/' + args.model)
image = skimage.io.imread(tests_path + args.image)[:, :, :3] # removing possible extra channels (Alpha)
print("Image shape:", image.shape)
predictions = []
images = []
# Padding and cropping the image
overlap_pad = (overlap, overlap) # padding tuple
pad_width = (overlap_pad, overlap_pad, (0, 0)) # assumes color channel as last
padded_image = np.pad(image, pad_width, 'constant') # padding the border
crops = seq_crop(padded_image) # crops into multiple sub-parts the image based on 'input_' constants
# Arranging the divided image into a single-dimension array of sub-images
for i in range(len(crops)): # amount of vertical crops
for j in range(len(crops[0])): # amount of horizontal crops
current_image = crops[i][j]
images.append(current_image)
print("Moving on to predictions. Amount:", len(images))
upscaled_overlap = overlap * 2
for p in range(len(images)):
if p % 3 == 0 and verbosity == 2:
print("--prediction #", p)
# Hack due to some GPUs that can only handle one image at a time
input_img = (np.expand_dims(images[p], 0)) # Add the image to a batch where it's the only member
pred = model.predict(input_img)[0] # returns a list of lists, one for each image in the batch
# Cropping the useless parts of the overlapped predictions (to prevent the repeated erroneous edge-prediction)
pred = pred[upscaled_overlap:pred.shape[0]-upscaled_overlap, upscaled_overlap:pred.shape[1]-upscaled_overlap]
predictions.append(pred)
return predictions, image, crops
def show_pred_output(input, pred):
plt.figure(figsize=(20, 20))
plt.suptitle("Results")
plt.subplot(1, 2, 1)
plt.title("Input : " + str(input.shape[1]) + "x" + str(input.shape[0]))
plt.imshow(input, cmap=plt.cm.binary).axes.get_xaxis().set_visible(False)
plt.subplot(1, 2, 2)
plt.title("Output : " + str(pred.shape[1]) + "x" + str(pred.shape[0]))
plt.imshow(pred, cmap=plt.cm.binary).axes.get_xaxis().set_visible(False)
plt.show()
# adapted from https://stackoverflow.com/a/52463034/9768291
def seq_crop(img):
"""
To crop the whole image in a list of sub-images of the same size.
Size comes from "input_" variables in the 'constants' (Evaluation).
Padding with 0 the Bottom and Right image.
:param img: input image
:return: list of sub-images with defined size (as per 'constants')
"""
sub_images = [] # will contain all the cropped sub-parts of the image
j, shifted_height = 0, 0
while shifted_height < (img.shape[0] - input_height):
horizontal = []
shifted_height = j * (input_height - overlap)
i, shifted_width = 0, 0
while shifted_width < (img.shape[1] - input_width):
shifted_width = i * (input_width - overlap)
horizontal.append(crop_precise(img,
shifted_width,
shifted_height,
input_width,
input_height))
i += 1
sub_images.append(horizontal)
j += 1
return sub_images
def crop_precise(img, coord_x, coord_y, width_length, height_length):
"""
To crop a precise portion of an image.
When trying to crop outside of the boundaries, the input to padded with zeros.
:param img: image to crop
:param coord_x: width coordinate (top left point)
:param coord_y: height coordinate (top left point)
:param width_length: width of the cropped portion starting from coord_x (toward right)
:param height_length: height of the cropped portion starting from coord_y (toward bottom)
:return: the cropped part of the image
"""
tmp_img = img[coord_y:coord_y + height_length, coord_x:coord_x + width_length]
return float_im(tmp_img) # From [0,255] to [0.,1.]
# adapted from https://stackoverflow.com/a/52733370/9768291
def reconstruct(predictions, crops):
"""
Used to reconstruct a whole image from an array of mini-predictions.
The image had to be split in sub-images because the GPU's memory
couldn't handle the prediction on a whole image.
:param predictions: an array of upsampled images, from left to right, top to bottom.
:param crops: 2D array of the cropped images
:return: the reconstructed image as a whole
"""
# unflatten predictions
def nest(data, template):
data = iter(data)
return [[next(data) for _ in row] for row in template]
if len(crops) != 0:
predictions = nest(predictions, crops)
# At this point "predictions" is a 3D image of the individual outputs
H = np.cumsum([x[0].shape[0] for x in predictions])
W = np.cumsum([x.shape[1] for x in predictions[0]])
D = predictions[0][0]
recon = np.empty((H[-1], W[-1], D.shape[2]), D.dtype)
for rd, rs in zip(np.split(recon, H[:-1], 0), predictions):
for d, s in zip(np.split(rd, W[:-1], 1), rs):
d[...] = s
# Removing the pad from the reconstruction
tmp_overlap = overlap * (scale_fact - 1) # using "-2" leaves the outer edge-prediction error
return recon[tmp_overlap:recon.shape[0]-tmp_overlap, tmp_overlap:recon.shape[1]-tmp_overlap]
if __name__ == '__main__':
print(" - ", args)
preds, original, crops = predict(args) # returns the predictions along with the original
enhanced = reconstruct(preds, crops) # reconstructs the enhanced image from predictions
# Save and display the result
plt.imsave('output/' + args.save, enhanced, cmap=plt.cm.gray)
show_pred_output(original, enhanced)
Constants and extra bits
verbosity = 2
input_width = 64
input_height = 64
overlap = 16
scale_fact = 4
def float_im(img):
return np.divide(img, 255.)
Alternative
A possibly better alternative which you might want to consider if you run into the same kind of problem as me; it's the same basic idea, but more polished and perfected.

Keras Image Preprocessing

My training images are downscaled versions of their associated HR image. Thus, the input and the output images aren't the same dimension. For now, I'm using a hand-crafted sample of 13 images, but eventually I would like to be able to use my 500-ish HR (high-resolution) images dataset. This dataset, however, does not have images of the same dimension, so I'm guessing I'll have to crop them in order to obtain a uniform dimension.
I currently have this code set up: it takes a bunch of 512x512x3 images and applies a few transformations to augment the data (flips). I thus obtain a basic set of 39 images in their HR form, and then I downscale them by a factor of 4, thus obtaining my trainset which consits of 39 images of dimension 128x128x3.
import numpy as np
from keras.preprocessing.image import ImageDataGenerator
import matplotlib.image as mpimg
import skimage
from skimage import transform
from constants import data_path
from constants import img_width
from constants import img_height
from model import setUpModel
def setUpImages():
train = []
finalTest = []
sample_amnt = 11
max_amnt = 13
# Extracting images (512x512)
for i in range(sample_amnt):
train.append(mpimg.imread(data_path + str(i) + '.jpg'))
for i in range(max_amnt-sample_amnt):
finalTest.append(mpimg.imread(data_path + str(i+sample_amnt) + '.jpg'))
# # TODO: https://keras.io/preprocessing/image/
# ImageDataGenerator(featurewise_center=False, samplewise_center=False, featurewise_std_normalization=False,
# samplewise_std_normalization=False, zca_whitening=False, zca_epsilon=1e-06, rotation_range=0,
# width_shift_range=0.0, height_shift_range=0.0, brightness_range=None, shear_range=0.0,
# zoom_range=0.0, channel_shift_range=0.0, fill_mode='nearest', cval=0.0, horizontal_flip=False,
# vertical_flip=False, rescale=None, preprocessing_function=None, data_format=None,
# validation_split=0.0, dtype=None)
# Augmenting data
trainData = dataAugmentation(train)
testData = dataAugmentation(finalTest)
setUpData(trainData, testData)
def setUpData(trainData, testData):
# print(type(trainData)) # <class 'numpy.ndarray'>
# print(len(trainData)) # 64
# print(type(trainData[0])) # <class 'numpy.ndarray'>
# print(trainData[0].shape) # (1400, 1400, 3)
# print(trainData[len(trainData)//2-1].shape) # (1400, 1400, 3)
# print(trainData[len(trainData)//2].shape) # (350, 350, 3)
# print(trainData[len(trainData)-1].shape) # (350, 350, 3)
# TODO: substract mean of all images to all images
# Separating the training data
Y_train = trainData[:len(trainData)//2] # First half is the unaltered data
X_train = trainData[len(trainData)//2:] # Second half is the deteriorated data
# Separating the testing data
Y_test = testData[:len(testData)//2] # First half is the unaltered data
X_test = testData[len(testData)//2:] # Second half is the deteriorated data
# Adjusting shapes for Keras input # TODO: make into a function ?
X_train = np.array([x for x in X_train])
Y_train = np.array([x for x in Y_train])
Y_test = np.array([x for x in Y_test])
X_test = np.array([x for x in X_test])
# # Sanity check: display four images (2x HR/LR)
# plt.figure(figsize=(10, 10))
# for i in range(2):
# plt.subplot(2, 2, i + 1)
# plt.imshow(Y_train[i], cmap=plt.cm.binary)
# for i in range(2):
# plt.subplot(2, 2, i + 1 + 2)
# plt.imshow(X_train[i], cmap=plt.cm.binary)
# plt.show()
setUpModel(X_train, Y_train, X_test, Y_test)
# TODO: possibly remove once Keras Preprocessing is integrated?
def dataAugmentation(dataToAugment):
print("Starting to augment data")
arrayToFill = []
# faster computation with values between 0 and 1 ?
dataToAugment = np.divide(dataToAugment, 255.)
# TODO: switch from RGB channels to CbCrY
# # TODO: Try GrayScale
# trainingData = np.array(
# [(cv2.cvtColor(np.uint8(x * 255), cv2.COLOR_BGR2GRAY) / 255).reshape(350, 350, 1) for x in trainingData])
# validateData = np.array(
# [(cv2.cvtColor(np.uint8(x * 255), cv2.COLOR_BGR2GRAY) / 255).reshape(1400, 1400, 1) for x in validateData])
# adding the normal images (8)
for i in range(len(dataToAugment)):
arrayToFill.append(dataToAugment[i])
# vertical axis flip (-> 16)
for i in range(len(arrayToFill)):
arrayToFill.append(np.fliplr(arrayToFill[i]))
# horizontal axis flip (-> 32)
for i in range(len(arrayToFill)):
arrayToFill.append(np.flipud(arrayToFill[i]))
# downsizing by scale of 4 (-> 64 images of 128x128x3)
for i in range(len(arrayToFill)):
arrayToFill.append(skimage.transform.resize(
arrayToFill[i],
(img_width/4, img_height/4),
mode='reflect',
anti_aliasing=True))
# # Sanity check: display the images
# plt.figure(figsize=(10, 10))
# for i in range(64):
# plt.subplot(8, 8, i + 1)
# plt.imshow(arrayToFill[i], cmap=plt.cm.binary)
# plt.show()
return np.array(arrayToFill)
My question is: in my case, can I use the Preprocessing tool that Keras offers? I would ideally like to be able to input my varying sized images of high quality, crop them (not downsize them) to 512x512x3, and data augment them through flips and whatnot. Substracting the mean would also be part of what I'd like to achieve. That set would represent my validation set.
Reusing the validation set, I want to downscale by a factor of 4 all the images, and that would generate my training set.
Those two sets could then be split appropriately to obtain, ultimately, the famous X_train Y_train X_test Y_test.
I'm just hesitant about throwing out all the work I've done so far to preprocess my mini sample, but I'm thinking if it can all be done with a single built-in function, maybe I should give that a go.
This is my first ML project, hence me not understanding very well Keras, and the documentation isn't always the clearest. I'm thinking that the fact that I'm working with a X and Y that are different in size, maybe this function doesn't apply to my project.
Thank you! :)
Yes you can use keras preprocessing function. Below some snippets to help you...
def cropping_function(x):
...
return cropped_image
X_image_gen = ImageDataGenerator(preprocessing_function = cropping_function,
horizontal_flip = True,
vertical_flip=True)
X_train_flow = X_image_gen.flow(X_train, batch_size = 16, seed = 1)
Y_image_gen = ImageDataGenerator(horizontal_flip = True,
vertical_flip=True)
Y_train_flow = Y_image_gen.flow(y_train, batch_size = 16, seed = 1)
train_flow = zip(X_train_flow,Y_train_flow)
model.fit_generator(train_flow)
Christof Henkel's suggestion is very clean and nice. I would just like to offer another way to do it using imgaug, a convenient way to augment images in lots of different ways. It's usefull if you want more implemented augmentations or if you ever need to use some ML library other than Keras.
It unfortunatly doesn't have a way to make crops that way but it allows implementing custom functions. Here is an example function for generating random crops of a set size from an image that's at least as big as the chosen crop size:
from imgaug import augmenters as iaa
def random_crop(images, random_state, parents, hooks):
crop_h, crop_w = 128, 128
new_images = []
for img in images:
if (img.shape[0] >= crop_h) and (img.shape[1] >= crop_w):
rand_h = np.random.randint(0, img.shape[0]-crop_h)
rand_w = np.random.randint(0, img.shape[1]-crop_w)
new_images.append(img[rand_h:rand_h+crop_h, rand_w:rand_w+crop_w])
else:
new_images.append(np.zeros((crop_h, crop_w, 3)))
return np.array(new_images)
def keypoints_dummy(keypoints_on_images, random_state, parents, hooks):
return keypoints_on_images
cropper = iaa.Lambda(func_images=random_crop, func_keypoints=keypoints_dummy)
You can then combine this function with any other builtin imgaug function, for example the flip functions that you're already using like this:
seq = iaa.Sequential([cropper, iaa.Fliplr(0.5), iaa.Flipud(0.5)])
This function could then generate lots of different crops from each image. An example image with some possible results (note that it would result in actual (128, 128, 3) images, they are just merged into one image here for visualization):
Your image set could then be generated by:
crops_per_image = 10
images = [skimage.io.imread(path) for path in glob.glob('train_data/*.jpg')]
augs = np.array([seq.augment_image(img)/255 for img in images for _ in range(crops_per_image)])
It would also be simple to add new functions to be applied to the images, for example the remove mean functions you mentioned.
Here's another way performing random and center crop before resizing using native ImageDataGenerator and flow_from_directory. You can add it as preprocess_crop.py module into your project.
It first resizes image preserving aspect ratio and then performs crop. Resized image size is based on crop_fraction which is hardcoded but can be changed. See crop_fraction = 0.875 line where 0.875 appears to be the most common, e.g. 224px crop from 256px image.
Note that the implementation has been done by monkey patching keras_preprocessing.image.utils.loag_img function as I couldn't find any other way to perform crop before resizing without rewriting many other classes above.
Due to these limitations, the cropping method is enumerated into the interpolation field. Methods are delimited by : where the first part is interpolation and second is crop e.g. lanczos:random. Supported crop methods are none, center, random. When no crop method is specified, none is assumed.
How to use it
Just drop the preprocess_crop.py into your project to enable cropping. The example below shows how you can use random cropping for the training and center cropping for validation:
import preprocess_crop
from keras.preprocessing.image import ImageDataGenerator
from keras.applications.inception_v3 import preprocess_input
#...
# Training with random crop
train_datagen = ImageDataGenerator(
rotation_range=20,
channel_shift_range=20,
horizontal_flip=True,
preprocessing_function=preprocess_input
)
train_img_generator = train_datagen.flow_from_directory(
train_dir,
target_size = (IMG_SIZE, IMG_SIZE),
batch_size = BATCH_SIZE,
class_mode = 'categorical',
interpolation = 'lanczos:random', # <--------- random crop
shuffle = True
)
# Validation with center crop
validate_datagen = ImageDataGenerator(
preprocessing_function=preprocess_input
)
validate_img_generator = validate_datagen.flow_from_directory(
validate_dir,
target_size = (IMG_SIZE, IMG_SIZE),
batch_size = BATCH_SIZE,
class_mode = 'categorical',
interpolation = 'lanczos:center', # <--------- center crop
shuffle = False
)
Here's preprocess_crop.py file to include with your project:
import random
import keras_preprocessing.image
def load_and_crop_img(path, grayscale=False, color_mode='rgb', target_size=None,
interpolation='nearest'):
"""Wraps keras_preprocessing.image.utils.loag_img() and adds cropping.
Cropping method enumarated in interpolation
# Arguments
path: Path to image file.
color_mode: One of "grayscale", "rgb", "rgba". Default: "rgb".
The desired image format.
target_size: Either `None` (default to original size)
or tuple of ints `(img_height, img_width)`.
interpolation: Interpolation and crop methods used to resample and crop the image
if the target size is different from that of the loaded image.
Methods are delimited by ":" where first part is interpolation and second is crop
e.g. "lanczos:random".
Supported interpolation methods are "nearest", "bilinear", "bicubic", "lanczos",
"box", "hamming" By default, "nearest" is used.
Supported crop methods are "none", "center", "random".
# Returns
A PIL Image instance.
# Raises
ImportError: if PIL is not available.
ValueError: if interpolation method is not supported.
"""
# Decode interpolation string. Allowed Crop methods: none, center, random
interpolation, crop = interpolation.split(":") if ":" in interpolation else (interpolation, "none")
if crop == "none":
return keras_preprocessing.image.utils.load_img(path,
grayscale=grayscale,
color_mode=color_mode,
target_size=target_size,
interpolation=interpolation)
# Load original size image using Keras
img = keras_preprocessing.image.utils.load_img(path,
grayscale=grayscale,
color_mode=color_mode,
target_size=None,
interpolation=interpolation)
# Crop fraction of total image
crop_fraction = 0.875
target_width = target_size[1]
target_height = target_size[0]
if target_size is not None:
if img.size != (target_width, target_height):
if crop not in ["center", "random"]:
raise ValueError('Invalid crop method {} specified.', crop)
if interpolation not in keras_preprocessing.image.utils._PIL_INTERPOLATION_METHODS:
raise ValueError(
'Invalid interpolation method {} specified. Supported '
'methods are {}'.format(interpolation,
", ".join(keras_preprocessing.image.utils._PIL_INTERPOLATION_METHODS.keys())))
resample = keras_preprocessing.image.utils._PIL_INTERPOLATION_METHODS[interpolation]
width, height = img.size
# Resize keeping aspect ratio
# result shold be no smaller than the targer size, include crop fraction overhead
target_size_before_crop = (target_width/crop_fraction, target_height/crop_fraction)
ratio = max(target_size_before_crop[0] / width, target_size_before_crop[1] / height)
target_size_before_crop_keep_ratio = int(width * ratio), int(height * ratio)
img = img.resize(target_size_before_crop_keep_ratio, resample=resample)
width, height = img.size
if crop == "center":
left_corner = int(round(width/2)) - int(round(target_width/2))
top_corner = int(round(height/2)) - int(round(target_height/2))
return img.crop((left_corner, top_corner, left_corner + target_width, top_corner + target_height))
elif crop == "random":
left_shift = random.randint(0, int((width - target_width)))
down_shift = random.randint(0, int((height - target_height)))
return img.crop((left_shift, down_shift, target_width + left_shift, target_height + down_shift))
return img
# Monkey patch
keras_preprocessing.image.iterator.load_img = load_and_crop_img

Mask rcnn not working for images with large resolution

I used Mask-Rcnn for training an image set (Note with high resolution Eg:2400*1920 ) with VIAtool following this reference article Mask rcnn usage. Here, I have edited the Ballon.py and the code is as follows:
import os
import sys
import json
import datetime
import numpy as np
import skimage.draw
# Root directory of the project
ROOT_DIR = os.path.abspath("../../")
# Import Mask RCNN
sys.path.append(ROOT_DIR) # To find local version of the library
from mrcnn.config import Config
from mrcnn import model as modellib, utils
# Path to trained weights file
COCO_WEIGHTS_PATH = os.path.join(ROOT_DIR, "mask_rcnn_coco.h5")
if COCO_WEIGHTS_PATH is None:
print('weights not available')
else:
print('weights available')
DEFAULT_LOGS_DIR = os.path.join(ROOT_DIR, "logs")
# Configurations
class NeuralCodeConfig(Config):
NAME = "screens"
# We use a GPU with 12GB memory, which can fit two images.
# Adjust down if you use a smaller GPU.
IMAGES_PER_GPU = 1
# Number of classes (including background)
NUM_CLASSES = 1 + 10 # Background + other region classes
# Number of training steps per epoch
STEPS_PER_EPOCH = 30
# Skip detections with < 90% confidence
DETECTION_MIN_CONFIDENCE = 0.9
# Dataset
class NeuralCodeDataset(utils.Dataset):
def load_screen(self, dataset_dir, subset):
"""Load a subset of the screens dataset.
dataset_dir: Root directory of the dataset.
subset: Subset to load: train or val
"""
# Add classes.
self.add_class("screens",1,"logo")
self.add_class("screens",2,"slider")
self.add_class("screens",3,"navigation")
self.add_class("screens",4,"forms")
self.add_class("screens",5,"social_media_icons")
self.add_class("screens",6,"video")
self.add_class("screens",7,"map")
self.add_class("screens",8,"pagination")
self.add_class("screens",9,"pricing_table_block")
self.add_class("screens",10,"gallery")
# Train or validation dataset?
assert subset in ["train", "val"]
dataset_dir = os.path.join(dataset_dir, subset)
# Load annotations
# VGG Image Annotator saves each image in the form:
# { 'filename': '28503151_5b5b7ec140_b.jpg',
# 'regions': {
# '0': {
# 'region_attributes': {},
# 'shape_attributes': {
# 'all_points_x': [...],
# 'all_points_y': [...],
# 'name': 'polygon'}},
# ... more regions ...
# },
# 'size': 100202
# }
# We mostly care about the x and y coordinates of each region
annotations = json.load(open(os.path.join(dataset_dir, "via_region_data.json")))
if annotations is None:
print ("region data json not loaded")
else:
print("region data json loaded")
# print(annotations)
annotations = list(annotations.values()) # don't need the dict keys
# The VIA tool saves images in the JSON even if they don't have any
# annotations. Skip unannotated images.
annotations = [a for a in annotations if a['regions']]
# Add images
for a in annotations:
# Get the x, y coordinaets of points of the polygons that make up
# the outline of each object instance. There are stores in the
# shape_attributes and region_attributes (see json format above)
polygons = [r['shape_attributes'] for r in a['regions']]
screens = [r['region_attributes']for r in a['regions']]
#getting the filename by spliting
class_name = screens[0]['html']
file_name = a['filename'].split("/")
file_name = file_name[len(file_name)-1]
#getting class_ids with file_name
class_ids = class_name+"_"+file_name
# #getting width an height of the images
# height = [h['height'] for h in polygons]
# width = [w['width'] for w in polygons]
# print(height,'height')
# print('polygons',polygons)
# load_mask() needs the image size to convert polygons to masks.
# Unfortunately, VIA doesn't include it in JSON, so we must readpath
# the image. This is only managable since the dataset is tiny.
image_path = os.path.join(dataset_dir,file_name)
image = skimage.io.imread(image_path)
#resizing images
# image = utils.resize_image(image, min_dim=800, max_dim=1000, min_scale=None, mode="square")
# print('image',image)
height,width = image.shape[:2]
# print('height',height)
# print('width',width)
# height = 800
# width = 800
self.add_image(
"screens",
image_id=file_name, # use file name as a unique image id
path=image_path,
width=width, height=height,
polygons=polygons,
class_ids=class_ids)
def load_mask(self, image_id):
"""Generate instance masks for an image.
Returns:
masks: A bool array of shape [height, width, instance count] with
one mask per instance.
class_ids: a 1D array of class IDs of the instance masks.
"""
# If not a screens dataset image, delegate to parent class.
image_info = self.image_info[image_id]
if image_info["source"] != "screens":
return super(self.__class__, self).load_mask(image_id)
# Convert polygons to a bitmap mask of shape
# [height, width, instance_count]
info = self.image_info[image_id]
mask = np.zeros([info["height"], info["width"], len(info["polygons"])],
dtype=np.uint8)
for i, p in enumerate(info["polygons"]):
# Get indexes of pixels inside the polygon and set them to 1
rr, cc = skimage.draw.polygon(p['y'], p['x'])
mask[rr, cc, i] = 1
# Return mask, and array of class IDs of each instance. Since we have
# one class ID only, we return an array of 1s
# return mask.astype(np.bool), np.ones([mask.shape[-1]], dtype=np.int32)
# class_ids = np.array(class_ids,dtype=np.int32)
return mask,class_ids
def image_reference(self, image_id):
"""Return the path of the image."""
info = self.image_info[image_id]
if info["source"] == "screens":
return info["path"]
else:
super(self.__class__, self).image_reference(image_id)
def train(model):
# Train the model.
# Training dataset.
dataset_train = NeuralCodeDataset()
dataset_train.load_screen(args.dataset, "train")
dataset_train.prepare()
# Validation dataset
dataset_val = NeuralCodeDataset()
dataset_val.load_screen(args.dataset, "val")
dataset_val.prepare()
# *** This training schedule is an example. Update to your needs ***
# Since we're using a very small dataset, and starting from
# COCO trained weights, we don't need to train too long. Also,
# no need to train all layers, just the heads should do it.
print("Training network heads")
model.train(dataset_train, dataset_val,
learning_rate=config.LEARNING_RATE,
epochs=30,
layers='heads')
# Training
if __name__ == '__main__':
import argparse
# Parse command line arguments
parser = argparse.ArgumentParser(
description='Train Mask R-CNN to detect screens.')
parser.add_argument("command",
metavar="<command>",
help="'train' or 'splash'")
parser.add_argument('--dataset', required='True',
metavar="../../datasets/screens",
help='Directory of the screens dataset')
parser.add_argument('--weights', required=True,
metavar="/weights.h5",
help="Path to weights .h5 file or 'coco'")
parser.add_argument('--logs', required=False,
default=DEFAULT_LOGS_DIR,
metavar="../../logs/",
help='Logs and checkpoints directory (default=logs/)')
parser.add_argument('--image', required=False,
metavar="path or URL to image",
help='Image to apply the color splash effect on')
parser.add_argument('--video', required=False,
metavar="path or URL to video",
help='Video to apply the color splash effect on')
args = parser.parse_args()
# Validate arguments
if args.command == "train":
assert args.dataset, "Argument --dataset is required for training"
elif args.command == "splash":
assert args.image or args.video,\
"Provide --image or --video to apply color splash"
print("Weights: ", args.weights)
print("Dataset: ", args.dataset)
print("Logs: ", args.logs)
# Configurations
if args.command == "train":
config = NeuralCodeConfig()
else:
class InferenceConfig(NeuralCodeConfig):
# Set batch size to 1 since we'll be running inference on
# one image at a time. Batch size = GPU_COUNT * IMAGES_PER_GPU
GPU_COUNT = 1
IMAGES_PER_GPU = 1
config = InferenceConfig()
config.display()
# Create model
if args.command == "train":
model = modellib.MaskRCNN(mode="training", config=config,
model_dir=args.logs)
else:
model = modellib.MaskRCNN(mode="inference", config=config,
model_dir=args.logs)
# Select weights file to load
if args.weights.lower() == "coco":
weights_path = COCO_WEIGHTS_PATH
# Download weights file
if not os.path.exists(weights_path):
utils.download_trained_weights(weights_path)
elif args.weights.lower() == "last":
# Find last trained weights
weights_path = model.find_last()
elif args.weights.lower() == "imagenet":
# Start from ImageNet trained weights
weights_path = model.get_imagenet_weights()
else:
weights_path = args.weights
# Load weights
print("Loading weights ", weights_path)
if args.weights.lower() == "coco":
# Exclude the last layers because they require a matching
# number of classes
model.load_weights(weights_path, by_name=True, exclude=[
"mrcnn_class_logits", "mrcnn_bbox_fc",
"mrcnn_bbox", "mrcnn_mask"])
else:
model.load_weights(weights_path, by_name=True)
# Train or evaluate
if args.command == "train":
train(model)
# elif args.command == "splash":
# detect_and_color_splash(model, image_path=args.image,
# video_path=args.video)
else:
print("'{}' is not recognized. "
"Use 'train' or 'splash'".format(args.command))
And I am getting the following error when training the data set with pretrained COCO dataset:
UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
2018-08-09 13:52:27.993239: W tensorflow/core/framework/allocator.cc:108] Allocation of 51380224 exceeds 10% of system memory.
2018-08-09 13:52:28.037704: W tensorflow/core/framework/allocator.cc:108] Allocation of 51380224 exceeds 10% of system memory.
/home/scit/anaconda3/lib/python3.6/site-packages/keras/engine/training.py:2022: UserWarning: Using a generator with use_multiprocessing=True` and multiple workers may duplicate your data. Please consider using the`keras.utils.Sequence class.
UserWarning('Using a generator with `use_multiprocessing=True`'
ERROR:root:Error processing image {'id': '487.jpg', 'source': 'screens', 'path': '../../datasets/screens/train/487.jpg', 'width': 1920, 'height': 7007, 'polygons': [{'name': 'rect', 'x': 384, 'y': 5, 'width': 116, 'height': 64}, {'name': 'rect', 'x': 989, 'y': 17, 'width': 516, 'height': 42}, {'name': 'rect', 'x': 984, 'y': 5933, 'width': 565, 'height': 273}, {'name': 'rect', 'x': 837, 'y': 6793, 'width': 238, 'height': 50}], 'class_ids': 'logo_487.jpg'}
Traceback (most recent call last):
File "/home/scit/Desktop/My_work/object_detection/mask_rcnn/mrcnn/model.py", line 1717, in data_generator
use_mini_mask=config.USE_MINI_MASK)
File "/home/scit/Desktop/My_work/object_detection/mask_rcnn/mrcnn/model.py", line 1219, in load_image_gt
mask, class_ids = dataset.load_mask(image_id)
File "neural_code.py", line 235, in load_mask
rr, cc = skimage.draw.polygon(p['y'], p['x'])
File "/home/scit/anaconda3/lib/python3.6/site-packages/skimage/draw/draw.py", line 441, in polygon
return _polygon(r, c, shape)
File "skimage/draw/_draw.pyx", line 217, in skimage.draw._draw._polygon (skimage/draw/_draw.c:4402)
OverflowError: Python int too large to convert to C ssize_t
My laptop graphics specs are follows:
Nvidia GeForce 830M (2 GB) with 250 CUDA cores
CPU specs:
Intel Core i5 (4th gen), 8 GB RAM
What may be the case here? Is it the resolution of the images or the incapability of my GPU. Shall I proceed with CPU?
I am sharing my observations with Mask RCNN while training my custom dataset.
My dataset comprises of images of various dimension (i.e. smallest image has approx 1700 x 1600 pixels and the largest image has approx 8500 x 4600 pixels).
I am training on nVIDIA RTX 2080Ti, 32 GB DDR4 RAM and while training I get the below mentioned warnings; but the training process completes.
UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
2019-05-23 15:25:23.433774: W T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.14GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
Few months back, I tried the Matterport Splash of Color Example on my Laptop which has 12 GB RAM and nVIDIA 920M (2GB GPU); and have encountered similar Memory Errors.
So, we can suspect that size of the GPU Memory is a contributing factor in this error.
Additionally, batch size is another contributing factor; but I see that you have set the IMAGE_PER_GPU=1. If you search for the BATCH_SIZE in the config.py file present in the mrcnn folder, you will find –
self.BATCH_SIZE = self.IMAGES_PER_GPU * self.GPU_COUNT
So, in your case the batch_size is 1.
In conclusion, I would suggest to please try the same code on a more powerful GPU.

Data loading with variable batch size?

I am currently working on patch based super-resolution. Most of the papers divide an image into smaller patches and then use the patches as input to the models.I was able to create patches using custom dataloader. The code is given below:
import torch.utils.data as data
from torchvision.transforms import CenterCrop, ToTensor, Compose, ToPILImage, Resize, RandomHorizontalFlip, RandomVerticalFlip
from os import listdir
from os.path import join
from PIL import Image
import random
import os
import numpy as np
import torch
def is_image_file(filename):
return any(filename.endswith(extension) for extension in [".png", ".jpg", ".jpeg", ".bmp"])
class TrainDatasetFromFolder(data.Dataset):
def __init__(self, dataset_dir, patch_size, is_gray, stride):
super(TrainDatasetFromFolder, self).__init__()
self.imageHrfilenames = []
self.imageHrfilenames.extend(join(dataset_dir, x)
for x in sorted(listdir(dataset_dir)) if is_image_file(x))
self.is_gray = is_gray
self.patchSize = patch_size
self.stride = stride
def _load_file(self, index):
filename = self.imageHrfilenames[index]
hr = Image.open(self.imageHrfilenames[index])
downsizes = (1, 0.7, 0.45)
downsize = 2
w_ = int(hr.width * downsizes[downsize])
h_ = int(hr.height * downsizes[downsize])
aug = Compose([Resize([h_, w_], interpolation=Image.BICUBIC),
RandomHorizontalFlip(),
RandomVerticalFlip()])
hr = aug(hr)
rv = random.randint(0, 4)
hr = hr.rotate(90*rv, expand=1)
filename = os.path.splitext(os.path.split(filename)[-1])[0]
return hr, filename
def _patching(self, img):
img = ToTensor()(img)
LR_ = Compose([ToPILImage(), Resize(self.patchSize//2, interpolation=Image.BICUBIC), ToTensor()])
HR_p, LR_p = [], []
for i in range(0, img.shape[1] - self.patchSize, self.stride):
for j in range(0, img.shape[2] - self.patchSize, self.stride):
temp = img[:, i:i + self.patchSize, j:j + self.patchSize]
HR_p += [temp]
LR_p += [LR_(temp)]
return torch.stack(LR_p),torch.stack(HR_p)
def __getitem__(self, index):
HR_, filename = self._load_file(index)
LR_p, HR_p = self._patching(HR_)
return LR_p, HR_p
def __len__(self):
return len(self.imageHrfilenames)
Suppose the batch size is 1, it takes an image and gives an output of size [x,3,patchsize,patchsize]. When batch size is 2, I will have two different outputs of size [x,3,patchsize,patchsize] (for example image 1 may give[50,3,patchsize,patchsize], image 2 may give[75,3,patchsize,patchsize] ). To handle this a custom collate function was required that stacks these two outputs along dimension 0. The collate function is given below:
def my_collate(batch):
data = torch.cat([item[0] for item in batch],dim = 0)
target = torch.cat([item[1] for item in batch],dim = 0)
return [data, target]
This collate function concatenates along x (From the above example, I finally get [125,3,patchsize,pathsize]. For training purposes, I need to train the model using a minibatch size of say 25. Is there any method or any functions which I can use to directly get an output of size [25 , 3, patchsize, pathsize] directly from the dataloader using the necessary number of images as input to the Dataloader?
The following code snippet works for your purpose.
First, we define a ToyDataset which takes in a list of tensors (tensors) of variable length in dimension 0. This is similar to the samples returned by your dataset.
import torch
from torch.utils.data import Dataset
from torch.utils.data.sampler import RandomSampler
class ToyDataset(Dataset):
def __init__(self, tensors):
self.tensors = tensors
def __getitem__(self, index):
return self.tensors[index]
def __len__(self):
return len(tensors)
Secondly, we define a custom data loader. The usual Pytorch dichotomy to create datasets and data loaders is roughly the following: There is an indexed dataset, to which you can pass an index and it returns the associated sample from the dataset. There is a sampler which yields an index, there are different strategies to draw indices which give rise to different samplers. The sampler is used by a batch_sampler to draw multiple indices at once (as many as specified by batch_size). There is a dataloader which combines sampler and dataset to let you iterate over a dataset, importantly the data loader also owns a function (collate_fn) which specifies how the multiple samples retrieved from the dataset using the indices from the batch_sampler should be combined. For your use case, the usual PyTorch dichotomy does not work well, because instead of drawing a fixed number of indices, we need to draw indices until the objects associated with the indices exceed the cumulative size we desire. This means we need immediate inspection of the objects and use this knowledge to decide whether to return a batch or keep drawing indices. This is what the custom data loader below does:
class CustomLoader(object):
def __init__(self, dataset, my_bsz, drop_last=True):
self.ds = dataset
self.my_bsz = my_bsz
self.drop_last = drop_last
self.sampler = RandomSampler(dataset)
def __iter__(self):
batch = torch.Tensor()
for idx in self.sampler:
batch = torch.cat([batch, self.ds[idx]])
while batch.size(0) >= self.my_bsz:
if batch.size(0) == self.my_bsz:
yield batch
batch = torch.Tensor()
else:
return_batch, batch = batch.split([self.my_bsz,batch.size(0)-self.my_bsz])
yield return_batch
if batch.size(0) > 0 and not self.drop_last:
yield batch
Here we iterate over the dataset, after drawing an index and loading the associated object, we concatenate it to the tensors we drew before (batch). We keep doing this until we reach the desired size, such that we can cut out and yield a batch. We retain the rows in batch, which we did not yield. Because it may be the case that a single instance exceeds the desired batch_size, we use a while loop.
You could modify this minimal CustomDataloader to add more features in the style of PyTorch's dataloader. There is also no need to use a RandomSampler to draw in indices, others would work equally well. It would also be possible to avoid repeated concats, in case your data is large by using for example a list and keeping track of the cumulative length of its tensors.
Here is an example, that demonstrates it works:
patch_size = 5
channels = 3
dim0sizes = torch.LongTensor(100).random_(1, 100)
data = torch.randn(size=(dim0sizes.sum(), channels, patch_size, patch_size))
tensors = torch.split(data, list(dim0sizes))
ds = ToyDataset(tensors)
dl = CustomLoader(ds, my_bsz=250, drop_last=False)
for i in dl:
print(i.size(0))
(Related, but not exactly in topic)
For batch size adaptation you can use the code as exemplified in this repo. It is implemented for a different purpose (maximize GPU memory usage), but it is not too hard to translate to your problem.
The code does batch adaptation and batch spoofing.
To improve the previous answer, I found a repo that uses DataManger to achieve different patch sizes and batch sizes. It is basically initiating different dataloaders with different settings and a set_epoch function is used to set the appropriate dataloader for a given epoch.

Resources