Running through a dataloader in Pytorch using Google Colab - python-3.x

I am trying to use Pytorch to run classification on a dataset of images of cats and dogs. In my code I am so far downloading the data and going into the folder train which has two folders in it called "cats" and "dogs." I am then trying to load this data into a dataloader and iterate through batches, but it is giving me some error I don't understand in the iteration step.
Since it is Google Colabs I have code in there for downloading data and installing libraries. Any other advice on my code so far would be appreciated as well.
!pip install torch
!pip install torchvision
from __future__ import print_function, division
import os
import torch
import pandas as pd
import numpy as np
# For showing and formatting images
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
# For importing datasets into pytorch
import torchvision.datasets as dataset
# Used for dataloaders
import torch.utils.data as data
# For pretrained resnet34 model
import torchvision.models as models
# For optimisation function
import torch.nn as nn
import torch.optim as optim
!wget http://files.fast.ai/data/dogscats.zip
!unzip dogscats.zip
batch_size = 256
train_raw = dataset.ImageFolder(PATH+"train", transform=transforms.ToTensor())
train_loader = data.DataLoader(train_raw, batch_size=batch_size, shuffle=True)
for batch_idx, (data, target) in enumerate(train_loader):
print("Data: ", batch_idx)
The error comes up on the last lines and is below:
RuntimeErrorTraceback (most recent call last)
<ipython-input-66-c32dd0c1b880> in <module>()
----> 1 for batch_idx, (data, target) in enumerate(train_loader):
2 print("Data: ", batch_idx)
3
/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.pyc in __next__(self)
257 if self.num_workers == 0: # same-process loading
258 indices = next(self.sample_iter) # may raise StopIteration
--> 259 batch = self.collate_fn([self.dataset[i] for i in indices])
260 if self.pin_memory:
261 batch = pin_memory_batch(batch)
/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.pyc in default_collate(batch)
133 elif isinstance(batch[0], collections.Sequence):
134 transposed = zip(*batch)
--> 135 return [default_collate(samples) for samples in transposed]
136
137 raise TypeError((error_msg.format(type(batch[0]))))
/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.pyc in default_collate(batch)
110 storage = batch[0].storage()._new_shared(numel)
111 out = batch[0].new(storage)
--> 112 return torch.stack(batch, 0, out=out)
113 elif elem_type.__module__ == 'numpy' and elem_type.__name__ != 'str_' \
114 and elem_type.__name__ != 'string_':
/usr/local/lib/python2.7/dist-packages/torch/functional.pyc in stack(sequence, dim, out)
62 inputs = [t.unsqueeze(dim) for t in sequence]
63 if out is None:
---> 64 return torch.cat(inputs, dim)
65 else:
66 return torch.cat(inputs, dim, out=out)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 400 and 487 in dimension 2 at /pytorch/torch/lib/TH/generic/THTensorMath.c:2897
Thanks

I think the main problem was images being of different size . I may have understood ImageFolder in other way but, i think you don't need labels for images if the directory structure is as specified in pytorch and pytorch will figure out the labels for you.
I would also add more things to your transform that automatically resizes every images from the folder such as:
normalize = transforms.Normalize(
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]
)
transform = transforms.Compose(
[transforms.ToTensor(),transforms.Resize((224,224)),
normalize])
Also you can use other tricks to make your DataLoader much faster such as adding batch_size and number of cpu workers such as:
testloader = DataLoader(testset, batch_size=16,
shuffle=False, num_workers=4)
I think this will make you pipeline much faster.

I see two problems in your code first you are importing import torch.utils.data as data and again replacing that in the data loader. Please keep the imported module and your variable name in separate namespace. I think this error could be because of different sizes of data returned by dataloder(images) and labels. As you can see there is an error in concatenation because the first dimension ie. the label size and number of images in folder do not match. Hope this helps.

I think I was wrong in my comment to Manoj Acharya, the problem was in the batch_size being put into the dataloader. I read the below source and it seems you can't batch images together with different sizes:
https://medium.com/#yvanscher/pytorch-tip-yielding-image-sizes-6a776eb4115b
So in my code after changing the data variable Manoj points out I changed the batch_size to 1 and the program stopped failing. I want to put it in batches though so I added a further transform CenterCrop() to resize all images to the same size. Below is my new code:
!pip install torch
!pip install torchvision
from __future__ import print_function, division
import os
import torch
import pandas as pd
import numpy as np
# For showing and formatting images
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
# For importing datasets into pytorch
import torchvision.datasets as dataset
# Used for dataloaders
from torch.utils.data import DataLoader
# For pretrained resnet34 model
import torchvision.models as models
# For optimisation function
import torch.nn as nn
import torch.optim as optim
# For turning data into tensors
import torchvision.transforms as transforms
!wget http://files.fast.ai/data/dogscats.zip
!unzip dogscats.zip
batch_size = 256
sz = 224
train_raw = dataset.ImageFolder(PATH+"train", transform=transforms.Compose([transforms.CenterCrop(sz),transforms.ToTensor()]))
train_loader = DataLoader(train_raw,batch_size=batch_size, shuffle=True)
for batch_idx, (data, target) in enumerate(train_loader):
print("Data: ", batch_idx)
Thanks

Related

Model overfits after first epoch

I'm trying to use hugging face's BERT-base-uncased model to train on emoji prediction on tweets, and it seems that after the first epoch, the model immediately starts to overfit. I have tried the following:
Increasing the training data (I increased this from 1x to 10x with no effect)
Changing the learning rate (no differences there)
Using different models from hugging face (the results were the same again)
Changing the batch size (went from 32, 72, 128, 256, 512, 1024)
Creating a model from scratch, but I ran into issues and decided to post here first to see if I was missing anything obvious.
At this point, I'm concerned that the individual tweets don't give enough information for the model to make a good guess, but wouldn't it be random in that case, rather than overfitting?
Also, training time seems to be ~4.5 hours on Colab's free GPUs, is there any way to speed that up? I tried their TPU, but it doesn't seem to be recognized.
This is what the data looks like
And this is my code below:
import pandas as pd
import json
import re
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from sklearn.model_selection import train_test_split
import torch
from transformers import TrainingArguments, Trainer
from transformers import EarlyStoppingCallback
from sklearn.metrics import accuracy_score,precision_score, recall_score, f1_score
import numpy as np
# opening up the data and removing all symbols
df = pd.read_json('/content/drive/MyDrive/computed_results.json.bz2')
df['text_no_emoji'] = df['text_no_emoji'].apply(lambda text: re.sub(r'[^\w\s]', '', text))
# loading the tokenizer and the model from huggingface
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=5).to('cuda')
# test train split
train, test = train_test_split(df[['text_no_emoji', 'emoji_codes']].sample(frac=1), test_size=0.2)
# defining a dataset class that generates the encoder and labels on the fly to minimize memory usage
class Dataset(torch.utils.data.Dataset):
def __init__(self, input, labels=None):
self.input = input
self.labels = labels
def __getitem__(self, pos):
encoded = tokenizer(self.input[pos], truncation=True, max_length=15, padding='max_length')
label = self.labels[pos]
ret = {key: torch.tensor(val) for key, val in encoded.items()}
ret['labels'] = torch.tensor(label)
return ret
def __len__(self):
return len(self.labels)
# training and validation datasets are defined here
train_dataset = Dataset(train['text_no_emoji'].tolist(), train['emoji_codes'].tolist())
val_dataset = Dataset(train['text_no_emoji'].tolist(), test['emoji_codes'].tolist())
# defining the training arguments
args = TrainingArguments(
output_dir="output",
evaluation_strategy="epoch",
logging_steps = 10,
per_device_train_batch_size=1024,
per_device_eval_batch_size=1024,
num_train_epochs=5,
save_steps=3000,
seed=0,
load_best_model_at_end=True,
weight_decay=0.2,
)
# defining the model trainer
trainer = Trainer(
model=model,
args=args,
train_dataset=train_dataset,
eval_dataset=val_dataset
)
# Training the model
trainer.train()
Results: After this, the training generally stops pretty fast due to the early stopper
The dataset can be found here (39 Mb compressed)

'MINST' downloading in Colab from pytorch's torchvision.datasets

The following code run in Colab and I got the following error:
NameError: name 'MINST' is not defined
What do I need to do?
import torch
import torchvision
from torchvision.datasets import MNIST
dataset = MINST(root='data/', download=True)
len(dataset)
test_dataset = MINST(root='data/', train=False)
len(test_dataset)
dataset[0]
It is what it say it is a NameError
You imported the MNIST dataset and try to access MINST which is not a valid name.
Your code should be:
import torch
import torchvision
from torchvision.datasets import MNIST
dataset = MNIST(root='data/', download=True)
len(dataset)
test_dataset = MINST(root='data/', train=False)
len(test_dataset)
dataset[0]

Loading images in PyTorch

I am new to PyTorch and working on a GAN model. I want to load my image dataset. The way its done using Keras is:
from keras.preprocessing.image import img_to_array
from keras.preprocessing.image import load_img
def load_images(path, size=(128,128)):
data_list = list()
# enumerate filenames in directory, assume all are images
for filename in listdir(path):
# load and resize the image
pixels = load_img(path + filename, target_size=size)
# convert to numpy array
pixels = img_to_array(pixels)
# store.
data_list.append(pixels)
return asarray(data_list)
# dataset path
path = 'mypath/'
# load dataset A
dataA = load_images(path + 'A/')
dataAB = load_images(path + 'B/')
I want to know how to do the same in PyTorch.
Any help is appreciated. Thanks
import torchvision, torch
from torchvision import datasets, models, transforms
def load_training(root_path, dir, batch_size, kwargs):
transform = transforms.Compose(
[transforms.Resize([256, 256]),
transforms.RandomCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor()])
data = datasets.ImageFolder(root=root_path + dir, transform=transform)
train_loader = torch.utils.data.DataLoader(data, batch_size=batch_size, shuffle=True, drop_last=True, **kwargs)
return train_loader
I hope it'll work ...

TypeError: unsupported operand type(s) for -: 'tensorflow.python.framework.ops.EagerTensor' and 'tensorflow.python.framework.ops.EagerTensor'

I am following the tutorial from https://www.pyimagesearch.com/2018/09/10/keras-tutorial-how-to-get-started-with-keras-deep-learning-and-python/
I am using Tensorflow2.0 on python 3.7
When the code reaches a point where input layer is defined, using
model.add(Conv2D(32, (3, 3), padding="same",input_shape=inputShape))
I get the following error:
TypeError: unsupported operand type(s) for -: 'tensorflow.python.framework.ops.EagerTensor' and 'tensorflow.python.framework.ops.EagerTensor' With the following traceback
File "<ipython-input-103-82ea2474a164>", line 1, in <module>
model.add(Conv2D(32, (3, 3), padding="same",input_shape=inputShape))
File "C:\ProgramData\Anaconda3\lib\site-packages\keras\engine\sequential.py", line 166, in add
layer(x)
File "C:\ProgramData\Anaconda3\lib\site-packages\keras\backend\tensorflow_backend.py", line 75, in symbolic_fn_wrapper
return func(*args, **kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\keras\engine\base_layer.py", line 463, in __call__
self.build(unpack_singleton(input_shapes))
File "C:\ProgramData\Anaconda3\lib\site-packages\keras\layers\convolutional.py", line 141, in build
constraint=self.kernel_constraint)
File "C:\ProgramData\Anaconda3\lib\site-packages\keras\engine\base_layer.py", line 279, in add_weight
weight = K.variable(initializer(shape, dtype=dtype),
File "C:\ProgramData\Anaconda3\lib\site-packages\keras\initializers.py", line 227, in __call__
dtype=dtype, seed=self.seed)
File "C:\ProgramData\Anaconda3\lib\site-packages\keras\backend\tensorflow_backend.py", line 4357, in random_uniform
shape, minval=minval, maxval=maxval, dtype=dtype, seed=seed)
File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow_core\python\keras\backend.py", line 5598, in random_uniform
shape, minval=minval, maxval=maxval, dtype=dtype, seed=seed)
File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow_core\python\ops\random_ops.py", line 246, in random_uniform
result = math_ops.add(rnd * (maxval - minval), minval, name=name)
TypeError: unsupported operand type(s) for -: 'tensorflow.python.framework.ops.EagerTensor' and 'tensorflow.python.framework.ops.EagerTensor'
I have checked that the input dimensions being passed are of 'int' datatype.
Not sure how to fix the error
Including the calling code. It reads images and resizes them into 64*64
# USAGE
# python train_vgg.py --dataset animals --model output/smallvggnet.model --
label-bin output/smallvggnet_lb.pickle --plot output/smallvggnet_plot.png
# set the matplotlib backend so figures can be saved in the background
import matplotlib
matplotlib.use("Agg")
# import the necessary packages
from pyimagesearch.smallvggnet import SmallVGGNet
from sklearn.preprocessing import LabelBinarizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from keras.preprocessing.image import ImageDataGenerator
from keras.optimizers import SGD
from imutils import paths
import matplotlib.pyplot as plt
import numpy as np
import argparse
import random
import pickle
import cv2 #pip install opencv-python
import os
args = {'dataset': 'E:/keras-tutorial/animals', 'model':'E:/keras-tutorial/output', 'label_bin':'E:/keras-tutorial/output' , 'plot':'E:/keras-tutorial/output'}
# initialize the data and labels
print("[INFO] loading images...")
data = []
labels = []
# grab the image paths and randomly shuffle them
imagePaths = sorted(list(paths.list_images(args["dataset"])))
random.seed(42)
random.shuffle(imagePaths)
# loop over the input images
for imagePath in imagePaths:
# load the image, resize it to 64x64 pixels (the required input
# spatial dimensions of SmallVGGNet), and store the image in the
# data list
image = cv2.imread(imagePath)
image = cv2.resize(image, (64, 64))
data.append(image)
# extract the class label from the image path and update the
# labels list
label = imagePath.split(os.path.sep)[-2]
labels.append(label)
# scale the raw pixel intensities to the range [0, 1]
data = np.array(data, dtype="float") / 255.0
labels = np.array(labels)
# partition the data into training and testing splits using 75% of
# the data for training and the remaining 25% for testing
(trainX, testX, trainY, testY) = train_test_split(data,labels, test_size=0.25, random_state=42)
# convert the labels from integers to vectors (for 2-class, binary classification you should use Keras' to_categorical function
# as the scikit-learn's LabelBinarizer will not return a vector)
lb = LabelBinarizer()
trainY = lb.fit_transform(trainY)
testY = lb.transform(testY)
# construct the image generator for data augmentation
aug = ImageDataGenerator(rotation_range=30, width_shift_range=0.1,
height_shift_range=0.1, shear_range=0.2, zoom_range=0.2,
horizontal_flip=True, fill_mode="nearest")
# initialize our VGG-like Convolutional Neural Network
model = SmallVGGNet.build(width=64, height=64, depth=3, classes=len(lb.classes_))
Code of class SmallVGGNet:
# import the necessary packages
from keras.models import Sequential
from keras.layers.normalization import BatchNormalization
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from keras.layers.core import Activation
from keras.layers.core import Flatten
from keras.layers.core import Dropout
from keras.layers.core import Dense
from keras import backend as K
class SmallVGGNet:
#staticmethod
def build(width, height, depth, classes):
# initialize the model along with the input shape to be
# "channels last" and the channels dimension itself
model = Sequential()
inputShape = (height, width, depth)
chanDim = -1
# if we are using "channels first", update the input shape
# and channels dimension
if K.image_data_format() == "channels_first":
inputShape = (depth, height, width)
chanDim = 1
# CONV => RELU => POOL layer set
model.add(Conv2D(32, (3, 3), padding="same",input_shape=(inputShape))
The last line throws the error
Finally fixed the issue by downgrading the version of cuda from 10.1 to cuda 10.0
I am now using the following versions:
NVIDIA GPU Driver 426.26,
CUDA toolkit: 10.0,
CuDNN version 7.6.5 and
Tensorflow 2.0.0
and appropriately changed all import statements from keras.layers... to tensorflow.keras.layers...
Finally checked the availability of GPU using:
import tensorflow as tf
tf.test.is_gpu_available( cuda_only=False, min_cuda_compute_capability=None )

Why do I keep getting an error saying "maximum recursion depth exceeded while calling a Python object" in Keras from Tensorflow 2.0?

I am trying to train a stacked neural network architecture with CNNs, GRUs and a CTC in tensorflow 2.0's edition of Keras. I keep getting an error saying "RecursionError: maximum recursion depth exceeded while calling a Python object".
I have tried importing sys and setting the Recursion Limit to be very high using sys.setrecursionlimit() but the program just stops running.
import sys
import tensorflow as tf
from generator_tf2 import VideoGenerator
from network_model_GRU_tf2 import Decoder
from helpers import labels_to_text
from spell import Spell
from network_model_GRU_tf2 import Network_Model
from keras.callbacks import EarlyStopping, TensorBoard, CSVLogger, ModelCheckpoint
import numpy as np
import datetime
import os
import matplotlib.pyplot as plt
from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession
config = ConfigProto()
config.gpu_options.allow_growth = True
session = InteractiveSession(config=config)
# strategy = tf.distribute.MirroredStrategy()
strategy = tf.distribute.MirroredStrategy(cross_device_ops=tf.distribute.HierarchicalCopyAllReduce())
print('Number of devices: {}'.format(strategy.num_replicas_in_sync))
PREDICT_GREEDY = False #Use of Greedy Search
PREDICT_BEAM_WIDTH = 200 #Set Beam search width
MAX_STRING_LENGTH = 114 #Maximum sentence length
MAX_VIDEO_LENGTH = 114 #Maximum number of video frames
start_epoch = 0
#Directories
PREDICT_DICTIONARY = os.path.join(r"F:\Lip Reading System\vsnet","LessFourSecondsSentences.txt") #Needed for Curriculum learning
INPUT_DIR = os.path.join(r"F:\Lip Reading System\vsnet","models_06082019") #Keras model directory
OUTPUT_DIR = os.path.join(r"F:\Lip Reading System\vsnet","models_08082019") #Keras model directory
#Generator for training
lip_gen_train = VideoGenerator("LessFourSeconds.txt","LessFourSeconds_videoframes_training","LessFourSeconds_subtitles_training", 30, absolute_max_video_len=MAX_VIDEO_LENGTH, absolute_max_string_len=MAX_STRING_LENGTH)
lip_gen_train.build_data_from_frames()
#Generator for testing
lip_gen_test = VideoGenerator("testing_videos.txt","testing_videoframes","testing_subtitles", 10, absolute_max_video_len=MAX_VIDEO_LENGTH, absolute_max_string_len=MAX_STRING_LENGTH)
lip_gen_test.build_data_from_frames()
#Set neural network conditions network
with strategy.scope():
network_model = Network_Model(img_c=1, img_w=100, img_h=50, frames_n=MAX_VIDEO_LENGTH, absolute_max_string_len=MAX_STRING_LENGTH, output_size=38)
network_model.summary()
adam = tf.keras.optimizers.Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, epsilon=1e-08)
# load weight if necessary
if start_epoch > 0:
weight_file = os.path.join(INPUT_DIR, 'weights%02d.h5' % (start_epoch))
network_model.model.load_weights(weight_file)
# the loss calc occurs elsewhere, so use a dummy lambda func for the loss
network_model.model.compile(loss={'ctc': lambda y_true, y_pred: y_pred}, optimizer=adam, metrics=['accuracy'])
#Spelling and Decoder
spell = Spell(path=PREDICT_DICTIONARY)
decoder = Decoder(greedy=PREDICT_GREEDY, beam_width=PREDICT_BEAM_WIDTH,postprocessors=[labels_to_text, spell.sentence])
#Early stop and function to save weights
early_stop = tf.keras.callbacks.EarlyStopping(monitor='loss', min_delta=0.001, patience=4, mode='min', verbose=1)
checkpoint = tf.keras.callbacks.ModelCheckpoint(os.path.join(OUTPUT_DIR, "weights{epoch:02d}.h5"), monitor='val_loss', save_weights_only=True, mode='min', period=10)
#Generator
train_history = network_model.model.fit_generator(generator=lip_gen_train.get_batch(),
steps_per_epoch=lip_gen_train.video_dataset_steps,
epochs=1000,
validation_data=lip_gen_test.get_batch(),
validation_steps=lip_gen_test.video_dataset_steps)
The script works fine when executed in tensorflow 1.10.0 with keras 2.2.4 and does not produce the error that I keep getting below:
Traceback (most recent call last): File "F:\Lip Reading
System\vsnet\training_tf2.py", line 71, in
validation_steps=lip_gen_test.video_dataset_steps) ...... RecursionError: maximum recursion depth exceeded while calling a
Python object
include model.compile() inside the strategy.scope()
#Set neural network conditions network
with strategy.scope():
network_model = Network_Model(img_c=1, img_w=100, img_h=50, frames_n=MAX_VIDEO_LENGTH, absolute_max_string_len=MAX_STRING_LENGTH, output_size=38)
network_model.summary()
adam = tf.keras.optimizers.Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, epsilon=1e-08)
# load weight if necessary
if start_epoch > 0:
weight_file = os.path.join(INPUT_DIR, 'weights%02d.h5' % (start_epoch))
network_model.model.load_weights(weight_file)
# the loss calc occurs elsewhere, so use a dummy lambda func for the loss
network_model.model.compile(loss={'ctc': lambda y_true, y_pred: y_pred}, optimizer=adam, metrics=['accuracy'])

Resources