I don't understand why this code works:
# Hyperparameters for our network
input_size = 784
hidden_sizes = [128, 64]
output_size = 10
# Build a feed-forward network
model = nn.Sequential(nn.Linear(input_size, hidden_sizes[0]),
nn.Linear(hidden_sizes[0], hidden_sizes[1]),
nn.Linear(hidden_sizes[1], output_size),
# Forward pass through the network and display output
images, labels = next(iter(trainloader))
images.resize_(images.shape[0], 1, 784)
ps = model.forward(images[0,:])
The size of an image is (images.shape[0], 1, 784) but our networks has input_size = 784. How does the network handle 1 dimension in an input image? I tried to change images.resize_(images.shape[0], 1, 784) to images = images.view(images.shape[0], -1) but I got an error:
IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)
For reference, data loader is created the next way:
# Define a transform to normalize the data
transform = transforms.Compose([transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,)),
# Download and load the training data
trainset = datasets.MNIST('~/.pytorch/MNIST_data/', download=True, train=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)
PyTorch networks take inputs as [batch_size, input_dimensions]
In your case, images[0,:] is of the shape [1,784] where “1” is tue batch size and that is why your code works.
I have a dataset of 600x600 grayscale images, grouped in batches of 50 images by a dataloader.
My network has a convolution layer with 16 filters, followed by Maxpooling with 6x6 kernels, and then a Dense layer. The output of the conv2D should be out_channels*width*height/maxpool_kernel_W/maxpool_kernel_H = 16*600*600/6/6 = 160000, multiplied by the batch size, 50.
However when I try to do a forward pass I get the following error: RuntimeError: mat1 and mat2 shapes cannot be multiplied (80000x100 and 160000x1000). I verified that the data is formatted correctly as [batch,n_channels,width,height] (so [50,1,600,600] in my case).
Logically the output should be a 50x160000 matrix, but apparently it is formatted as a 80000x100 matrix. It seems like torch is multiplying the matrices along the wrong dimensions. If anyone understands why, please help me understand too.
# get data (using a fake dataset generator)
dataset = FakeData(size=500, image_size= (1, 600, 600), transform=ToTensor())
training_data, test_data = random_split(dataset,[400,100])
train_dataloader = DataLoader(training_data, batch_size=50, shuffle=True)
test_dataloader = DataLoader(test_data, batch_size=50, shuffle=True)
net = nn.Sequential(
nn.Linear(160000, 1000),
optimizer = optim.Adam(net.parameters(), lr=1e-3,)
epochs = 10
for i in range(epochs):
for (x, _) in train_dataloader:
# make sure the data is in the right shape
print(x.shape) # returns torch.Size([50, 1, 600, 600])
# error happens here, at the first forward pass
output = net(x)
criterion = nn.MSELoss()
loss = criterion(output, x)
If you inspect your model's inference layer by layer you would have noticed that the nn.MaxPool2d returns a 4D tensor shaped (50, 16, 100, 100). There are different ways to reduce spatial dimensionality (flattening, average-pooling, max-pooling). For instance, if you want to flatten the spatial dimensions, this will result in a tensor of shape (50, 16*100*100), ie. (50, 160_000) as you expected to have. This being said you are required to use a nn.Flatten layer.
net = nn.Sequential(nn.Conv2d(in_channels=1, out_channels=16, kernel_size=5, padding=2),
nn.Linear(160000, 1000),
Is RNN for image classification available only for gray image?
The following program works for gray image classification.
If RGB images are used, I have this error:
Expected input batch_size (18) to match target batch_size (6)
at this line loss = criterion(outputs, labels).
My data loading for train, valid and test are as follows.
input_size = 300
inputH = 300
inputW = 300
#Data transform (normalization & data augmentation)
stats = ((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
train_resize_tfms = tt.Compose([tt.Resize((inputH, inputW), interpolation=2),
train_tfms = tt.Compose([tt.Resize((inputH, inputW), interpolation=2),
valid_tfms = tt.Compose([tt.Resize((inputH, inputW), interpolation=2),
test_tfms = tt.Compose([tt.Resize((inputH, inputW), interpolation=2),
#Create dataset
train_ds = ImageFolder('./data/train', train_tfms)
valid_ds = ImageFolder('./data/valid', valid_tfms)
test_ds = ImageFolder('./data/test', test_tfms)
from torch.utils.data.dataloader import DataLoader
batch_size = 6
#Training data loader
train_dl = DataLoader(train_ds, batch_size, shuffle = True, num_workers = 8, pin_memory=True)
#Validation data loader
valid_dl = DataLoader(valid_ds, batch_size, shuffle = True, num_workers = 8, pin_memory=True)
#Test data loader
test_dl = DataLoader(test_ds, 1, shuffle = False, num_workers = 1, pin_memory=True)
My model is as follows.
num_steps = 300
hidden_size = 256 #size of hidden layers
num_classes = 5
num_epochs = 20
learning_rate = 0.001
# Fully connected neural network with one hidden layer
num_layers = 2 # 2 RNN layers are stacked
class RNN(nn.Module):
def __init__(self, input_size, hidden_size, num_layers, num_classes):
super(RNN, self).__init__()
self.num_layers = num_layers
self.hidden_size = hidden_size
self.rnn = nn.RNN(input_size, hidden_size, num_layers, batch_first=True, dropout=0.2)#batch must have first dimension
#our inpyt needs to have shape
#x -> (batch_size, seq, input_size)
self.fc = nn.Linear(hidden_size, num_classes)#this fc is after RNN. So needs the last hidden size of RNN
def forward(self, x):
#according to ducumentation of RNN in pytorch
#rnn needs input, h_0 for inputs at RNN (h_0 is initial hidden state)
#the following one is initial hidden layer
h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(device)#first one is number of layers and second one is batch size
#output has two outputs. The first tensor contains the output features of the hidden last layer for all time steps
#the second one is hidden state f
out, _ = self.rnn(x, h0)
#output has batch_size, num_steps, hidden size
#we need to decode hidden state only the last time step
#out (N, 30, 128)
#Since we need only the last time step
#Out (N, 128)
out = out[:, -1, :] #-1 for last time step, take all for N and 128
out = self.fc(out)
return out
stacked_rnn_model = RNN(input_size, hidden_size, num_layers, num_classes).to(device)
# Loss and optimizer
criterion = nn.CrossEntropyLoss()#cross entropy has softmax at output
#optimizer = torch.optim.Adam(stacked_rnn_model.parameters(), lr=learning_rate) #optimizer used gradient optimization using Adam
optimizer = torch.optim.SGD(stacked_rnn_model.parameters(), lr=learning_rate)
# Train the model
n_total_steps = len(train_dl)
for epoch in range(num_epochs):
for i, (images, labels) in enumerate(train_dl):
# origin shape: [6, 3, 300, 300]
# resized: [6, 300, 300]
images = images.reshape(-1, num_steps, input_size).to(device)
print('images shape')
labels = labels.to(device)
# Forward pass
outputs = stacked_rnn_model(images)
print('outputs shape')
loss = criterion(outputs, labels)
# Backward and optimize
Printing images and outputs shapes are
images shape
torch.Size([18, 300, 300])
outputs shape
torch.Size([18, 5])
Where is the mistake?
Tl;dr: You are flattening the first two axes, namely batch and channels.
I am not sure you are taking the right approach but I will write about that layer.
In any case, let's look at the issue you are facing. You have a data loader that produces (6, 3, 300, 300), i.e. batches of 6 three-channel 300x300 images. By the look of it you are looking to reshape each batch element (3, 300, 300) into (step_size=300, -1).
However instead of that you are affecting the first axis - which you shouldn't - with images.reshape(-1, num_steps, input_size). This will have the desired effect when working with a single-channel images since dim=1 wouldn't be the "channel axis". In your case your have 3 channels, therefore, the resulting shape is: (6*3*300*300//300//300, 300, 300) which is (18, 300, 300) since num_steps=300 and input_size=300. As a result you are left with 18 batch elements instead of 6.
Instead what you want is to reshape with (batch_size, num_steps, -1). Leaving the last axis (a.k.a. seq_length) of variable size. This will result in a shape (6, 300, 900).
Here is a corrected and reduced snippet:
batch_size = 6
channels = 3
inputH, inputW = 300, 300
train_ds = TensorDataset(torch.rand(100, 3, inputH, inputW), torch.rand(100, 5))
train_dl = DataLoader(train_ds, batch_size)
class RNN(nn.Module):
def __init__(self, input_size, hidden_size, num_layers, num_classes):
super(RNN, self).__init__()
# (batch_size, seq, input_size)
self.rnn = nn.RNN(input_size, hidden_size, num_layers, batch_first=True)
# (batch_size, hidden_size)
self.fc = nn.Linear(hidden_size, num_classes)
# (batch_size, num_classes)
def forward(self, x):
out, _ = self.rnn(x)
out = out[:, -1, :]
out = self.fc(out)
return out
num_steps = 300
input_size = inputH*inputW*channels//num_steps
hidden_size = 256
num_classes = 5
num_layers = 2
rnn = RNN(input_size, hidden_size, num_layers, num_classes)
for x, y in train_dl:
print(x.shape, y.shape)
images = images.reshape(batch_size, num_steps, -1)
outputs = rnn(images)
As I said in the beginning I am a bit wary about this approach because you are essentially feeding your RNN a RGB 300x300 image in the form of a sequence of 300 flattened vectors... I can't say if that makes sense and terms of training and if the model will be able to learn from that. I could be wrong!
I am following some tutorials on setting up my first conv NN for some image classifications.
The tutorials load all images into memory and pass them into model.fit(). I can't do that because my data set is too large.
I wrote this generator to "drip feed" preprocessed images to model.fit, but I am getting an error and because I am a newbie I am having trouble diagnosing.
These are processed only as greyscale images also.
Here is the generator that I made...
# need to preprocess image in batches because memory
# tdata expects list of tuples<(string) file_path, (int) class_num)>
def image_generator(tdata, batch_size):
start_from = 0;
while True:
# Slice array into batch data
batch = tdata[start_from:start_from+batch_size]
# Keep track of position
start_from += batch_size
# Create batch lists
batch_x = []
batch_y = []
# Read in each input, perform preprocessing and get labels
for img_path, class_num in batch:
# Read raw img data as np array
# Returns as shape (600, 300, 1)
img_arr = create_np_img_array(img_path)
# Normalize img data (/255)
img_arr = normalize_img_array(img_arr)
# Add to the batch x data list
# Add to the batch y classification list
yield (batch_x, batch_y)
Creating an instance of the generator:
img_gen = image_generator(training_data, 30)
Setting up my model like so:
# create the model
model = Sequential()
# input layer has the input_shape param which is the dimentions of the np array
model.add( Conv2D(256, (3, 3), activation='relu', input_shape = (600, 300, 1)) )
model.add( MaxPooling2D( (2,2)) )
# second hidden layer
model.add( MaxPooling2D((2, 2)) )
model.add( Conv2D(256, (3, 3), activation='relu') )
# third hidden layer
model.add( MaxPooling2D((2, 2)))
model.add( Conv2D(256, (3, 3), activation='relu') )
# forth hidden layer
model.add( Flatten() )
model.add( Dense(64, activation='relu') )
# ouput layer
model.add( Dense(2) )
# pass generator
model.fit(img_gen, epochs=5)
Then model.fit() fails from trying to call shape on an int.
~\anaconda3\lib\site-packages\tensorflow\python\keras\engine\data_adapter.py in _get_dynamic_shape(t)
798 def _get_dynamic_shape(t):
--> 799 shape = t.shape
800 # Unknown number of dimensions, `as_list` cannot be called.
801 if shape.rank is None:
AttributeError: 'int' object has no attribute 'shape'
Any suggestions on what I've done wrong??
Converting the outputs from the generator to numpy arrays seems to have stopped the error.
np_x = np.array(batch_x)
np_y = np.array(batch_y)
Seems like it didn't like the classifications as a std list of ints.
I have a TFRecords file consisting of 60 examples of six Landsat band values for some pixels plus a label for each pixel, and I want to train a Keras classifier with it. But I get a dimension mismatch when I try to load network with the data.
TFRecords file is generated with below structure:
# TFRecords file contains below features per each example
bands = ['B2', 'B3', 'B4', 'B5', 'B6', 'B7','landcover']
columns = [tf.FixedLenFeature(shape=[1], dtype=tf.float32) for k in bands]
featuresDict = dict(zip(bands, columns))
And my code for defining generator function and Keras model is as follows:
def tfdata_generator_training(fileName, batchSize=None):
dataset = tf.data.TFRecordDataset(fileName, compression_type='GZIP')
def parse_tfrecord(example):
features = tf.parse_single_example(example, featuresDict)
# Extract landcover and remove it from dictionary
labels = features.pop('landcover')
labels = tf.one_hot(tf.cast(labels, tf.uint8), 3)
# Return list of dictionary values (to be convertable to numpy array for Keras) and pixel label in one-hot format
return list(features.values()), labels
# Map the parsing function over the dataset
dataset = dataset.map(parse_tfrecord)
dataset = dataset.batch(batchSize)
return dataset
training_data = tfdata_generator_training(fileName=<my_file_path>, batchSize=1)
def keras_model():
from tensorflow.keras.layers import Dense, Input
inputs = Input(shape=(6,1))
x = Dense(5, activation='relu')(inputs)
x = Dense(7, activation='relu')(x)
outputs = Dense(3, activation='softmax')(x)
return tf.keras.Model(inputs, outputs)
model = keras_model()
model.compile('adam', 'categorical_crossentropy', metrics=['acc'])
model.fit(training_data.make_one_shot_iterator(), steps_per_epoch=60, epochs=8)
But I get below error when running the code:
ValueError: Error when checking target: expected dense_2 to have shape (6, 3) but got array with shape (1, 3)
What is the problem with my code? I also tried to get the dimensions of the input layer and the Tensorflow printout was as follows:
(<tf.Tensor 'IteratorGetNext:0' shape=(?, 6, 1) dtype=float32>, <tf.Tensor 'IteratorGetNext:1' shape=(?, 1, 3) dtype=float32>)
I'm trying to do an image segmentation problem where I want to segment 5 objects in an image. I'm using a U-net architecture. My final layer looks like this:
conv_final = Conv2D(OUTPUT_MASK_CHANNELS, (1, 1))(up_conv_224)
conv_final = Activation('sigmoid')(conv_final)
model = Model(inputs, conv_final, name="ZF_UNET_224")
However I get an error saying:
ValueError: Error when checking target: expected conv2d_24 to have shape (224, 224, 5) but got array with shape (224, 224, 3)
This is the generator that I'm using
image_generator = train_datagen.flow_from_directory(
'data/train', # this is the target directory
target_size=(224, 224),
color_mode = 'rgb',# all images will be resized to 150x150
seed = 1) # since we use binary_crossentropy loss, we need binary labels
# this is a similar generator, for validation data
mask_generator = mask_datagen.flow_from_directory(
target_size=(224, 224),
color_mode = 'rgb',
seed = 1)
train_generator = zip(image_generator, mask_generator)
What can I do to fix this? Any help appreciated!
You have to convert the data into one hot encoded format.
Use from keras.utils import to_categorical