PyTorch DataLoader adding extra dimension for TorchVision MNIST - pytorch

I am fairly new to PyTorch and have been experimenting with the DataLoader class.
When I attempt to load the MNIST dataset, the DataLoader appears to add an additional dimension after the batch dimension. I am not sure what is causing this to occur.
import torch
from torchvision.datasets import MNIST
from torchvision import transforms
if __name__ == '__main__':
mnist_train = MNIST(root='./data', train=True, download=True, transform=transforms.Compose([transforms.ToTensor()]))
first_x = mnist_train.data[0]
print(first_x.shape) # expect to see [28, 28], actual [28, 28]
train_loader = torch.utils.data.DataLoader(mnist_train, batch_size=200)
batch_x, batch_y = next(iter(train_loader)) # get first batch
print(batch_x.shape) # expect to see [200, 28, 28], actual [200, 1, 28, 28]
# Where is the extra dimension of 1 from?
Can anyone shed some light on the issue?

I guess that is the number of channels of the input image. So basically it is
batch_x.shape = Batch-size, No of channels, Height of the image, Width of the image

Related

ValueError: Input 0 of layer sequential_6 is incompatible with the layer: expected ndim=4, found ndim=3. Full shape received: [32, 28, 28]

I tried the following code, but I encountered the above error. I saw some similar questions but I didn't get a proper solution. Please help me!
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
mnist=tf.keras.datasets.mnist #download the dataset
(xtrain, ytrain),(xtest, ytest)=mnist.load_data() #split the dataset in test and train
xtrain=tf.keras.utils.normalize(xtrain, axis=1)
xtest=tf.keras.utils.normalize(xtest, axis=1)
model=tf.keras.models.Sequential() # start building the model
model.add(tf.keras.layers.Conv2D(64, kernel_size=3, activation='relu', input_shape=(28,28,1)))
model.add(tf.keras.layers.Conv2D(32, kernel_size=3, activation='relu', input_shape=(28,28,1)))
model.add(tf.keras.layers.Flatten()) # converting matrix to vector
model.add(tf.keras.layers.Dense(10,activation=tf.nn.softmax)) # adding a layer with 10 nodes(as only 10 outputs are possible) and softmax activaation function
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) # specifiying hyperparameters
model.fit(xtrain,ytrain,epochs=5,) # load the model
model.save('Ashwatdhama') # save the model with a unique name
myModel=tf.keras.models.load_model('Ashwatdhama') # make an object of the model
prediction=myModel.predict((xtest)) # run the model object
for i in range(10):
print(np.argmax(prediction[i]))
plt.imshow(xtest[i]) # make visuals of mnist dataset
plt.show() #output
your network expect images in black and white (1 channel), so you have to modify your data accordingly to this. this is possible simply adding dimensionality to your images before fitting
xtrain = xtrain[...,None] # (batch_dim, 28, 28, 1)
xtest = xtest[...,None] # (batch_dim, 28, 28, 1)

ValueError: Cannot feed value of shape (64, 80, 60, 3) for Tensor 'input/X:0', which has shape '(?, 80, 60, 1)'

I was trying to recreate a program that Sentdex been doing in "python plays gta V" series but when i come to train the ai it turn me this error: ValueError: Cannot feed value of shape (64, 80, 60, 3) for Tensor 'input/X:0', which has shape '(?, 80, 60, 1)' i was trying to canche sme parameters but it didn't work. Here is my code:
import numpy as np
from alexnet import alexnet
import time
width=80
height=60
lr=1e-3
epochs=30
model_name='minecraft-ai-{}-{}-{}'.format(lr,'ghostbot',epochs)
model=alexnet(width,height,lr)
train_data=np.load('training_data.npy',allow_pickle=True)
train=train_data[:-500]
test=train_data[-500:]
X=np.array([i[0]for i in train]).reshape(-1,width,height,3)
Y=[i[1] for i in train]
test_x = np.array([i[0] for i in test]).reshape(-1,width,height,3)
test_y = [i[1] for i in test]
print(X.shape)
print(test_x.shape)
time.sleep(3)
model.fit({'input': X}, {'targets': Y}, n_epoch=epochs, validation_set=({'input': test_x}, {'targets': test_y}),
snapshot_step=500, show_metric=True, run_id=model_name,)
model.save(model_name)
I checked the source at this path - https://github.com/Sentdex/pygta5/blob/master/2.%20train_model.py#L91. It seems the line #91 has been changed to:
test_x = np.array([i[0] for i in test]).reshape(-1,width,height,3)
so you need to edit the last axis (number of channels) to 3 so that the last dimension (channels) of the test images matches that of the train ones. Make the same changes to debug this. Hope this helps!
This is caused because, number of channels in training images does not match input layer architecture.
If you are using grayscale images, in alexnet model definition and change
Line 728 from
network = input_data(shape=[None, width, height, 3], name='input')
to this
network = input_data(shape=[None, width, height, 1], name='input')
https://github.com/Sentdex/pygta5/blob/master/models.py#L728

How to reshape a whole batch in Keras model?

I'm trying to reshape a whole batch in Keras so that it will transform from (?, 28, 28, 1) to (?/10, 10, 28, 28, 1). Is it possible to implement with Keras?
I have a dataset where labels are per group of images (let's call it 'bag') but not per images themselves. I'd like to load images from folder using Keras ImageDataGenerator and its flow_from_directory() method and then reshape images inside the model itself.
from keras.datasets import mnist
import keras.backend as K
from keras.layers import Lambda
(x_train, y_train), (x_test, y_test) = mnist.load_data()
def test_model(data):
from keras import Model
from keras.models import Input
input_layer = Input(shape=data.shape[1:])
layer = Lambda(lambda x: K.reshape(x,
(-1, 10, *data.shape[1:])),
output_shape=(-1, 10, *data.shape[1:]))(input_layer)
model = Model(inputs=[input_layer], outputs=[layer])
model.compile(optimizer='adadelta', loss='mean_squared_error')
return model.predict(data, batch_size=60)
if __name__ == '__main__':
test_model(x_train)
I expected the model to reshape each batch and return it somehow (to be honest, I don't know how Keras concatenates results of each batch when predicting).
Instead of results, I get error:
ValueError: could not broadcast input array from shape (6,10,28,28) into shape (60,10,28,28)

Transferring pretrained pytorch model to onnx

I am trying to convert pytorch model to ONNX, in order to use it later for TensorRT. I followed the following tutorial https://pytorch.org/tutorials/advanced/super_resolution_with_caffe2.html, but my kernel dies all the time.
This is the code that I implemented.
# Some standard imports
import io
import numpy as np
from torch import nn
import torch.onnx
from deepformer.nets.quicknat import quickNAT
param = {
'num_channels': 64,
'num_filters': 64,
'kernel_h': 5,
'kernel_w': 5,
'kernel_c': 1,
'stride_conv': 1,
'pool': 2,
'stride_pool': 2,
'num_classes': 1,
'padding': 'reflection'
}
net = quickNAT(param)
checkpoint_path = 'checkpoint_epoch36_loss0.78.t7'
checkpoints=torch.load(checkpoint_path)
map_location = lambda storage, loc: storage
if torch.cuda.is_available():
map_location = None
net.load_state_dict(checkpoints['net'])
net.train(False)
# Input to the modelvcdfx
x = torch.rand(1, 64, 256, 1600, requires_grad=True)
# Export the model
torch_out = torch.onnx._export(net, # model being run
x, # model input (or a tuple for multiple inputs)
"quicknat.onnx", # where to save the model (can be a file or file-like object)
export_params=True) # store the trained parameter weights inside the model file
What is the output you get? It seems SuperResolution is supported with the export operators in pytorch as mentioned in the documentation
Are you sure the input to your model is:
x = torch.rand(1, 64, 256, 1600, requires_grad=True)
That could be the variable that you used for training, since for deployment you run the network on one or multiple images the dummy input to export to onnx is usually:
dummy_input = torch.randn(1, 3, 720, 1280, device='cuda')
With 1 being the batch size, 3 being the channels of the image(RGB), and then the size of the image, in this case 720x1280. Check on that input, I guess you don't have a 64 channel image as input right?
Also, it'd be helpful if you post the terminal output to see where it fails.
Good luck!

Keras Convolution Neural Network

Right now I am trying to construct a basic convolutional neural network to do simple classification with mnist dataset using keras. Eventually I want to put my own images in I just want to build a simple network first to make sure I have the structure working. so I downloaded the mnist data as mnint.pkl.gz unpacked it and loaded it into tuples and eventually bumpy arrays. Here is my code:
import numpy as np
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation, Flatten
from keras.layers.convolutional import Convolution2D, MaxPooling2D
from keras.optimizers import SGD
from PIL import Image as IM
import theano
from sklearn.cross_validation import train_test_split
import cPickle
import gzip
f=gzip.open('mnist.pkl.gz')
data1,data2,data3=cPickle.load(f)
f.close()
X=data1[0]
Y=data1[1]
x=X[0:15000,:]
y=Y[0:15000]
X_train,X_test,y_train,y_test=train_test_split(x,y,test_size
=0.33,random_state=99)
model=Sequential()
model.add(Convolution2D(10,5,5,border_mode='valid',
input_shape= (1,28,28)))
model.add(Activation('tanh'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dense(10))
model.add(Activation('softmax'))
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='mean_squared_error', optimizer=sgd)
model.fit(X_train,y_train, batch_size=10, nb_epoch=10)
score=model.evaluate(X_test,y_test,batch_size=10)
print(score)
I get an error as such:
'Wrong number of dimensions: expected 4, got 2 with shape
(10, 784).')
I think this means I need to put it into a theano 4d tensor such that
is has (samples,channels,rows,columns) but I have no idea how to do that. Furthermore when I specifically want to solve the problem I am after I will we loading '.png' files in, I was then going to put them into numpy matrices to feed in but it looks like that is not going to work. Can anybody tell me how I can get Images into theano4d tensors to use in this code? Thanks
You are correct that the code is expecting a tensor4. The conventional structure is (batch, channel, width, height). In this case the images are monochrome so channel=1 It looks like you're using a batch size of 10 and the MNIST images are 28 pixels in width and 28 pixels in height.
You can simply reshape the data into the format required. If x is of shape (10, 784) then x.reshape(10, 1, 28, 28) will have the required format.
The code is expecting a 4-dimensional numpy array, not a Theano tensor (keras does all the Theano tensor manipulation under the hood).
Your inputs, X_train and X_test need to be reshaped as follows:
X_train = X_train.reshape(-1, 1, 28, 28)
X_test = X_test.reshape(-1, 1, 28, 28)

Resources