PyTorch single channel conv slower than manual implementation - pytorch

I want to convolve a 2d matrix with a constant 1d weight kernel in PyTorch. I would have imagined that the fastest way to do this would be to use PyTorch's conv2d, but this turns out to be substantially slower than a simple manual implementation. This is, in turn, substantially slower than a simple CUDA version that I wrote (not shown). Is there any way to make the PyTorch version faster? I tried preallocating the output array (the _out versions of the implementations, below), but this was even slower. Why does PyTorch not at least automatically use something like my manual implementation for such cases?
The four implementations (conv2d and manual, with and without preallocation) are below. I also performed the test after applying JIT to each implementation, but the outcome was the same (manual was fastest).
The output of the code (running on a CPU with PyTorch 1.11.0):
True True True
conv 3.8901006870000856
conv_out 4.489796123998531
manual 1.6673853559987037
manual_out 1.976811970002018
jit conv 3.674719169997843
jit conv_out 4.3507045450023725
jit manual 0.31136969999352004
jit manual_out 0.4854257180049899
The code:
import torch
import timeit
## Four implementations
def conv(x):
return torch.nn.functional.conv2d(x, torch.tensor([[[[1.0, 2.0, 3.0]]]]), padding='valid')
def conv_out(x, out):
out.copy_(torch.nn.functional.conv2d(x, torch.tensor([[[[1.0, 2.0, 3.0]]]]), padding='valid'))
def manual(x):
return 1.0 * x[..., :-2] + 2.0 * x[..., 1:-1] + 3.0 * x[..., 2:]
def manual_out(x, out):
out.copy_(1.0 * x[..., :-2] + 2.0 * x[..., 1:-1] + 3.0 * x[..., 2:])
## Test that all implementations produce the same result
x = torch.arange(256 * 256, dtype=torch.float).reshape(1, 1, 256, 256)
y1 = conv(x)
y2 = torch.empty(1, 1, 256, 254)
conv_out(x, y2)
y3 = manual(x)
y4 = torch.empty(1, 1, 256, 254)
manual_out(x, y4)
print(torch.allclose(y1, y2), torch.allclose(y2, y3), torch.allclose(y3, y4))
## Time execution of each implementation
x = torch.arange(256 * 256, dtype=torch.float).reshape(1, 1, 256, 256)
out = torch.empty(1, 1, 256, 254)
n = 10000
print('conv', timeit.timeit(lambda: conv(x), number=n))
print('conv_out', timeit.timeit(lambda: conv_out(x, out), number=n))
print('manual', timeit.timeit(lambda: manual(x), number=n))
print('manual_out', timeit.timeit(lambda: manual_out(x, out), number=n))
## Time execution of each implementation after JIT
conv = torch.jit.script(conv)
conv_out = torch.jit.script(conv_out)
manual = torch.jit.script(manual)
manual_out = torch.jit.script(manual_out)
# Warm-up runs before timing
conv(x)
conv(x)
conv_out(x, out)
conv_out(x, out)
manual(x)
manual(x)
manual_out(x, out)
manual_out(x, out)
print('jit conv', timeit.timeit(lambda: conv(x), number=n))
print('jit conv_out', timeit.timeit(lambda: conv_out(x, out), number=n))
print('jit manual', timeit.timeit(lambda: manual(x), number=n))
print('jit manual_out', timeit.timeit(lambda: manual_out(x, out), number=n))

Related

requires_grad = False seems not working in my case

I received a Cannot insert a Tensor that requires grad as a constant. Consider making it a parameter or input, or detaching the gradient error with tensor W.
W has the size of (10,10) and grad_fn=<DivBackward0>. The error happens at the second line
def muy(self, x):
V = torch.tensor(self.W - self.lambda_ * torch.eye(self.ENCODING_DIM), requires_grad=False)
return -0.5 * V.inverse().mm(self.b + self.lambda_ * x[:, None])
Other vars, values taken at the time of the error
self.lambda_: 1.0
self.ENCODING_DIM: 10
self.b: torch.Size([10, 1]), requires_grad=True
x: torch.Size([3, 1, 10]), grad_fn=<MulBackward0>
How could I set the result of muy as just an ingredient of the leaf node, so grad through V is required?
I tried this monstrosity, to no avail
def muy(self, x):
V_inv = np.linalg.inv(self.V.detach().numpy())
x_numpy = x[:, None].detach().numpy()
temp= -0.5 * np.matmul(V_inv, self.b.detach().numpy() + self.lambda_ * x_numpy)
return temp
Why I cared about this JIT:
I wanted to use tensorboard to visualize my model, if I understand the error messages right, the visualzing models use Tracer
EDIT
This still gives the same error, W or W.detach()
with torch.no_grad():
V = self.W - self.lambda_ * torch.eye(self.ENCODING_DIM)
return -0.5 * V.inverse().mm(self.b + self.lambda_ * x[:, None])
V = torch.tensor(self.W - self.lambda_ * torch.eye(self.ENCODING_DIM), requires_grad=False)
What you are trying to do here doesn't make much sense. torch.tensor(value) can only be created if the value is scalar (e.g. Python's 5), while you are trying to fit torch.Tensor there.
What you should do is simply this:
V = self.W - self.lambda_ * torch.eye(self.ENCODING_DIM)
If you want to detach self.W for some reason you can do this:
V = self.W.detach() - self.lambda_ * torch.eye(self.ENCODING_DIM)
(this will make a copy of self.W with requires_grad set to False).
You could also use torch.no_grad() context manager so this operation will not be recorded on the graph which will have the same effect on the graph (but only in this case, not in general and you won't make copy of self.W so it is advised to do that):
with torch.no_grad():
V = self.W - self.lambda_ * torch.eye(self.ENCODING_DIM)
Code to reproduce
Can't reproduce this exact issue based on your code description, see below:
import torch
lambda_ = 1.0
W = torch.randn(10, 10, requires_grad=True)
ENCODING_DIM = 10
b = torch.randn(10, 1, requires_grad=True)
x = torch.randn(3, 1, 10, requires_grad=True)
with torch.no_grad():
V = W - lambda_ * torch.eye(ENCODING_DIM)
result = -0.5 * V.inverse().mm(b + lambda_ * x[:, None])
print(result)
This code gives the following (different!) error:
Traceback (most recent call last): File "foo.py", line 13, in
result = -0.5 * V.inverse().mm(b + lambda_ * x[:, None]) RuntimeError: matrices expected, got 2D, 4D tensors at
/pytorch/aten/src/TH/generic/THTensorMath.cpp:36
I think the problem is about shape of matrixes.
in this line return -0.5 * V.inverse().mm(self.b + self.lambda_ * x[:, None]),
V.inverse()shape is [10, 10], b shape is [10, 1] and x[:, None] shape is[3, 1, 1, 10]. These shapes are not good for calculating what you want. especially that of x[:, None].

How does Pytorch build the computation graph

Here is example pytorch code from the website:
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
# 1 input image channel, 6 output channels, 3x3 square convolution
# kernel
self.conv1 = nn.Conv2d(1, 6, 3)
self.conv2 = nn.Conv2d(6, 16, 3)
# an affine operation: y = Wx + b
self.fc1 = nn.Linear(16 * 6 * 6, 120) # 6*6 from image dimension
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
# Max pooling over a (2, 2) window
x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
# If the size is a square you can only specify a single number
x = F.max_pool2d(F.relu(self.conv2(x)), 2)
x = x.view(-1, self.num_flat_features(x))
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
In the forward function, we simply apply a series of transformations to x, but never explicitly define which objects are part of that transformation. Yet when computing the gradient and updating the weights, Pytorch 'magically' knows which weights to update and how the gradient should be calculated.
How does this process work? Is there code analysis going on, or something else that I am missing?
Yes, there is implicit analysis on forward pass. Examine the result tensor, there is thingie like grad_fn= <CatBackward>, that's a link, allowing you to unroll the whole computation graph. And it is built during real forward computation process, no matter how you defined your network module, object oriented with 'nn' or 'functional' way.
You can exploit this graph for net analysis, as torchviz do here: https://github.com/szagoruyko/pytorchviz/blob/master/torchviz/dot.py

CoreML: creating a custom layer for ONNX RandomNormal

I've trainined a VAE that in PyTorch that I need to convert to CoreML. From this thread PyTorch VAE fails conversion to onnx I was able to get the ONNX model to export, however, this just pushed the problem one step further to the ONNX-CoreML stage.
The original function that contains the torch.randn() call is the reparametrize func:
def reparametrize(self, mu, logvar):
std = logvar.mul(0.5).exp_()
if self.have_cuda:
eps = torch.randn(self.bs, self.nz, device='cuda')
else:
eps = torch.randn(self.bs, self.nz)
return eps.mul(std).add_(mu)
The solution is, of course, to create a custom layer, but I'm having problems creating a layer with no inputs (i.e., it's just a randn() call).
I can get the CoreML conversion to complete with this def:
def convert_randn(node):
params = NeuralNetwork_pb2.CustomLayerParams()
params.className = "RandomNormal"
params.description = "Random normal distribution generator"
params.parameters["dtype"].intValue = node.attrs.get('dtype', 1)
params.parameters["bs"].intValue = node.attrs.get("shape")[0]
params.parameters["nz"].intValue = node.attrs.get("shape")[1]
return params
I do the conversion with:
coreml_model = convert(onnx_model, add_custom_layers=True,
image_input_names = ['input'],
custom_conversion_functions={"RandomNormal": convert_randn})
I should also note that, at the completion of the mlmodel export, the following is printed:
Custom layers have been added to the CoreML model corresponding to the
following ops in the onnx model:
1/1: op type: RandomNormal, op input names and shapes: [], op output
names and shapes: [('62', 'Shape not available')]
Bringing the .mlmodel into Xcode complains that Layer '62' of type 500 has 0 inputs but expects at least 1. So I'm wondering how to specify a kind of "dummy" input to the layer, since it doesn't actually have an input -- it's just a wrapper around torch.randn() (or, more specifically, the onnx RandonNormal op). I should clarify that I do need the whole VAE, not just the decoder, as I'm actually using the entire process to "error correct" my inputs (i.e., the encoder estimates my z vector, based on an input, then the decoder generates the closest generalizable prediction of the input).
Any help greatly appreciated.
UPDATE: Okay, I finally got a version to load in Xcode (thanks to #MattijsHollemans and his book!). The originalConversion.mlmodel is the initial output of converting my model from ONNX to CoreML. To this, I had to manually insert the input for the RandomNormal layer. I made it (64, 28, 28) for no great reason — I know my batch size is 64, and my inputs are 28 x 28 (but presumably it could also be (1, 1, 1), since it's a "dummy"):
spec = coremltools.utils.load_spec('originalConversion.mlmodel')
nn = spec.neuralNetwork
layers = {l.name:i for i,l in enumerate(nn.layers)}
layer_idx = layers["62"] # '62' is the name of the layer -- see above
layer = nn.layers[layer_idx]
layer.input.extend(["dummy_input"])
inp = spec.description.input.add()
inp.name = "dummy_input"
inp.type.multiArrayType.SetInParent()
spec.description.input[1].type.multiArrayType.shape.append(64)
spec.description.input[1].type.multiArrayType.shape.append(28)
spec.description.input[1].type.multiArrayType.shape.append(28)
spec.description.input[1].type.multiArrayType.dataType = ft.ArrayFeatureType.DOUBLE
coremltools.utils.save_spec(spec, "modelWithInsertedInput.mlmodel")
This loads in Xcode, but I have yet to test the functioning of the model in my app. Since the additional layer is simple, and the input is literally a bogus, non-functional input (just to keep Xcode happy), I don't imagine it will be a problem, but I'll post again if it doesn't run properly.
UPDATE 2: Unfortunately, the model doesn't load at runtime. It fails with [espresso] [Espresso::handle_ex_plan] exception=Failed in 2nd reshape after missing custom layer info. What I find very strange and confusing is that, inspecting model.espresso.shape, I see that almost every node has a shape like:
"62" : {
"k" : 0,
"w" : 0,
"n" : 0,
"seq" : 0,
"h" : 0
}
I have two question/concerns: 1) Most obviously, why are all the values zero (this is the case with all but the input nodes), and 2) Why does it appear to be a sequential model, when it's just a fairly conventional VAE? Opening model.espresso.shape for a fully-functioning GAN in the same app, I see that the nodes are of the format:
"54" : {
"k" : 256,
"w" : 16,
"n" : 1,
"h" : 16
}
That is, they contain reasonable shape info, and they don't have seq fields.
Very, very confused...
UPDATE 3: I've also just noticed in the compiler report the error: IMPORTANT: new sequence length computation failed, falling back to old path. Your compilation was sucessful, but please file a radar on Core ML | Neural Networks and attach the model that generated this message.
Here's the original PyTorch model:
class VAE(nn.Module):
def __init__(self, bs, nz):
super(VAE, self).__init__()
self.nz = nz
self.bs = bs
self.encoder = nn.Sequential(
# input is (nc) x 28 x 28
nn.Conv2d(nc, ndf, 4, 2, 1, bias=False),
nn.LeakyReLU(0.2, inplace=True),
# size = (ndf) x 14 x 14
nn.Conv2d(ndf, ndf * 2, 4, 2, 1, bias=False),
nn.BatchNorm2d(ndf * 2),
nn.LeakyReLU(0.2, inplace=True),
# size = (ndf*2) x 7 x 7
nn.Conv2d(ndf * 2, ndf * 4, 3, 2, 1, bias=False),
nn.BatchNorm2d(ndf * 4),
nn.LeakyReLU(0.2, inplace=True),
# size = (ndf*4) x 4 x 4
nn.Conv2d(ndf * 4, 1024, 4, 1, 0, bias=False),
nn.LeakyReLU(0.2, inplace=True),
)
self.decoder = nn.Sequential(
# input is Z, going into a convolution
nn.ConvTranspose2d( 1024, ngf * 8, 4, 1, 0, bias=False),
nn.BatchNorm2d(ngf * 8),
nn.ReLU(True),
# size = (ngf*8) x 4 x 4
nn.ConvTranspose2d(ngf * 8, ngf * 4, 3, 2, 1, bias=False),
nn.BatchNorm2d(ngf * 4),
nn.ReLU(True),
# size = (ngf*4) x 8 x 8
nn.ConvTranspose2d(ngf * 4, ngf * 2, 4, 2, 1, bias=False),
nn.BatchNorm2d(ngf * 2),
nn.ReLU(True),
# size = (ngf*2) x 16 x 16
nn.ConvTranspose2d(ngf * 2, nc, 4, 2, 1, bias=False),
nn.Sigmoid()
)
self.fc1 = nn.Linear(1024, 512)
self.fc21 = nn.Linear(512, nz)
self.fc22 = nn.Linear(512, nz)
self.fc3 = nn.Linear(nz, 512)
self.fc4 = nn.Linear(512, 1024)
self.lrelu = nn.LeakyReLU()
self.relu = nn.ReLU()
def encode(self, x):
conv = self.encoder(x);
h1 = self.fc1(conv.view(-1, 1024))
return self.fc21(h1), self.fc22(h1)
def decode(self, z):
h3 = self.relu(self.fc3(z))
deconv_input = self.fc4(h3)
deconv_input = deconv_input.view(-1,1024,1,1)
return self.decoder(deconv_input)
def reparametrize(self, mu, logvar):
std = logvar.mul(0.5).exp_()
eps = torch.randn(self.bs, self.nz, device='cuda') # needs custom layer!
return eps.mul(std).add_(mu)
def forward(self, x):
# print("x", x.size())
mu, logvar = self.encode(x)
z = self.reparametrize(mu, logvar)
decoded = self.decode(z)
return decoded, mu, logvar
To add an input to your Core ML model, you can do the following from Python:
import coremltools
spec = coremltools.utils.load_spec("YourModel.mlmodel")
nn = spec.neuralNetworkClassifier # or just spec.neuralNetwork
layers = {l.name:i for i,l in enumerate(nn.layers)}
layer_idx = layers["your_custom_layer"]
layer = nn.layers[layer_idx]
layer.input.extend(["dummy_input"])
inp = spec.description.input.add()
inp.name = "dummy_input"
inp.type.doubleType.SetInParent()
coremltools.utils.save_spec(spec, "NewModel.mlmodel")
Here, "your_custom_layer" is the name of the layer you want to add the dummy input to. In your model it looks like it's called 62. You can look at the layers dictionary to see the names of all the layers in the model.
Notes:
If your model is not a classifier, use nn = spec.neuralNetwork instead of neuralNetworkClassifier.
I made the new dummy input have the type "double". That means your custom layer gets a double value as input.
You need to specify a value for this dummy input when using the model.

How to increase Tensorflow Prediction accuracy

I am working with tensorflow to create a model which can classify digits using the SVHN dataset provided by google.My accracy is really low(~25%) but i have seen a notebook which has 88% accuracy.Reference Notebook
I was wondering if anybody could give me some tips regarding how should i improve my accuracy in order to make my model better.
Here is my model code.
filename='extra.pickle'
with open(filename,'rb') as f:
other=pickle.load(f)
train_data=other['train_dataset']
test_data=other['test_dataset']
del other
train_dataset=train_data['X']
test_dataset=test_data['X']
train_labels=train_data['y']
test_labels=test_data['y']
print(len(test_dataset))
print(len(test_labels))
print(test_dataset.shape)
#print(train_dataset.length())
classes=10
batch_size=32
num_steps = 200000
graph=tf.Graph()
#Placeholder for the data
with graph.as_default():
data_placeholder = tf.placeholder(tf.float32, shape=(batch_size,32,32,3))
label_placeolder = tf.placeholder(tf.int64, shape=(batch_size, classes))
tf_test_dataset = tf.placeholder(tf.float32, shape=(batch_size,32,32,3))
tf_label_dataset = tf.placeholder(tf.float32, shape=(batch_size, classes))
layer1_weights=tf.Variable(tf.truncated_normal([3,3,3,16]))
layer1_biases=tf.Variable(tf.zeros([16]))
layer2_weights=tf.Variable(tf.truncated_normal([3,3,16,32]))
layer2_biases=tf.Variable(tf.zeros([32]))
layer3_weights=tf.Variable(tf.truncated_normal([2,2,32,64]))
layer3_biases=tf.Variable(tf.zeros([64]))
layer4_weights=tf.Variable(tf.truncated_normal([1024,10]))
layer4_biases=tf.Variable(tf.zeros([10]))
layer5_weights=tf.Variable(tf.truncated_normal([10,classes]))
layer5_biases=tf.Variable(tf.zeros([classes]))
def layer_multiplication(data_input_given,dropping=False):
#Convolutional Layer 1
CNN1=tf.nn.relu(tf.nn.conv2d(data_input_given,layer1_weights,strides=[1,1,1,1],padding='SAME')+layer1_biases)
print('CNN1 Done!!')
#Pooling Layer
Pool1=tf.nn.max_pool(CNN1,ksize=[1,2,2,1],strides=[1,2,2,1],padding='SAME')
print('Pool1 DOne')
#second Convolution layer
CNN2=tf.nn.relu(tf.nn.conv2d(Pool1,layer2_weights,strides=[1,1,1,1],padding='SAME'))+layer2_biases
print('CNN2 Done')
#Second Pooling
Pool2 = tf.nn.max_pool(CNN2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
print('pool2 Done')
#Third Convolutional Layer
print(Pool2.shape)
CNN3 = tf.nn.relu(tf.nn.conv2d(Pool2, layer3_weights, strides=[1, 1, 1, 1], padding='SAME')) + layer3_biases
print('CNN3 Done')
#Third Pooling Layer
Pool3 = tf.nn.max_pool(CNN3, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
print('Pool3 DOne')
#Fully Connected Layer
#print(Pool3.shape)
shape = Pool3.get_shape().as_list()
# print(shape)
reshape = tf.reshape(Pool3, [shape[0], shape[1] * shape[2] * shape[3]])
#print(reshape.shape)
FullyCon = tf.nn.relu(tf.matmul(reshape, layer4_weights) + layer4_biases)
#print(FullyCon.shape)
if dropping==False:
print('Training')
dropout = tf.nn.dropout(FullyCon, 0.6)
z=tf.matmul(dropout,layer5_weights)+layer5_biases
return z
else:
print('Testing')
z = tf.matmul(FullyCon, layer5_weights) + layer5_biases
return z
gloabl_step = tf.Variable(0, trainable=False)
decay_rate=tf.train.exponential_decay(1e-6,gloabl_step,4000,0.96,staircase=False,)
train_input=layer_multiplication(data_placeholder,False)
test_prediction = tf.nn.softmax(layer_multiplication(tf_test_dataset,True))
loss=(tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=label_placeolder,logits=train_input))
+ 0.01 * tf.nn.l2_loss(layer1_weights)
+ 0.01 * tf.nn.l2_loss(layer2_weights)
+ 0.01 * tf.nn.l2_loss(layer3_weights)
+ 0.01 * tf.nn.l2_loss(layer4_weights)
+ 0.01 * tf.nn.l2_loss(layer5_weights)
)
optimizer = tf.train.GradientDescentOptimizer(name='Stochastic', learning_rate=decay_rate).minimize(loss,global_step=gloabl_step)
def accuracy(predictions, labels):
print(predictions.shape[0])
return (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1))
/ predictions.shape[0])
config=tf.ConfigProto()
config.gpu_options.allocator_type ='BFC'
saver = tf.train.Saver()
test_accuracy=[]
with tf.Session(config=config) as session:
tf.global_variables_initializer().run()
print('Initialized')
tf.train.write_graph(session.graph_def, '.', './SVHN.pbtxt')
for step in range(num_steps):
offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
batch_data = train_dataset[offset:(offset + batch_size), :, :]
batch_labels = train_labels[offset:(offset + batch_size), :]
batch_test_data = test_dataset[offset:(offset + batch_size), :, :]
batch_test_labels = test_labels[offset:(offset + batch_size),:]
#print(batch_data)
#print(batch_test.shape)
feed_dict = {data_placeholder:batch_data, label_placeolder:batch_labels}
_, l, predictions = session.run(
[optimizer, loss, train_input], feed_dict=feed_dict)
if (step % 500 == 0):
#print(session.run(decay_rate))
print('Minibatch loss at step %d: %f' % (step, l))
print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))
if(batch_test_data.shape!=(32,32,32,3)):
print('Skip')
else:
correct_prediction = tf.equal(tf.argmax(test_prediction, 1), tf.argmax(tf_label_dataset, 1))
accuracy_for_test = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print("Test Accuracy")
test_accuracy.append(accuracy_for_test.eval(feed_dict={tf_test_dataset:batch_test_data,
tf_label_dataset:batch_test_labels}))
print(accuracy_for_test.eval(feed_dict={tf_test_dataset:batch_test_data,
tf_label_dataset:batch_test_labels}))
print(np.mean(test_accuracy))
saver.save(sess=session, save_path='./SVHN.ckpt')
Ps-The code does work in My system.The issue seems to be with my architecture i believe.
Use a different weight initialization and your results should be way better. In your reference notebook they use stddev=0.1, but you can also take a look at glorot or he initialization, which should work even better.
Also your learning rate is really low and the decay makes the learning rate even lower, so the network will not learn much this way. By doing a better initialization of the network you can increase the learning rate and learn something useful from your data.
First of all there is not single parameter which will increase your accuracy. Training in tensorflow depend on lot of parameter like initial weight, learning rate, number of step to train model. It is kind of try with different combination. Like learning rate should not be low as well not be very much other wise you will miss the minima.It is like our brain , some time we learn thing very fast when we have clear data about that thing.and some time we need more data to find the thing in better manner.
So do training on different configaration of all these parameter and then check your accuracy

InvalidArgumentError logits and labels must be same size: logits_size=[3215,25] labels_size=[10,25]

I was having quite a few errors (OOM, shape problems, etc) which I had managed to fix somehow.
But I'm unable to get my head around this error. I have searched quite a bit and I have also tried the sparse cross entropy with logits method in tensorflow and the tf.squeeze function also but that also didn't help me in resolving this error. Here is the link of the code (it's a github gist with the entire stacktrace and errors).
Code Link
Here is the link for the data set(It's around 500 Mb)
Dataset Link
Here is the Code (just in Case):
from PIL import Image
import numpy as np
import glob
from numpy import array
import pandas as pd
from sklearn.preprocessing import LabelEncoder,OneHotEncoder
import h5py
import tensorflow as tf
def loading_saving_image_as_grayscale_train(img):
##combined_path='M:/PycharmProjects/AI+DL+CP/test_img'+img
loading=Image.open(img)
loading=loading.resize((28,28),Image.ANTIALIAS)
loading=loading.convert('L')
#loading.show()
conversion_to_array=np.asarray(loading,dtype=float)
train_data.append(conversion_to_array)
def loading_saving_image_as_grayscale_test(img):
#combined_path = 'M:/PycharmProjects/AI+DL+CP/train_img/' + img
#print(combined_path)
loading=Image.open(img,'r')
loading=loading.resize((28,28),Image.ANTIALIAS)
loading=loading.convert('L')
conversion_to_array=np.asarray(loading,dtype=float)
test_data.append(conversion_to_array)
import os
import requests, zipfile, io
import pandas as pd
#url = requests.get('https://he-s3.s3.amazonaws.com/media/hackathon/deep-learning-challenge-1/identify-the-objects/a0409a00-8-dataset_dp.zip')
#data = zipfile.ZipFile(io.BytesIO(url.content))
#data.extractall()
#os.listdir()
dataframe1=pd.read_csv('test.csv')
dataframe1.index=dataframe1.index+1
only_index=dataframe['image_id']
test_data=[]
train_data=[]
train=glob.glob('train_img/*.png')
test=glob.glob('test_img/*.png')
#other=loading_saving_image_as_grayscale('M:/PycharmProjects/AI+DL+CP/test_img/test_1000b.png')
#print(Image.open('M:/PycharmProjects/AI+DL+CP/test_img/test_1000b.png'))
#print(test)
#loading_sample=Image.open('M:/PycharmProjects/AI+DL+CP/test_img/test_1000b.png')
#loading_sample.show()
#print(train)
#print(test)
for data in train:
#print(data)
loading_saving_image_as_grayscale_train(data)
for item in test:
#print(item)
loading_saving_image_as_grayscale_test(item)
#print(train_data)
#print(test_data)
'''with Image.fromarray(train_data[1]) as img:
width,height=img.size
print(width,height)
'''
def OneHot(label,n_classes):
label=np.array(label).reshape(-1)
label=np.eye(n_classes)[label]
return label
dataframe=pd.read_csv('train.csv')
train_data=np.asarray(train_data)
test_data=np.asarray(test_data)
uni=dataframe['label']
dataframe1=pd.read_csv('test.csv')
dataframe1.index=dataframe1.index+1
only_index=dataframe['image_id']
label=LabelEncoder()
integer_encoding=label.fit_transform(uni)
#del uni
#del dataframe
#print(integer_encoding)
binary=OneHotEncoder(sparse=False)
integer_encoding=integer_encoding.reshape(len(integer_encoding),1)
onehot=binary.fit_transform(integer_encoding)
train_data=np.reshape(train_data,[-1,28,28,1])
test_data=np.reshape(test_data,[-1,28,28,1])
#onehot=np.reshape(onehot,[-1,10])
train_data=np.transpose(train_data,(0,2,1,3))
test_data=np.transpose(test_data,(0,2,1,3))
train_data=train_data.astype(np.float32)
test_data=test_data.astype(np.float32)
print(train_data.shape,test_data.shape,onehot.shape)
graph = tf.Graph()
with graph.as_default():
# placeholders for input data batch_size x 32 x 32 x 3 and labels batch_size x 10
data_placeholder = tf.placeholder(tf.float32, shape=[None, 28, 28, 1])
label_placeholder = tf.placeholder(tf.int32, shape=[None, 25])
# defining decaying learning rate
global_step = tf.Variable(0)
decay_rate = tf.train.exponential_decay(1e-4, global_step=global_step, decay_steps=10000, decay_rate=0.97)
layer1_weights = tf.Variable(tf.truncated_normal([3, 3, 1, 64],stddev=0.1))
layer1_biases = tf.Variable(tf.constant(0.1, shape=[64]))
layer2_weights = tf.Variable(tf.truncated_normal([3, 3, 64,32],stddev=0.1))
layer2_biases = tf.Variable(tf.constant(0.1,shape=[32]))
layer3_weights = tf.Variable(tf.truncated_normal([2, 2, 32, 20],stddev=0.1))
layer3_biases = tf.Variable(tf.constant(0.1,shape=[20]))
layer4_weights = tf.Variable(tf.truncated_normal([20,25],stddev=0.1))
layer4_biases = tf.Variable(tf.constant(0.1,shape=[25]))
layer5_weights = tf.Variable(tf.truncated_normal([25, 25], stddev=0.1))
layer5_biases = tf.Variable(tf.constant(0.1, shape=[25]))
def layer_multiplication(data_input_given):
#Convolutional Layer 1
#data_input_given=np.reshape(data_input_given,[-1,64,64,1])
CNN1=tf.nn.relu(tf.nn.conv2d(data_input_given,layer1_weights,strides=[1,1,1,1],padding='SAME')+layer1_biases)
print('CNN1 Done!!')
#Pooling Layer
Pool1=tf.nn.max_pool(CNN1,ksize=[1,4,4,1],strides=[1,4,4,1],padding='SAME')
print('Pool1 DOne')
#second Convolution layer
CNN2=tf.nn.relu(tf.nn.conv2d(Pool1,layer2_weights,strides=[1,1,1,1],padding='SAME'))+layer2_biases
print('CNN2 Done')
#Second Pooling
Pool2 = tf.nn.max_pool(CNN2, ksize=[1,2,2,1], strides=[1,2,2,1], padding='SAME')
print('pool2 Done')
#Third Convolutional Layer
CNN3 = tf.nn.relu(tf.nn.conv2d(Pool2, layer3_weights, strides=[1, 1, 1, 1], padding='SAME')) + layer3_biases
print('CNN3 Done')
#Third Pooling Layer
Pool3 = tf.nn.max_pool(CNN3, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
print('Pool3 DOne')
#Fully Connected Layer
Pool4=tf.nn.max_pool(Pool3,ksize=[1,2,2,1],strides=[1,2,2,1],padding='SAME')
FullyCon=tf.reshape(Pool4,[-1,20])
FullyCon=tf.nn.relu(tf.matmul(FullyCon,layer4_weights)+layer4_biases)
print('Fullyconnected Done')
dropout = tf.nn.dropout(FullyCon, 0.4)
dropout=tf.reshape(dropout,[-1,25])
dropout=tf.matmul(dropout,layer5_weights)+layer5_biases
#print(dropout.shape)
return dropout
train_input = layer_multiplication(train_data)
print(train_input.shape)
loss = (tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=label_placeholder,logits=train_input))
+ 0.01 * tf.nn.l2_loss(layer1_weights)
+ 0.01 * tf.nn.l2_loss(layer2_weights)
+ 0.01 * tf.nn.l2_loss(layer3_weights)
+ 0.01 * tf.nn.l2_loss(layer4_weights)
)
#other=(tf.squeeze(label_placeholder))
#print(tf.shape())
optimizer = tf.train.GradientDescentOptimizer(name='Stochastic', learning_rate=decay_rate).minimize(loss,global_step=global_step)
#print(train_input.shape)
batch_size = 10
num_steps=10000
prediction=[]
with tf.Session(graph=graph) as session:
tf.global_variables_initializer().run()
print('Initialized')
for i in range(num_steps):
print("in loop")
offset = (i * batch_size) % (onehot.shape[0] - batch_size)
batch_data = train_data[offset:(offset + batch_size), :, :]
batch_labels = onehot[offset:(offset + batch_size), :]
print("training")
feed_dict = {data_placeholder: batch_data, label_placeholder: batch_labels}
_, l, predictions = session.run(
[optimizer, loss, train_input], feed_dict=feed_dict)
print(sess.run(tf.argmax(label_placeholder, 1), feed_dict={x:test_data}))
prediction.append(sess.run(tf.argmax(label_placeholder,1),feed_dict={x:test_data}))
print('Finished')
submit=pd.Dataframe({'image_id':only_index, 'label':prediction})
submit.to_csv('submit.csv',index=False)
I also had a doubt regarding predicting class labels. Can someone tell me whether the method I'm using for storing the predicted class labels will work or not?
The reshape operations do not make sense:
FullyCon=tf.reshape(Pool4,[-1,20])
this will collapse batch dimension and feature dimensions.
Why would output of Pool4 have 20 dimensions? The fact it has 20 kernels does not mean it has 20 dimensions. Dimensionality is 20 * size of the image on this level of convolutions, which will be much bigger (my guess is it will be 6430).
It should be something among the lines of
output_shape = Pool4.shape[1] * Pool4.shape[2] * Pool4.shape[3]
FullyCon=tf.reshape(Pool4, [-1, output_shape])
and then you will have to change final layer accordingly (to match shapes).
The error has been fixed after reshaping everything properly and also in the softmax with logits part,i had to send the data_placeholder for logits.After doing this the issue got cleared.

Resources