Using matrices as input to convolutional neural network - python-3.x

I am trying to use a convolutional neural network to identify patterns in binary matrices and classify them to one of two classes. At the moment I have a bunch of 15x15 matrices in csv format.
In order to get a handle on how convolutional nets work I have been following sentdex's tutorials on youtube. In this he uses a conv net to classify the MNIST dataset. The code he uses to specify the input is like this:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot = True)
x = tf.placeholder('float', [None, 784])
y = tf.placeholder('float')
My question is how do I set up a file like 'input_data' which the conv net can read my matrices and labels from? Can I include ALL of my training data in one file or do I need to split them into train/test files?
I have set up an excel file in the following format but not sure if it will work in the same way MNIST does.
input data example file:

My favorite tutorials are from aymericdamien, below is a link to the convolutional tutorial in jupyter (go back up a few directories in github for all of the tutorials).
https://github.com/aymericdamien/TensorFlow-Examples/blob/master/notebooks/3_NeuralNetworks/convolutional_network_raw.ipynb
You'll notice that their input is the same as what you have posted:
X = tf.placeholder(tf.float32, [None, num_input])
Y = tf.placeholder(tf.float32, [None, num_classes])
And the first thing they do in the conv_net() function is to reshape it to a image:
x = tf.reshape(x, shape=[-1, 28, 28, 1])
The shape arguments are understood as follows:
-1: variable batch size
28: height of the image (mnist is 28x28 grayscale images)
28: width of the image
1: color channels, grayscale images have 1 color channel, RGB images have 3 typically.
Try reshaping the image using numpy and displaying it yourself to check that you got it right:
import scipy.misc as misc
import numpy as np
img = np.reshape(flat_image, (28,28,1))
misc.imshow(img)
As far as train and test process goes, tensorflow doesn't care anything about your structure. I generally would separate the files to make sure you don't accidentally pass your test set to your training process though. You will ultimately need to call sess.run separately on your training and test datasets. I think the tutorial I linked to provides a very good example of this process, so if you have more specific questions I'll leave them to a future post.

Related

Output of CNN should be image

I am pretty new to deep learning, so I got one question:
Assume an input Grayscale image of shape (128,128,1). Target (Output) is as well an (128,128,1) sized image, e.g. for segmentation, depth prediction etc.. Usually with valid padding the size of the image shrinks after several convolution layers.
What are decent (maybe not the toughest one) variants to keep the size or predict a same sized image? Is it via same-padding? Is it via tranpose convolution or upsampling? Should I use a FCN at the end and reshape them to the image size? I am using pytorch. I would be glad for any hints, because I didn't find much in the internet.
Best
TLDR; You want to look at Deconv networks (Convolution transpose) that help regenerate an image using convolution operations. You want to build an encoder-decoder convolution architecture that compresses an image to a latent representation using convolutions and then decodes an image from this compressed representation. For image segmentation, a popular architecture is U-net.
NOTE: I cant answer for pytorch, so I will he sharing the Tensorflow equivalent. Please feel to ignore the code, but since you are looking for the concept, I can help you with what you need to solve this.
You are trying to generate an image as the output of the network.
A series convolution operation help to Downsample an image. Since you need an output 2D matrix (gray scale image), you want to Upsample as well. Such a network is called a Deconv network.
The first series of layers convolve over the input, 'flattening' them into a vector of channels. The next set of layers use 2D Conv Transpose or Deconv operations to change the channels back into a 2D matrix (Gray scale image)
Refer to this image for reference -
Here is a sample code that shows you how you can take a (10,3,1) image to a (12,10,1) image using a deconv net.
You can find the conv2dtranspose layer implementation in pytorch here.
from tensorflow.keras import layers, Model, utils
inp = layers.Input((128,128,1)) ##
x = layers.Conv2D(2, (3,3))(inp) ## Convolution part
x = layers.Conv2D(4, (3,3))(x) ##
x = layers.Conv2D(6, (3,3))(x) ##
##########
x = layers.Conv2DTranspose(6, (3,3))(x)
x = layers.Conv2DTranspose(4, (3,3))(x) ## ## Deconvolution part
out = layers.Conv2DTranspose(1, (3,3))(x) ##
model = Model(inp, out)
utils.plot_model(model, show_shapes=True, show_layer_names=False)
Also, if you are looking for tried and tested architectures in this domain, check out U-net; U-Net: Convolutional Networks for Biomedical Image Segmentation. This is an encoder-decoder (conv2d, conv2d-transpose) architecture that uses a concept called skip connections to avoid information loss and generate better image segmentation masks.

How to use ImageDataGenerator with multi-label masks for multi-class image segmentation?

In order to do multiclass segmentation the masks need to be one-hot-encoded. For example if I have a 100 images of shape 224x224x3 with 5 different classes I would have a set of masks with shape (100, 224, 224, 5) i.e the last dimension (the channel) refers to the class of the pixel. Take a grayscale masks that contains 6 classes where each pixel has the label 1-6, I can easily convert this to the categorical mask I need using tf.keras.utils.to_categorical.
If I use the ImageDataGenerator provided with keras I know I can create a generator for both images and masks then zip them together for the problem (as code shows below) but where i'm confused is how do I convert the masks into this categorical one-hot-encoded structure whilst using the ImageDataGenerator? The ImageDataGenerator only finds files in directories that are saved as images therefore I can't convert the masks and then save them down as numpy arrays (the one-hot-encoded masks) for the generator to pick up, as images can't have that have more than 4 channels right? Is there somehow of telling the generator to do this conversion? Or does this therefore limit the number of classes I can have in my problem?
One solution is to write my own custom generator with the sequence class which I have done but I'm keen on understanding if this is possible to do with Keras inbuilt ImageDataGenenerator? Could writing my a lambda layer on the network be the solution?
mask_categorical = tf.keras.utils.to_categoricl(mask) #converts 224x224 grayscale mask to one-hot encoding version
imgDataGen = ImageDataGenerator(rescale=1/255.)
maskDataGen = ImageDataGenerator()
imageGenerator =imageDataGen.flow_from_directory("dataset/image/",
class_mode=None, seed=40)
maskGenerator = maskDataGen.flow_from_directory("dataset/mask/",
class_mode=None, seed=40)
trainGenerator = zip(imageGenerator, maskGenerator)

Multiple predictions of multi-class image classification with Keras

I trained a CNN in Keras with images in a folder (two types of bees). I have a second folder with unlabeled bees images for prediction.
I'm able to predict a single image (as per below code).
from keras.preprocessing import image
test_image = image.load_img('data/test/20300.jpg')
test_image = image.img_to_array(test_image)
test_image = np.expand_dims(test_image, axis = 0)
prob = classifier.predict_proba(test_image)
Result:
prob
Out[214]: array([[1., 0.]], dtype=float32)
I would like to be able to predict all of the images (around 300).
Is there a way to load and predict all the images in a batch? And will predict() be able to handle it, as it expects and array to predict?
Model.predict_proba() (which is a synonym of predict() really) accepts the batch input. From the documentation:
Generates class probability predictions for the input samples.
The input samples are processed batch by batch.
You just need to load several images and glue them together in a single numpy array. By expanding the 0 dimension your code already uses a batch of 1 in test_image. To complete the picture there's also a Model.predict_on_batch() method.
To load a batch of test images you can use image.list_pictures or ImageDataGenerator.flow_from_directory() (which is compatible with Model.predict_generator() method, see the examples in the documentation).

How to label test data using trained model with keras?

I am working on the following keras convolutional neural network tutorial https://gist.github.com/fchollet/0830affa1f7f19fd47b06d4cf89ed44d
After training the model I want to test the model on sample images, and also label the images. I realize that I have to use predict method that generates an array that shows which label gets what score for a particular image. But I am having trouble using this method. If the images are in the folder test_images and there are 20 of them, how do I test these images and get the prediction?
This is how I've gotten with one image (even though I want it for multiple images):
image = cv2.imread('test1.jpg')
image = cv2.resize(image,(224,224))
features = np.swapaxes(np.swapaxes(image, 1, 2), 0, 1)
predictions = model.predict(features)
This throws the following error:
ValueError: Error when checking : expected conv2d_1_input to have 4 dimensions, but got array with shape (3, 224, 224)
Thank you very much!
Some of the questions I consulted before:
Simple Neural Network in Python not displaying label for the test image
https://github.com/fchollet/keras/issues/315
model.predict works by processing an array of samples, not just one image, so you are missing the batch/samples dimension, which in your case would only be just one image. You just have to reshape the array:
features = features.reshape((1, 3, 224, 224)
And then pass it to predict.

Adverserial images in TensorFlow

I am reading an article that explains how to trick neural networks into predicting any image you want. I am using the mnist dataset.
The article provides a relatively detailed walk through but the person who wrote it is using Caffe.
Anyways, my first step was to create a logistic regression function using TensorFlow that is trained on the mnist dataset. So, if I were to restore the logistic regression model I can use it to predict any image. For example, I feed the number 7 to the following model...
with tf.Session() as sess:
saver.restore(sess, "/tmp/model.ckpt")
# number 7
x_in = np.expand_dims(mnist.test.images[0], axis=0)
classification = sess.run(tf.argmax(pred, 1), feed_dict={x:x_in})
print(classification)
>>>[7]
This prints out the number [7] which is correct.
Now the article explains that in order to break a neural network we need to calculate the gradient of the neural network. This is the derivative of the neural network.
The article states that to calculate the gradient, we first need to pick an intended outcome to move towards, and set the output probability list to be 0 everywhere, and 1 for the intended outcome. Backpropagation is an algorithm for calculating the gradient.
Then there's code provided in Caffe as to how to calculate the gradient...
def compute_gradient(image, intended_outcome):
# Put the image into the network and make the prediction
predict(image)
# Get an empty set of probabilities
probs = np.zeros_like(net.blobs['prob'].data)
# Set the probability for our intended outcome to 1
probs[0][intended_outcome] = 1
# Do backpropagation to calculate the gradient for that outcome
# and the image we put in
gradient = net.backward(prob=probs)
return gradient['data'].copy()
Now, my issue is, I'm having a hard time understanding how this function is able to get the gradient just by feeding just the image and the probabilities to the function. Because I do not fully understand this code, I am having a hard time translating this logic to TensorFlow.
I think I am confused as to how the Caffe framework works because I've never seen/used it before. If someone could explain how this logic works step-by-step that would be great.
I already know the basics of Backpropagation so you may assume I already know how it works.
Here is a link to the article itself...https://codewords.recurse.com/issues/five/why-do-neural-networks-think-a-panda-is-a-vulture
I'm going to show you how to do the basics of generating an adversarial image in TF, to apply that to an already learned model you might need some adaptations.
The code blocks work well as blocks in a Jupyter notebook if you want to try this out interactively. If you don't use a notebook, you'll need to add plt.show() calls for the plots to show and remove the matplotlib inline statement. The code is basically the simple MNIST tutorial from the TF documentation, I'll point out the important differences.
First block is just setup, nothing special ...
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
# if you're not using jupyter notebooks then comment this out
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
Get MNIST data (it is down from time to time so you might need to download it from web.archive.org manually and put it into that directory). We're not using one hot encoding like in the tutorial because by now TF has nicer functions to calculate the loss that don't need the one hot encoding anymore.
mnist = input_data.read_data_sets('/tmp/tensorflow/mnist/input_data')
In the next block we are doing something "special". The input image tensor is defined as a variable because later we want to optimize with regard to the input image. Usually you would have a placeholder here. It does limit us a bit here because we need a definite shape so we only feed in one example at a time. Not something you want to do in production, but for teaching purposes it's fine (and you can get around it with a little more code). Labels are placeholders like normal.
input_images = tf.get_variable("input_image", shape=[1,784], dtype=tf.float32)
input_labels = tf.placeholder(shape=[1], name='input_label', dtype=tf.int32)
Our model is a standard logistic regression model like in the tutorial. We only use the softmax for visualization of results, the loss function takes plain logits.
W = tf.get_variable("weights", shape=[784, 10], dtype=tf.float32, initializer=tf.random_normal_initializer())
b = tf.get_variable("biases", shape=[1, 10], dtype=tf.float32, initializer=tf.zeros_initializer())
logits = tf.matmul(input_images, W) + b
softmax = tf.nn.softmax(logits)
The loss is standard cross entropy. What's to note in the training step is that there is an explicit list of variables passed in - we have defined the input image as a training variable but we don't want to try optimizing the image while training the logistic regression, just weights and biases - so we explicitly state that.
loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits,labels=input_labels,name='xentropy')
mean_loss = tf.reduce_mean(loss)
train_step = tf.train.AdamOptimizer(learning_rate=0.1).minimize(mean_loss, var_list=[W,b])
Start the session ...
sess = tf.Session()
sess.run(tf.global_variables_initializer())
Training is slower than it should be because of batch size 1. Like I said, not something you want to do in production, but this is just for teaching the basics ...
for step in range(10000):
batch_xs, batch_ys = mnist.train.next_batch(1)
loss_v, _ = sess.run([mean_loss, train_step], feed_dict={input_images: batch_xs, input_labels: batch_ys})
At this point we should have a model that is good enough to demonstrate how to generate an adversarial image. First, we get an image that has label '2' because these are easy so even our suboptimal classifier should get them right (if it doesn't, run this cell again ;) this step is random so I can't guarantee that it'll work).
We're setting our input image variable to that example.
sample_label = -1
while sample_label != 2:
sample_image, sample_label = mnist.test.next_batch(1)
sample_label
plt.imshow(sample_image.reshape(28, 28),cmap='gray')
# assign image to var
sess.run(tf.assign(input_images, sample_image));
sess.run(softmax) # now using the variable as input, no feed dict
# should show something like
# array([[ 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.]], dtype=float32)
# With the third entry being the highest by far.
Now we are going to "break" the classification. We want to change the image to make it look more like another number, in the eyes of the network, without changing the network itself. To do that, the code looks basically identical to what we had before. We define a "fake" label, the same loss as before (cross entropy) and we get an optimizer to minimize the fake loss, but this time with a var_list consisting of only the input image - so we won't change the logistic regression weights:
fake_label = tf.placeholder(tf.int32, shape=[1])
fake_loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits,labels=fake_label)
adversarial_step = tf.train.GradientDescentOptimizer(learning_rate=1e-3).minimize(fake_loss, var_list=[input_images])
The next block is intended to be run interactively multiple times, while you see the image and the scores changing (here moving towards a label of 8):
sess.run(adversarial_step, feed_dict={fake_label:np.array([8])})
plt.imshow(sess.run(input_images).reshape(28,28),cmap='gray')
sess.run(softmax)
The first time you run this block, the scores will probably still heavily point towards 2, but it will change over time and after a couple runs you should see something like the following image - note that the image still looks like a 2 with some noise in the background, but the score for "2" is at around 3% while the score for "8" is at over 96%.
Note that we never actually computed the gradient explicitly - we don't need to, the TF optimizer takes care of computing gradients and applying updates to the variables. If you want to get the gradient, you can do so by using tf.gradients(fake_loss, input_images).
The same pattern works for more complicated models, but what you'll want to do is to train your model as normal - using placeholders with bigger batches, or using a pipeline with TF readers, and when you want to do the adversarial image you'd recreate the network with the input image variable as an input. As long as all the variable names remain the same (which they should if you use the same functions to build the network) you can restore using your network checkpoint, and then apply the steps from this post to get to an adversarial image. You might need to play around with learning rates and such.

Resources