Keras layer for slicing image data into sliding windows - keras

I have a set of images, all of varying widths, but with fixed height set to 100 pixels and 3 channels of depth. My task is to classify if each vertical line in the image is interesting or not. To do that, I look at the line in context of its 10 predecessor and successor lines. Imagine the algorithm sweeping from left to right of the image, detecting vertical lines containing points of interest.
My first attempt at doing this was to manually cut out these sliding windows using numpy before feeding the data into the Keras model. Like this:
# Pad left and right
s = np.repeat(D[:1], 10, axis = 0)
e = np.repeat(D[-1:], 10, axis = 0)
# D now has shape (w + 20, 100, 3)
D = np.concatenate((s, D, e))
# Sliding windows creation trick from SO question
idx = np.arange(21)[None,:] + np.arange(len(D) - 20)[:,None]
windows = D[indexer]
Then all windows and all ground truth 0/1 values for all vertical lines in all images would be concatenated into two very long arrays.
I have verified that this works, in principle. I fed each window to a Keras layer looking like this:
Conv2D(20, (5, 5), input_shape = (21, 100, 3), padding = 'valid', ...)
But the windowing causes the memory usage to increase 21 times so doing it this way becomes impractical. But I think my scenario is a very common in machine learning so there must be some standard method in Keras to do this efficiently? E.g I would like to feed Keras my raw image data (w, 100, 80) and tell it what the sliding window sizes are and let it figure out the rest. I have looked at some sample code but I'm a ml noob so I don't get it.

Unfortunately this isn't an easy problem because it can involve using a variable sized input for your Keras model. While I think it is possible to do this with proper use of placeholders that's certainly no place for a beginner to start. your other option is a data generator. As with many computationally intensive tasks there is often a trade off between compute speed and memory requirements, using a generator is more compute heavy and it will be done entirely on your cpu (no gpu acceleration), but it won't make the memory increase.
The point of a data generator is that it will apply the operation to images one at a time to produce the batch, then train on that batch, then free up the memory - so you only end up keeping one batch worth of data in memory at any time. Unfortunately if you have a time consuming generation then this can seriously affect performance.
The generator will be a python generator (using the 'yield' keyword) and is expected to produce a single batch of data, keras is very good at using arbitrary batch sizes, so you can always make one image yield one batch, especially to start.
Here is the keras page on fit_generator - I warn you, this starts to become a lot of work very quickly, consider buying more memory:
https://keras.io/models/model/#fit_generator
Fine I'll do it for you :P
import numpy as np
import pandas as pd
import keras
from keras.models import Model, model_from_json
from keras.layers import Dense, Concatenate, Multiply,Add, Subtract, Input, Dropout, Lambda, Conv1D, Flatten
from tensorflow.python.client import device_lib
# check for my gpu
print(device_lib.list_local_devices())
# make some fake image data
# 1000 random widths
data_widths = np.floor(np.random.random(1000)*100)
# producing 1000 random images with dimensions w x 100 x 3
# and a vector of which vertical lines are interesting
# I assume your data looks like this
images = []
interesting = []
for w in data_widths:
images.append(np.random.random([int(w),100,3]))
interesting.append(np.random.random(int(w))>0.5)
# this is a generator
def image_generator(images, interesting):
num = 0
while num < len(images):
windows = None
truth = None
D = images[num]
# this should look familiar
# Pad left and right
s = np.repeat(D[:1], 10, axis = 0)
e = np.repeat(D[-1:], 10, axis = 0)
# D now has shape (w + 20, 100, 3)
D = np.concatenate((s, D, e))
# Sliding windows creation trick from SO question
idx = np.arange(21)[None,:] + np.arange(len(D) - 20)[:,None]
windows = D[idx]
truth = np.expand_dims(1*interesting[num],axis=1)
yield (windows, truth)
num+=1
# the generator MUST loop
if num == len(images):
num = 0
# basic model - replace with your own
input_layer = Input(shape = (21,100,3), name = "input_node")
fc = Flatten()(input_layer)
fc = Dense(100, activation='relu',name = "fc1")(fc)
fc = Dense(50, activation='relu',name = "fc2")(fc)
fc = Dense(10, activation='relu',name = "fc3")(fc)
output_layer = Dense(1, activation='sigmoid',name = "output")(fc)
model = Model(input_layer,output_layer)
model.compile(optimizer="adam", loss='binary_crossentropy')
model.summary()
#and training
training_history = model.fit_generator(image_generator(images, interesting),
epochs =5,
initial_epoch = 0,
steps_per_epoch=len(images),
verbose=1
)

Related

Receiving coordinates from inference Pytorch

I'm trying to get the coordinates of the pixels inside of a mask that is generated by Pytorches DefaultPredictor, to later on get the polygon corners and use this in my application.
However, DefaultPredictor produced a tensor of pred_masks, in the following format: [False, False ... False], ... [False, False, .. False]
Where the length of each individual list is length of the image, and the number of total lists is the height of the image.
Now, as I need to get the pixel coordinates that are inside of the mask, the simple solution seemed to be looping through the pred_masks, checking the value and if == "True" creating tuples of these and adding them to a list. However, as we are talking about images with width x height of about 3200 x 1600, this is a relatively slow process (~4 seconds to loop through a single 3200x1600, yet as there are quite some objects for which I need to get the inference in the end - this will end up being incredibly slow).
What would be the smarter way to get the the coordinates (mask) of the detected object using the pytorch (detectron2) model?
Please find my code below for reference:
from __future__ import print_function
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.data import MetadataCatalog
from detectron2.data.datasets import register_coco_instances
import cv2
import time
# get image
start = time.time()
im = cv2.imread("inputImage.jpg")
# Create config
cfg = get_cfg()
cfg.merge_from_file("detectron2_repo/configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5 # Set threshold for this model
cfg.MODEL.WEIGHTS = "model_final.pth" # Set path model .pth
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1
cfg.MODEL.DEVICE='cpu'
register_coco_instances("dataset_test",{},"testval.json","Images_path")
test_metadata = MetadataCatalog.get("dataset_test")
# Create predictor
predictor = DefaultPredictor(cfg)
# Make prediction
outputs = predictor(im)
#Loop through the pred_masks and check which ones are equal to TRUE, if equal, add the pixel values to the true_cords_list
outputnump = outputs["instances"].pred_masks.numpy()
true_cords_list = []
x_length = range(len(outputnump[0][0]))
#y kordinaat on range number
for y_cord in range(len(outputnump[0])):
#x cord
for x_cord in x_length:
if str(outputnump[0][y_cord][x_cord]) == "True":
inputcoords = (x_cord,y_cord)
true_cords_list.append(inputcoords)
print(str(true_cords_list))
end = time.time()
print(f"Runtime of the program is {end - start}") # 14.29468035697937
//
EDIT:
After changing the for loop partially to compress - I've managed to reduce the runtime of the for loop by ~3x - however, ideally I would like to receive this from the predictor itself if possible.
y_length = len(outputnump[0])
x_length = len(outputnump[0][0])
true_cords_list = []
for y_cord in range(y_length):
x_cords = list(compress(range(x_length), outputnump[0][y_cord]))
if x_cords:
for x_cord in x_cords:
inputcoords = (x_cord,y_cord)
true_cords_list.append(inputcoords)
The problem is easily solvable with sufficient knowledge about NumPy or PyTorch native array handling, which allows 100x speedups compared to Python loops. You can study the NumPy library, and PyTorch tensors are similar to NumPy in behaviour.
How to get indices of values in NumPy:
import numpy as np
arr = np.random.rand(3,4) > 0.5
ind = np.argwhere(arr)[:, ::-1]
print(arr)
print(ind)
In your particular case this will be
ind = np.argwhere(outputnump[0])[:, ::-1]
How to get indices of values in PyTorch:
import torch
arr = torch.rand(3, 4) > 0.5
ind = arr.nonzero()
ind = torch.flip(ind, [1])
print(arr)
print(ind)
[::-1] and .flip are used to inverse the order of coordinates from (y, x) to (x, y).
NumPy and PyTorch even allow checking simple conditions and getting the indices of values that meet these conditions, for further understanding see the according NumPy docs article
When asking, you should provide links for your problem context. This question is actually about Facebook object detector, where they provide a nice demo Colab notebook.

How to implement Pytorch 1D crosscorrelation for long signals in fourier domain?

I have a series of signals length n = 36,000 which I need to perform crosscorrelation on. Currently, my cpu implementation in numpy is a little slow. I've heard Pytorch can greatly speed up tensor operations, and provides a way to perform computations in parallel on the GPU. I'd like to explore this option, but I'm not quite sure how to accomplish this using the framework.
Because of the length of these signals, I'd prefer to perform the crosscorrelation operation in the frequency domain.
Normally using numpy I'd perform the operation like so:
import numpy as np
signal_length=36000
# make the signals
signal_1 = np.random.uniform(-1,1, signal_length)
signal_2 = np.random.uniform(-1,1, signal_length)
# output target length of crosscorrelation
x_cor_sig_length = signal_length*2 - 1
# get optimized array length for fft computation
fast_length = np.fftpack.next_fast_len(x_cor_sig_length)
# move data into the frequency domain. axis=-1 to perform
# along last dimension
fft_1 = np.fft.rfft(src_data, fast_length, axis=-1)
fft_2 = np.fft.rfft(src_data, fast_length, axis=-1)
# take the complex conjugate of one of the spectrums. Which one you choose depends on domain specific conventions
fft_1 = np.conj(fft_1)
fft_multiplied = fft_1 * fft_2
# back to time domain.
prelim_correlation = np.fft.irfft(result, x_corr_sig_length, axis=-1)
# shift the signal to make it look like a proper crosscorrelation,
# and transform the output to be purely real
final_result = np.real(np.fft.fftshift(prelim_correlation),axes=-1)).astype(np.float64)
Looking at the Pytorch documentation, there doesn't seem to be an equivalent for numpy.conj(). I'm also not sure if/how I need to implement a fftshift after the irfft operation.
So how would you go about writing a 1D crosscorrelation in Pytorch using the fourier method?
A few things to be considered.
Python interpreter is very slow, what those vectorization libraries do is to move the workload to a native implementation. In order to make any difference you need to be able to give perform many operations in one python instruction. Evaluating things on GPU follows the same principle, while GPU has more compute power it is slower to copy data to/from GPU.
I adapted your example to process multiple signals simulataneously.
import numpy as np
def numpy_xcorr(BATCH=1, signal_length=36000):
# make the signals
signal_1 = np.random.uniform(-1,1, (BATCH, signal_length))
signal_2 = np.random.uniform(-1,1, (BATCH, signal_length))
# output target length of crosscorrelation
x_cor_sig_length = signal_length*2 - 1
# get optimized array length for fft computation
fast_length = next_fast_len(x_cor_sig_length)
# move data into the frequency domain. axis=-1 to perform
# along last dimension
fft_1 = np.fft.rfft(signal_1, fast_length, axis=-1)
fft_2 = np.fft.rfft(signal_2 + 0.1 * signal_1, fast_length, axis=-1)
# take the complex conjugate of one of the spectrums.
fft_1 = np.conj(fft_1)
fft_multiplied = fft_1 * fft_2
# back to time domain.
prelim_correlation = np.fft.irfft(fft_multiplied, fast_length, axis=-1)
# shift the signal to make it look like a proper crosscorrelation,
# and transform the output to be purely real
final_result = np.fft.fftshift(np.real(prelim_correlation), axes=-1)
return final_result, np.sum(final_result)
Since torch 1.7 we have the torch.fft module that provides an interface similar to numpy.fft, the fftshift is missing but the same result can be obtained with torch.roll. Another point is that numpy uses by default 64-bit precision and torch will use 32-bit precision.
The fast length consists in choosing smooth numbers (those having that are factorized in to small prime numbers, and I suppose you are familiar with this subject).
def next_fast_len(n, factors=[2, 3, 5, 7]):
'''
Returns the minimum integer not smaller than n that can
be written as a product (possibly with repettitions) of
the given factors.
'''
best = float('inf')
stack = [1]
while len(stack):
a = stack.pop()
if a >= n:
if a < best:
best = a;
if best == n:
break; # no reason to keep searching
else:
for p in factors:
b = a * p;
if b < best:
stack.append(b)
return best;
Then the torch implementation goes
import torch;
import torch.fft
def torch_xcorr(BATCH=1, signal_length=36000, device='cpu', factors=[2,3,5], dtype=torch.float):
signal_length=36000
# torch.rand is random in the range (0, 1)
signal_1 = 1 - 2*torch.rand((BATCH, signal_length), device=device, dtype=dtype)
signal_2 = 1 - 2*torch.rand((BATCH, signal_length), device=device, dtype=dtype)
# just make the cross correlation more interesting
signal_2 += 0.1 * signal_1;
# output target length of crosscorrelation
x_cor_sig_length = signal_length*2 - 1
# get optimized array length for fft computation
fast_length = next_fast_len(x_cor_sig_length, [2, 3])
# the last signal_ndim axes (1,2 or 3) will be transformed
fft_1 = torch.fft.rfft(signal_1, fast_length, dim=-1)
fft_2 = torch.fft.rfft(signal_2, fast_length, dim=-1)
# take the complex conjugate of one of the spectrums. Which one you choose depends on domain specific conventions
fft_multiplied = torch.conj(fft_1) * fft_2
# back to time domain.
prelim_correlation = torch.fft.irfft(fft_multiplied, dim=-1)
# shift the signal to make it look like a proper crosscorrelation,
# and transform the output to be purely real
final_result = torch.roll(prelim_correlation, (fast_length//2,))
return final_result, torch.sum(final_result);
And here a code to test the results
import time
funcs = {'numpy-f64': lambda b: numpy_xcorr(b, factors=[2,3,5], dtype=np.float64),
'numpy-f32': lambda b: numpy_xcorr(b, factors=[2,3,5], dtype=np.float32),
'torch-cpu-f64': lambda b: torch_xcorr(b, device='cpu', factors=[2,3,5], dtype=torch.float64),
'torch-cpu': lambda b: torch_xcorr(b, device='cpu', factors=[2,3,5], dtype=torch.float32),
'torch-gpu-f64': lambda b: torch_xcorr(b, device='cuda', factors=[2,3,5], dtype=torch.float64),
'torch-gpu': lambda b: torch_xcorr(b, device='cuda', factors=[2,3,5], dtype=torch.float32),
}
times ={}
for batch in [1, 10, 100]:
times[batch] = {}
for l, f in funcs.items():
t0 = time.time()
t1, t2 = f(batch)
tf = time.time()
del t1
del t2
times[batch][l] = 1000 * (tf - t0) / batch;
I obtained the following results
And what surprised myself is the result when the numbers are not so smooth e.g. using 17-smooth length the torch implementation is so much better that I used logarithmic scale here (with batch size 100 the torch gpu was 10000 times faster than numpy with batch size 1).
Remember that these functions are generating the data at the GPU in general we want to copy the final results to the CPU, if we consider the time spent copying the final result to CPU I observed times up to 10x higher than the cross correlation computation (random data generation + three FFTs).

Creating a session in a graph that uses another graph and its session

Versions : I am using tensorflow (version : v1.1.0-13-g8ddd727 1.1.0) in python3 (Python 3.4.3 (default, Nov 17 2016, 01:08:31) [GCC 4.8.4] on linux), it is installed from source and GPU-based (name: GeForce GTX TITAN X major: 5 minor: 2 memoryClockRate (GHz) 1.076).
Context : Generative adversarial networks (GANs) learn to synthesise new samples from a high-dimensional distribution by passing samples drawn from a latent space through a generative network. When the high-dimensional distribution describes images of a particular data set, the network should learn to generate visually similar image samples for latent variables that are close to each other in the latent space. For tasks such as image retrieval and image classification, it may be useful to exploit the arrangement of the latent space by projecting images into it, and using this as a representation for discriminative tasks.
Context Problem : I am trying to invert a generator (compute L2 norm between an input image in cifar10 and a image g(z) of the generator, where z is a parameter to be trained with stochastic gradient descent in order to minimize this norm and find an approximation of the preimage of the input image).
Technical Issue : Therefore, I am building a new graph in a new session in tensorflow but I need to use a trained gan that was trained in another session, which I cannot import because the two graphs are not the same. That is to say, when I use sess.run(), the variables are not found and therefore there is a Error Message.
The code is
import tensorflow as tf
from data import cifar10, utilities
from . import dcgan
import logging
logger = logging.getLogger("gan.test")
BATCH_SIZE = 1
random_z = tf.get_variable(name='z_to_invert', shape=[BATCH_SIZE, 100], initializer=tf.random_normal_initializer())
#random_z = tf.random_normal([BATCH_SIZE, 100], mean=0.0, stddev=1.0, name='random_z')
# Generate images with generator
generator = dcgan.generator(random_z, is_training=True, name='generator')
# Add summaries to visualise output images
generator_visualisation = tf.cast(((generator / 2.0) + 0.5) * 255.0, tf.uint8)
summary_generator = tf.summary.\
image('summary/generator', generator_visualisation,
max_outputs=8)
#Create one image to test inverting
test_image = map((lambda inp: (inp[0]*2. -1., inp[1])),
utilities.infinite_generator(cifar10.get_train(), BATCH_SIZE))
inp, _ = next(test_image)
summary_inp = tf.summary.image('input_image', inp)
img_summary = tf.summary.merge([summary_generator, summary_inp])
with tf.name_scope('error'):
error = inp - generator #generator = g(z)
# We set axis = None because norm(tensor, ord=ord) is equivalent to norm(reshape(tensor, [-1]), ord=ord)
error_norm = tf.norm(error, ord=2, axis=None, keep_dims=False, name='L2Norm')
summary_error = tf.summary.scalar('error_norm', error_norm)
with tf.name_scope('Optimizing'):
optimizer = tf.train.AdamOptimizer(0.001).minimize(error_norm, var_list=z)
sv = tf.train.Supervisor(logdir="gan/invert_logs/", save_summaries_secs=None, save_model_secs=None)
batch = 0
with sv.managed_session() as sess:
logwriter = tf.summary.FileWriter("gan/invert_logs/", sess.graph)
while not sv.should_stop():
if batch > 0 and batch % 100 == 0:
logger.debug('Step {} '.format(batch))
(_, s) = sess.run((optimizer, summary_error))
logwriter.add_summary(s, batch)
print('step %d: Patiente un peu poto!' % batch)
img = sess.run(img_summary)
logwriter.add_summary(img, batch)
batch += 1
print(batch)
I understood what is the problem, it is actually that I am trying to run a session which is saved in gan/train_logs but the graph does not have those variables I am trying to run.
Therefore, I tried to implement this instead :
graph = tf.Graph()
tf.reset_default_graph()
with tf.Session(graph=graph) as sess:
ckpt = tf.train.get_checkpoint_state('gan/train_logs/')
saver = tf.train.import_meta_graph(ckpt.model_checkpoint_path + '.meta', clear_devices=True)
saver.restore(sess, ckpt.model_checkpoint_path)
logwriter = tf.summary.FileWriter("gan/invert_logs/", sess.graph)
#inp, _ = next(test_image)
BATCH_SIZE = 1
#Create one image to test inverting
test_image = map((lambda inp: (inp[0]*2. -1., inp[1])),
utilities.infinite_generator(cifar10.get_train(), BATCH_SIZE))
inp, _ = next(test_image)
#M_placeholder = tf.placeholder(tf.float32, shape=cifar10.get_shape_input(), name='M_input')
M_placeholder = inp
zmar = tf.summary.image('input_image', inp)
#Create sample noise from random normal distribution
z = tf.get_variable(name='z', shape=[BATCH_SIZE, 100], initializer=tf.random_normal_initializer())
# Function g(z) zhere z is randomly generated
g_z = dcgan.generator(z, is_training=True, name='generator')
generator_visualisation = tf.cast(((g_z / 2.0) + 0.5) * 255.0, tf.uint8)
sum_generator = tf.summary.image('summary/generator', generator_visualisation)
img_summary = tf.summary.merge([sum_generator, zmar])
with tf.name_scope('error'):
error = M_placeholder - g_z
# We set axis = None because norm(tensor, ord=ord) is equivalent to norm(reshape(tensor, [-1]), ord=ord)
error_norm = tf.norm(error, ord=2, axis=None, keep_dims=False, name='L2Norm')
summary_error = tf.summary.scalar('error_norm', error_norm)
with tf.name_scope('Optimizing'):
optimizer = tf.train.AdamOptimizer(0.001).minimize(error_norm, var_list=z)
sess.run(tf.global_variables_initializer())
for i in range(10000):
(_, s) = sess.run((optimizer, summary_error))
logwriter.add_summary(s, i)
print('step %d: Patiente un peu poto!' % i)
img = sess.run(img_summary)
logwriter.add_summary(img, i)
print('Done Training')
This script runs, but I have checked on tensorboard, the generator that is used here does not have the trained weights and it only produces noise.
I think I am trying to run a session in a graph that uses another graph and its trained session. I have read thoroughly the Graphs and Session documentation on tensorflow website https://www.tensorflow.org/versions/r1.3/programmers_guide/graphs, I have found an interesting tf.import_graph_def function :
You can rebind tensors in the imported graph to tf.Tensor objects in the default graph by passing the optional input_map argument. For example, input_map enables you to take import a graph fragment defined in a tf.GraphDef, and statically connect tensors in the graph you are building to tf.placeholder tensors in that fragment.
You can return tf.Tensor or tf.Operation objects from the imported graph by passing their names in the return_elements list.
But I don't know how to use this function, no example is given, and also I only found those two links that may help me :
https://github.com/tensorflow/tensorflow/issues/7508
Tensorflow: How to use a trained model in a application?
It would be really nice to have your help on this topic. This should be straightforward for someone who has already used the tf.import_graph_def function... What I really need is to get the trained generator to apply it to a new variable z which is to be trained in another session.
Thanks

Get gradient value necessary to break an image

I've been experimenting with adversarial images and I read up on the fast gradient sign method from the following link https://arxiv.org/pdf/1412.6572.pdf...
The instructions explain that the necessary gradient can be calculated using backpropagation...
I've been successful at generating adversarial images but I have failed at attempting to extract the gradient necessary to create an adversarial image. I will demonstrate what I mean.
Let us assume that I have already trained my algorithm using logistic regression. I restore the model and I extract the number I wish to change into a adversarial image. In this case it is the number 2...
# construct model
logits = tf.matmul(x, W) + b
pred = tf.nn.softmax(logits)
...
...
# assign the images of number 2 to the variable
sess.run(tf.assign(x, labels_of_2))
# setup softmax
sess.run(pred)
# placeholder for target label
fake_label = tf.placeholder(tf.int32, shape=[1])
# setup the fake loss
fake_loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits,labels=fake_label)
# minimize fake loss using gradient descent,
# calculating the derivatives of the weight of the fake image will give the direction of weights necessary to change the prediction
adversarial_step = tf.train.GradientDescentOptimizer(learning_rate=FLAGS.learning_rate).minimize(fake_loss, var_list=[x])
# continue calculating the derivative until the prediction changes for all 10 images
for i in range(FLAGS.training_epochs):
# fake label tells the training algorithm to use the weights calculated for number 6
sess.run(adversarial_step, feed_dict={fake_label:np.array([6])})
sess.run(pred)
This is my approach, and it works perfectly. It takes my image of number 2 and changes it only slightly so that when I run the following...
x_in = np.expand_dims(x[0], axis=0)
classification = sess.run(tf.argmax(pred, 1))
print(classification)
it will predict the number 2 as a number 6.
The issue is, I need to extract the gradient necessary to trick the neural network into thinking number 2 is 6. I need to use this gradient to create the nematode mentioned above.
I am not sure how can I extract the gradient value. I tried looking at tf.gradients but I was unable to figure out how to produce an adversarial image using this function. I implemented the following after the fake_loss variable above...
tf.gradients(fake_loss, x)
for i in range(FLAGS.training_epochs):
# calculate gradient with weight of number 6
gradient_value = sess.run(gradients, feed_dict={fake_label:np.array([6])})
# update the image of number 2
gradient_update = x+0.007*gradient_value[0]
sess.run(tf.assign(x, gradient_update))
sess.run(pred)
Unfortunately the prediction did not change in the way I wanted, and moreover this logic resulted in a rather blurry image.
I would appreciate an explanation as to what I need to do in order calculate and extract the gradient that will trick the neural network, so that if I were to take this gradient and apply it to my image as a nematode, it will result in a different prediction.
Why not let the Tensorflow optimizer add the gradients to your image? You can still evaluate the nematode to get the resulting gradients that were added.
I created a bit of sample code to demonstrate this with a panda image. It uses the VGG16 neural network to transform your own panda image into a "goldfish" image. Every 100 iterations it saves the image as PDF so you can print it losslessly to check if your image is still a goldfish.
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import IPython.display as ipyd
from libs import vgg16 # Download here! https://github.com/pkmital/CADL/tree/master/session-4/libs
pandaimage = plt.imread('panda.jpg')
pandaimage = vgg16.preprocess(pandaimage)
plt.imshow(pandaimage)
img_4d = np.array([pandaimage])
g = tf.get_default_graph()
input_placeholder = tf.Variable(img_4d,trainable=False)
to_add_image = tf.Variable(tf.random_normal([224,224,3], mean=0.0, stddev=0.1, dtype=tf.float32))
combined_images_not_clamped = input_placeholder+to_add_image
filledmax = tf.fill(tf.shape(combined_images_not_clamped), 1.0)
filledmin = tf.fill(tf.shape(combined_images_not_clamped), 0.0)
greater_than_one = tf.greater(combined_images_not_clamped, filledmax)
combined_images_with_max = tf.where(greater_than_one, filledmax, combined_images_not_clamped)
lower_than_zero =tf.less(combined_images_with_max, filledmin)
combined_images = tf.where(lower_than_zero, filledmin, combined_images_with_max)
net = vgg16.get_vgg_model()
tf.import_graph_def(net['graph_def'], name='vgg')
names = [op.name for op in g.get_operations()]
style_layer = 'prob:0'
the_prediction = tf.import_graph_def(
net['graph_def'],
name='vgg',
input_map={'images:0': combined_images},return_elements=[style_layer])
goldfish_expected_np = np.zeros(1000)
goldfish_expected_np[1]=1.0
goldfish_expected_tf = tf.Variable(goldfish_expected_np,dtype=tf.float32,trainable=False)
loss = tf.reduce_sum(tf.square(the_prediction[0]-goldfish_expected_tf))
optimizer = tf.train.AdamOptimizer().minimize(loss)
sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())
def show_many_images(*images):
fig = plt.figure()
for i in range(len(images)):
print(images[i].shape)
subplot_number = 100+10*len(images)+(i+1)
plt.subplot(subplot_number)
plt.imshow(images[i])
plt.show()
for i in range(1000):
_, loss_val = sess.run([optimizer,loss])
if i%100==1:
print("Loss at iteration %d: %f" % (i,loss_val))
_, loss_val,adversarial_image,pred,nematode = sess.run([optimizer,loss,combined_images,the_prediction,to_add_image])
res = np.squeeze(pred)
average = np.mean(res, 0)
res = res / np.sum(average)
plt.imshow(adversarial_image[0])
plt.show()
print([(res[idx], net['labels'][idx]) for idx in res.argsort()[-5:][::-1]])
show_many_images(img_4d[0],nematode,adversarial_image[0])
plt.imsave('adversarial_goldfish.pdf',adversarial_image[0],format='pdf') # save for printing
Let me know if this helps you!

LSTMLayer produces NaN values even before training it

I'm currently trying to construct a LSTM network with Lasagne to predict the next step of noisy sequences. I first trained a stack of 2 LSTM layers for a while, but had to use an abysmally small learning rate (1e-6) because of divergence issues (that ultimately produced NaN values). The results were kind of disappointing, as the network produced smooth, out-of-phase versions of the input.
I then came to the conclusion I should use better parameter initialization than what is given by default. The goal was to start from a network that just mimics identity, since for strongly auto-correlated signal it should be a good first estimation of the next step (x(t) ~ x(t+1)), and to sprinkle a bit of noise on top of it.
import theano, numpy, lasagne
from theano import tensor as T
from lasagne.layers.recurrent import LSTMLayer, InputLayer, Gate
from lasagne.layers import DropoutLayer
from lasagne.nonlinearities import sigmoid, tanh, leaky_rectify
from lasagne.layers import get_output
from lasagne.init import GlorotNormal, Normal, Constant
floatX = 'float32'
# function to create a lstm that ~ propagate the input from start to finish off the bat
# should be a good start for a predictive lstm with high one-step autocorrelation
def create_identity_lstm(input, shape, orig_inp=None, noiselvl=0.01, G=10., mask_input=None):
inp, out = shape
# orig_inp is used to limit the number of units that are actually used to pass the input information from one layer to the other - the rest of the units should produce ~ 0 activation.
if orig_inp is None:
orig_inp = inp
# input gate
inputgate = Gate(
W_in=GlorotNormal(noiselvl),
W_hid=GlorotNormal(noiselvl),
W_cell=Normal(noiselvl),
b=Constant(0.),
nonlinearity=sigmoid
)
# forget gate
forgetgate = Gate(
W_in=GlorotNormal(noiselvl),
W_hid=GlorotNormal(noiselvl),
W_cell=Normal(noiselvl),
b=Constant(0.),
nonlinearity=sigmoid
)
# cell gate
cell = Gate(
W_in=GlorotNormal(noiselvl),
W_hid=GlorotNormal(noiselvl),
W_cell=None,
b=Constant(0.),
nonlinearity=leaky_rectify
)
# output gate
outputgate = Gate(
W_in=GlorotNormal(noiselvl),
W_hid=GlorotNormal(noiselvl),
W_cell=Normal(noiselvl),
b=Constant(0.),
nonlinearity=sigmoid
)
lstm = LSTMLayer(input, out, ingate=inputgate, forgetgate=forgetgate, cell=cell, outgate=outputgate, nonlinearity=leaky_rectify, mask_input=mask_input)
# change matrices and biases
# ingate - should return ~1 (matrices = 0, big bias)
b_i = lstm.b_ingate.get_value()
b_i[:orig_inp] += G
lstm.b_ingate.set_value(b_i)
# forgetgate - should return 0 (matrices = 0, big negative bias)
b_f = lstm.b_forgetgate.get_value()
b_f[:orig_inp] -= G
b_f[orig_inp:] += G # to help learning future features, I preserve a large bias on "unused" units to help it remember stuff
lstm.b_forgetgate.set_value(b_f)
# cell - should return x(t) (W_xc = identity, rest is 0)
W_xc = lstm.W_in_to_cell.get_value()
for i in xrange(orig_inp):
W_xc[i, i] += 1.
lstm.W_in_to_cell.set_value(W_xc)
# outgate - should return 1 (same as ingate)
b_o = lstm.b_outgate.get_value()
b_o[:orig_inp] += G
lstm.b_outgate.set_value(b_o)
# done
return lstm
I then use this lstm generation code to generate the following network:
# layers
#input + dropout
input = InputLayer((None, None, 7), name='input')
mask = InputLayer((None, None), name='mask')
drop1 = DropoutLayer(input, p=0.33)
#lstm1 + dropout
lstm1 = create_identity_lstm(drop1, (7, 1024), mask_input=mask)
drop2 = DropoutLayer(lstm1, p=0.33)
#lstm2 + dropout
lstm2 = create_identity_lstm(drop2, (1024, 128), orig_inp=7, mask_input=mask)
drop3 = DropoutLayer(lstm2, p=0.33)
#lstm3
lstm3 = create_identity_lstm(drop3, (128, 7), orig_inp=7, mask_input=mask)
# symbolic variables and prediction
x = input.input_var
ma = mask.input_var
ma_reshape = ma.dimshuffle((0,1,'x'))
yhat = get_output(lstm3, deterministic=False)
yhat_det = get_output(lstm3, deterministic=True)
y = T.ftensor3('y')
predict = theano.function([x, ma], yhat_det)
Problem is, even without any training, this network produces garbage values and sometimes even a bunch of NaNs, right from the very first LSTM layer:
X = numpy.random.random((5, 10000, 7)).astype('float32')
Masks = numpy.ones(X.shape[:2], dtype='float32')
hid1 = get_output(lstm1, determistic=True)
get_hid1 = theano.function([x, ma], hid1)
h1 = get_hid1(X, Masks)
print numpy.isnan(h1).sum(axis=1).sum(axis=1)
array([6379520, 6367232, 6377472, 6376448, 6378496])
# even the first output value is garbage!
print h1[:,0,0] - X[:,0,0]
array([-0.03898358, -0.10118812, 0.34877831, -0.02509735, 0.36689138], dtype=float32)
I don't get why, I checked each matrices and their values are fine, like I wanted them to be. I even tried to recreate each gate activations and the resulting hidden activations using the actual numpy arrays and they reproduce the input just fine. What did I do wrong there??

Resources