Creating a session in a graph that uses another graph and its session - python-3.x

Versions : I am using tensorflow (version : v1.1.0-13-g8ddd727 1.1.0) in python3 (Python 3.4.3 (default, Nov 17 2016, 01:08:31) [GCC 4.8.4] on linux), it is installed from source and GPU-based (name: GeForce GTX TITAN X major: 5 minor: 2 memoryClockRate (GHz) 1.076).
Context : Generative adversarial networks (GANs) learn to synthesise new samples from a high-dimensional distribution by passing samples drawn from a latent space through a generative network. When the high-dimensional distribution describes images of a particular data set, the network should learn to generate visually similar image samples for latent variables that are close to each other in the latent space. For tasks such as image retrieval and image classification, it may be useful to exploit the arrangement of the latent space by projecting images into it, and using this as a representation for discriminative tasks.
Context Problem : I am trying to invert a generator (compute L2 norm between an input image in cifar10 and a image g(z) of the generator, where z is a parameter to be trained with stochastic gradient descent in order to minimize this norm and find an approximation of the preimage of the input image).
Technical Issue : Therefore, I am building a new graph in a new session in tensorflow but I need to use a trained gan that was trained in another session, which I cannot import because the two graphs are not the same. That is to say, when I use sess.run(), the variables are not found and therefore there is a Error Message.
The code is
import tensorflow as tf
from data import cifar10, utilities
from . import dcgan
import logging
logger = logging.getLogger("gan.test")
BATCH_SIZE = 1
random_z = tf.get_variable(name='z_to_invert', shape=[BATCH_SIZE, 100], initializer=tf.random_normal_initializer())
#random_z = tf.random_normal([BATCH_SIZE, 100], mean=0.0, stddev=1.0, name='random_z')
# Generate images with generator
generator = dcgan.generator(random_z, is_training=True, name='generator')
# Add summaries to visualise output images
generator_visualisation = tf.cast(((generator / 2.0) + 0.5) * 255.0, tf.uint8)
summary_generator = tf.summary.\
image('summary/generator', generator_visualisation,
max_outputs=8)
#Create one image to test inverting
test_image = map((lambda inp: (inp[0]*2. -1., inp[1])),
utilities.infinite_generator(cifar10.get_train(), BATCH_SIZE))
inp, _ = next(test_image)
summary_inp = tf.summary.image('input_image', inp)
img_summary = tf.summary.merge([summary_generator, summary_inp])
with tf.name_scope('error'):
error = inp - generator #generator = g(z)
# We set axis = None because norm(tensor, ord=ord) is equivalent to norm(reshape(tensor, [-1]), ord=ord)
error_norm = tf.norm(error, ord=2, axis=None, keep_dims=False, name='L2Norm')
summary_error = tf.summary.scalar('error_norm', error_norm)
with tf.name_scope('Optimizing'):
optimizer = tf.train.AdamOptimizer(0.001).minimize(error_norm, var_list=z)
sv = tf.train.Supervisor(logdir="gan/invert_logs/", save_summaries_secs=None, save_model_secs=None)
batch = 0
with sv.managed_session() as sess:
logwriter = tf.summary.FileWriter("gan/invert_logs/", sess.graph)
while not sv.should_stop():
if batch > 0 and batch % 100 == 0:
logger.debug('Step {} '.format(batch))
(_, s) = sess.run((optimizer, summary_error))
logwriter.add_summary(s, batch)
print('step %d: Patiente un peu poto!' % batch)
img = sess.run(img_summary)
logwriter.add_summary(img, batch)
batch += 1
print(batch)
I understood what is the problem, it is actually that I am trying to run a session which is saved in gan/train_logs but the graph does not have those variables I am trying to run.
Therefore, I tried to implement this instead :
graph = tf.Graph()
tf.reset_default_graph()
with tf.Session(graph=graph) as sess:
ckpt = tf.train.get_checkpoint_state('gan/train_logs/')
saver = tf.train.import_meta_graph(ckpt.model_checkpoint_path + '.meta', clear_devices=True)
saver.restore(sess, ckpt.model_checkpoint_path)
logwriter = tf.summary.FileWriter("gan/invert_logs/", sess.graph)
#inp, _ = next(test_image)
BATCH_SIZE = 1
#Create one image to test inverting
test_image = map((lambda inp: (inp[0]*2. -1., inp[1])),
utilities.infinite_generator(cifar10.get_train(), BATCH_SIZE))
inp, _ = next(test_image)
#M_placeholder = tf.placeholder(tf.float32, shape=cifar10.get_shape_input(), name='M_input')
M_placeholder = inp
zmar = tf.summary.image('input_image', inp)
#Create sample noise from random normal distribution
z = tf.get_variable(name='z', shape=[BATCH_SIZE, 100], initializer=tf.random_normal_initializer())
# Function g(z) zhere z is randomly generated
g_z = dcgan.generator(z, is_training=True, name='generator')
generator_visualisation = tf.cast(((g_z / 2.0) + 0.5) * 255.0, tf.uint8)
sum_generator = tf.summary.image('summary/generator', generator_visualisation)
img_summary = tf.summary.merge([sum_generator, zmar])
with tf.name_scope('error'):
error = M_placeholder - g_z
# We set axis = None because norm(tensor, ord=ord) is equivalent to norm(reshape(tensor, [-1]), ord=ord)
error_norm = tf.norm(error, ord=2, axis=None, keep_dims=False, name='L2Norm')
summary_error = tf.summary.scalar('error_norm', error_norm)
with tf.name_scope('Optimizing'):
optimizer = tf.train.AdamOptimizer(0.001).minimize(error_norm, var_list=z)
sess.run(tf.global_variables_initializer())
for i in range(10000):
(_, s) = sess.run((optimizer, summary_error))
logwriter.add_summary(s, i)
print('step %d: Patiente un peu poto!' % i)
img = sess.run(img_summary)
logwriter.add_summary(img, i)
print('Done Training')
This script runs, but I have checked on tensorboard, the generator that is used here does not have the trained weights and it only produces noise.
I think I am trying to run a session in a graph that uses another graph and its trained session. I have read thoroughly the Graphs and Session documentation on tensorflow website https://www.tensorflow.org/versions/r1.3/programmers_guide/graphs, I have found an interesting tf.import_graph_def function :
You can rebind tensors in the imported graph to tf.Tensor objects in the default graph by passing the optional input_map argument. For example, input_map enables you to take import a graph fragment defined in a tf.GraphDef, and statically connect tensors in the graph you are building to tf.placeholder tensors in that fragment.
You can return tf.Tensor or tf.Operation objects from the imported graph by passing their names in the return_elements list.
But I don't know how to use this function, no example is given, and also I only found those two links that may help me :
https://github.com/tensorflow/tensorflow/issues/7508
Tensorflow: How to use a trained model in a application?
It would be really nice to have your help on this topic. This should be straightforward for someone who has already used the tf.import_graph_def function... What I really need is to get the trained generator to apply it to a new variable z which is to be trained in another session.
Thanks

Related

Detectron2: No instances in prediction

I'm trying to train Detectron2 on a custom dataset that I annotated with coco-annotator. After training I wanted to predict Instances of my Image, but I dont get any shown.
Training:
from detectron2.engine import DefaultTrainer
cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"))
cfg.DATASETS.TRAIN = ("TrashTron_train",)
cfg.DATASETS.TEST = ("TrashTron_val",)
# cfg.DATASETS.TEST = ()
cfg.DATALOADER.NUM_WORKERS = 2
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml") # Let training initialize from model zoo
cfg.SOLVER.IMS_PER_BATCH = 2
cfg.SOLVER.BASE_LR = 0.00025 # pick a good LR
cfg.SOLVER.MAX_ITER = 300 # 300 iterations seems good enough for this toy dataset; you will need to train longer for a practical dataset
cfg.SOLVER.STEPS = [] # do not decay learning rate
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 512 # faster, and good enough for this toy dataset (default: 512)
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 24 # only has one class (ballon). (see https://detectron2.readthedocs.io/tutorials/datasets.html#update-the-config-for-new-datasets)
# NOTE: this config means the number of classes, but a few popular unofficial tutorials incorrect uses num_classes+1 here.
os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
trainer = DefaultTrainer(cfg)
trainer.resume_or_load(resume=False)
trainer.train()
Prediction:
test_data = [{'1191.jpg': '/content/datasets/val/1191.jpg',
'image_id': 1308}]
cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"))
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.7 # set a custom testing threshold
cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, "model_final.pth") # path to the model we just trained
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 24
predictor = DefaultPredictor(cfg)
outputs = predictor(im)
# print(outputs["instances"].pred_densepose)
im = cv2.imread(test_data[0]["1191.jpg"])
v = Visualizer(im[:, :, ::-1],
metadata=MetadataCatalog.get(cfg.DATASETS.TRAIN[0]),
scale=0.5,
instance_mode=ColorMode.IMAGE_BW)
out = v.draw_instance_predictions(outputs["instances"].to("cpu"))
img = cv2.cvtColor(out.get_image()[:, :, ::-1], cv2.COLOR_RGBA2RGB)
plt.imshow(img)
The corresponding image is shown, but no instances.
Any suggestions? The overall evaluation scores aren't that great, but I picked the best class and there I also dont get any predictions...
I would try to lower the threshold, since you have said that overall training scores were not great.
In this answer in official repo, following code is suggested to change the threshold:
cfg.MODEL.TENSOR_MASK.SCORE_THRESH_TEST = 0.5
at another answer at the same thread, other thresholds are modified as well.
cfg.MODEL.RETINANET.SCORE_THRESH_TEST = args.confidence_threshold
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = args.confidence_threshold
cfg.MODEL.PANOPTIC_FPN.COMBINE.INSTANCES_CONFIDENCE_THRESH = args.confidence_threshold

Model selection & Selecting the number of active components in Bayesian Gaussian Mixture Models

I have generated 2 groups of 1-D data points which are visually clearly separable and I want to use a Bayesian Gaussian Mixture Model (BGMM) to ideally recover 2 clusters.
Since BGMMs maximize a lower bound on the model evidence (ELBO) and given that the ELBO is supposed to combine notions of accuracy and complexity, I would expect more complex models to be penalized.
However, when running Grid Search over the number of clusters, I often get a solution with more than 2 clusters. More specifically, I often get the maximal number of clusters on my grid search. In the example below, I would expect the best model to define 2 clusters. Instead, the best models defines 4 but assigns minimal weights to 2 out of 4 clusters.
I am really surprised, since 2 out of 4 clusters are therefore adding little information and this more complex model still gets selected as the best model.
Why is the BGMM then picking 4 clusters for the best model?
If this is indeed the behavior a BGMM should show, how can I then assess how many active components I actually have in my model? Visually? By defining an arbitrary threshold on the weights?
I have added the code to reproduce my example below.
# Import statements
import itertools
import multiprocessing
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats
from joblib import Parallel, delayed
from sklearn.mixture import BayesianGaussianMixture
from sklearn.utils import shuffle
def fitmodel(x, params):
'''
Instantiates and fits Bayesian GMM
Used in the parallel for loop
'''
# Gaussian mixture model
clf = BayesianGaussianMixture(**params)
# Fit
clf = clf.fit(x, y=None)
return clf
def plot_results(X, means, covariances, title):
plt.plot(X, np.random.uniform(low=0, high=1, size=len(X)),'o', alpha=0.1, color='cornflowerblue', label='data points')
for i, (mean, covar) in enumerate(zip(
means, covariances)):
# Get normal PDF
n_sd = 2.5
x = np.linspace(mean - n_sd*covar, mean + n_sd*covar, 300)
x = x.ravel()
y = stats.norm.pdf(x, mean, covar).ravel()
if i == 0:
label = 'Component PDF'
else:
label = None
plt.plot(x, y, color='darkorange', label=label)
plt.yticks(())
plt.title(title)
# Generate data
g1 = np.random.uniform(low=-1.5, high=-1, size=(1,100))
g2 = np.random.uniform(low=1.5, high=1, size=(1,100))
X = np.append(g1, g2)
# Shuffle data
X = shuffle(X)
X = X.reshape(-1, 1)
# Define parameters for grid search
parameters = {
'n_components': [1, 2, 3, 4],
'weight_concentration_prior_type':['dirichlet_distribution']
}
# Create permutations of parameter settings
keys, values = zip(*parameters.items())
param_grid = [dict(zip(keys, v)) for v in itertools.product(*values)]
# Run GridSearch using parallel for loop
list_clf = [None] * len(param_grid)
num_cores = multiprocessing.cpu_count()
list_clf = Parallel(n_jobs=num_cores)(delayed(fitmodel)(X, params) for params in param_grid)
# Print best model (based on lower bound on model evidence)
lower_bounds = [x.lower_bound_ for x in list_clf] # Extract lower bounds on model evidence
idx = int(np.where(lower_bounds == np.max(lower_bounds))[0]) # Find best model
best_estimator = list_clf[idx]
print(f'Parameter setting of best model: {param_grid[idx]}')
print(f'Components weights: {best_estimator.weights_}')
# Plot data points and gaussian components
plt.figure(figsize=(8,6))
ax = plt.subplot(2, 1, 1)
if best_estimator.weight_concentration_prior_type == 'dirichlet_process':
prior_label = 'Dirichlet process'
elif best_estimator.weight_concentration_prior_type == 'dirichlet_distribution':
prior_label = 'Dirichlet distribution'
plot_results(X, best_estimator.means_, best_estimator.covariances_,
f'Best Bayesian GMM | {prior_label} prior')
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['left'].set_visible(False)
plt.legend(fontsize='small')
# Plot histogram of weights
ax = plt.subplot(2, 1, 2)
for k, w in enumerate(best_estimator.weights_):
plt.bar(k, w,
width=0.9,
color='#56B4E9',
zorder=3,
align='center',
edgecolor='black'
)
plt.text(k, w + 0.01, "%.1f%%" % (w * 100.),
horizontalalignment='center')
ax.get_xaxis().set_tick_params(direction='out')
ax.yaxis.grid(True, alpha=0.7)
plt.xticks(range(len(best_estimator.weights_)))
plt.subplots_adjust(left=None, bottom=None, right=None, top=None, wspace=None, hspace=0.4)
plt.ylabel('Component weight')
plt.ylim(0, np.max(best_estimator.weights_)+0.25*np.max(best_estimator.weights_))
plt.yticks(())
plt.savefig('bgmm_clustering.png')
plt.show()
plt.close()

Keras layer for slicing image data into sliding windows

I have a set of images, all of varying widths, but with fixed height set to 100 pixels and 3 channels of depth. My task is to classify if each vertical line in the image is interesting or not. To do that, I look at the line in context of its 10 predecessor and successor lines. Imagine the algorithm sweeping from left to right of the image, detecting vertical lines containing points of interest.
My first attempt at doing this was to manually cut out these sliding windows using numpy before feeding the data into the Keras model. Like this:
# Pad left and right
s = np.repeat(D[:1], 10, axis = 0)
e = np.repeat(D[-1:], 10, axis = 0)
# D now has shape (w + 20, 100, 3)
D = np.concatenate((s, D, e))
# Sliding windows creation trick from SO question
idx = np.arange(21)[None,:] + np.arange(len(D) - 20)[:,None]
windows = D[indexer]
Then all windows and all ground truth 0/1 values for all vertical lines in all images would be concatenated into two very long arrays.
I have verified that this works, in principle. I fed each window to a Keras layer looking like this:
Conv2D(20, (5, 5), input_shape = (21, 100, 3), padding = 'valid', ...)
But the windowing causes the memory usage to increase 21 times so doing it this way becomes impractical. But I think my scenario is a very common in machine learning so there must be some standard method in Keras to do this efficiently? E.g I would like to feed Keras my raw image data (w, 100, 80) and tell it what the sliding window sizes are and let it figure out the rest. I have looked at some sample code but I'm a ml noob so I don't get it.
Unfortunately this isn't an easy problem because it can involve using a variable sized input for your Keras model. While I think it is possible to do this with proper use of placeholders that's certainly no place for a beginner to start. your other option is a data generator. As with many computationally intensive tasks there is often a trade off between compute speed and memory requirements, using a generator is more compute heavy and it will be done entirely on your cpu (no gpu acceleration), but it won't make the memory increase.
The point of a data generator is that it will apply the operation to images one at a time to produce the batch, then train on that batch, then free up the memory - so you only end up keeping one batch worth of data in memory at any time. Unfortunately if you have a time consuming generation then this can seriously affect performance.
The generator will be a python generator (using the 'yield' keyword) and is expected to produce a single batch of data, keras is very good at using arbitrary batch sizes, so you can always make one image yield one batch, especially to start.
Here is the keras page on fit_generator - I warn you, this starts to become a lot of work very quickly, consider buying more memory:
https://keras.io/models/model/#fit_generator
Fine I'll do it for you :P
import numpy as np
import pandas as pd
import keras
from keras.models import Model, model_from_json
from keras.layers import Dense, Concatenate, Multiply,Add, Subtract, Input, Dropout, Lambda, Conv1D, Flatten
from tensorflow.python.client import device_lib
# check for my gpu
print(device_lib.list_local_devices())
# make some fake image data
# 1000 random widths
data_widths = np.floor(np.random.random(1000)*100)
# producing 1000 random images with dimensions w x 100 x 3
# and a vector of which vertical lines are interesting
# I assume your data looks like this
images = []
interesting = []
for w in data_widths:
images.append(np.random.random([int(w),100,3]))
interesting.append(np.random.random(int(w))>0.5)
# this is a generator
def image_generator(images, interesting):
num = 0
while num < len(images):
windows = None
truth = None
D = images[num]
# this should look familiar
# Pad left and right
s = np.repeat(D[:1], 10, axis = 0)
e = np.repeat(D[-1:], 10, axis = 0)
# D now has shape (w + 20, 100, 3)
D = np.concatenate((s, D, e))
# Sliding windows creation trick from SO question
idx = np.arange(21)[None,:] + np.arange(len(D) - 20)[:,None]
windows = D[idx]
truth = np.expand_dims(1*interesting[num],axis=1)
yield (windows, truth)
num+=1
# the generator MUST loop
if num == len(images):
num = 0
# basic model - replace with your own
input_layer = Input(shape = (21,100,3), name = "input_node")
fc = Flatten()(input_layer)
fc = Dense(100, activation='relu',name = "fc1")(fc)
fc = Dense(50, activation='relu',name = "fc2")(fc)
fc = Dense(10, activation='relu',name = "fc3")(fc)
output_layer = Dense(1, activation='sigmoid',name = "output")(fc)
model = Model(input_layer,output_layer)
model.compile(optimizer="adam", loss='binary_crossentropy')
model.summary()
#and training
training_history = model.fit_generator(image_generator(images, interesting),
epochs =5,
initial_epoch = 0,
steps_per_epoch=len(images),
verbose=1
)

Get gradient value necessary to break an image

I've been experimenting with adversarial images and I read up on the fast gradient sign method from the following link https://arxiv.org/pdf/1412.6572.pdf...
The instructions explain that the necessary gradient can be calculated using backpropagation...
I've been successful at generating adversarial images but I have failed at attempting to extract the gradient necessary to create an adversarial image. I will demonstrate what I mean.
Let us assume that I have already trained my algorithm using logistic regression. I restore the model and I extract the number I wish to change into a adversarial image. In this case it is the number 2...
# construct model
logits = tf.matmul(x, W) + b
pred = tf.nn.softmax(logits)
...
...
# assign the images of number 2 to the variable
sess.run(tf.assign(x, labels_of_2))
# setup softmax
sess.run(pred)
# placeholder for target label
fake_label = tf.placeholder(tf.int32, shape=[1])
# setup the fake loss
fake_loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits,labels=fake_label)
# minimize fake loss using gradient descent,
# calculating the derivatives of the weight of the fake image will give the direction of weights necessary to change the prediction
adversarial_step = tf.train.GradientDescentOptimizer(learning_rate=FLAGS.learning_rate).minimize(fake_loss, var_list=[x])
# continue calculating the derivative until the prediction changes for all 10 images
for i in range(FLAGS.training_epochs):
# fake label tells the training algorithm to use the weights calculated for number 6
sess.run(adversarial_step, feed_dict={fake_label:np.array([6])})
sess.run(pred)
This is my approach, and it works perfectly. It takes my image of number 2 and changes it only slightly so that when I run the following...
x_in = np.expand_dims(x[0], axis=0)
classification = sess.run(tf.argmax(pred, 1))
print(classification)
it will predict the number 2 as a number 6.
The issue is, I need to extract the gradient necessary to trick the neural network into thinking number 2 is 6. I need to use this gradient to create the nematode mentioned above.
I am not sure how can I extract the gradient value. I tried looking at tf.gradients but I was unable to figure out how to produce an adversarial image using this function. I implemented the following after the fake_loss variable above...
tf.gradients(fake_loss, x)
for i in range(FLAGS.training_epochs):
# calculate gradient with weight of number 6
gradient_value = sess.run(gradients, feed_dict={fake_label:np.array([6])})
# update the image of number 2
gradient_update = x+0.007*gradient_value[0]
sess.run(tf.assign(x, gradient_update))
sess.run(pred)
Unfortunately the prediction did not change in the way I wanted, and moreover this logic resulted in a rather blurry image.
I would appreciate an explanation as to what I need to do in order calculate and extract the gradient that will trick the neural network, so that if I were to take this gradient and apply it to my image as a nematode, it will result in a different prediction.
Why not let the Tensorflow optimizer add the gradients to your image? You can still evaluate the nematode to get the resulting gradients that were added.
I created a bit of sample code to demonstrate this with a panda image. It uses the VGG16 neural network to transform your own panda image into a "goldfish" image. Every 100 iterations it saves the image as PDF so you can print it losslessly to check if your image is still a goldfish.
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import IPython.display as ipyd
from libs import vgg16 # Download here! https://github.com/pkmital/CADL/tree/master/session-4/libs
pandaimage = plt.imread('panda.jpg')
pandaimage = vgg16.preprocess(pandaimage)
plt.imshow(pandaimage)
img_4d = np.array([pandaimage])
g = tf.get_default_graph()
input_placeholder = tf.Variable(img_4d,trainable=False)
to_add_image = tf.Variable(tf.random_normal([224,224,3], mean=0.0, stddev=0.1, dtype=tf.float32))
combined_images_not_clamped = input_placeholder+to_add_image
filledmax = tf.fill(tf.shape(combined_images_not_clamped), 1.0)
filledmin = tf.fill(tf.shape(combined_images_not_clamped), 0.0)
greater_than_one = tf.greater(combined_images_not_clamped, filledmax)
combined_images_with_max = tf.where(greater_than_one, filledmax, combined_images_not_clamped)
lower_than_zero =tf.less(combined_images_with_max, filledmin)
combined_images = tf.where(lower_than_zero, filledmin, combined_images_with_max)
net = vgg16.get_vgg_model()
tf.import_graph_def(net['graph_def'], name='vgg')
names = [op.name for op in g.get_operations()]
style_layer = 'prob:0'
the_prediction = tf.import_graph_def(
net['graph_def'],
name='vgg',
input_map={'images:0': combined_images},return_elements=[style_layer])
goldfish_expected_np = np.zeros(1000)
goldfish_expected_np[1]=1.0
goldfish_expected_tf = tf.Variable(goldfish_expected_np,dtype=tf.float32,trainable=False)
loss = tf.reduce_sum(tf.square(the_prediction[0]-goldfish_expected_tf))
optimizer = tf.train.AdamOptimizer().minimize(loss)
sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())
def show_many_images(*images):
fig = plt.figure()
for i in range(len(images)):
print(images[i].shape)
subplot_number = 100+10*len(images)+(i+1)
plt.subplot(subplot_number)
plt.imshow(images[i])
plt.show()
for i in range(1000):
_, loss_val = sess.run([optimizer,loss])
if i%100==1:
print("Loss at iteration %d: %f" % (i,loss_val))
_, loss_val,adversarial_image,pred,nematode = sess.run([optimizer,loss,combined_images,the_prediction,to_add_image])
res = np.squeeze(pred)
average = np.mean(res, 0)
res = res / np.sum(average)
plt.imshow(adversarial_image[0])
plt.show()
print([(res[idx], net['labels'][idx]) for idx in res.argsort()[-5:][::-1]])
show_many_images(img_4d[0],nematode,adversarial_image[0])
plt.imsave('adversarial_goldfish.pdf',adversarial_image[0],format='pdf') # save for printing
Let me know if this helps you!

LSTMLayer produces NaN values even before training it

I'm currently trying to construct a LSTM network with Lasagne to predict the next step of noisy sequences. I first trained a stack of 2 LSTM layers for a while, but had to use an abysmally small learning rate (1e-6) because of divergence issues (that ultimately produced NaN values). The results were kind of disappointing, as the network produced smooth, out-of-phase versions of the input.
I then came to the conclusion I should use better parameter initialization than what is given by default. The goal was to start from a network that just mimics identity, since for strongly auto-correlated signal it should be a good first estimation of the next step (x(t) ~ x(t+1)), and to sprinkle a bit of noise on top of it.
import theano, numpy, lasagne
from theano import tensor as T
from lasagne.layers.recurrent import LSTMLayer, InputLayer, Gate
from lasagne.layers import DropoutLayer
from lasagne.nonlinearities import sigmoid, tanh, leaky_rectify
from lasagne.layers import get_output
from lasagne.init import GlorotNormal, Normal, Constant
floatX = 'float32'
# function to create a lstm that ~ propagate the input from start to finish off the bat
# should be a good start for a predictive lstm with high one-step autocorrelation
def create_identity_lstm(input, shape, orig_inp=None, noiselvl=0.01, G=10., mask_input=None):
inp, out = shape
# orig_inp is used to limit the number of units that are actually used to pass the input information from one layer to the other - the rest of the units should produce ~ 0 activation.
if orig_inp is None:
orig_inp = inp
# input gate
inputgate = Gate(
W_in=GlorotNormal(noiselvl),
W_hid=GlorotNormal(noiselvl),
W_cell=Normal(noiselvl),
b=Constant(0.),
nonlinearity=sigmoid
)
# forget gate
forgetgate = Gate(
W_in=GlorotNormal(noiselvl),
W_hid=GlorotNormal(noiselvl),
W_cell=Normal(noiselvl),
b=Constant(0.),
nonlinearity=sigmoid
)
# cell gate
cell = Gate(
W_in=GlorotNormal(noiselvl),
W_hid=GlorotNormal(noiselvl),
W_cell=None,
b=Constant(0.),
nonlinearity=leaky_rectify
)
# output gate
outputgate = Gate(
W_in=GlorotNormal(noiselvl),
W_hid=GlorotNormal(noiselvl),
W_cell=Normal(noiselvl),
b=Constant(0.),
nonlinearity=sigmoid
)
lstm = LSTMLayer(input, out, ingate=inputgate, forgetgate=forgetgate, cell=cell, outgate=outputgate, nonlinearity=leaky_rectify, mask_input=mask_input)
# change matrices and biases
# ingate - should return ~1 (matrices = 0, big bias)
b_i = lstm.b_ingate.get_value()
b_i[:orig_inp] += G
lstm.b_ingate.set_value(b_i)
# forgetgate - should return 0 (matrices = 0, big negative bias)
b_f = lstm.b_forgetgate.get_value()
b_f[:orig_inp] -= G
b_f[orig_inp:] += G # to help learning future features, I preserve a large bias on "unused" units to help it remember stuff
lstm.b_forgetgate.set_value(b_f)
# cell - should return x(t) (W_xc = identity, rest is 0)
W_xc = lstm.W_in_to_cell.get_value()
for i in xrange(orig_inp):
W_xc[i, i] += 1.
lstm.W_in_to_cell.set_value(W_xc)
# outgate - should return 1 (same as ingate)
b_o = lstm.b_outgate.get_value()
b_o[:orig_inp] += G
lstm.b_outgate.set_value(b_o)
# done
return lstm
I then use this lstm generation code to generate the following network:
# layers
#input + dropout
input = InputLayer((None, None, 7), name='input')
mask = InputLayer((None, None), name='mask')
drop1 = DropoutLayer(input, p=0.33)
#lstm1 + dropout
lstm1 = create_identity_lstm(drop1, (7, 1024), mask_input=mask)
drop2 = DropoutLayer(lstm1, p=0.33)
#lstm2 + dropout
lstm2 = create_identity_lstm(drop2, (1024, 128), orig_inp=7, mask_input=mask)
drop3 = DropoutLayer(lstm2, p=0.33)
#lstm3
lstm3 = create_identity_lstm(drop3, (128, 7), orig_inp=7, mask_input=mask)
# symbolic variables and prediction
x = input.input_var
ma = mask.input_var
ma_reshape = ma.dimshuffle((0,1,'x'))
yhat = get_output(lstm3, deterministic=False)
yhat_det = get_output(lstm3, deterministic=True)
y = T.ftensor3('y')
predict = theano.function([x, ma], yhat_det)
Problem is, even without any training, this network produces garbage values and sometimes even a bunch of NaNs, right from the very first LSTM layer:
X = numpy.random.random((5, 10000, 7)).astype('float32')
Masks = numpy.ones(X.shape[:2], dtype='float32')
hid1 = get_output(lstm1, determistic=True)
get_hid1 = theano.function([x, ma], hid1)
h1 = get_hid1(X, Masks)
print numpy.isnan(h1).sum(axis=1).sum(axis=1)
array([6379520, 6367232, 6377472, 6376448, 6378496])
# even the first output value is garbage!
print h1[:,0,0] - X[:,0,0]
array([-0.03898358, -0.10118812, 0.34877831, -0.02509735, 0.36689138], dtype=float32)
I don't get why, I checked each matrices and their values are fine, like I wanted them to be. I even tried to recreate each gate activations and the resulting hidden activations using the actual numpy arrays and they reproduce the input just fine. What did I do wrong there??

Resources