Show model layout / design (with all connections) in Keras - keras

I have major differences when testing a Keras LSTM model after I've trained it compared to when I load that trained model from a .h5 file (Accuracy of the first is always > 0.85 but of the later is always below < 0.2 i.e. a random guess).
However I checked the weights, they are identical and also the sparse layout Keras give me via plot_model is the same, but since this only retrieves a rough overview:
Is there away to show the full layout of a Keras model (especially node connections)?

If you're using tensorflow backend, apart from plot_model, you can also use keras.callbacks.TensorBoard callback to visualize the whole graph in tensorboard. Example:
callback = keras.callbacks.TensorBoard(log_dir='./graph',
histogram_freq=0,
write_graph=True,
write_images=True)
model.fit(..., callbacks=[callback])
Then run tensorboard --logdir ./graph from the same directory.
This is a quick shortcut, but you can go even further with that.
For example, add tensorflow code to define (load) the model within custom tf.Graph instance, like this:
from keras.layers import LSTM
import tensorflow as tf
my_graph = tf.Graph()
with my_graph.as_default():
# All ops / variables in the LSTM layer are created as part of our graph
x = tf.placeholder(tf.float32, shape=(None, 20, 64))
y = LSTM(32)(x)
.. after which you can list all graph nodes with dependencies, evaluate any variable, display the graph topology and so on, to compare the models.
I personally think, the simplest way is to setup your own session. It works in all cases with minimal patching:
import tensorflow as tf
from keras import backend as K
sess = tf.Session()
K.set_session(sess)
...
# Now can evaluate / access any node in this session, e.g. `sess.graph`

Related

How do I put both PCA and a keras classifier into an sklearn pipeline and do a grid search CV?

I am trying to do a grid search CV operation on a keras NN with PCA beforehand. To this end, I have constructed a pipeline consisting of the PCA step then the keras estimator using the sklearn wrapper. However, one of the things I want to search through is n_components of the PCA, which means that the input size of the neural net needs to be variable and dependent on the number of features selected in a previous piepline step.
Below is my code to create a NN which I have put into a keras wrapper
def create_model(learning_rate=0.01, activation='relu',Input_Vector_Size=34,
neuron_number=10,dropout_prob=0.1):
# Create an Adam optimizer with the given learning rate
opt=Adam(lr=learning_rate)
# Create your binary classification model
model=Sequential()
model.add(Dense(neuron_number,input_shape=(Input_Vector_Size,),
activation=activation))
model.add(Dropout(dropout_prob))
model.add(Dense(1,activation='sigmoid')) #output layer
# Compile model with your optimizer, loss, and metrics
model.compile(optimizer=opt,loss='binary_crossentropy',
metrics=['accuracy'])
return model
Which I put into a pipline using
#%% Create a Pipeline with the Keras_Classifier and PCA
pca=PCA(n_components=0.9)
NN=KerasClassifier(build_fn=create_model,verbose=0)
# Define the parameters to try out
pipeline=Pipeline([('pca',pca),('NN',NN)])
params = {'pca__n_components':[0.8,0.85,0.9,0.95],
'NN__activation': ['relu', 'tanh'],
'NN__neuron_number': [10, 15, 20],
'NN__dropout_prob':[0.05,0.1,0.2,0.3],
'NN__learning_rate': [0.1, 0.01, 0.001]}
However I'm a bit stuck on what to set as the param for Input_Vector_Size as this will be dependent on how many features PCA has selected.
So, is it possible to make a pipeline parameter (here, the Input_Vector_Size) that is dependent on a parameter in a previous step of the pipeline (here, the number of features selected by PCA)?
(Note: I realize that one option around this is to just have an autoencoder in my NN and vary the compression, however I was hoping to do PCA specifically)

Display Keras graph in Tensorboard without using the callback in the fit method

Is it possible to display a Keras graph in Tensorboard without using the tensorboard callback in the fit method?
Is it possible to extract the graph from Keras and display the graph using the tensorflow FileWriter?
tf.summary.FileWriter(logdir='logdir', graph=graph)
I want to do this to check all the connections of this part of the graph are as expected (this model is part of a larger model that is far from finished).
Thanks.
It turned out to be very simple by extracting the Tensorflow graph from the backend and using the file writer.
import tensorflow as tf
# Used to get the graph
from tensorflow.python.keras import backend as K
tb_path = "logs/"
# Simple model to test the tensorboard plotting
model = SimpleModel(50, 20, 10).build_model()
# Get the sessions graph
graph = K.get_session().graph
# Display with the tensorflow file writer
writer = tf.summary.FileWriter(logdir=tb_path, graph=graph)

Load and use saved Keras model.h5

I try to a KerasClassifier (wrapper) into final_model.h5
validator = GridSearchCV(estimator=clf, param_grid=param_grid)
grid_result = validator.fit(train_images, train_labels)
best_estimator = grid_result.best_estimator_
best_estimator.model.save("final_model.h5")
And then I want to reuse the model
from keras.models import load_model
loaded_model = load_model("final_model.h5")
But it seems like loaded_model is now a Sequential object instead. In other words it is different from KerasClassifier object like best_estimator
I want to reuse some method like score which is available in KerasClassifier, which is not available in Sequential model. What should I do?
Also, I would like to know more about how to continue the training process left off on final_model.h5. What can I do next?
Yes, in the end you saved the Keras model as HDF5, not the KerasClassifier that is just an adapter to use with scikit-learn.
But you don't really need the KerasClassifier instance, you want the score function and this in keras is called evaluate, so just call model.evaluate(X, Y) and this will return a list containing first the loss and then any metrics that your model used (most likely accuracy).
To continue training the model, just load it and call model.fit with the new training set and that's it.

Adverserial images in TensorFlow

I am reading an article that explains how to trick neural networks into predicting any image you want. I am using the mnist dataset.
The article provides a relatively detailed walk through but the person who wrote it is using Caffe.
Anyways, my first step was to create a logistic regression function using TensorFlow that is trained on the mnist dataset. So, if I were to restore the logistic regression model I can use it to predict any image. For example, I feed the number 7 to the following model...
with tf.Session() as sess:
saver.restore(sess, "/tmp/model.ckpt")
# number 7
x_in = np.expand_dims(mnist.test.images[0], axis=0)
classification = sess.run(tf.argmax(pred, 1), feed_dict={x:x_in})
print(classification)
>>>[7]
This prints out the number [7] which is correct.
Now the article explains that in order to break a neural network we need to calculate the gradient of the neural network. This is the derivative of the neural network.
The article states that to calculate the gradient, we first need to pick an intended outcome to move towards, and set the output probability list to be 0 everywhere, and 1 for the intended outcome. Backpropagation is an algorithm for calculating the gradient.
Then there's code provided in Caffe as to how to calculate the gradient...
def compute_gradient(image, intended_outcome):
# Put the image into the network and make the prediction
predict(image)
# Get an empty set of probabilities
probs = np.zeros_like(net.blobs['prob'].data)
# Set the probability for our intended outcome to 1
probs[0][intended_outcome] = 1
# Do backpropagation to calculate the gradient for that outcome
# and the image we put in
gradient = net.backward(prob=probs)
return gradient['data'].copy()
Now, my issue is, I'm having a hard time understanding how this function is able to get the gradient just by feeding just the image and the probabilities to the function. Because I do not fully understand this code, I am having a hard time translating this logic to TensorFlow.
I think I am confused as to how the Caffe framework works because I've never seen/used it before. If someone could explain how this logic works step-by-step that would be great.
I already know the basics of Backpropagation so you may assume I already know how it works.
Here is a link to the article itself...https://codewords.recurse.com/issues/five/why-do-neural-networks-think-a-panda-is-a-vulture
I'm going to show you how to do the basics of generating an adversarial image in TF, to apply that to an already learned model you might need some adaptations.
The code blocks work well as blocks in a Jupyter notebook if you want to try this out interactively. If you don't use a notebook, you'll need to add plt.show() calls for the plots to show and remove the matplotlib inline statement. The code is basically the simple MNIST tutorial from the TF documentation, I'll point out the important differences.
First block is just setup, nothing special ...
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
# if you're not using jupyter notebooks then comment this out
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
Get MNIST data (it is down from time to time so you might need to download it from web.archive.org manually and put it into that directory). We're not using one hot encoding like in the tutorial because by now TF has nicer functions to calculate the loss that don't need the one hot encoding anymore.
mnist = input_data.read_data_sets('/tmp/tensorflow/mnist/input_data')
In the next block we are doing something "special". The input image tensor is defined as a variable because later we want to optimize with regard to the input image. Usually you would have a placeholder here. It does limit us a bit here because we need a definite shape so we only feed in one example at a time. Not something you want to do in production, but for teaching purposes it's fine (and you can get around it with a little more code). Labels are placeholders like normal.
input_images = tf.get_variable("input_image", shape=[1,784], dtype=tf.float32)
input_labels = tf.placeholder(shape=[1], name='input_label', dtype=tf.int32)
Our model is a standard logistic regression model like in the tutorial. We only use the softmax for visualization of results, the loss function takes plain logits.
W = tf.get_variable("weights", shape=[784, 10], dtype=tf.float32, initializer=tf.random_normal_initializer())
b = tf.get_variable("biases", shape=[1, 10], dtype=tf.float32, initializer=tf.zeros_initializer())
logits = tf.matmul(input_images, W) + b
softmax = tf.nn.softmax(logits)
The loss is standard cross entropy. What's to note in the training step is that there is an explicit list of variables passed in - we have defined the input image as a training variable but we don't want to try optimizing the image while training the logistic regression, just weights and biases - so we explicitly state that.
loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits,labels=input_labels,name='xentropy')
mean_loss = tf.reduce_mean(loss)
train_step = tf.train.AdamOptimizer(learning_rate=0.1).minimize(mean_loss, var_list=[W,b])
Start the session ...
sess = tf.Session()
sess.run(tf.global_variables_initializer())
Training is slower than it should be because of batch size 1. Like I said, not something you want to do in production, but this is just for teaching the basics ...
for step in range(10000):
batch_xs, batch_ys = mnist.train.next_batch(1)
loss_v, _ = sess.run([mean_loss, train_step], feed_dict={input_images: batch_xs, input_labels: batch_ys})
At this point we should have a model that is good enough to demonstrate how to generate an adversarial image. First, we get an image that has label '2' because these are easy so even our suboptimal classifier should get them right (if it doesn't, run this cell again ;) this step is random so I can't guarantee that it'll work).
We're setting our input image variable to that example.
sample_label = -1
while sample_label != 2:
sample_image, sample_label = mnist.test.next_batch(1)
sample_label
plt.imshow(sample_image.reshape(28, 28),cmap='gray')
# assign image to var
sess.run(tf.assign(input_images, sample_image));
sess.run(softmax) # now using the variable as input, no feed dict
# should show something like
# array([[ 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.]], dtype=float32)
# With the third entry being the highest by far.
Now we are going to "break" the classification. We want to change the image to make it look more like another number, in the eyes of the network, without changing the network itself. To do that, the code looks basically identical to what we had before. We define a "fake" label, the same loss as before (cross entropy) and we get an optimizer to minimize the fake loss, but this time with a var_list consisting of only the input image - so we won't change the logistic regression weights:
fake_label = tf.placeholder(tf.int32, shape=[1])
fake_loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits,labels=fake_label)
adversarial_step = tf.train.GradientDescentOptimizer(learning_rate=1e-3).minimize(fake_loss, var_list=[input_images])
The next block is intended to be run interactively multiple times, while you see the image and the scores changing (here moving towards a label of 8):
sess.run(adversarial_step, feed_dict={fake_label:np.array([8])})
plt.imshow(sess.run(input_images).reshape(28,28),cmap='gray')
sess.run(softmax)
The first time you run this block, the scores will probably still heavily point towards 2, but it will change over time and after a couple runs you should see something like the following image - note that the image still looks like a 2 with some noise in the background, but the score for "2" is at around 3% while the score for "8" is at over 96%.
Note that we never actually computed the gradient explicitly - we don't need to, the TF optimizer takes care of computing gradients and applying updates to the variables. If you want to get the gradient, you can do so by using tf.gradients(fake_loss, input_images).
The same pattern works for more complicated models, but what you'll want to do is to train your model as normal - using placeholders with bigger batches, or using a pipeline with TF readers, and when you want to do the adversarial image you'd recreate the network with the input image variable as an input. As long as all the variable names remain the same (which they should if you use the same functions to build the network) you can restore using your network checkpoint, and then apply the steps from this post to get to an adversarial image. You might need to play around with learning rates and such.

Using a transformer (estimator) to transform the target labels in sklearn.pipeline

I understand that one can chain several estimators that implement the transform method to transform X (the feature set) in sklearn.pipeline. However I have a use case where I would like also transform the target labels (like transform the labels to [1...K] instead of [0, K-1] and I would love to do that as a component in my pipeline. Is it possible to that at all using the sklearn.pipeline.?
There is now a nicer way to do this built into scikit-learn; using a compose.TransformedTargetRegressor.
When constructing these objects you give them a regressor and a transformer. When you .fit() them they transform the targets before regressing, and when you .predict() them they transform their predicted targets back to the original space.
It's important to note that you can pass them a pipeline object, so they should interface nicely with your existing setup. For example, take the following setup where I train a ridge regression to predict 1 target given 2 features:
# Imports
import numpy as np
from sklearn import compose, linear_model, metrics, pipeline, preprocessing
# Generate some training and test features and targets
X_train = np.random.rand(200).reshape(100,2)
y_train = 1.2*X_train[:, 0]+3.4*X_train[:, 1]+5.6
X_test = np.random.rand(20).reshape(10,2)
y_test = 1.2*X_test[:, 0]+3.4*X_test[:, 1]+5.6
# Define my model and scalers
ridge = linear_model.Ridge(alpha=1e-2)
scaler = preprocessing.StandardScaler()
minmax = preprocessing.MinMaxScaler(feature_range=(-1,1))
# Construct a pipeline using these methods
pipe = pipeline.make_pipeline(scaler, ridge)
# Construct a TransformedTargetRegressor using this pipeline
# ** So far the set-up has been standard **
regr = compose.TransformedTargetRegressor(regressor=pipe, transformer=minmax)
# Fit and train the regr like you would a pipeline
regr.fit(X_train, y_train)
y_pred = regr.predict(X_test)
print("MAE: {}".format(metrics.mean_absolute_error(y_test, y_pred)))
This still isn't quite as smooth as I'd like it to be, for example you can access the regressor that contained by a TransformedTargetRegressor using .regressor_ but the coefficients stored there are untransformed. This means there are some extra hoops to jump through if you want to work your way back to the equation that generated the data.
No, pipelines will always pass y through unchanged. Do the transformation outside the pipeline.
(This is a known design flaw in scikit-learn, but it's never been pressing enough to change or extend the API.)
You could add the label column to the end of the training data, then you apply your transformation and you delete that column before training your model. That's not very pro but enough.

Resources