Plotting Hidden Weights - theano

I've had an interest for neural networks for a while now and have just started following the deep learning tutorials. I have what I hope is a relatively straight forward question that I am hoping someone may answer.
In the multilayer perception tutorial, I am interested in seeing the state of the network at different layers (something similar to what is seen in this paper: ). For instance, I am able to write out the weights of the hidden layer using:
W_open = open('mlp_w_pickle.pkl','w')
cPickle.dump(classifier.hiddenLayer.W.get_value(borrow=True), W_open, -1)
When I plot this using the tile plotting, I get the following pretty plot [edit: pretty plot rmoved as I dont have enough rep].
If I wanted to plot the weights at the logRegressionLayer, such that
cPickle.dump(classifier.logRegressionLayer.W.get_value(borrow=True), W_open, -1)
what would I actually have to do? The above doesn't seem to work - it returns a 2darray of shape (500,10). I understand that the 500 relates to the number of hidden units. The paragraph on the Miscellaneous page:
Plotting the weights is a bit more tricky. We have n_hidden hidden
units, each of them corresponding to a column of the weight matrix. A
column has the same shape as the visible, where the weight
corresponding to the connection with visible unit j is at position j.
Therefore, if we reshape every such column, using numpy.reshape, we
get a filter image that tells us how this hidden unit is influenced by
the input image.
confuses me alittle. I am unsure exactly how I would string it together.
Thanks to all - sorry if the question is confusing!

You could plot them just the like the weights in the first layer but they will not necessarily make much sense.
Consider the weights in the first layer of a neural network. If the inputs have size 784 (e.g. MNIST images) and there are 2000 hidden units in the first layer then the first layer weights are a matrix of size 784x2000 (or maybe the transpose depending on how it's implemented). Those weights can be plotted as either 784 patches of size 2000 or, more usually, 2000 patches of size 784. In this latter case each patch can be plotted as a 28x28 image which directly ties back to the original inputs and thus is interpretable.
For you higher level regression layer, you could plot 10 tiles, each of size 500 (e.g. patches of size 22x23 with some padding to make it rectangular), or 500 patches of size 10. Either might illustrate some patterns that are being found but it may be difficult to tie those patterns back to the original inputs.


How to use data with different length for cnn with conv1 as first layer?

I have a list of arrays of different shapes i.e.
list = [array([1,2,3]), dtype=int16),
array([1,2,3,4,5]), dtype=int16),
array([1,2]), dtype=int16),
array([1,2,3,4,5,6,7,8,9]), dtype=int16)]
I want to use these data as input in a cnn in which the first layer is conv1. How should i transform the data in order to work? Should i fill with zeros the arrays? The data is signal coming from a heart device.d
Every sample data has to have same shape to feed any Keras model as you know, therefore you have to make all sample's shape same. In order to do so, you can leverage sklearn.impute.SimpleImputer to full with dummy numbers to make the shapes same.
The SimpleImputer has some options to fill out, please refer below site.
If the smallest sample shape (e.g. [1, 2]) is still relatively big, you can prune other sample's data in order to fit to the smallest shape. If the smallest one is still small, you should consider whether you should omit it or not.
Given data is heart beat wave data, I'll try with zero as you mentioned at first.

White spot on generated image CycleGAN

I am trying to implement cyclegan. However, it looks like I always get white spots on my generated images even after 10 or 25 epochs. I am wondering what could be wrong? should I continue training and the problem would just go away? or is there any hint on how to solve this problem?
White spots are results from clipping your models output values that are too large when plotting the image.
In the Documentation of matplotlib.imshow() it says:
(...) an image with RGB values (0-1 float or 0-255 int)
(...) Out-of-range RGB(A) values are clipped.
Without knowing your architecture, I would:
review your activation functions within the model and the final activation of your generator.
check your loss function, probably high output values are favoured by your loss objectvive
try training for more epochs, probably the model learns to avoid clipping by itself.

Direct Heatmap Regression with Fully Convolutional Nets

I'm trying to develop a fully-convolutional neural net to estimate the 2D locations of keypoints in images that contain renders of known 3D models. I've read plenty of literature on this subject (human pose estimation, model based estimation, graph networks for occluded objects with known structure) but no method I've seen thus far allows for estimating an arbitrary number of keypoints of different classes in an image. Every method I've seen is trained to output k heatmaps for k keypoint classes, with one keypoint per heatmap. In my case, I'd like to regress k heatmaps for k keypoint classes, with an arbitrary number of (non-overlapping) points per heatmap.
In this toy example, the network would output heatmaps around each visible location of an upper vertex for each shape. The cubes have 4 vertices on top, the extruded pentagons have 2, and the pyramids just have 1. Sometimes points are offscreen or occluded, and I don't wish to output heatmaps for occluded points.
The architecture is a 6-6 layer Unet (as in this paper The ground truth heatmaps are normal distributions centered around each keypoint. When training the network with a batch size of 5 and l2 loss, the network learns to never make an estimate whatsoever, just outputting blank images. Datatypes are converted properly and normalized from 0 to 1 for input and 0 to 255 for output. I'm not sure how to solve this, are there any red flags with my general approach? I'll post code if there's no clear problem in general...

Calculate gradient of neural network

I am reading about adversarial images and breaking neural networks. I am trying to work through the article step-by-step but do to my inexperience I am having a hard time trying to understand the following instructions.
At the moment, I have a logistic regression model for the MNIST data set. If you give an image, it will predict the number that it most likely is...
saver.restore(sess, "/tmp/model.ckpt")
# image of number 7
x_in = np.expand_dims(mnist.test.images[0], axis=0)
classification =, 1), feed_dict={x:x_in})
Now, the article states that in order to break this image, the first thing we need to do is get the gradient of the neural network. In other words, this will tell me the direction needed to make the image look more like a number 2 or 3, even though it is a 7.
The article states that this is relatively simple to do using back propagation. So you may define a function...
compute_gradient(image, intended_label)
...and this basically tells us what kind of shape the neural network is looking for at that point.
This may seem easy to implement to those more experienced but the logic evades me.
From the parameters of the function compute_gradient, I can see that you feed it an image and an array of labels where the value of the intended label is set to 1.
But I do not see how this is supposed to return the shape of the neural network.
Anyways, I want to understand how I should implement this back propagation algorithm to return the gradient of the neural network. If the answer is not very straightforward, I would like some step-by-step instructions as to how I may get my back propagation to work as the article suggests it should.
In other words, I do not need someone to just give me some code that I can copy but I want to understand how I may implement it as well.
Back propagation involves calculating the error in the network's output (the cost function) as a function of the inputs and the parameters of the network, then computing the partial derivative of the cost function with respect to each parameter. It's too complicated to explain in detail here, but this chapter from a free online book explains back propagation in its usual application as the process for training deep neural networks.
Generating images that fool a neural network simply involves extending this process one step further, beyond the input layer, to the image itself. Instead of adjusting the weights in the network slightly to reduce the error, we adjust the pixel values slightly to increase the error, or to reduce the error for the wrong class.
There's an easy (though computationally intensive) way to approximate the gradient with a technique from Calc 101: for a small enough e, df/dx is approximately (f(x + e) - f(x)) / e.
Similarly, to calculate the gradient with respect to an image with this technique, calculate how much the loss/cost changes after adding a small change to a single pixel, save that value as the approximate partial derivative with respect to that pixel, and repeat for each pixel.
Then the gradient with respect to the image is approximately:
(cost(x1+e, x2, ... xn) - cost(x1, x2, ... xn)) / e,
(cost(x1, x2+e, ... xn) - cost(x1, x2, ... xn)) / e,
(cost(x1, x2, ... xn+e) - cost(x1, x2, ... xn)) / e

Why a CNN learns different feature maps

I understand (and please correct me if my understanding is wrong) that the primary purpose of a CNN is to reduce the number of parameters from what you would need if you were to use a fully connected NN. And CNN achieves this by extracting "features" of images.
CNN can do this because in a natural image, there are small features such as lines and elementary curves that may occur in an "invariant" fashion, and constitute the image much like elementary building blocks.
My question is: when we create layers of feature maps, say, 5 of them, and we get these by using the sliding window of a size, say, 5x5 on an image that has pixels of, say, 100x100, Initially, these feature maps are initialized as random number weight matrices, and must progressively adjust the weights with gradient descent right? But then, if we are getting these feature maps by using the exactly same sized windows, sliding in exactly the same ways (sharing the same starting point and the same stride value), on the exactly same image, how can these maps learn different features of the image? Won't they all come out the same, say, a line or a curve?
Is it due to the different initial values of the weight matrices? (I.e. some weight matrices are more receptive to learning a certain particular feature than others?)
Thanks!! I wrote my 4 questions/opinions and indexed them, for the ease of addressing them separately!
