I am trying to implement cyclegan. However, it looks like I always get white spots on my generated images even after 10 or 25 epochs. I am wondering what could be wrong? should I continue training and the problem would just go away? or is there any hint on how to solve this problem?
IMAGE
White spots are results from clipping your models output values that are too large when plotting the image.
In the Documentation of matplotlib.imshow() it says:
(...) an image with RGB values (0-1 float or 0-255 int)
(...) Out-of-range RGB(A) values are clipped.
Without knowing your architecture, I would:
review your activation functions within the model and the final activation of your generator.
check your loss function, probably high output values are favoured by your loss objectvive
try training for more epochs, probably the model learns to avoid clipping by itself.
Related
I have a tensor named input with dimensions 64x21x21. It is a minibatch of 64 images, each 21x21 pixels. I'd like to crop each image down to 11x11 pixels. So the output tensor I want would have dimensions 64x11x11.
I'd like to crop each image around a different "center pixel." The center pixels are given by a 2-dimensional long tensor named center with dimensions 64x2. For image i, center[i][0] gives the row index and center[i][1] gives the column index for the pixel that should be at the center in the output. We can assume that the center pixel is always at least 5 pixels away from the border.
Is there an efficient way to do this in pytorch (on the gpu)?
UPDATE: Let me clarify that the center tensor is formed by a deep neural network. It acts as a "hard attention mechanism," to use the reinforcement learning term for it. After I "crop" an image, that subimage becomes the input to another neural network. That's why I want to do the cropping in Pytorch: because the operations before and after the cropping are in Pytorch. I'd like to avoid having to transfer anything from the GPU back to the CPU.
I raised the question over on the pytorch forums, and got an answer there from smth. The grid_sample function should totally solve the problem.
https://discuss.pytorch.org/t/cropping-a-minibatch-of-images-each-image-a-bit-differently/12247
torchvision contains transforms including RandomCrop, but it doesn't seem to fit your use case if you want the images cropped in a specific way. I would recon that PyTorch, a deep learning framework, is not the appropriate tool for cropping images.
Instead, have a look at this tutorial that uses pillow. You should be able to implement your use case with this. Also have a look at pillow-simd which does some operations faster.
We see pretty pictures of error surface with a global minima and convergence of a neural network in many books. How can I visualize something similar in keras i.e containing error surface and how my model is converging to achieve global minimal error? Below is an example image of such illustrations. And this link has animated illustration of different optimizers. I explored tensorboard log callback for this purpose but could not find any such thing. A little guidance will be appreciated.
The pictures and animations are made for didatic purposes, but the error surface is completely unknown (or incredibly complex to be understood or visualized). That's the whole idea behind using gradient descent.
We only know, at a single point, the direction towards which the funcion increases, through getting the current gradient.
You could try to plot the way (line) you're following by getting the weights values at each iteration and the error, but then you'd face another problem: it's a massively multidimensional function. It's not actually a surface. The number of variables is the number of weights you have in the model (often thousands or even millions). This is absolutely impossible to visualize or even conceive as a visual thing.
To plot such a surface, you'd have to manually change all thousands of weights to get the error for each arrangement. Besides the "impossible to visualize" problem, this would be excessively time consuming.
I am reading about adversarial images and breaking neural networks. I am trying to work through the article step-by-step but do to my inexperience I am having a hard time trying to understand the following instructions.
At the moment, I have a logistic regression model for the MNIST data set. If you give an image, it will predict the number that it most likely is...
saver.restore(sess, "/tmp/model.ckpt")
# image of number 7
x_in = np.expand_dims(mnist.test.images[0], axis=0)
classification = sess.run(tf.argmax(pred, 1), feed_dict={x:x_in})
print(classification)
Now, the article states that in order to break this image, the first thing we need to do is get the gradient of the neural network. In other words, this will tell me the direction needed to make the image look more like a number 2 or 3, even though it is a 7.
The article states that this is relatively simple to do using back propagation. So you may define a function...
compute_gradient(image, intended_label)
...and this basically tells us what kind of shape the neural network is looking for at that point.
This may seem easy to implement to those more experienced but the logic evades me.
From the parameters of the function compute_gradient, I can see that you feed it an image and an array of labels where the value of the intended label is set to 1.
But I do not see how this is supposed to return the shape of the neural network.
Anyways, I want to understand how I should implement this back propagation algorithm to return the gradient of the neural network. If the answer is not very straightforward, I would like some step-by-step instructions as to how I may get my back propagation to work as the article suggests it should.
In other words, I do not need someone to just give me some code that I can copy but I want to understand how I may implement it as well.
Back propagation involves calculating the error in the network's output (the cost function) as a function of the inputs and the parameters of the network, then computing the partial derivative of the cost function with respect to each parameter. It's too complicated to explain in detail here, but this chapter from a free online book explains back propagation in its usual application as the process for training deep neural networks.
Generating images that fool a neural network simply involves extending this process one step further, beyond the input layer, to the image itself. Instead of adjusting the weights in the network slightly to reduce the error, we adjust the pixel values slightly to increase the error, or to reduce the error for the wrong class.
There's an easy (though computationally intensive) way to approximate the gradient with a technique from Calc 101: for a small enough e, df/dx is approximately (f(x + e) - f(x)) / e.
Similarly, to calculate the gradient with respect to an image with this technique, calculate how much the loss/cost changes after adding a small change to a single pixel, save that value as the approximate partial derivative with respect to that pixel, and repeat for each pixel.
Then the gradient with respect to the image is approximately:
(
(cost(x1+e, x2, ... xn) - cost(x1, x2, ... xn)) / e,
(cost(x1, x2+e, ... xn) - cost(x1, x2, ... xn)) / e,
.
.
.
(cost(x1, x2, ... xn+e) - cost(x1, x2, ... xn)) / e
)
I've had an interest for neural networks for a while now and have just started following the deep learning tutorials. I have what I hope is a relatively straight forward question that I am hoping someone may answer.
In the multilayer perception tutorial, I am interested in seeing the state of the network at different layers (something similar to what is seen in this paper: http://www.iro.umontreal.ca/~lisa/publications2/index.php/publications/show/247 ). For instance, I am able to write out the weights of the hidden layer using:
W_open = open('mlp_w_pickle.pkl','w')
cPickle.dump(classifier.hiddenLayer.W.get_value(borrow=True), W_open, -1)
When I plot this using the utils.py tile plotting, I get the following pretty plot [edit: pretty plot rmoved as I dont have enough rep].
If I wanted to plot the weights at the logRegressionLayer, such that
cPickle.dump(classifier.logRegressionLayer.W.get_value(borrow=True), W_open, -1)
what would I actually have to do? The above doesn't seem to work - it returns a 2darray of shape (500,10). I understand that the 500 relates to the number of hidden units. The paragraph on the Miscellaneous page:
Plotting the weights is a bit more tricky. We have n_hidden hidden
units, each of them corresponding to a column of the weight matrix. A
column has the same shape as the visible, where the weight
corresponding to the connection with visible unit j is at position j.
Therefore, if we reshape every such column, using numpy.reshape, we
get a filter image that tells us how this hidden unit is influenced by
the input image.
confuses me alittle. I am unsure exactly how I would string it together.
Thanks to all - sorry if the question is confusing!
You could plot them just the like the weights in the first layer but they will not necessarily make much sense.
Consider the weights in the first layer of a neural network. If the inputs have size 784 (e.g. MNIST images) and there are 2000 hidden units in the first layer then the first layer weights are a matrix of size 784x2000 (or maybe the transpose depending on how it's implemented). Those weights can be plotted as either 784 patches of size 2000 or, more usually, 2000 patches of size 784. In this latter case each patch can be plotted as a 28x28 image which directly ties back to the original inputs and thus is interpretable.
For you higher level regression layer, you could plot 10 tiles, each of size 500 (e.g. patches of size 22x23 with some padding to make it rectangular), or 500 patches of size 10. Either might illustrate some patterns that are being found but it may be difficult to tie those patterns back to the original inputs.
I have been using libsvm. It produces some good results (95% on positives, 94% on negatives). When I examine the ones that it gets incorrect, however, I am confused about why it got them wrong. How do I determine what it is doing wrong? (More importantly, how do I explain it to my boss?). Some of the testing inputs it gets wrong are very close (visually) to some of the testing inputs it gets right.
About my problem: I am looking at images, 32x32 pixels, 8-bit greyscale. I am evaluating different feature detectors and using them as a dense representation (i.e. at every pixel) of the image. Hence, my feature length is often 1024; some of the feature detectors have multiple outputs, sometimes I do not use every pixel but every 3rd or 5th, etc.. It is a binary classification task, looking for figures in the image; for example, I am trying to find a square, with various letters for negatives. The SVM does well. But sometimes, it will classify a T as a square, and I don't know why. If I'm using probabilities, then sometimes the probability is quite high. What do I do to get an insight into what it is doing and why?