I have a GAN with a simple encoder-decoder as a base. I have tested different combinations of loss functions (adversarial, L1, MSE, perceptual loss using pre-trained VGG16). The use case is filling in missing information (semantic inpainting) in a panorama. The network is based on the DeepFillv2 paper. See here: https://github.com/zhaoyuzhi/deepfillv2
However, my problem is that the higher I prioritize the adversarial loss or perceptual loss, the larger artifacts become visible like in the following images:
Structures are visible but with these RGB shifts. Has anyone had this problem before? Does anyone have any tips on what I could try? I had changed the normalization so far and tried different data sets.
Related
I'm using Windows 10 machine. Libraries: Keras with Tensorflow 2.0 Embeddings: Glove(100 dimensions).
I am trying to implement an LSTM architecture for multi-label text classification.
I am using different types of fine-tuning to achieve better results but with no luck so far.
The main problem I believe is the difference in class distributions of my dataset but after a lot of tries and errors, I couldn't implement stratified-k-split in Keras.
I am also experimenting with dropout layers, batch sizes, # of layers, learning rates, clip values, validation splits but I get a minimum boost or worst performance sometimes.
For metrics, I use mainly ROC and F1.
I also followed the suggestion from a StackOverflow member who said to delete some of my examples so I can balance my dataset but if I do that I will have a very low number of examples.
What would you suggest to me?
If someone can provide code based on my implementation for
stratified-k-split I would be grateful cause I have checked all the
online resources but can't implement it.
Any tips, suggestions will be really helpful.
Metrics Plots
Dataset form+Embedings form+train-test-split form
Dataset's labels distribution
My LSTM implementation
I am currently working on fingers-count deep learning problem. When you look at the dataset, images in the training and validation set are very basic and are almost the same. The network can achieve high training and validation accuracies. But when it comes to prediction in real-life images, it performs very badly(this is because the model has been trained on very basic images).
To overcome this, I converted the training and validation images to HSV(Hue-Saturation-Value) and trained the model on new HSV images. Example of 1 such image from new training set is:
I then convert my image from real life to HSV and pass it to model for prediction. But still, the model is not able to predict correctly. I assumed that since the training images and predicting image are almost same after applying HSV, the model should be predicting good. Is there something which I am thinking incorrectly here? Can HSV images be actually used for training CNN?
It seems you have the overfitting issue, and your model only memorize the simple samples of the training set and in contrast it can not generalize to more complex and diverse data.
In the context of Deep Learning there are various methods to avoid overfitting and I think you don't need to transform your input to HSV necessarily. First of all you can apply various data augmentation methods like random crop or rotation to create various versions of your data. If this method does not work, you can use a smaller model or applying techniques such as Drop Out or Regularization.
Here is a good tutorial from TensorFlow.
I've made a neural network and it's architecture is as follows:
It has two two branches that are merged. One branch takes matrices as an input to a convolutional network and other branch is a fully connected layer that takes a vector as an input. These two branches are merged and send to a fully connected layer followed by a output layer. My network runs, however, I get the following graphs:
For accuracy:
For Loss:
I think my loss graph is alright. But the accuracy fun is fluctuating a lot. My overall accuracy is 60%. Do you think these graphs suggests under-fitting or it's normal? Insights would be appreciated.
It is a common behaviour to have fluctuations due to batch training. A perfect smooth loss graph/increase in accuracy would be obtained if and only if the neural network would be fed the entire dataset (this is impossible from a computational viewpoint).
When your training loss increases, the validation accuracy decreases, which is a good sign. The last graph together with my previous observation eliminates the possibility of overfitting(at least on the development set).
The graphs do not look out of the ordinary (except for those spikes that I have already mentioned, it is normal to have them during batch training)
It may or may not be the case of underfitting. If using more complex neural networks on both branches(on even on one of the branch) gives you a better result, then this means that it is a case of underfitting.
However, this underfitting phenomenon has nothing to do with the spikes that you see on the graphs.
Hope this helps you with your problem :)
I am a newby to the convolutional neural nets... so this may be an ignorant question.
I have followed many examples and tutorials now on the MNIST example in TensforFlow. In the CNN examples, all authors talk bout using the 'input filters' to run in the CNN. But no one that I can find mentions WHERE they come from. Can anyone answer where these come from? Or are they magically obtained from the input images.
Thanks! Chris
This is an image that one professor uses, be he does not exaplain if he made them or TensorFlow auto-extracts these somehow.
Disclaimer: I am not an expert, more of an enthusiast.
To cut a long story short: filters are the CNN equivalent of weights, and all a neural network essentially does is learning their optimal values.
Which it does by iterating through a training dataset, making predictions, comparing them to the label/value already assigned to each training unit (usually an image in case of a CNN) and adjusting weights to minimize the error function (the difference between the predicted value and the actual value).
Initial values of filters/weights do not matter that much, so although they might affect the speed of convergence to a small degree, I believe they are often assigned random values.
It is the job of the neural network to figure out the optimal weights, not of the person implementing it.
Main Problem
I cannot understand the Plot of the weights of a specific layer.
I used a method from no-learn : plot_conv_weights(layer, figsize=(6, 6))
Im using lasagne as my neural-network library.
The plot comes out fine, but I dont know how i should interpret it.
Neural Network Structure
The structure im using :
InputLayer 1x31x31
Conv2DLayer 20x3x3
Conv2DLayer 20x3x3
Conv2DLayer 20x3x3
MaxPool2DLayer 2x2
Conv2DLayer 40x3x3
Conv2DLayer 40x3x3
Conv2DLayer 40x3x3
MaxPool2DLayer 40x2x2
DropoutLayer
DenseLayer 96
DropoutLayer 96
DenseLayer 32
DropoutLayer 32
DenseLayer 1 as sigmoid
Here are the weights of the first 3 Layers :
** About the Images **
So for me, they look random and i cannot interpret them!
However, on Cs231, it says the following :
Conv/FC Filters. The second common strategy is to visualize the
weights. These are usually most interpretable on the first CONV layer
which is looking directly at the raw pixel data, but it is possible to
also show the filter weights deeper in the network. The weights are
useful to visualize because well-trained networks usually display nice
and smooth filters without any noisy patterns. Noisy patterns can be
an indicator of a network that hasn’t been trained for long enough, or
possibly a very low regularization strength that may have led to
overfitting
http://cs231n.github.io/understanding-cnn/
Then why mine are random?
The structure is trained and performs well for its task.
References
http://cs231n.github.io/understanding-cnn/
https://github.com/dnouri/nolearn/blob/master/nolearn/lasagne/visualize.py
Normally when you visualize the weights you want to check 2 things:
That they are smooth and cover a wide range of values, i.e. it's not a bunch of 1's and 0's. That would mean the non-linearity is being saturated.
That they have some kind of structure. Normally you tend to see oriented edges although this is more difficult to see when you have small filters like 3x3.
That being said, your weights do not appear to be saturated, but they indeed seem to be too random.
During training, did the network converge correctly?
I am also surprised at how big your filters are (30x30). Not sure what you are trying to accomplish with that.