I have some questions about learning GANS, for example in the source code of the link below:
https://github.com/ImagingLab/Colorizing-with-GANs
in the file src/model I want to know if the generator and discriminator are trained at the same time or just only the generator is trained because I read in some papers that is not recommended to train the two networks (Discriminator a Generator) at the same time. If they are trained at the same time in this code which line can disable discriminator training?
Related
I am training various Autoencoder structures in keras+tensorflow for my master thesis on anomaly detection and during my literature research on how to train the autoencoder structures, I found various training schemes for Autoencoders including, Layerwise pretraining and Joint training. However, I have not been able to find any literature on how Keras.fit() handles the training process. Can anyone point me to any useful links?
I am trying to understand the BERT weight calculation. Please suggest me some article which can help me to understand the internal workings of BERT. I have read articles from Medium.
https://towardsdatascience.com/deconstructing-bert-distilling-6-patterns-from-100-million-parameters-b49113672f77
https://towardsdatascience.com/deconstructing-bert-part-2-visualizing-the-inner-workings-of-attention-60a16d86b5c1
I am doing a small project to understand the Bert pretraining and fine-tuning from different sources. My idea is to calculate the weights of each token in their own sources and find avg of all weights to get a global model. Then this global model can be used to fine-tune in different sources.
How can I find these weights, and how can average these weights from multiple sources?
can I visualise it? Then how?
Also, note that I am trying to use Tensorflow version of the Bert implementation and planning to fine-tune for the NER task.
When generating adversarial examples, it is typically using logits as the output of the neural network, and then train the network with cross-entropy.
However, I found that the tutorial of cleverhans uses log softmax and then convert the pytorch model to a tensorflow model, and finally train the model.
https://github.com/tensorflow/cleverhans/blob/master/cleverhans_tutorials/mnist_tutorial_pytorch.py#L65
I am wondering if anyone has the idea about whether using logits instead of log_softmax will make any difference?
As you said, when we get logits from a neural network, we train it using CrossEntropyLoss. An alternative way is to compute the log_softmax and then train the network by minimizing the negative log-likelihood (NLLLoss).
Both approaches are basically the same if you are training a network for classification tasks. However, if you have a different objective function, you may find one of these two techniques, particularly useful in your scenario.
Reference
CrossEntropyLoss
NLLLoss
Correct me if i'm wrong, but i saw an article somewhere that says that training neural net using unsupervised method before using it for supervised classification will give a better result. I assume it is like a transfer learning kind of stuff. But i'm wondering if i wanted to train the LSTM without the labels, it cannot be entered to the fit function of keras. Any idea how to do it?
I have been following the http://deeplearning.net/tutorial/ tutorial on how to train an ANN to classify the MNIST numbers. I am now at the "Convolutional Neural Networks" chapter. I want to use the trained network on single examples (MNIST images) and get the predictions. Is there a way to do that?
I have looked ahead in the tutorial and on google but can't find anything.
Thanks a lot in advance for any kind of help!
The material in the Theano tutorial in the earlier chapters, before reaching the Convolutional Neural Networks (CNN) chapter, give a good overview of how Theano works and some of the components the CNN sample code uses. It might be reasonable to assume that students reaching this point have developed their understanding of Theano sufficiently to figure out how to modify the code to extract the model's predictions. Here's a few hints.
The CNN's output layer, called layer3, is an instance of the LogisticRegression class, introduced in an earlier chapter.
The LogisticRegression class has an attribute called y_pred. The comments next to the code which assigns that attribute's values says
symbolic description of how to compute prediction as class whose
probability is maximal
Looking for places where y_pred is used in the logistic regression sample will highlight a function called predict(). This does for the logistic regression sample what is desired of the CNN example.
If one follows the same approach, using layer3.y_pred as the output of a new Theano function, the model's predictions will become apparent.