This a part of the code for a Deconvolutional-Convoltional Generative Adversarial Network (DC-GAN)
discriminator.trainable = False
ganInput = Input(shape=(100,))
# getting the output of the generator
# and then feeding it to the discriminator
# new model = D(G(input))
x = generator(ganInput)
ganOutput = discriminator(x)
gan = Model(input=ganInput, output=ganOutput)
gan.compile(loss='binary_crossentropy', optimizer=Adam())
I do not understand what the line ganInput = Input(shape=(100,)) does.
Clearly ganInput is a variable but what is Input? Is it a function ?
If Input is a function then what will ganInput contain ?
Then it is ganInput is fed into the generator since it is an empty variable (assuming) it will not matter. Next ganOutput catches the output of the discriminator.
Then comes the problem. I read about the Model API but I do not understand fully what it does.
To summarise these are my problems : What is the role of ganInput and what is Input in the second line. And what is Model doing and what is it?
Using Keras with TensorFlow backend
COMPLETE SOURCE CODE : https://github.com/yashk2810/DCGAN-Keras/blob/master/DCGAN.ipynb
Please ask for any more clarification / details required. If you know the answer to even one of my queries I will request you to please answer it will be a huge help. Thanks
What is input: Notice the wildcard import of keras.layers. In context, Input is keras.layers.Input. Generally, if you see a function or class that wasn't defined or explicitly imported in Python, it got there via a wildcard import, i.e.
from keras.layers import *
That means import everything from keras.layers directly into the workspace.
What is model: The model object is essentially the interface for making neural networks with Keras.
You can read about model and keras.layers.Input at the model docs or at this model guide since I'm not very familiar with Keras.
What's going on in the example is they define generator and discriminator as Sequentials. But the GAN model is a little more complex than a standard old Sequential. The authors deal with that by marking the data that needs fed in at every iteration (in this case, just the random noise for the generator - ganInput) as a keras.layers.Input. Then, like you said, ganOutput catches the output of the discriminator. Since we have two distinct Sequentials that need wrapped together, the authors use the model API.
Related
I have an existing model where I load some pre-trained weights and then do prediction (one image at a time) in pytorch. I am trying to basically convert it to a pytorch lightning module and am confused about a few things.
So currently, my __init__ method for the model looks like this:
self._load_config_file(cfg_file)
# just creates the pytorch network
self.create_network()
self.load_weights(weights_file)
self.cuda(device=0) # assumes GPU and uses one. This is probably suboptimal
self.eval() # prediction mode
What I can gather from the lightning docs, I can pretty much do the same, except not to do the cuda() call. So something like:
self.create_network()
self.load_weights(weights_file)
self.freeze() # prediction mode
So, my first question is whether this is the correct way to use lightning? How would lightning know if it needs to use the GPU? I am guessing this needs to be specified somewhere.
Now, for the prediction, I have the following setup:
def infer(frame):
img = transform(frame) # apply some transformation to the input
img = torch.from_numpy(img).float().unsqueeze(0).cuda(device=0)
with torch.no_grad():
output = self.__call__(Variable(img)).data.cpu().numpy()
return output
This is the bit that has me confused. Which functions do I need to override to make a lightning compatible prediction?
Also, at the moment, the input comes as a numpy array. Is that something that would be possible from the lightning module or do things always have to use some sort of a dataloader?
At some point, I want to extend this model implementation to do training as well, so want to make sure I do it right but while most examples focus on training models, a simple example of just doing prediction at production time on a single image/data point might be useful.
I am using 0.7.5 with pytorch 1.4.0 on GPU with cuda 10.1
LightningModule is a subclass of torch.nn.Module so the same model class will work for both inference and training. For that reason, you should probably call the cuda() and eval() methods outside of __init__.
Since it's just a nn.Module under the hood, once you've loaded your weights you don't need to override any methods to perform inference, simply call the model instance. Here's a toy example you can use:
import torchvision.models as models
from pytorch_lightning.core import LightningModule
class MyModel(LightningModule):
def __init__(self):
super().__init__()
self.resnet = models.resnet18(pretrained=True, progress=False)
def forward(self, x):
return self.resnet(x)
model = MyModel().eval().cuda(device=0)
And then to actually run inference you don't need a method, just do something like:
for frame in video:
img = transform(frame)
img = torch.from_numpy(img).float().unsqueeze(0).cuda(0)
output = model(img).data.cpu().numpy()
# Do something with the output
The main benefit of PyTorchLighting is that you can also use the same class for training by implementing training_step(), configure_optimizers() and train_dataloader() on that class. You can find a simple example of that in the PyTorchLightning docs.
Even though above answer suffices, if one takes note of following line
img = torch.from_numpy(img).float().unsqueeze(0).cuda(0)
One has to put both the model as well as image to the right GPU. On multi-gpu inference machine, this becomes a hassle.
To solve this, .predict was also recently produced, see more at https://pytorch-lightning.readthedocs.io/en/stable/deploy/production_basic.html
I am trying to approach a multi label image classification problem,for which i have image data but i also have some other features like gender etc, but the issue is that i will get this information during testing, in other words during testing only the image information will be provided.
My question is how can i use these extra features to help my image model which is a convolution Neural Network even though i wont have this info during testing?
Any advice will be helpful.Thanks in advance.
This is a really open ended question. I can give you some general guidelines on how this can work.
keras model API supports multiple inputs as well as merge layers. For example you can have something like this:
from keras.layers import Input
from keras.models import Model
image = Input(...)
text = Input(...)
... # apply layers onto image and text
from keras.layers.merge import Concatenate
combined = Concatenate()([image, text])
... # apply layers onto combined
model = Model([image, text], [combined])
This way you can have a model that takes multiple inputs that can make use of all of your data sources. keras has tools to combine your different inputs to produce one output. The part where this becomes open ended is the architecture.
Right now you should probably pass image through a CNN, and then merge the output with text. You have to tweak the exact specifications, such as how you handle each input, your merge method, and how you handle the combined output.
A good example of merge being used is here, where a GAN is given latent noise in the form of an image but also a label to determine what kind of image it should generate. Both the discriminator and the generator make use of the multiply merge layer to combine their inputs.
I usually get to feature importance using
regr = XGBClassifier()
regr.fit(X, y)
regr.feature_importances_
where type(regr) is .
However, I have a pickled mXGBoost model, which when unpacked returns an object of type . This is the same object as if I would have ran regr.get_booster().
I have found a few solutions for getting variable importance from a booster object, but is there a way to get to the classifier object from the booster object so I can just apply the same feature_importances_ command? This seems like the most straightforward solution, or it seems like I have to write a function that mimics the output of feature_importances_ in order for it to fit my logged feature importances...
So ideally I'd have something like
xbg_booster = pickle.load(open("xgboost-model", "rb"))
assert str(type(xgb_booster)) == "<class 'xgboost.core.Booster'>", 'wrong class'
xgb_classifier = xgb_booster.get_classifier()
xgb_classifier.feature_importances_
Are there any limitations to what can be done with a booster object in terms finding the classifier? I figure there's some combination of save/load/dump that will get me what I need but I'm stuck for now...
Also for context, the pickled model is the output from AWS sagemaker, so I'm just unpacking it to do some further evaluation
Based on my own experience trying to recreate a classifier from a booster object generated by SageMaker I learned the following:
It doesn't appear to be possible to recreate the classifier from the booster. :(
https://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.Booster has the details on the booster class so you can review what it can do.
Crazy things you can do however:
You can create a classifier object and then over-ride the booster within it:
xgb_classifier = xgb.XGBClassifier(**xgboost_params)
[..]
xgb_classifier._Boster = booster
This is nearly useless unless you fit it otherwise it doesn't have any feature data. (I didn't go all the way through this scenario to validate if fitting would provide the feature data required to be functional.)
You can remove the booster object from the classifier and then pickle the classifier using xgboost directly. Then later restore the SageMaker booster back into it. This abomination is closer and appears to work, but is not truly a rehydrated classifier object from the SageMaker output alone.
Recommendation
If you’re not stuck using the SageMaker training solution you can certainly use XGBoost directly to train with. At that point you have access to everything you need to dump/save the data for use in a different context.
I know you're after feature importance so I hope this gets you closer, I had a different use case and was ultimately able to leverage the booster for what I needed.
I was able to get xgboost.XGBClassifier model virtually identical to a xgboost.Booster version model by
(1) extracting all tuning parameters from the booster model using this:
import json
json.loads(your_booster_model.save_config())
(2) implementing these same tuning parameters and then training a XGBClassifier model using the same training dataset used to train the Booster model before that.
Note: one mistake I made was that I forgot to explicitly assign the same seed /random_state in both Booster and Classifier versions.
There are two sets of very similar code below with a very simple input as an illustrative example to my question. I think an explanation to the following observation can somehow answer my question. Thanks!
When I run the following code, the model can be trained quickly and can predict good results.
import tensorflow as tf
import numpy as np
from tensorflow import keras
model = tf.keras.Sequential([keras.layers.Dense(units=1, input_shape=[1])])
model.compile(optimizer='sgd', loss='mean_squared_error')
model.fit(xs, ys, epochs=1000)
print(model.predict([7.0]))
However, when i run the following code, which is very similar to the one above, the model is trained very slowly and may not be well trained and give bad predictions (i.e. the loss becomes <1 easily with the code above but stays at around 20000 with the code below)
model = keras.Sequential()# Your Code Here#
model.add(keras.layers.Dense(2,activation = 'relu',input_shape = (1,)))
model.add(keras.layers.Dense(1))
#model.compile(optimizer=tf.train.AdamOptimizer(0.1),
#loss='mean_squared_error')
model.compile(optimizer = tf.train.AdamOptimizer(1),loss = 'mean_squared_error')
#model.compile(# Your Code Here#)
xs = np.array([1,2,3,4,5,6,7,8,9,10], dtype=float)# Your Code Here#
ys = np.array([100,150,200,250,300,350,400,450,500,550], dtype=float)# Your Code Here#
model.fit(xs,ys,epochs = 1000)
print(model.predict([7.0]))
One more note: when I train my model with the second set of code, the model may be well trained occasionally (~8 out of 10 times it is not well trained, and loss remains >10000 after 1000 epochs).
I don't think there is any direct way to choose best deep architecture rather doing multiple experiments by varying hyper-parameters and changing the architecture. Compare the performance of each and every experiment and choose the best one. There are few articles listed below which may be helpful for you.
link-1, link-2, link-3
I am trying to merge two networks. I can accomplish this by doing the following:
merged = Merge([CNN_Model, RNN_Model], mode='concat')
But I get a warning:
merged = Merge([CNN_Model, RNN_Model], mode='concat')
__main__:1: UserWarning: The `Merge` layer is deprecated and will be removed after 08/2017. Use instead layers from `keras.layers.merge`, e.g. `add`, `concatenate`, etc.
So I tried this:
merged = Concatenate([CNN_Model, RNN_Model])
model = Sequential()
model.add(merged)
and got this error:
ValueError: The first layer in a Sequential model must get an `input_shape` or `batch_input_shape` argument.
Can anyone give me the syntax as how I would get this to work?
Don't use sequential models for models with branches.
Use the Functional API:
from keras.models import Model
You're right in using the Concatenate layer, but you must pass "tensors" to it. And first you create it, then you call it with input tensors (that's why there are two parentheses):
concatOut = Concatenate()([CNN_Model.output,RNN_Model.output])
For creating a model out of that, you need to define the path from inputs to outputs:
model = Model([CNN_Model.input, RNN_Model.input], concatOut)
This answer assumes your existing models have only one input and output each.