CUDA vs. DataParallel: Why the difference?

CUDA vs. DataParallel: Why the difference? - pytorch

I have a simple neural network model and I apply either cuda() or DataParallel() on the model like following.
model = torch.nn.DataParallel(model).cuda()
OR,
model = model.cuda()
When I don't use DataParallel, rather simply transform my model to cuda(), I need to explicitly convert the batch inputs to cuda() and then give it to the model, otherwise it returns the following error.
torch.index_select received an invalid combination of arguments - got (torch.cuda.FloatTensor, int, torch.LongTensor)
But with DataParallel, the code works fine. Rest of the other things are same. Why this happens? Why when I use DataParallel, I don't need to transform the batch inputs explicitly to cuda()?

Because, DataParallel allows CPU inputs, as it's first step is to transfer inputs to appropriate GPUs.
Info source: https://discuss.pytorch.org/t/cuda-vs-dataparallel-why-the-difference/4062/3

Related

How can I extract all arguments I am passing to a TensorFlow function?

It is difficult to retrain my models in new data because I never remember my initial optimizer, loss function, and hyperparameters. How can I extract all arguments I am passing to a TensorFlow function? Let's say from the code below, how to extract a list with the arguments learning_rate, beta_1, beta_2, and so on.
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001,
beta_1=0.9,beta_2=0.999,
epsilon=1e-07, amsgrad=False,
name="Adam")
I just want to extract names thus I can later on call them by for example:
optimizer.learning_rate
I have try .keys(), .classes(), but nothing work. Of course I can inspect it using dir(optimizer) but the output is not filtered.

I just found a way. The drawback it requires compiling the model first. I will post it because maybe someone has the same issue.
model.optimizer.get_config()

Build a pytorch model wrap around another pytorch model

Is it possible to wrap a pytorch model inside another pytorch module? I could not do it the normal way like in transfer learning (simply concatenating some more layers) because in order to get the intended value for the next 'layer', I need to wait the last layer of the first module to generate multiple outputs (say 100) and to use all those outputs to get the value for the next 'layer' (say taking the max of those outputs). I tried to define the integrated model as something like the following:
class integrated(nn.Module):
def __init__(self):
super(integrated, self)._init_()
def forward(self, x):
model = VAE(
encoder_layer_sizes=args.encoder_layer_sizes,
latent_size=args.latent_size,
decoder_layer_sizes=args.decoder_layer_sizes,
conditional=args.conditional,
num_labels=10 if args.conditional else 0).to(device)
device = torch.device('cpu')
model.load_state_dict(torch.load(r'...')) # the first model is saved somewhere else beforehand
model.eval()
temp = []
for j in range(100):
x = model(x)
temp.append(x)
y=max(temp)
return y
The reason I would like to do that is the library I need to use requires the input itself to be a pytorch module. Otherwise I could simply leave the last part outside of the module.

Yes you can definitely use a Pytorch module inside another Pytorch module. The way you are doing this in your example code is a bit unusual though, as external modules (VAE, in your case) are more often initialized in the __init__ function and then saved as attributes of the main module (integrated). Among other things, this avoids having to reload the sub-module every time you call forward.
One other thing that looks a bit funny is your for loop over repeated invocations of model(x). If there is no randomness involved in model's evaluation, then you would only need a single call to model(x), since all 100 calls will give the same value. So assuming there is some randomness, you should consider whether you can get the desired effect by batching together 100 copies of x and using a single call to model with this batched input. This ultimately depends on additional information about why you are calling this function multiple times on the same input, but either way, using a single batched evaluation will be a lot faster than using many unbatched evaluations.

example of doing simple prediction with pytorch-lightning

I have an existing model where I load some pre-trained weights and then do prediction (one image at a time) in pytorch. I am trying to basically convert it to a pytorch lightning module and am confused about a few things.
So currently, my __init__ method for the model looks like this:
self._load_config_file(cfg_file)
# just creates the pytorch network
self.create_network()
self.load_weights(weights_file)
self.cuda(device=0) # assumes GPU and uses one. This is probably suboptimal
self.eval() # prediction mode
What I can gather from the lightning docs, I can pretty much do the same, except not to do the cuda() call. So something like:
self.create_network()
self.load_weights(weights_file)
self.freeze() # prediction mode
So, my first question is whether this is the correct way to use lightning? How would lightning know if it needs to use the GPU? I am guessing this needs to be specified somewhere.
Now, for the prediction, I have the following setup:
def infer(frame):
img = transform(frame) # apply some transformation to the input
img = torch.from_numpy(img).float().unsqueeze(0).cuda(device=0)
with torch.no_grad():
output = self.__call__(Variable(img)).data.cpu().numpy()
return output
This is the bit that has me confused. Which functions do I need to override to make a lightning compatible prediction?
Also, at the moment, the input comes as a numpy array. Is that something that would be possible from the lightning module or do things always have to use some sort of a dataloader?
At some point, I want to extend this model implementation to do training as well, so want to make sure I do it right but while most examples focus on training models, a simple example of just doing prediction at production time on a single image/data point might be useful.
I am using 0.7.5 with pytorch 1.4.0 on GPU with cuda 10.1

LightningModule is a subclass of torch.nn.Module so the same model class will work for both inference and training. For that reason, you should probably call the cuda() and eval() methods outside of __init__.
Since it's just a nn.Module under the hood, once you've loaded your weights you don't need to override any methods to perform inference, simply call the model instance. Here's a toy example you can use:
import torchvision.models as models
from pytorch_lightning.core import LightningModule
class MyModel(LightningModule):
def __init__(self):
super().__init__()
self.resnet = models.resnet18(pretrained=True, progress=False)
def forward(self, x):
return self.resnet(x)
model = MyModel().eval().cuda(device=0)
And then to actually run inference you don't need a method, just do something like:
for frame in video:
img = transform(frame)
img = torch.from_numpy(img).float().unsqueeze(0).cuda(0)
output = model(img).data.cpu().numpy()
# Do something with the output
The main benefit of PyTorchLighting is that you can also use the same class for training by implementing training_step(), configure_optimizers() and train_dataloader() on that class. You can find a simple example of that in the PyTorchLightning docs.

Even though above answer suffices, if one takes note of following line
img = torch.from_numpy(img).float().unsqueeze(0).cuda(0)
One has to put both the model as well as image to the right GPU. On multi-gpu inference machine, this becomes a hassle.
To solve this, .predict was also recently produced, see more at https://pytorch-lightning.readthedocs.io/en/stable/deploy/production_basic.html

DC - Generative Adverserial Network. Issues in understanding code

This a part of the code for a Deconvolutional-Convoltional Generative Adversarial Network (DC-GAN)
discriminator.trainable = False
ganInput = Input(shape=(100,))
# getting the output of the generator
# and then feeding it to the discriminator
# new model = D(G(input))
x = generator(ganInput)
ganOutput = discriminator(x)
gan = Model(input=ganInput, output=ganOutput)
gan.compile(loss='binary_crossentropy', optimizer=Adam())
I do not understand what the line ganInput = Input(shape=(100,)) does.
Clearly ganInput is a variable but what is Input? Is it a function ?
If Input is a function then what will ganInput contain ?
Then it is ganInput is fed into the generator since it is an empty variable (assuming) it will not matter. Next ganOutput catches the output of the discriminator.
Then comes the problem. I read about the Model API but I do not understand fully what it does.
To summarise these are my problems : What is the role of ganInput and what is Input in the second line. And what is Model doing and what is it?
Using Keras with TensorFlow backend
COMPLETE SOURCE CODE : https://github.com/yashk2810/DCGAN-Keras/blob/master/DCGAN.ipynb
Please ask for any more clarification / details required. If you know the answer to even one of my queries I will request you to please answer it will be a huge help. Thanks

What is input: Notice the wildcard import of keras.layers. In context, Input is keras.layers.Input. Generally, if you see a function or class that wasn't defined or explicitly imported in Python, it got there via a wildcard import, i.e.
from keras.layers import *
That means import everything from keras.layers directly into the workspace.
What is model: The model object is essentially the interface for making neural networks with Keras.
You can read about model and keras.layers.Input at the model docs or at this model guide since I'm not very familiar with Keras.
What's going on in the example is they define generator and discriminator as Sequentials. But the GAN model is a little more complex than a standard old Sequential. The authors deal with that by marking the data that needs fed in at every iteration (in this case, just the random noise for the generator - ganInput) as a keras.layers.Input. Then, like you said, ganOutput catches the output of the discriminator. Since we have two distinct Sequentials that need wrapped together, the authors use the model API.

when do you use Input shape vs batch_shape in keras?

I don't find API that explains keras Input.
When should you use shape attribute vs batch_shape attribute?

From the Keras source code:
Arguments
shape: A shape tuple (integer), not including the batch size.
For instance, `shape=(32,)` indicates that the expected input
will be batches of 32-dimensional vectors.
batch_shape: A shape tuple (integer), including the batch size.
For instance, `batch_shape=(10, 32)` indicates that
the expected input will be batches of 10 32-dimensional vectors.
`batch_shape=(None, 32)` indicates batches of an arbitrary number
of 32-dimensional vectors.
The batch size is how many examples you have in your training data.
You can use any. Personally I never used "batch_shape". When you use "shape", your batch can be any size, you don't have to care about it.
shape=(32,) means exactly the same as batch_shape=(None,32)

To expand on Daniel's answer, one case I've found where it's necessary to specify batch_shape instead of shape to an Input layer is when you are using stateful LSTMs in the functional API. It's described well in Phillipe Remy's blog. In short, the stateful mode allows you to keep the hidden state values in an LSTM across batches (they usually get reset every batch if the default stateful=False is set). That means it needs knowledge about the batch size in order to shape everything properly. If you don't do this, it yells at you:
ValueError: If a RNN is stateful, it needs to know its batch size. Specify the batch size of your input tensors:
- If using a Sequential model, specify the batch size by passing a `batch_input_shape` argument to your first layer.
- If using the functional API, specify the batch size by passing a `batch_shape` argument to your Input layer.
The second point is the relevant one here. If using LSTM with stateful=True in the functional API, you need to set batch_shape for your Input layers.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

CUDA vs. DataParallel: Why the difference? - pytorch

Because, DataParallel allows CPU inputs, as it's first step is to transfer inputs to appropriate GPUs. Info source: https://discuss.pytorch.org/t/cuda-vs-dataparallel-why-the-difference/4062/3

Related

How can I extract all arguments I am passing to a TensorFlow function?

Build a pytorch model wrap around another pytorch model

example of doing simple prediction with pytorch-lightning

DC - Generative Adverserial Network. Issues in understanding code

when do you use Input shape vs batch_shape in keras?

Categories

Resources