How to move PyTorch model to GPU on Apple M1 chips?

How to move PyTorch model to GPU on Apple M1 chips? - pytorch

On 18th May 2022, PyTorch announced support for GPU-accelerated PyTorch training on Mac.
I followed the following process to set up PyTorch on my Macbook Air M1 (using miniconda).
conda create -n torch-nightly python=3.8
$ conda activate torch-nightly
$ pip install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpu
I am trying to execute a script from Udacity's Deep Learning Course available here.
The script moves the models to GPU using the following code:
G.cuda()
D.cuda()
However, this will not work on M1 chips, since there is no CUDA.
If we want to move models to M1 GPU and our tensors to M1 GPU, and train entirely on M1 GPU, what should we be doing?
If Relevant: G and D are Discriminator and Generators for GAN's.
class Discriminator(nn.Module):
def __init__(self, conv_dim=32):
super(Discriminator, self).__init__()
self.conv_dim = conv_dim
# complete init function
self.cv1 = conv(in_channels=3, out_channels=conv_dim, kernel_size=4, stride=2, padding=1, batch_norm=False) # 32*32*3 -> 16*16*32
self.cv2 = conv(in_channels=conv_dim, out_channels=conv_dim*2, kernel_size=4, stride=2, padding=1, batch_norm=True) # 16*16*32 -> 8*8*64
self.cv3 = conv(in_channels=conv_dim*2, out_channels=conv_dim*4, kernel_size=4, stride=2, padding=1, batch_norm=True) # 8*8*64 -> 4*4*128
self.fc1 = nn.Linear(in_features = 4*4*conv_dim*4, out_features = 1, bias=True)
def forward(self, x):
# complete forward function
out = F.leaky_relu(self.cv1(x), 0.2)
out = F.leaky_relu(self.cv2(x), 0.2)
out = F.leaky_relu(self.cv3(x), 0.2)
out = out.view(-1, 4*4*conv_dim*4)
out = self.fc1(out)
return out
D = Discriminator(conv_dim)
class Generator(nn.Module):
def __init__(self, z_size, conv_dim=32):
super(Generator, self).__init__()
self.conv_dim = conv_dim
self.z_size = z_size
# complete init function
self.fc1 = nn.Linear(in_features = z_size, out_features = 4*4*conv_dim*4)
self.dc1 = deconv(in_channels = conv_dim*4, out_channels = conv_dim*2, kernel_size=4, stride=2, padding=1, batch_norm=True)
self.dc2 = deconv(in_channels = conv_dim*2, out_channels = conv_dim, kernel_size=4, stride=2, padding=1, batch_norm=True)
self.dc3 = deconv(in_channels = conv_dim, out_channels = 3, kernel_size=4, stride=2, padding=1, batch_norm=False)
def forward(self, x):
# complete forward function
x = self.fc1(x)
x = x.view(-1, conv_dim*4, 4, 4)
x = F.relu(self.dc1(x))
x = F.relu(self.dc2(x))
x = F.tanh(self.dc3(x))
return x
G = Generator(z_size=z_size, conv_dim=conv_dim)

This is what I used:
if torch.backends.mps.is_available():
mps_device = torch.device("mps")
G.to(mps_device)
D.to(mps_device)
Similarly for all tensors that I want to move to M1 GPU, I used:
tensor_ = tensor_(mps_device)
Some operations are ot yet implemented using MPS, and we might need to set a few environment variables to use CPU fall back instead:
One error that I faced during executing the script was
# NotImplementedError: The operator 'aten::_slow_conv2d_forward' is not current implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.
To solve it I set the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1
conda env config vars set PYTORCH_ENABLE_MPS_FALLBACK=1
conda activate <test-env>
References:
https://pytorch.org/blog/introducing-accelerated-pytorch-training-on-mac/
https://pytorch.org/docs/master/notes/mps.html
https://sebastianraschka.com/blog/2022/pytorch-m1-gpu.html
https://sebastianraschka.com/blog/2022/pytorch-m1-gpu.html
https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#setting-environment-variables

I'd like to add to the answer above by specifying that we should make sure we're using the native Python arm64 version (3.9.x) for M1 while installing the mps build. If you're on conda do:
import platform
print(platform.platform())
to check whether x86 or arm64 is being used. The two errors I encountered were:
RuntimeError: Expected one of cpu, cuda, xpu, mkldnn, opengl, opencl, ideep, hip, ve, ort, mlc, xla, lazy, vulkan, meta, hpu device type at start of device string: mps` and `AttributeError: module 'torch.backends' has no attribute 'mps'
This is because even though I had installed the required Pytorch versions, I was still running Python x86.
To fix these, do:
conda create -n py39_native python=3.9 -c conda-forge --override-channels
conda activate py39_native
conda config --env --set subdir osx-arm64
That works for me, although pytorch on MPS is still extremely new and buggy. Hope it gets better soon.

Related

Is there a difference between Pytorch 0.4.1 and 1.1.0

When I finished my training task of semantic segmentation (pytorch 0.4.1 GPU CUDA9.0), and successed in inference of the model(pytorch 0.4.1), however when I switched my pytorch version to 1.1.0, I got slightly different result. What's the problem???

I found where the difference existed using just one Conv2d layer. In pytorch0.4.1, the output of nn.Conv2d and the output of formula are always same. But Sometimes they are different in pytorch1.1. I'm confused!!!!
import torch.nn as nn
torch.set_printoptions(precision=64)
input_t = torch.randn((3,3))
input_t = input_t.unsqueeze(0).unsqueeze(0).float()
class minimodel(nn.Module):
def __init__(self):
super(minimodel, self).__init__()
self.conv = nn.Conv2d(1, 1, kernel_size=3)
def forward(self, x):
x = self.conv(x)
return x
demo_model = minimodel()
weight = torch.load("model.ckpt")
demo_model.load_state_dict(torch.load("model.ckpt"))
demo_model.eval()
output_t = demo_model(input_t)
# torch.save(demo_model.state_dict(),"model.ckpt")
print("#####output of nn.Conv2d#####")
print(output_t)
Kernel_W = weight['conv.weight']
print("####output of formula######")
print(torch.add(weight['conv.bias'],torch.sum(torch.mul(Kernel_W, input_t))))```

module 'tensorflow' has no attribute 'random_uniform'

I tried to perform some deep learning application and got a module 'tensorflow' has no attribute 'random_uniform' error. On CPU the code works fine but it is really slow. In order to run the code on GPU i needed to change some definitions. Here is my code below. Any ideas?
def CapsNet(input_shape, n_class, routings):
x = tf.keras.layers.Input(shape=input_shape)
# Layer 1: Just a conventional Conv2D layer
conv1 = tf.keras.layers.Convolution2D(filters=256, kernel_size=9, strides=1, padding='valid', activation='relu', name='conv1')(x)
# Layer 2: Conv2D layer with `squash` activation, then reshape to [None, num_capsule, dim_capsule]
primarycaps = PrimaryCap(conv1, dim_capsule=8, n_channels=32, kernel_size=9, strides=2, padding='valid')
# Layer 3: Capsule layer. Routing algorithm works here.
digitcaps = CapsuleLayer(num_capsule=n_class, dim_capsule=16, routings=routings,
name='digitcaps')(primarycaps)
# Layer 4: This is an auxiliary layer to replace each capsule with its length. Just to match the true label's shape.
# If using tensorflow, this will not be necessary. :)
out_caps = Length(name='capsnet')(digitcaps)
# Decoder network.
y = tf.keras.layers.Input(shape=(n_class,))
masked_by_y = Mask()([digitcaps, y]) # The true label is used to mask the output of capsule layer. For training
masked = Mask()(digitcaps) # Mask using the capsule with maximal length. For prediction
# Shared Decoder model in training and prediction
decoder = tf.keras.models.Sequential(name='decoder')
decoder.add(tf.keras.layers.Dense(512, activation='relu', input_dim=16*n_class))
decoder.add(tf.keras.layers.Dense(1024, activation='relu'))
decoder.add(tf.keras.layers.Dense(np.prod(input_shape), activation='sigmoid'))
decoder.add(tf.keras.layers.Reshape(target_shape=input_shape, name='out_recon'))
# Models for training and evaluation (prediction)
train_model = tf.keras.models.Model([x, y], [out_caps, decoder(masked_by_y)])
eval_model = tf.keras.models.Model(x, [out_caps, decoder(masked)])
# manipulate model
noise = tf.keras.layers.Input(shape=(n_class, 16))
noised_digitcaps = tf.keras.layers.Add()([digitcaps, noise])
masked_noised_y = Mask()([noised_digitcaps, y])
manipulate_model = tf.keras.models.Model([x, y, noise], decoder(masked_noised_y))
return train_model, eval_model, manipulate_model
def margin_loss(y_true, y_pred):
L = y_true * K.square(K.maximum(0., 0.9 - y_pred)) + \
0.5 * (1 - y_true) * K.square(K.maximum(0., y_pred - 0.1))
return K.mean(K.sum(L, 1))
model, eval_model, manipulate_model = CapsNet(input_shape=train_x_temp.shape[1:], n_class=len(np.unique(np.argmax(train_y, 1))), routings=3)

The problem lays with your tenserflow installation. To be exact your python tensorflow library. Make sure you reinstall the package correctly, with anaconda you need to install it with administrator rights.
Or you have the newest version then you need to add like
tf.random.uniform(
See for more information the documentation: https://www.tensorflow.org/api_docs/python/tf/random/uniform

Upsample ONNX gives INVALID_GRAPH when trying to convert bilinear layer to onnx

when i convert network with bilinear layer trained on Pytorch to ONNX, i get following error
RuntimeError: [ONNXRuntimeError] : 10 : INVALID_GRAPH : Load model
from test.onnx failed:Type Error: Type 'tensor(int64)' of input
parameter (11) of operator (Floor) in node () is invalid.
I m not sure on why this error occurs, I tried building ONNX from source but still the issue dosen't seems to go way.
Any ideas on what might cause this error? or how to tackle the issue?
Way to Reproduce-
from torch import nn
import torch
import torch.nn.functional as F
import onnxruntime as rt
class Upsample(torch.nn.Module):
def forward(self, x):
#l = nn.Conv2d(3, 3, kernel_size=1, stride=1, padding=1, bias=True)
return F.interpolate(x, scale_factor=2, mode="bilinear", align_corners=False)
m = Upsample()
v = torch.randn(1,3,128,128, dtype=torch.float32, requires_grad=False)
torch.onnx.export(m, v, "test.onnx")
sess = rt.InferenceSession("test.onnx")

This error has been fixed in https://github.com/pytorch/pytorch/pull/21434 (the fix is in functional.py), so you should be able to get it if you install pytorch's nightly build.
However, in this same PR, converting Upsample in bilinear mode has been disabled; the reason is that Pytorch's bilinear mode does not align with ONNX's, and Nearest mode is the only mode currently supported.
Upsample (Now called Resize) in ONNX is being updated in opset 11 to support a bilinear mode that aligns with Pytorch in https://github.com/onnx/onnx/pull/2057, but this is not yet pushed.

Changing input dimension for AlexNet

I am beginner and I am trying to implement AlexNet for image classification. The pytorch implementation of AlexNet is as follows:
class AlexNet(nn.Module):
def __init__(self, num_classes=1000):
super(AlexNet, self).__init__()
self.features = nn.Sequential(
nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(64, 192, kernel_size=5, padding=2),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(192, 384, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(384, 256, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(256, 256, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
)
self.avgpool = nn.AdaptiveAvgPool2d((6, 6))
self.classifier = nn.Sequential(
nn.Dropout(),
nn.Linear(256 * 6 * 6, 4096),
nn.ReLU(inplace=True),
nn.Dropout(),
nn.Linear(4096, 4096),
nn.ReLU(inplace=True),
nn.Linear(4096, num_classes),
)
def forward(self, x):
x = self.features(x)
x = self.avgpool(x)
x = x.view(x.size(0), 256 * 6 * 6)
x = self.classifier(x)
return x
However I am trying to implement the network for a input size of (3,448,224) with num of classes = 8.
I have no idea on how to change x.view in the forward method and how many layers I should drop to get optimum performance. Please help.

As stated in https://github.com/pytorch/vision/releases:
Since, most of the pretrained models provided in torchvision (the newest version) already added self.avgpool = nn.AdaptiveAvgPool2d((size, size)) to resolve the incompatibility with input size. So you don't have to care about it so much.
Below is the code, very short.
import torchvision
import torch.nn as nn
num_classes = 8
model = torchvision.models.alexnet(pretrained=True)
# replace the last classifier
model.classifier[6] = nn.Linear(4096, num_classes)
# now you can trained it with your dataset of size (3, 448, 224)
Transfer learning
There are two popular ways to do transfer learning. Suppose that we trained a model M in very large dataset D_large, now we would like to transfer the "knowledge" learned by the model M to our new model, M', on other datasets such as D_other (which has a smaller size than that of D_large).
Use (most) parts of M as the architecture of our new M' and initialize those parts with the weights trained on D_large. We can start training the model M' on the dataset D_other and let it learn the weights of those above parts from M to find the optimal weights on our new dataset. This is usually referred as fine-tuning the model M'.
Same as the above method except that before training M' we freeze all the parameters of those parts and start training M' on our dataset D_other. In both cases, those parts from M are mostly the first components in the model M' (the base). However, in this case, we refer those parts of M as the model to extract the features from the input dataset (or feature extractor). The accuracy obtained from the two methods may differ a little to some extent. However, this method guarantees the model doesn't overfit on the small dataset. It's a good point in terms of accuracy. On the other hands, when we freeze the weights of M, we don't need to store some intermediate values (the hidden outputs from each hidden layer) in the forward pass and also don't need to compute the gradients during the backward pass. This improves the speed of training and reduces the memory required during training.
The implementation
Along with Alexnet, a lot of pretrained models on ImageNet is already provided by Facebook team such as ResNet, VGG.
To fit your requirements the most in the aspect of model size, it would be nice to use VGG11, and ResNet which have fewest parameters in their model family.
I just pick VGG11 as an example:
Obtain a pretrained model from torchvision.
Freeze the all the parameters of this model.
Replace the last layer in the model by your new Linear layer to perform your classification. This means that you can reuse all most everything of M to M'.
import torchvision
# obtain the pretrained model
model = torchvision.models.vgg11(pretrained=True)
# freeze the params
for param in net.parameters():
param.requires_grad = False
# replace with your classifier
num_classes = 8
net.classifier[6] = nn.Linear(in_features=4096, out_features=num_classes)
# start training with your dataset
Warnings
In the old torchvision package version, there is no self.avgpool = nn.AdaptiveAvgPool2d((size, size)) which makes harder to train on our input size which is different from [3, 224, 224] used in training ImageNet. You can do a little effort as below:
class OurVGG11(nn.Module):
def __init__(self, num_classes=8):
super(OurVGG11, self).__init__()
self.vgg11 = torchvision.models.vgg11(pretrained=True)
for param in self.vgg11.parameters():
param.requires_grad = False
# Add a avgpool here
self.avgpool = nn.AdaptiveAvgPool2d((7, 7))
# Replace the classifier layer
self.vgg11.classifier[-1] = nn.Linear(4096, num_classes)
def forward(self, x):
x = self.vgg11.features(x)
x = self.avgpool(x)
x = x.view(x.size(0), 512 * 7 * 7)
x = self.vgg11.classifier(x)
return x
model = OurVGG11()
# now start training `model` on our dataset.
Try out with different models in torchvision.models.

Are there any computational efficiency differences between nn.functional() Vs nn.sequential() in PyTorch

The following is a Feed-forward network using the nn.functional() module in PyTorch
import torch.nn as nn
import torch.nn.functional as F
class newNetwork(nn.Module):
def __init__(self):
super().__init__()
self.fc1 = nn.Linear(784, 128)
self.fc2 = nn.Linear(128, 64)
self.fc3 = nn.Linear(64,10)
def forward(self,x):
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = F.softmax(self.fc3(x))
return x
model = newNetwork()
model
The following is the same Feed-forward using nn.sequential() module to essentially build the same thing. What is the difference between the two and when would i use one instead of the other?
input_size = 784
hidden_sizes = [128, 64]
output_size = 10
Build a feed-forward network
model = nn.Sequential(nn.Linear(input_size, hidden_sizes[0]),
nn.ReLU(),
nn.Linear(hidden_sizes[0], hidden_sizes[1]),
nn.ReLU(),
nn.Linear(hidden_sizes[1], output_size),
nn.Softmax(dim=1))
print(model)

There is no difference between the two. The latter is arguably more concise and easier to write and the reason for "objective" versions of pure (ie non-stateful) functions like ReLU and Sigmoid is to allow their use in constructs like nn.Sequential.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to move PyTorch model to GPU on Apple M1 chips? - pytorch

Related

Is there a difference between Pytorch 0.4.1 and 1.1.0

module 'tensorflow' has no attribute 'random_uniform'

Upsample ONNX gives INVALID_GRAPH when trying to convert bilinear layer to onnx

Changing input dimension for AlexNet

Are there any computational efficiency differences between nn.functional() Vs nn.sequential() in PyTorch

Categories

Resources