Using a specific device for Run and Debug in PyTorch - pytorch

I want to use a specific GPU. I followed the instructions of some posts including (1, 2) but they don't work for me. More specifically, I when I run the code for debugging with vscode, even when I select the GPU using: dev = torch.device('cuda:1' if torch.cuda.is_available() else 'cpu') the program is run on the device 0 and device 2. How can I fix that?
This is the code I use:
dev = torch.device('cuda:1' if torch.cuda.is_available() else 'cpu')
model = model.to(dev)

Related

pipe to cuda not working stable diffusion

stable diffusion inside a jupyter notebook with cuda 12
Nvidea studio driver on the host Win 11.
PyTorch in other projects runs just fine no problems with cuda.
My jupyterlab sits inside a WSL ubuntu.
My problem I cannot run pipe.to("cuda") with stable diffusion, the image generator.
Which I like to run local for faster generation.
# inside jupyterlab cell:
from huggingface_hub import notebook_login
notebook_login()
although I enter my key hf_asfasfd... I cannot verify login is accepted
but I guess that's normal? kind of weird
# inside the next cell:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)
--> prints cuda
# in another cell:
import torch
from diffusers import StableDiffusionPipeline
from PIL import Image
pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
i can see it downloaded the model so login was OK i guess..
pipe = pipe.to("cuda")
kernel time out I got a 3080TX 12Gig memory stays low no indication it loaded
The fans of the cards also don't spin up which it should if it worked
I also raised the timeout for jupyterlab to 10 minutes with no effect.
The time-out doesn't show any error in the console or in the web page.
# cell that doesn't get executed:
prompt = "a photo of a cat riding a horse on mars"
image = pipe(prompt).images[0]
image.show()
last cell never gets executed
I have altered the config to perform kernel restart after 10 minutes to wait if it takes longer but this has no effect ( when the kernel dies eventually on the Linux prompt I get AsyncIOLoopKernelRestarter: restarting kernel ) but no other errors on the screen or on the web page of the notebook. And this error is just that the timeout (increased to 10 minutes) has passed, so it's hanging
installed all the latest versions of transformers, and diffusers, scipy
Any ideas of what this could be, this code runs fine on google colab.
I know pytorch cuda can work on my machine but it isn't loading the pipeline here

Why is TPU on Google Colab in PyTorch not being detected?

I am using google colab and PyTorch. I set my hardware accelerator to TPU.
This line of code shows that no cuda device is being detected:
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
print(device)
TPUs are not CUDA devices. Please take a look at this Colabthat shows how to run some PyTorch code on TPUs.

When would I use model.to("cuda:1") as opposed to model.to("cuda:0")?

I have a user with two GPU's; the first one is AMD which can't run CUDA, and the second one is a cuda-capable NVIDIA GPU. I am using the code model.half().to("cuda:0"). I'm not sure if the invocation successfully used the GPU, nor am I able to test it because I don't have any spare computer with more than 1 GPU lying around.
In this case, does "cuda:0" mean the first device which can run CUDA, so it would've worked even if their first device was AMD? Or would I need to say "cuda:1" instead? How would I detect which number is the first CUDA-capable device?
The package nvidia-smi can help to track GPU's memory while running your code.
To install, run pip install nvidia-ml-py3. Take a look at this code snip:
import nvidia_smi
cuda_idx = 0 # edit device index that you want to track
to_cuda = f'cuda:{cuda_idx}' # 'cuda:0' in this case
nvidia_smi.nvmlInit()
handle = nvidia_smi.nvmlDeviceGetHandleByIndex(cuda_idx)
def B2G(num):
return round(num/(1024**3),2)
def print_memory(name, handle, pre_used):
info = nvidia_smi.nvmlDeviceGetMemoryInfo(handle)
used = info.used
print(f'{name}: {B2G(used)}')
print(f'This step use: {B2G(used-pre_used)}')
print('------------')
return used
# start
mem = print_memory('Start', handle, 0)
model = ... # init your model
model.to(to_cuda)
mem = print_memory('Init model', handle, mem)
Above is the example with nvidia-smi that can help to track the memory that needs for each part of the model and print it in GB unit.
Edited: To check the list of GPUs:
def check_gpu():
for i in range(torch.cuda.device_count()):
device_name = f'cuda:{i}'
print(f'{i} device name:{torch.cuda.get_device_name(torch.device(device_name))}')
I tested it and as I suspected the model.half().to("cuda:0") will put your model in the first available GPU with CUDA support i.e. NVIDIA GPU in your case, the AMD GPU isn't visible as a cuda device, feel safe to assume cuda:0 is only a CUDA enabled GPU, and AMD GPU won't be seen by your program.
Have a good day.
There are plenty of methods of torch.cuda to query and monitor GPU devices.
For example, you can check the type of each device:
torch.cuda.get_device_name(torch.device('cuda:0'))
% or
torch.cuda.get_device_name(torch.device('cuda:1'))
In my case, the output of get_device_name returns:
'Quadro RTX 6000'
If you want a more programmatic way to explore the properties of your devices, you can use torch.cuda.get_device_properties.
Once you are working with a device (or believe you are), you can use [torch.cuda]'s memory management functions to monitor GPU memory usage.
For instance, you can get a very detailed account of the current state of your device's memory using:
torch.cuda.memory_stats(torch.device('cuda:0'))
% or
torch.cuda.memory_stats(torch.device('cuda:0'))
If you want nvidia-smi-like stats on utilization, you can use torch.cuda.utilization

Choose 2nd GPU on server

I am running code on server. There are 2 GPUs there, and the 1st one is busy. Yet, I can't find a way to switch between them. I am using pytorch if that is important. Following lines of code should be modified:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
Modification may be stated only here.
Thanks.
cuda by defaults chooses cuda:0, switching to the other GPU may be done through cuda:1
So, your line becomes:
device = 'cuda:1' if torch.cuda.is_available() else 'cpu'
You can read more about CUDA semantics.
Here is the way I'm doing it while using FastAI and pre-trained model for inference.
First, while model definition with fai (import fastai.vision.all as fai) I obtain the model instance and put it to specified GPU (say with gpu_id=3):
model = fai.nn.Sequential(body, head)
model.cuda(device=gpu_id)
Then, while loading model weights I also specify which device to use (otherwise it creates the copy of a model in GPU 0):
model.load_state_dict(torch.load(your_model_state_filepath, torch.device(gpu_id)))

Very simple torch.tensor().to("cuda") gives CUDA error: device-side assert triggered

All I do is,
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
torch.tensor([123,123]).to(device)
And I get:
RuntimeError: CUDA error: device-side assert triggered
I really cannot see why.
Edit: Weirdly enough, I see that this happens only I after I run some code before hand. Restarting the kernel solves it. But why, some code can cause a problem like this? I cannot share the code because it's not mine but I still welcome any guesses. Thanks!

Resources