torch.to(device) Not in effect - pytorch

token_embeddings.to(device)
print(device)
print(token_embeddings.device)
cuda:0
cpu
I use 'torch.to(device)', but it is not in effect.

.to() is not an in-place operation. So you need to do:
token_embeddings = token_embeddings.to(device)

Related

When would I use model.to("cuda:1") as opposed to model.to("cuda:0")?

I have a user with two GPU's; the first one is AMD which can't run CUDA, and the second one is a cuda-capable NVIDIA GPU. I am using the code model.half().to("cuda:0"). I'm not sure if the invocation successfully used the GPU, nor am I able to test it because I don't have any spare computer with more than 1 GPU lying around.
In this case, does "cuda:0" mean the first device which can run CUDA, so it would've worked even if their first device was AMD? Or would I need to say "cuda:1" instead? How would I detect which number is the first CUDA-capable device?
The package nvidia-smi can help to track GPU's memory while running your code.
To install, run pip install nvidia-ml-py3. Take a look at this code snip:
import nvidia_smi
cuda_idx = 0 # edit device index that you want to track
to_cuda = f'cuda:{cuda_idx}' # 'cuda:0' in this case
nvidia_smi.nvmlInit()
handle = nvidia_smi.nvmlDeviceGetHandleByIndex(cuda_idx)
def B2G(num):
return round(num/(1024**3),2)
def print_memory(name, handle, pre_used):
info = nvidia_smi.nvmlDeviceGetMemoryInfo(handle)
used = info.used
print(f'{name}: {B2G(used)}')
print(f'This step use: {B2G(used-pre_used)}')
print('------------')
return used
# start
mem = print_memory('Start', handle, 0)
model = ... # init your model
model.to(to_cuda)
mem = print_memory('Init model', handle, mem)
Above is the example with nvidia-smi that can help to track the memory that needs for each part of the model and print it in GB unit.
Edited: To check the list of GPUs:
def check_gpu():
for i in range(torch.cuda.device_count()):
device_name = f'cuda:{i}'
print(f'{i} device name:{torch.cuda.get_device_name(torch.device(device_name))}')
I tested it and as I suspected the model.half().to("cuda:0") will put your model in the first available GPU with CUDA support i.e. NVIDIA GPU in your case, the AMD GPU isn't visible as a cuda device, feel safe to assume cuda:0 is only a CUDA enabled GPU, and AMD GPU won't be seen by your program.
Have a good day.
There are plenty of methods of torch.cuda to query and monitor GPU devices.
For example, you can check the type of each device:
torch.cuda.get_device_name(torch.device('cuda:0'))
% or
torch.cuda.get_device_name(torch.device('cuda:1'))
In my case, the output of get_device_name returns:
'Quadro RTX 6000'
If you want a more programmatic way to explore the properties of your devices, you can use torch.cuda.get_device_properties.
Once you are working with a device (or believe you are), you can use [torch.cuda]'s memory management functions to monitor GPU memory usage.
For instance, you can get a very detailed account of the current state of your device's memory using:
torch.cuda.memory_stats(torch.device('cuda:0'))
% or
torch.cuda.memory_stats(torch.device('cuda:0'))
If you want nvidia-smi-like stats on utilization, you can use torch.cuda.utilization

Choose 2nd GPU on server

I am running code on server. There are 2 GPUs there, and the 1st one is busy. Yet, I can't find a way to switch between them. I am using pytorch if that is important. Following lines of code should be modified:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
Modification may be stated only here.
Thanks.
cuda by defaults chooses cuda:0, switching to the other GPU may be done through cuda:1
So, your line becomes:
device = 'cuda:1' if torch.cuda.is_available() else 'cpu'
You can read more about CUDA semantics.
Here is the way I'm doing it while using FastAI and pre-trained model for inference.
First, while model definition with fai (import fastai.vision.all as fai) I obtain the model instance and put it to specified GPU (say with gpu_id=3):
model = fai.nn.Sequential(body, head)
model.cuda(device=gpu_id)
Then, while loading model weights I also specify which device to use (otherwise it creates the copy of a model in GPU 0):
model.load_state_dict(torch.load(your_model_state_filepath, torch.device(gpu_id)))

Why is the device parameter of torch.cuda.device probably greater than torch.cuda.device_count()?

It's the code:
torch.cuda.device_count()
# output: 1
d2 = torch.cuda.device(2)
# d2.idx = 2
Is it for the model running on other device?
torch.cuda.device_count() returns the number of GPUs available, whereas torch.cuda.device(device) is a context-manager. It facilitates the proper handling of resources, i.e., automatic setup and release of resources after usage. So, the argument you are passing in this function doesn't mean you are switching/setting to that device. In order to set current device, you should use torch.cuda.set_device(device). Although, as per the official documentation here,
Usage of this function is discouraged in favor of device. In most
cases it’s better to use CUDA_VISIBLE_DEVICES environmental variable.
To know more about torch.cuda.device() and how it works, go through this official pytorch discussion.

What type of FEI should I use?

I have a device that has one 40MHz wideband input and can output one wideband output and 8 narrowband DDC channels from a device. I am not sure how to setup the FEI interface for this device. Should it be a CHANNELIZER or an RX_DIGITIZER_CHANNELIZER? I am leaning towards the latter due to the wideband output.
Regarding the allocation of such a device. For the RX_DIGITIZER_CHANNELIZER types, the manual is vague in the use of the CHANNELIZER portion in this instance. Should I still allow CHANNELIZERS to be allocated or just allocate the RX_DIGITIZER_CHANNELIZERs and DDCs? Should changes to the RX_DIGITIZER_CHANNELIZER determine when to drop the DDCs in this instance? If CHANNELIZERS can still be allocated with a RX_DIGITIZER_CHANNELIZER, how does this work?
A channelizer device takes already digitized data and provides DDCs from this digital stream. A RX_DIGITIZER_CHANNELIZER has a analog input and performs the receiver, digitizer and channelizer function in a non-separable device. It sounds like you choose correctly to use the RX_DIGITIZER_CHANNELIZER for a device with an analog input. Typically allocating the RX_DIGITIZER_CHANNELIZER takes care of allocating the Channelizer because all three parts are linked together. Therefore the only additional allocations are usually DDCs.

Difference between enable_irq_wake and enable_irq

My driver is using irq which can wake up the device, enable_irq_wake is enough or i need to first enable_irq and then set enable_irq_wake.
Looked into the definition of these functions, not able to understand much.
I tried using both the combination it does not seem to have any effect. I mean just use enable_irq_wake and in other case use enable_irq then enable_irq_wake.
Thank you
If you check here, you'll see that enable_irq_wake invokes set_irq_wake_real that does not enable the irq.
Further more, take for example this driver: they enable/disable_irq the irq at open/close, while they enable/disable_irq_wake at suspend/resume.

Resources