token_embeddings.to(device)
print(device)
print(token_embeddings.device)
cuda:0
cpu
I use 'torch.to(device)', but it is not in effect.
.to() is not an in-place operation. So you need to do:
token_embeddings = token_embeddings.to(device)
Related
I have a user with two GPU's; the first one is AMD which can't run CUDA, and the second one is a cuda-capable NVIDIA GPU. I am using the code model.half().to("cuda:0"). I'm not sure if the invocation successfully used the GPU, nor am I able to test it because I don't have any spare computer with more than 1 GPU lying around.
In this case, does "cuda:0" mean the first device which can run CUDA, so it would've worked even if their first device was AMD? Or would I need to say "cuda:1" instead? How would I detect which number is the first CUDA-capable device?
The package nvidia-smi can help to track GPU's memory while running your code.
To install, run pip install nvidia-ml-py3. Take a look at this code snip:
import nvidia_smi
cuda_idx = 0 # edit device index that you want to track
to_cuda = f'cuda:{cuda_idx}' # 'cuda:0' in this case
nvidia_smi.nvmlInit()
handle = nvidia_smi.nvmlDeviceGetHandleByIndex(cuda_idx)
def B2G(num):
return round(num/(1024**3),2)
def print_memory(name, handle, pre_used):
info = nvidia_smi.nvmlDeviceGetMemoryInfo(handle)
used = info.used
print(f'{name}: {B2G(used)}')
print(f'This step use: {B2G(used-pre_used)}')
print('------------')
return used
# start
mem = print_memory('Start', handle, 0)
model = ... # init your model
model.to(to_cuda)
mem = print_memory('Init model', handle, mem)
Above is the example with nvidia-smi that can help to track the memory that needs for each part of the model and print it in GB unit.
Edited: To check the list of GPUs:
def check_gpu():
for i in range(torch.cuda.device_count()):
device_name = f'cuda:{i}'
print(f'{i} device name:{torch.cuda.get_device_name(torch.device(device_name))}')
I tested it and as I suspected the model.half().to("cuda:0") will put your model in the first available GPU with CUDA support i.e. NVIDIA GPU in your case, the AMD GPU isn't visible as a cuda device, feel safe to assume cuda:0 is only a CUDA enabled GPU, and AMD GPU won't be seen by your program.
Have a good day.
There are plenty of methods of torch.cuda to query and monitor GPU devices.
For example, you can check the type of each device:
torch.cuda.get_device_name(torch.device('cuda:0'))
% or
torch.cuda.get_device_name(torch.device('cuda:1'))
In my case, the output of get_device_name returns:
'Quadro RTX 6000'
If you want a more programmatic way to explore the properties of your devices, you can use torch.cuda.get_device_properties.
Once you are working with a device (or believe you are), you can use [torch.cuda]'s memory management functions to monitor GPU memory usage.
For instance, you can get a very detailed account of the current state of your device's memory using:
torch.cuda.memory_stats(torch.device('cuda:0'))
% or
torch.cuda.memory_stats(torch.device('cuda:0'))
If you want nvidia-smi-like stats on utilization, you can use torch.cuda.utilization
I am running code on server. There are 2 GPUs there, and the 1st one is busy. Yet, I can't find a way to switch between them. I am using pytorch if that is important. Following lines of code should be modified:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
Modification may be stated only here.
Thanks.
cuda by defaults chooses cuda:0, switching to the other GPU may be done through cuda:1
So, your line becomes:
device = 'cuda:1' if torch.cuda.is_available() else 'cpu'
You can read more about CUDA semantics.
Here is the way I'm doing it while using FastAI and pre-trained model for inference.
First, while model definition with fai (import fastai.vision.all as fai) I obtain the model instance and put it to specified GPU (say with gpu_id=3):
model = fai.nn.Sequential(body, head)
model.cuda(device=gpu_id)
Then, while loading model weights I also specify which device to use (otherwise it creates the copy of a model in GPU 0):
model.load_state_dict(torch.load(your_model_state_filepath, torch.device(gpu_id)))
It's the code:
torch.cuda.device_count()
# output: 1
d2 = torch.cuda.device(2)
# d2.idx = 2
Is it for the model running on other device?
torch.cuda.device_count() returns the number of GPUs available, whereas torch.cuda.device(device) is a context-manager. It facilitates the proper handling of resources, i.e., automatic setup and release of resources after usage. So, the argument you are passing in this function doesn't mean you are switching/setting to that device. In order to set current device, you should use torch.cuda.set_device(device). Although, as per the official documentation here,
Usage of this function is discouraged in favor of device. In most
cases it’s better to use CUDA_VISIBLE_DEVICES environmental variable.
To know more about torch.cuda.device() and how it works, go through this official pytorch discussion.
I have a device that has one 40MHz wideband input and can output one wideband output and 8 narrowband DDC channels from a device. I am not sure how to setup the FEI interface for this device. Should it be a CHANNELIZER or an RX_DIGITIZER_CHANNELIZER? I am leaning towards the latter due to the wideband output.
Regarding the allocation of such a device. For the RX_DIGITIZER_CHANNELIZER types, the manual is vague in the use of the CHANNELIZER portion in this instance. Should I still allow CHANNELIZERS to be allocated or just allocate the RX_DIGITIZER_CHANNELIZERs and DDCs? Should changes to the RX_DIGITIZER_CHANNELIZER determine when to drop the DDCs in this instance? If CHANNELIZERS can still be allocated with a RX_DIGITIZER_CHANNELIZER, how does this work?
A channelizer device takes already digitized data and provides DDCs from this digital stream. A RX_DIGITIZER_CHANNELIZER has a analog input and performs the receiver, digitizer and channelizer function in a non-separable device. It sounds like you choose correctly to use the RX_DIGITIZER_CHANNELIZER for a device with an analog input. Typically allocating the RX_DIGITIZER_CHANNELIZER takes care of allocating the Channelizer because all three parts are linked together. Therefore the only additional allocations are usually DDCs.
My driver is using irq which can wake up the device, enable_irq_wake is enough or i need to first enable_irq and then set enable_irq_wake.
Looked into the definition of these functions, not able to understand much.
I tried using both the combination it does not seem to have any effect. I mean just use enable_irq_wake and in other case use enable_irq then enable_irq_wake.
Thank you
If you check here, you'll see that enable_irq_wake invokes set_irq_wake_real that does not enable the irq.
Further more, take for example this driver: they enable/disable_irq the irq at open/close, while they enable/disable_irq_wake at suspend/resume.