Running model in GPU using OpenVino - openvino

I can run model on CPU successfully.
Once run on GPU, I have error as
[ ERROR ] Check 'get_element_type().is_dynamic() || get_element_type() == element_type' failed at C:\j\workspace\private-ci\ie\build-windows-vs2019#2\b\repos\openvino\ngraph\core\src\runtime\host_tensor.cpp:174:
Can not change a static element type
How can I solve?

You might need to do the Additional Installation Steps for Intel® Processor Graphics (GPU) or your model may not work on GPU. Refer to the Public Pre-Trained Models and Intel's Pre-Trained Models to check the Models Device Support. Also, please ensure that you are following all the System requirements needed.

Related

Torchvision model cannot be loaded from storage when no GPU availabe

I trained a torchvision mask r-cnn model on GPU and saved it to disk using torch.save(model, model_name). On another machine, without GPU, I try to load it again using torch.load(model_name). The model cannot be deserializised because torch does not know about device cuda:0.
How can I 'convert' such a model to be used on non-GPU environments?
I assume it is best practice to move a model to CPU before saving it?
torch.load() has an argument map_location where you can specify the device. So you can use
torch.load(..., map_location='cpu')
or specify any other device to directly load it there.

Training a model on GPU is very slow

I am using A100-SXM4-40GB Gpu but training is terribly slow. I tried two models, a simple classification on cifar and a Unet on Cityscapes. I tried my code on other GPUs and it worked totally fine, but I do not know why training on this high capacity GPU is super slow.
I would appreciate any help.
Here are some other properties of GPUs.
GPU 0: A100-SXM4-40GB
GPU 1: A100-SXM4-40GB
GPU 2: A100-SXM4-40GB
GPU 3: A100-SXM4-40GB
Nvidia driver version: 460.32.03
cuDNN version: Could not collect
Thank you for your answer. Before trying your answer, I decided to uninstall anaconda and reinstall it and this solved the problem.
Call .cuda() on the model during initialization.
As per your above comments, you have GPUs, as well as CUDA installed, so there's no point of checking the device availability with torch.cuda.is_available().
Additionally, you should wrap your model in nn.DataParallel to allow PyTorch use every GPU you expose it to. You also could do DistributedDataParallel, but DataParallel is easier to grasp initially.
Example initialization:
model = UNet().cuda()
model = torch.nn.DataParallel(model)
Also, you can be sure you're exposing the code to all GPUs by executing the python script with the following flag:
CUDA_VISIBLE_DEVICES=0,1,2,3 python3 train_unet.py
Last thing to note - nn.DataParallel encapsulates the model itself, so for saving the state_dict, you'll need to reach module inside DataParallel:
torch.save(model.module.state_dict(), 'unet.pth')

If I Trace a PyTorch Network on Cuda, can I use it on CPU?

I traced my Neural Network using torch.jit.trace on a CUDA-compatible GPU server. When I reloaded that Trace on the same server, I could reload it and use it fine. Now, when I downloaded it onto my laptop (for quick testing), when I try to load the trace I get:
RuntimeError: Could not run 'aten::empty_strided' with arguments from the 'CUDA' backend. 'aten::empty_strided' is only available for these backends: [CPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].
Can I not switch between GPU and CPU on a trace? Or is there something else going on?
I had this exact same issue. In my model I had one line of code that was causing this:
if torch.cuda.is_available():
weight = weight.cuda()
If you have a look at the official documentation for trace (https://pytorch.org/docs/stable/generated/torch.jit.trace.html) you will see that
the returned ScriptModule will always run the same traced graph on any input. This has some important implications when your module is expected to run different sets of operations, depending on the input and/or the module state
So, if the model was traced on a machine with GPU this operation will be recorded and you won't be able to even load your model a CPU only machine. To solve this, deleted everything that makes you model CUDA dependent. In my case it was as easy as deleting the code-block above.

How to use multiple cores in Caffe2 on mobile?

I’ve got a simple model consisting only of convolutions (even no activation between) and I wanted to benchmark it in Caffe2 on ARM Android device using multiple cores.
When I run
./speed_benchmark --init_net=model_for_inference-simplified-init-net.pb --net=model_for_inference-simplified-predict-net.pb --iter=1
it runs on single core.
Speed benchmark was built using:
scripts/build_android.sh -DANDROID_ABI=arm64-v8a -DANDROID_TOOLCHAIN=clang -DBUILD_BINARY=ON
On X86 it has been build via
mkdir build
cd build
cmake .. -DBUILD_BINARY=ON
and setting OMP_NUM_THREADS=8 helps but not on ARM
Do I need to change the building command for arm, set some environmental variables, some binary arguments or something else?
I didn't know that I need to set the engine information in the model like described in https://caffe2.ai/docs/mobile-integration.html
After updating prediction net by:
for op in predict_net.op:
if op.type == 'Conv':
op.engine = 'NNPACK'
more cores started to be used

Theano Installation On windows 64

Im new in Python and Theano library. I want to install Theano on windows 7-64. I have a display adapters :
Intel(R) HD Graphics 3000 which is not compatible with NVIDA.
My QUESTIONS:
1-Is obligatory to install CUDA to i can use Theano?
2- Even if i have an Ubuntu opearting system, with the same display adapters, CUDA still mandatory?
Any help!
Thanks
You do not need CUDA to run Theano.
Theano can run on either CPU or GPU. If you want to run on GPU you must (currently) use CUDA which means you must be using a NVIDIA display adapter. Without CUDA/NVIDIA you must run on CPU.
There is no disadvantage to running on CPU other than speed -- Theano can be much faster on GPU but everything that runs on a GPU will also run on a CPU as long as it has been coded generically (the default and standard method for Theano code).

Resources