YOLO - tensorflow works on cpu but not on gpu - python-3.x

I've used YOLO detection with trained model using my GPU - Nvidia 1060 3Gb, and everything worked fine.
Now I am trying to generate my own model, with param --gpu 1.0. Tensorflow can see my gpu, as I can read at start those communicates:
"name: GeForce GTX 1060 major: 6 minor: 1 memoryClockRate(GHz): 1.6705"
"totalMemory: 3.00GiB freeMemory: 2.43GiB"
Anyway, later on, when program loads data, and is trying to start learning i got following error:
"failed to allocate 832.51M (872952320 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY"
I've checked if it tries to use my other gpu (Intel 630) , but it doesn't.
As i run the train process without "--gpu" mode option, it works fine, but slowly.
( I've tried also --gpu 0.8, 0.4 ect.)
Any idea how to fix it?

Problem solved. Changing batch size and image size in config file didn't seem to help as they didn't load correctly. I had to go to defaults.py file and change them up there to lower, to make it possible for my GPU to calculate the steps.

Look like your custom model use to much memory and the graphic card cannot support it. You only need to use the --batch option to control the size of memory.

Related

When would I use model.to("cuda:1") as opposed to model.to("cuda:0")?

I have a user with two GPU's; the first one is AMD which can't run CUDA, and the second one is a cuda-capable NVIDIA GPU. I am using the code model.half().to("cuda:0"). I'm not sure if the invocation successfully used the GPU, nor am I able to test it because I don't have any spare computer with more than 1 GPU lying around.
In this case, does "cuda:0" mean the first device which can run CUDA, so it would've worked even if their first device was AMD? Or would I need to say "cuda:1" instead? How would I detect which number is the first CUDA-capable device?
The package nvidia-smi can help to track GPU's memory while running your code.
To install, run pip install nvidia-ml-py3. Take a look at this code snip:
import nvidia_smi
cuda_idx = 0 # edit device index that you want to track
to_cuda = f'cuda:{cuda_idx}' # 'cuda:0' in this case
nvidia_smi.nvmlInit()
handle = nvidia_smi.nvmlDeviceGetHandleByIndex(cuda_idx)
def B2G(num):
return round(num/(1024**3),2)
def print_memory(name, handle, pre_used):
info = nvidia_smi.nvmlDeviceGetMemoryInfo(handle)
used = info.used
print(f'{name}: {B2G(used)}')
print(f'This step use: {B2G(used-pre_used)}')
print('------------')
return used
# start
mem = print_memory('Start', handle, 0)
model = ... # init your model
model.to(to_cuda)
mem = print_memory('Init model', handle, mem)
Above is the example with nvidia-smi that can help to track the memory that needs for each part of the model and print it in GB unit.
Edited: To check the list of GPUs:
def check_gpu():
for i in range(torch.cuda.device_count()):
device_name = f'cuda:{i}'
print(f'{i} device name:{torch.cuda.get_device_name(torch.device(device_name))}')
I tested it and as I suspected the model.half().to("cuda:0") will put your model in the first available GPU with CUDA support i.e. NVIDIA GPU in your case, the AMD GPU isn't visible as a cuda device, feel safe to assume cuda:0 is only a CUDA enabled GPU, and AMD GPU won't be seen by your program.
Have a good day.
There are plenty of methods of torch.cuda to query and monitor GPU devices.
For example, you can check the type of each device:
torch.cuda.get_device_name(torch.device('cuda:0'))
% or
torch.cuda.get_device_name(torch.device('cuda:1'))
In my case, the output of get_device_name returns:
'Quadro RTX 6000'
If you want a more programmatic way to explore the properties of your devices, you can use torch.cuda.get_device_properties.
Once you are working with a device (or believe you are), you can use [torch.cuda]'s memory management functions to monitor GPU memory usage.
For instance, you can get a very detailed account of the current state of your device's memory using:
torch.cuda.memory_stats(torch.device('cuda:0'))
% or
torch.cuda.memory_stats(torch.device('cuda:0'))
If you want nvidia-smi-like stats on utilization, you can use torch.cuda.utilization

Raspberry PI 4 very slow predictions with TensorflowLite object detection api

I'm running TensorFlow lite object detection in raspberry pi 4 model b 8GB of ram and the prediction is very slow at 1.5 to 2 frame rate per second Is there a way to get better performance to improve prediction at least 5 to 10 fps
I would recommend to use this tool to try different running option: https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/tools/benchmark
Any maybe use --enable_op_profiling to see which ops make it slow. A quick fix might be enable multi-threading or use_gpu?
If you build TFLite use cmake, please set TFLITE_ENABLE_RUY=ON.

Keras 2.2.4 runs slowly with backend cntk-gpu version 2.5.1

everyone. I have a problem when I use Keras with backend cntk-gpu version 2.5.1.
For some reason, I have to use cntk-gpu 2.5.1 as Keras backend and I have a piece of code with the core code as follows (really simple code for prediction):
# test_x: test data from dataset imagenet
predict_model = keras.models.load_model(model_path,custom_objects=ModelUtils.custom_objects())
predict_model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
res = predict_model.predict(test_x,batch_size=4)
I found that the cntk-gpu2.7 version only takes 50 seconds, but the version 2.5.1 takes 20 minutes. The console shows that cntk does use gpu because it prints my GPU information like:
Selected GPU[0] GeForce GTX 1080 Ti as the process wide default device
I have tested both on ubuntu18.04 cuda10.0 and ubuntu16.04 cuda9.1. Version 2.4 and below will also encounter this problem
I don't know what the reason is and how to solve the problem.
Have you ever encountered a situation where the cntk-gpu 2.5 version is running very slowly?
I hope to hear from you. Thank you in advance.

Tensorflow could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED

Recently, I try to repeat the deep learning experiment in Github. However, every time I run that experiment, I will receive the following error information.
2018-08-27 09:32:16.827025: E T:\src\github\tensorflow\tensorflow\stream_executor\cuda\cuda_dnn.cc:332] could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED
In this situation, I set the session in Tensorflow as the following.
sess = tf.Session(config=tf.ConfigProto(allow_soft_placement=True, log_device_placement=False))
If I try to limit the GPU memory as the following, I find that I do not have enough memory to run my model.
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))
The information about my GPU is as the following. I am not sure where the problem is and I have met such problems several times. Thank you for your contribution!
2018-08-27 09:31:45.966248: IT:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2018-08-27 09:31:46.199314: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1392] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.683
pciBusID: 0000:01:00.0
totalMemory: 11.00GiB freeMemory: 9.09GiB
sean. According to the documentation. The error status CUDNN_STATUS_ALLOC_FAILED is due problem with the host memory and not the device memory. Check your RAM also.
In my case, this was due to running 2 TensorFlow processes using the GPU simultaneously (either by you or by other users): https://stackoverflow.com/a/53707323/10993413
Source: https://forums.developer.nvidia.com/t/could-not-create-cudnn-handle-cudnn-status-alloc-failed/108261

The meaning of "n_jobs == 1" in GridSearchCV with using multiple GPU

I have been training NN model by using Keras framework with 4 NVIDIA GPU. (Data Row Count: ~160,000, Column Count: 5). Now I want to optimize its parameter by using GridSearchCV.
However, I encountered several different errors whenever I tried to change n_jobs to other values than one. Error, such as
CUDA OUT OF MEMORY
Can not get device properties error code : 3
Then I read this web page,
"# if you're not using a GPU, you can set n_jobs to something other than 1"
http://queirozf.com/entries/scikit-learn-pipeline-examples
So it is not possible to use multiple GPU with GridSearchCV?
[Environment]
Ubuntu 16.04
Python 3.6.0
Keras / Scikit-Learn
Thanks!
According to the FAQ in scikit learn - GPU is NOT supported. Link
You can use n_jobs to use your CPU cores. If you want to run at maximum speed you might want to use almost all your cores:
import multiprocessing
n_jobs = multiprocessing.cpu_count()-1

Resources