opencv doesn't use all GPU memory - python-3.x

I'm trying to use the cvlib package which use yolov3 model to recognize objects on images on windows 10.
Let's take an easy example:
import cvlib as cv
import time
from cvlib.object_detection import draw_bbox
inittimer=time.time()
bbox, label, conf = cv.detect_common_objects(img,confidence=0.5,model='yolov3-worker',enable_gpu=True)
print('The process tooks %.3f s'%(time.time()-inittimer)
output_image = draw_bbox(img, bbox, label, conf)
The results give ~60ms.
cvlib use opencv to compute this cnn part.
If now I try to see how much GPU tensorflow used, using subprocess, It tooks only 824MiB.
while the program runs, if I start nvidia-smi it gives me this result:
As u can see there is much more memory available here. My question is simple.. why Cvlib (and so tensorflow) doesn't use all of it to improve the time's detection?
EDIT:
As far as I understand, cvlib use tensorflow but it also use opencv detector. I installed opencv using cmake and Cuda 10.2
I don't understand why but in the nvidia-smi it's written CUDA Version : 11.0 which is not. Maybe that's the part of the problem?

You can verify if opencv is using CUDA or not. This can be done using the following
import cv2
print(cv2.cuda.getCudaEnabledDeviceCount())
This should get you the number of CUDA enabled devices in your machine. You should also check the build information by using the following
import cv2
print cv2.getBuildInformation()
The output for both the above cases can indicate whether your opencv can access GPU or not. In case it doesn't access GPU then you may consider reinstallation.

I got it! The problem come from the fact that I created a new Net object for each itteration.
Here is the related issue on github where you can follow it: https://github.com/opencv/opencv/issues/16348
With a custom function, it now works at ~60 fps. Be aware that cvlib is, maybe, not done for real time computation.

workon opencv_cuda
cd opencv
mkdir build
cd build
cmake -D CMAKE_BUILD_TYPE=RELEASE
and share the result.
It should be something like this

Related

Google Colab + Pytorch: RuntimeError: No CUDA GPUs are available

Screenshot of error:
Hello, I am trying to run this Pytorch application, which is a CNN for classifying dog and cat pics.
I am using Google Colab for the GPU, but for some reason, I get RuntimeError: No CUDA GPUs are available. This is weird because I specifically both enabled the GPU in Colab settings, then tested if it was available with torch.cuda.is_available(), which returned true.
The weirdest thing is that this error doesn't appear until about 1.5 minutes after I run the code. You would think that if it couldn't detect the GPU, it would notify me sooner.
I've had no problems using the Colab GPU when running other Pytorch applications using the exact same notebook. I can only imagine it's a problem with this specific code, but the returned error is so bizarre that I had to ask on StackOverflow to make sure.
Try again, this is usually a transient issue when there are no Cuda GPUs available
Recently I had a similar problem, where Cobal print(torch.cuda.is_available()) was True, but print(torch.cuda.is_available()) was False on a specific project. Both of our projects have this code similar to os.environ["CUDA_VISIBLE_DEVICES"]. I can use this code comment and find that the GPU can be used.
-------My English is poor, I use Google Translate

ImageDataGenerator.flow_from_directory() segfaulting with no augmentation

I'm trying to construct an autoencoder for ultrasound images, and am unable to use ImageDataGenerator.flow_from_directory() to provide train/test datasets due to segfault on call to the method. No augmentation is being used, which should only result in the original images being provided by the generator.
The source images are in TIFF format, so I first tried converting them to JPG and PNG thinking that maybe PIL was faulting on the encoding, no difference. I have tried converting to different color modes (grayscale, RGB, RGBA) with no change in behavior. I have stripped the code down to the bare minimum, taking defaults for nearly all function params and still getting a segfault on call in both debug and full run.
# Directory below contains a single subdirectory "input" containing 5635 TIFF images
from keras.preprocessing.image import *
print('Create train_gen')
train_gen = ImageDataGenerator().flow_from_directory(
directory=r'/data/ultrasound-nerve-segmentation/train/',
class_mode='input'
)
print('Created train_gen')
Expected output is a report of 5635 images found in one class "input" and both debug messages to print out, with usable generator for use in Model.fit_generator().
Actual output:
Using TensorFlow backend.
Create train_gen
Found 5635 images belonging to 1 classes.
Segmentation fault
Is there something I'm doing above that could be causing the problem? According to every scrap of sample code I can find, it looks like it should be working.
Environment is:
Ubuntu 16.04 LTS
CUDA 10.1
tensorflow-gpu 1.14
Keras 2.2.4
Python 3.7.2
Thanks for any help you can provide!
OK so I haven't pegged specifically why it is segfaulting, but it appears to be related to the virtualenv it is running under. I was apparently using a JupyterHub environment, which seems to not behave even when run from an ssh session (vs from within JupyterHub consoles). Once I created a whole new standalone virtualenv with only the TF + Keras packages installed, it appears to run just fine.

Object detection slow and does not use GPU

I need to use Tensorflow Object Detection API to make some classification connected with recognition.
My problem is that using the API for detection with a pretrained coco model takes too much time and for sure does not use the GPU. I checked my tensorflow-gpu installation on different scripts and it works fine, but when I use this model for detection I can only see increse in CPU usage.
I checked different version of tensorflow (1.12, 1.14), different combinations of CUDA Toolkit (9.0, 10.0) and CuDNN (7.4.2, 7.5.1, 7.6.1) but it is all the same, also tried it on both Windows 7 and Ubuntu 16.04, no difference. My project however requires much faster detection time.
System information:
System: Windows 7, Ubuntu 16.04
Tensorflow: 1.12, 1.14
GPU: GTX 970
Run following python code, if it detects GPU then you can use GPU for training otherwise there is some problem,
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
One more thing, just because your CPU is utilizing does not mean GPU is not at work. CPU always be busy, GPU should also spike when you are training.
Paste the output of above code in the comment if you are not sure about the output.
Edit: After chat with OP on comments, I see the suggested code, and it is using pretrained model, so no training happening here. You are using model and not training a new model. So no gpu is being used.

Using PyTorch on AWS Lambda

Has anyone had any luck being able to use PyTorch on AWS Lambda for feature extraction from images or just using the framework at all? I finally got PyTorch, numpy, and pillow zipped in a folder under the uncompressed size limit (which is actually around 262 MB) but I had to build PyTorch from source to do this. The problem I am having now is that Lambda has a very old version of gcc running on it (4.8.3) which is very buggy and missing whole header files altogether. I believe the Pytorch docs state you should be using at least gcc 7 or later but I'm hoping someone may have found a way around this? I built the source using gcc 7.5 but then when I tried to import torch Lambda obviously used it's installed version of 4.8.3 causing an error on import: Floating point exception (core dumped) which stems from the old version of gcc. Is there a possible solution around this? I've been at this for a day and a half now so any help would be great. I think the bottom line is I am facing this similar issue. Better yet does anyone have a Pytorch lambda layer I could use?
I was able to utilize the below layers for using pytorch on AWS Lambda:
arn:aws:lambda:AWS_REGION:934676248949:layer:pytorchv1-py36:1 PyTorch 1.0.1
arn:aws:lambda:AWS_REGION:934676248949:layer:pytorchv1-py36:2 PyTorch 1.1.0
Found these on Fastai production deployment page, thanks to Matt McClean

"g++ not detected" while data set goes larger, is there any limit to matrix size in GPU?

I got this message in using Keras to train an RNN for language model with a big 3D tensor (generated from a text, one hot encoded, and results a shape of (165717, 25, 7631)):
WARNING (theano.configdefaults): g++ not detected ! Theano will be unable to
execute optimized C-implementations (for both CPU and GPU) and will default to
Python implementations. Performance will be severely degraded. To remove this
warning, set Theano flags cxx to an empty string.
ERROR (theano.sandbox.cuda): nvcc compiler not found on $PATH. Check your nvcc
installation and try again.
But everything goes well while I limit the size of data set into small. Thus I wonder that does Theano or CUDA limit the size of matrix?
Besides, do I have a better way to do one hot representation? I mean, in the large 3D tensor, most elements are 0 due to the one-hot representation. However, I didn't found a layer which accepts index representation of words.
conda install mingw libpython
Make sure this is installed. Get this answer from another post, https://stackoverflow.com/a/31109547/3598832, which indicated from the manual.
Your Theano installation is not complete.
There are two problems mentioned in the question's pasted result:
WARNING (theano.configdefaults): g++ not detected ! Theano will be
unable to execute optimized C-implementations (for both CPU and GPU)
and will default to Python implementations. Performance will be
severely degraded. To remove this warning, set Theano flags cxx to an
empty string.
I suspect you're seeing this one even with small data sizes but it's a warning so things continue running successfully (using the pure Python implementation automatically).
ERROR (theano.sandbox.cuda): nvcc compiler not found on $PATH. Check
your nvcc installation and try again.
This is the one that happens when the data size increases because now the GPU is trying to be used.
Both messages indicate an incomplete Theano installation. The first indicates that you've not set up your C++ compiler properly. The second indicates that you've not set up CUDA properly. You need to follow the appropriate sections of the installation documentation to fix these problems. Note that simply doing pip install Theano is not sufficient when you want to use anything other than the pure Python implementations.

Resources