is tensorflow.train.threading.Thread no longer supported? - multithreading

I am playing with the code from https://github.com/Russell91/TensorBox But it's failing to run with GPU, here the results of the run
TensorBox$ python train.py --hypes hypes/overfeat_rezoom.json --gpu 0 --logdir output
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: Quadro M4000
major: 5 minor: 2 memoryClockRate (GHz) 0.7725
pciBusID 0000:02:00.0
Total memory: 7.93GiB
Free memory: 7.63GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Quadro M4000, pci bus id: 0000:02:00.0)
Traceback (most recent call last):
File "train.py", line 537, in <module>
main()
File "train.py", line 534, in main
train(H, test_images=[])
File "train.py", line 457, in train
t = tf.train.threading.Thread(target=thread_loop,
AttributeError: 'module' object has no attribute 'threading'
Did tensorflow change the way it handles threads? Otherwise, what am I doing wrong?

TL;DR: Change it to threading.Thread.
The import tensorflow.train.threading.Thread was never officially part of the API, but rather it was accessible because we weren't cautious about what symbols were visible through the TensorFlow module. As we move towards a stable release, we're clamping down on such undocumented inclusions, by using techniques like Python's __all__ to define the precise contents of modules.

Related

Unable to run python package AIBenchmark on 13 inch m1 MacBook Pro

I got a new m1 MacBook Pro and installed the TensorFlow 2 provided by Apple, and I decided to do some testing for the MacBook, so I installed the python3 package "AIBenchmark", and the process was successful without any error messages. However, when I imported it, the following error message appeared.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/andrew/tensorflow_macos_venv/lib/python3.8/site-packages/ai_benchmark/__init__.py",
line 5, in <module>
from ai_benchmark.utils import *
File "/Users/andrew/tensorflow_macos_venv/lib/python3.8/site-packages/ai_benchmark/utils.py",
line 10, in <module>
from PIL import Image
File "/Users/andrew/tensorflow_macos_venv/lib/python3.8/site-packages/PIL/Image.py",
line 94, in <module>
from . import _imaging as core ImportError: dlopen(/Users/andrew/tensorflow_macos_venv/lib/python3.8/site-packages/PIL/_imaging.cpython-38-darwin.so,
2): no suitable image found. Did find:
/Users/andrew/tensorflow_macos_venv/lib/python3.8/site-packages/PIL/_imaging.cpython-38-darwin.so:
mach-o, but wrong architecture
/Users/andrew/tensorflow_macos_venv/lib/python3.8/site-packages/PIL/_imaging.cpython-38-darwin.so:
mach-o, but wrong architecture
How do I solve this problem?
I'm guessing that since AI Benchmark has not been updated since Dec. 18, 2019, the library is an Intel architecture binary. I don't know the details of the python 3.8 installation via the Xcode Command Line Tools, but I imagine it's a universal binary (both Intel and Apple Arm64 architecture). My guess is that you'll have to run TensorFlow as an Intel binary, so I would try the following in the terminal: precede your command to startup your app with
arch -x86_64
Or, configure Terminal to run under Rosetta2.
Right-click on Terminal in Finder
Get Info
Open with Rosetta
More suggestions here (no, it's not python or TensorFlow related, but yes, it's relevant).

While using GPU for PyTorch models, getting the CUDA error: Unknown error?

I am trying to use a pre-trained model using PyTorch. While loading the model to GPU, it is giving the following error:
Traceback (most recent call last):
File "model\vgg_model.py", line 45, in <module>
vgg_model1 = VGGFeatureExtractor(True).double().to(device)
File "C:\Users\myidi\Anaconda3\envs\openpose\lib\site-packages\torch\nn\modules\module.py", line 386, in to
return self._apply(convert)
File "C:\Users\myidi\Anaconda3\envs\openpose\lib\site-packages\torch\nn\modules\module.py", line 193, in _apply
module._apply(fn)
File "C:\Users\myidi\Anaconda3\envs\openpose\lib\site-packages\torch\nn\modules\module.py", line 193, in _apply
module._apply(fn)
File "C:\Users\myidi\Anaconda3\envs\openpose\lib\site-packages\torch\nn\modules\module.py", line 199, in _apply
param.data = fn(param.data)
File "C:\Users\myidi\Anaconda3\envs\openpose\lib\site-packages\torch\nn\modules\module.py", line 384, in convert
return t.to(device, dtype if t.is_floating_point() else None, non_blocking)
File "C:\Users\myidi\Anaconda3\envs\openpose\lib\site-packages\torch\cuda\__init__.py", line 163, in _lazy_init
torch._C._cuda_init()
RuntimeError: CUDA error: unknown error
I have a Windows 10 Laptop, Nvidia 940m GPU, Latest Pytorch and CUDA Toolkit 9.0 (Also tried on 10.0).
I have tried re-installing the GPU drivers, restarted my machine, re-installed PyTorch, Torchvision and CUDA Toolkit.
While using the following to see if PyTorch is detecting a GPU:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
I am getting the following output: device(type='cuda').
What could be the possible issues? I have tried the solution mentioned here: https://github.com/pytorch/pytorch/issues/20990 and the issue still persists.
I simply put the torch.cuda.current_device() after import torch but the issue still persists.
Strangely, this worked by using CUDA Toolkit 10.1. I don't know why the latest one is not the default one on PyTorch website in the section where they provide the commands to download the libraries.
Used the following command to install the libraries: conda install pytorch torchvision cudatoolkit=10.1 -c pytorch

TensorFlow-GPU 1.12.0 + CUDA 9 + cuDNN 7.41 on Windows throws DLL load failed. Same bundle works on Ubuntu

As in title I have
CUDA 9.0.176
cuDNN v7.4.1
TF-GPU 1.12
Python 3.6.6
I can confirm that path both to bin/lib for CUDA is in a PATH and also there is a path to cuda folder (not sure why if same files were copieed to CUDA folder (nvidia not a custom one as per tutorials)).
import tensorflow
>>> import tensorflow
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "D:\OtherCode\Teest1\test1\lib\site-
packages\tensorflow\__init__.py", line 22, in <module>
from tensorflow.python import pywrap_tensorflow # pylint:
disable=unused-import
File "D:\OtherCode\Teest1\test1\lib\site-
packages\tensorflow\python\__init__.py", line 52, in <module>
from tensorflow.core.framework.graph_pb2 import *
File "D:\OtherCode\Teest1\test1\lib\site-
packages\tensorflow\core\framework\graph_pb2.py", line 6, in <module>
from google.protobuf import descriptor as _descriptor
File "D:\OtherCode\Teest1\test1\lib\site-
packages\google\protobuf\descriptor.py", line 47, in <module>
from google.protobuf.pyext import _message
ImportError: DLL load failed: The specified procedure could not be found.
I've tried pretty much many other different bundles after that (using Tensorflow table showing tested configurations), but none of them works.
I have all files in System32 that I've found info about, I have VS 2017 / 2019 + install the compile for 2015.
Nothing works.
Is there anything on Windows that I need to be aware of ?
Same bundle works fine on Ubuntu pretty much instantly, on Windows it fails terribly.
I would prefer use windows as there is number of issues with ubuntu (most of hardware is not supported on Ubuntu + I am using VS Studio for most of the projects).
I've tried CUDA 8.0, CUDA 9.0, CUDA 9.2, CUDA 10 (with different cuDNN for specific version + different tensorflows according to the table, however it looks like there is something else missing).
Unfortunately for some reason this combo won't work.
Other with CUDA 10 worked.

Dlib (GPU supported) is not working properly, not sure?

My System Configuration:
Windows 10,
Nvidia 940mx 2GB GDDR5 GPU, 8GB RAM, i5 8th generation.
Software installed:
CUDA toolkit 9.0
cuDNN 7.1.4
I have successfully installed dlib with GPU support after installing above requirements using commands below:
$ git clone https://github.com/davisking/dlib.git
$ python setup.py install --clean
As stated by dlib's creater #Davis King, on my jupyter notebook I executed :
import dlib
dlib.DLIB_USE_CUDA
[Out 17] :True
Which verifies that my 'dlib' is using GPU through CUDA, and all other libraries depend on dlib like #adma ageitgey's 'face_recognition' will also use cuda acceleration.
So I was running a code for training images so that I can recognize faces in a video, using the code below:
import face_recognition
img = face_recognition.load_image_file('./training images/John_Cena/Gifts-John-Cena-Fans.jpg')
locations = face_recognition.face_loactions(img,model='cnn')
It prints the error as stated below:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\Tushar\Anaconda3\lib\site-packages\face_recognition\api.py", line 116, in face_locations
return [_trim_css_to_bounds(_rect_to_css(face.rect), img.shape) for face in _raw_face_locations(img, number_of_times_to_upsample, "cnn")]
File "C:\Users\Tushar\Anaconda3\lib\site-packages\face_recognition\api.py", line 100, in _raw_face_locations
return cnn_face_detector(img, number_of_times_to_upsample)
RuntimeError: Error while calling cudaMalloc(&data, n) in file C:\Users\Tushar\Desktop\face_recognition\dlib\dlib\cuda\cuda_data_ptr.cpp:28. code: 2, reason: out of memory
After trying again for another image :
img = face_recognition.load_image_file('./training images/John_Cena/Images.jpg')
locations = face_recognition.face_loactions(img,model='cnn')
It gave error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\Tushar\Anaconda3\lib\site-packages\face_recognition\api.py", line 116, in face_locations
return [_trim_css_to_bounds(_rect_to_css(face.rect), img.shape) for face in _raw_face_locations(img, number_of_times_to_upsample, "cnn")]
File "C:\Users\Tushar\Anaconda3\lib\site-packages\face_recognition\api.py", line 100, in _raw_face_locations
return cnn_face_detector(img, number_of_times_to_upsample)
RuntimeError: Error while calling cudnnConvolutionForward( context(), &alpha, descriptor(data), data.device(), (const cudnnFilterDescriptor_t)filter_handle, filters.device(), (const cudnnConvolutionDescriptor_t)conv_handle, (cudnnConvolutionFwdAlgo_t)forward_algo, forward_workspace, forward_workspace_size_in_bytes, &beta, descriptor(output), output.device()) in file C:\Users\Tushar\Desktop\face_recognition\dlib\dlib\cuda\cudnn_dlibapi.cpp:1007. code: 3, reason: CUDNN_STATUS_BAD_PARAM
Then I restarted the jupyter's kernel and tried once again for different image :
face_recognition.face_locations(face_recognition.load_image_file('./training
images/John_Cena/images.jpg'),model='cnn')
[Out] : [(21, 136, 61, 97)]
This time it gave the coordinates of the location of the face in the image.
So this is happening again and again, for some images it just runs fine and for some, it gives one of the 2 errors(as stated above).
While using model='hog' it's running fine for all the similar images as used in model='cnn'.
So when I try to train the classifier on images in different folders using for loop:
from face_recognition.face_detection_cli import image_files_in_folder
import os
import os.path
import face_recognition
for class_dir in os.listdir('./training images/'):
count = 0
for img_path in image_files_in_folder(os.path.join('./training images/', class_dir)):
count += 1
image = face_recognition.load_image_file(img_path)
face_bounding_boxes = face_recognition.face_locations(image,model='cnn')
print(face_bounding_boxes, count)
It always stops after processing some images showing the same any of 2 errors(as stated above).
I tried every possible way to install dlib with GPU support, CUDA 9.0 toolkit and cuDNN 7.1.4. They all are working fine!
I don't know what's the real issue here, Is the memory (2 GB) of Graphic Card is less or something else.
I really want to use GPU's power to make recognition in video faster.
I found that face_encodings is given quick "Indexerror" if the face is slighty rotate and not straight. Despite It found the face locations with the coordinates and when supply the crop image with the coordinates to "face_encodings" it failed with the index error...

Problems with Theano installation using CUDA when using non-root user

I have followed the instructions to install Theano an GPUArray from source (git versions), in the system folders (not as a user). The GPUArray tests run just fine without errors.
The problem is Theano only works with GPU if I run as root. Running the example to test gpu:
(python35) rll#ip-30-92:~$ THEANO_FLAGS=device=cuda python temp.py
ERROR (theano.gpuarray): Could not initialize pygpu, support disabled
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/theano/gpuarray/__init__.py", line 179, in <module>
use(config.device)
File "/usr/local/lib/python3.5/dist-packages/theano/gpuarray/__init__.py", line 166, in use
init_dev(device, preallocate=preallocate)
File "/usr/local/lib/python3.5/dist-packages/theano/gpuarray/__init__.py", line 73, in init_dev
context.cudnn_handle = dnn._make_handle(context)
File "/usr/local/lib/python3.5/dist-packages/theano/gpuarray/dnn.py", line 83, in _make_handle
cudnn = _dnn_lib()
File "/usr/local/lib/python3.5/dist-packages/theano/gpuarray/dnn.py", line 70, in _dnn_lib
raise RuntimeError('Could not find cudnn library (looked for v5* or v6*)')
RuntimeError: Could not find cudnn library (looked for v5* or v6*)
[Elemwise{exp,no_inplace}(<TensorType(float64, vector)>)]
Looping 1000 times took 3.201078 seconds
Result is [ 1.23178032 1.61879341 1.52278065 ..., 2.20771815 2.29967753
1.62323285]
Used the cpu
If run as root it works, although there is still an error related to cuDNN not being able to identify the devices maybe:
(python35) rll#ip-30-92:~$ sudo THEANO_FLAGS=device=cuda python3 temp.py
Can not use cuDNN on context None: cannot compile with cuDNN. We got this error:
b'/tmp/try_flags_bg7m03hd.c:4:19: fatal error: cudnn.h: No such file or directory\ncompilation terminated.\n'
Mapped name None to device cuda: TITAN X (Pascal) (0000:01:00.0)
[GpuElemwise{exp,no_inplace}(<GpuArrayType<None>(float64, vector)>), HostFromGpu(gpuarray)(GpuElemwise{exp,no_inplace}.0)]
Looping 1000 times took 0.390976 seconds
Result is [ 1.23178032 1.61879341 1.52278065 ..., 2.20771815 2.29967753
1.62323285]
Used the gpu
There are 2 Titan X on this machine. Works fine with Tensorflow. I am not using .theanorc file, but I have set both:
(python35) rll#ip-30-92:~$ echo $LD_LIBRARY_PATH
/usr/local/cuda-8.0/lib64
(python35) rll#ip-30-92:~$ echo $CUDA_ROOT
/usr/local/cuda-8.0/
I did everything as per the instructions, and despite some warnings there were no errors.
I don't think it is a permissions error on the compile dir .theano, because if I chown the .theano dir the behaviour is the same.
How can I fix this?
I have finally found the problem. There is an aspect missing in the instructions to install Theano which is that you have to verify if LIBRARY_PATH is set and add the cuda libraries to it (note that it is not the LD_LIBRARY_PATH).
If it is not set just export it and you will be good to go. So for temporary fix:
export LIBRARY_PATH=/usr/local/cuda-8.0/lib64
To persist it may depend on the system, but in general you can add to the /etc/environment, adding a line:
LIBRARY_PATH=/usr/local/cuda-8.0/lib64
This fixed the message when root, and fixed cuda for the regular user.

Resources