Executing keras aborts RStudio session - keras

I am trying to install keras in a laptop with R and RStudio previously installed.
I first installed Anaconda3 following the instructions here https://docs.anaconda.com/anaconda/install/linux/.
Second, I followed these instructions to install keras and tensorflow in RStudio https://github.com/FrancisArgnR/Guide-Keras-R (cpu instructions), but specifying conda method:
install.packages('devtools')
devtools::install_github("rstudio/keras")
library(keras)
install_keras(method = c("conda"))
However, whenever I tried to run keras functions (data <- dataset_mnist()), RStudio session aborts. When I use R in the terminal I get the error:
> library(keras)
> data<-dataset_mnist()
*** caught illegal operation ***
address 0x7fb3e50fe820, cause 'illegal operand'
Traceback:
1: py_module_import(module, convert = convert)
2: import(module)
3: doTryCatch(return(expr), name, parentenv, handler)
4: tryCatchOne(expr, names, parentenv, handlers[[1L]])
5: tryCatchList(expr, classes, parentenv, handlers)
6: tryCatch(import(module), error = clear_error_handler())
7: py_resolve_module_proxy(x)
8: `$.python.builtin.module`(keras, "datasets")
9: keras$datasets
10: dataset_mnist()
Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection:
I found a similar error previously reported here https://github.com/rstudio/tensorflow/issues/228, although the "Traceback" is not exactly the same. I tried specifying an earlier version of Tensorflow as suggested, but the error perstists.
install_keras(method = c("conda"),tensorflow = "1.5")
I tired serveral tensorflow versions, from 1.9 to 1.0, but I obtain the same behaviour.
I also tried installing keras from CRAN with install.packages("keras") and repeating all the procedure, but nothing changed. If I do not specify the conda method, the same happens, but in addition, I obtain the following errors when running install_keras():
ERROR: spyder 3.3.6 requires pyqt5<5.13; python_version >= "3", which is not installed.
ERROR: spyder 3.3.6 requires pyqtwebengine<5.13; python_version >= "3", which is not installed.
ERROR: astroid 2.3.1 requires typed-ast<1.5,>=1.4.0; implementation_name == "cpython" and python_version < "3.8", which is not installed.
ERROR: astroid 2.3.1 has requirement six==1.12, but you'll have six 1.13.0 which is incompatible.
Some details in case they can be useful:
R version 3.6.1, Platform: x86_64-pc-linux-gnu (64-bit)
RStudio: Version 1.2.5019
OS: Ubuntu 19.10
Processor: Intel® Celeron(R) CPU N3450 # 1.10GHz × 4
5,6 GiB RAM

It seems that your problem is caused by the lack of AVX instructions support of your CPU.
First you should know that when you run install_keras(method = "conda") a new environment (usually named r-tensorflow) is created. All the Python libraries required to run Tensorflow within R will be installed in this environment. When you run Keras code in R what R does is call the Tensorflow library of this environment.
Most of the pre-built binaries of Tensorflow are compiled on a CPU supporting the AVX instructions set given that this instructions allow a drastic speed up in certain floating point operations. Intel started to support these instructions in 2011. Although your CPU seems to be produced in 2016 it is a Celeron and as we can see here:
Not all CPUs from the listed families support AVX. Generally, CPUs with the commercial denomination "Core i3/i5/i7" support them, whereas "Pentium" and "Celeron" CPUs don't.
You can check this by running on your linux terminal lscpu | grep avx. If nothing is shown your CPU does not support these instructions.
Under these circumstances you have two options:
Build Tensorflow from source in order to compile the code given your CPU features. You can find more info here
Find a pre-built binary of Tensorflow built for CPU without AVX support. Something like this

Related

Stable Baselines PPO algorithm crashes due to RuntimeError: Calling torch.geqrf on a CPU tensor requires compiling PyTorch with LAPACK

I tried to run a PPO algorithm from the stable-baselines3 library on a basic gym environment on my local CPU, however I get the following RuntimeError:
RuntimeError: Calling torch.geqrf on a CPU tensor requires compiling PyTorch with LAPACK. Please use PyTorch built with LAPACK support.
I'm using a conda environment on a Windows machine, with following installations:
pytorch 1.12.1 cpu_py39h5e1f01c_0
lapack 3.9.0 netlib conda-forge
Since I'm quite new to Python I have no clue how to resolve this issue and web search didn't give any proper instructions on that specific problem.
I tried uninstalling PyTorch in my anaconda prompt, however this would remove a lot of packages which scared me of breaking something. Hence, I'm lost what else to do in order to build PyTorch with Lapack support...
Any help would be appreciated, Cheers.

Not compile with GPU support in detectron2

enter image description here
Detectron2 ran faster- RCNN when the error, from the error, should be the network RPN part caused the error.
The GPU should be running because the backbone part did not report an error.
How can I solve this problem?
The reason for this error is the server cuda version with pytorch
Cuda version mismatches, such as between 10.1 and 10.0. So you should check whether the pytorch cuda version is the same as the machine cuda version
"nvcc not found" or "Not compiled with GPU support" or "Detectron2 CUDA Compiler: not available".
CUDA is not found when building detectron2.
You should make sure
import torch
from torch.utils.cpp_extension import CUDA_HOME
print(torch.cuda.is_available(), CUDA_HOME)
should return
print (True, a directory with cuda) at the time you build detectron2.
Most models can run inference (but not training) without GPU support. To use CPUs, set MODEL.DEVICE='cpu' in the config.
and for in-depth solution you may visit this link:
https://detectron2.readthedocs.io/en/latest/tutorials/install.html#common-installation-issues

Pytorch list of supported GPU hardware for each release (Ubuntu18.04)

I have an old GPU and pytorch says it is too old to support:
Found GPU0 GeForce GTX 670 which is of cuda capability 3.0.
PyTorch no longer supports this GPU because it is too old.
The minimum cuda capability that we support is 3.5.
First question is what is this 3.0 / 3.5 referring to ? Clearly not cuda, nvidia driver nor pytorch?
Secondly, I know I can build Pytorch from source in order to have more support, however it's unclear what hardware pytorch supports when built from source, or even when installed with pip. I'm at the point where the likely pytorch version I need, requires python2, and cuda 9, so I'm spending solid time changing a bunch of software versions without knowing whether any of it will work.
Also if I were to get a newer GPU, I don't know whether it is currently supported.
Any way to have a list of supported NVIDIA hardware of pytorch? Ideally for each releases, but at least for current release?
I run an Nvidia GTX670 with Nvidia Driver 430.50. I changed from cuda 10 to 9, but seeing I might have to build pytorch 0.3.1, I think cuda8 is needed since when runnin python setup.py install in v0.3.1 i get:
CMakeFiles/THC.dir/build.make:560: recipe for target 'CMakeFiles/THC.dir/THC_generated_THCHalf.cu.o' failed
make[2]: *** [CMakeFiles/THC.dir/THC_generated_THCHalf.cu.o] Error 1
CMakeFiles/Makefile2:72: recipe for target 'CMakeFiles/THC.dir/all' failed
make[1]: *** [CMakeFiles/THC.dir/all] Error 2
Makefile:129: recipe for target 'all' failed
make: *** [all] Error 2

Bazel returns a rule linking error when building Tensorflow from source

I'm trying to get the InfoGAN code from GitHub to run, but when I try to build Tensorflow from the recommended source code, an error keeps on appearing. It runs fine for about 30 minutes and then crashes (even when sudo is used).
The error is as follows:
ERROR: /home/socialab/Desktop/tensorflow-master/tensorflow/python/BUILD:2436:1: Linking of rule '//tensorflow/python:gen_stateless_random_ops_py_wrappers_cc' failed (Exit 1)
bazel-out/host/bin/tensorflow/core/libop_gen_lib.a(op_gen_lib.o): In function `google::protobuf::internal::ArenaStringPtr::CreateInstance(google::protobuf::Arena*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const*)':
op_gen_lib.cc:(.text._ZN6google8protobuf8internal14ArenaStringPtr14CreateInstanceEPNS0_5ArenaEPKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE[_ZN6google8protobuf8internal14ArenaStringPtr14CreateInstanceEPNS0_5ArenaEPKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE]+0x36): undefined reference to `google::protobuf::internal::ArenaImpl::AllocateAlignedAndAddCleanup(unsigned long, void (*)(void*))'
When configuring Tensorflow I say no to almost everything (except for CUDA), like so:
WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown".
You have bazel 0.29.1 installed.
Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python3
Found possible Python library paths:
/usr/lib/python3/dist-packages
/usr/local/lib/python3.6/dist-packages
Please input the desired Python library path to use. Default is [/usr/lib/python3/dist-packages]
/usr/local/lib/python3.6/dist-packages
Do you wish to build TensorFlow with XLA JIT support? [Y/n]: n
No XLA JIT support will be enabled for TensorFlow.
Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n
No OpenCL SYCL support will be enabled for TensorFlow.
Do you wish to build TensorFlow with ROCm support? [y/N]: n
No ROCm support will be enabled for TensorFlow.
Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.
Do you wish to build TensorFlow with TensorRT support? [y/N]: n
No TensorRT support will be enabled for TensorFlow.
Found CUDA 10.1 in:
/usr/local/cuda/lib64
/usr/local/cuda/include
Found cuDNN 7 in:
/usr/lib/x86_64-linux-gnu
/usr/include
Please specify a list of comma-separated CUDA compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size, and that TensorFlow only supports compute capabilities >= 3.5 [Default is: 7.5]: 7.5
Do you want to use clang as CUDA compiler? [y/N]: n
nvcc will be used as CUDA compiler.
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native -Wno-sign-compare]:
Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: n
Not configuring the WORKSPACE for Android builds.
Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.
--config=mkl # Build with MKL support.
--config=monolithic # Config for mostly static monolithic build.
--config=ngraph # Build with Intel nGraph support.
--config=numa # Build with NUMA support.
--config=dynamic_kernels # (Experimental) Build kernels into separate shared objects.
--config=v2 # Build TensorFlow 2.x instead of 1.x.
Preconfigured Bazel build configs to DISABLE default on features:
--config=noaws # Disable AWS S3 filesystem support.
--config=nogcp # Disable GCP support.
--config=nohdfs # Disable HDFS support.
--config=nonccl # Disable NVIDIA NCCL support.
Configuration finished
I'm using the Bazel version that was recommended (0.29.1), CUDA 10.1, cudNN 7.6.4 and the 430.50 drivers.
System specs:
Ubuntu 18.04
RTX 2080 Ti
i5-8600K
8GB RAM
Any help is much appreciated. Thanks!

ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory

I have installed cuda-8.0 and cudnn5.1 on CentOS. Then, when importing tensorflow (python 3.6), it gives the error as above.
I have already set symbol link as below in /etc/profile. Are there any guys who occurred this kind of problem?
export PATH=/usr/local/cuda-8.0/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64:/usr/local/cuda-8.0/extras/CUPTI/lib64:$LD_LIBRARY_PATH
Also, what makes me confused is that, I run nvcc -V, it shows
Cuda compilation tools, release 8.0, V8.0.61
However, when I run ./deviceQuery in folder /usr/local/cuda-8.0/samples/1_Utilities/deviceQuery, on device 0: "Tesla M40", it shows
CUDA Driver Version / Runtime Version 9.1 / 8.0
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.1, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = Tesla M40
Check your version of tensorflow using "pip3 list | grep tensorflow" If it is of version tensorflow-gpu (1.5.0) then the required cuda version is 9.0 and cuDNN v7.
Look into the following link for more details:
https://github.com/tensorflow/tensorflow/releases
Tensorflow installation guide needs to be updated.
I had the same problem. Tensorflow 1.5.0 is precompiled to CUDA 9.0 (which is outdated; Sept 2017).
The newest CUDA version is CUDA 9.1 (Dec. 2017) and sudo pip install tensorflow-gpu will not work with the newest CUDA 9.1. There are two solutions to the problem:
1.) Install CUDA 9.0 next to CUDA 9.1 (this worked for me)
2.) Build Tensorflow by yourself from the git source code
Either way do not forget to add the PATH variables to your operating system, otherwise you receive the error message stated in the question from your python interpreter.

Resources