Run Tensorflow in Qubole - apache-spark

I am trying to train LSTM using Spark python Notebook in Qubole. When I try to fit model, I received below error.
I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX
Why this error occur and how can I overcome this?

This is not an error, but a warning. The pre-built tensorflow binaries are not compiled with various CPU instruction extensions because not everyone has them. When TensorFlow is started, it checks which extensions are available on your machine and which ones the binary was compiled with. If your machine has some extensions that the binary was not compiled with, it lets you know this. If you do a lot of CPU computation and care about ultimate performance, you can build tensorflow yourself with extensions present on your machine.

Related

Issue Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

I met this issue when I running my python file in linux.
I searched some answers in google like use the code below:
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
The system can be running but without ant out information. Actually I found the 2 measn ignore all information while 1 means give all information like error or normal output.
because I need to use GPU so my original code is:
os.environ["CUDA_VISIBLE_DEVICES"]="0"
But if I keep the code my output would be an error information like the title.
How can I do? and of course I need use the GPU and the codes can be running in colab which indicated that my code has no problem.
some guys even said uninstall tensorboard...that must a wrong way.
Or should I download tensorflow gpu not tensorflow in m,y virtual enviroment? when I USE THE tensorflow gpu version, the error is core dumped.
If when forcing the os.environ["CUDA_VISIBLE_DEVICES"]="0" doesn't work, then this means that your tensorflow gpu installation did not succeed. You must ensure you have the right combination of TensorFlow + CUDA + CUDNN. That is why you get the error, because due to improper versions/installation TF falls back on CPU.

PyTorch C++ is slow on Ubuntu aarch64

I'm trying to compare Windows x64 and Ubuntu aarch64 of our code, that uses PyTorch C++ 1.5.1.
When most of the process in our code is running from same speed to twice faster on Ubuntu than on Windows, all calls to PyTorch are running 3 times slower.
I try to find any information on PyTorch for arm64, if it has been as optimized as on x64? As there is no MKL, pytorch goes with eigen and openblas. Is it a reason for this slowdown?
Thank you,
GĂ©rald

Tensorflow instalation on Windows 10 Intel Core i3 Processor with nVIDIA GEFORCE 920M GPU

I am trying to install using the instructions here
I have a compatible nVIDIA GEFORCE 920M GPU and the CRUD DNN toolkit and the driver both installed on the System.
when I do the step on the python program to test tensorflow installation on GPU:
import tensorflow as tf
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
The output I get is:
>>> sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
2017-05-28 09:38:01.349304: W c:\tf_jenkins\home\workspace\release-
win\device\cpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45]
The TensorFlow library wasn't compiled to use SSE instructions, but these
are available on your machine and could speed up CPU computations.
2017-05-28 09:38:01.349459: W c:\tf_jenkins\home\workspace\release-
win\device\cpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45]
The TensorFlow library wasn't compiled to use SSE2 instructions, but these
are available on your machine and could speed up CPU computations.
2017-05-28 09:38:01.349583: W c:\tf_jenkins\home\workspace\release-
win\device\cpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45]
The TensorFlow library wasn't compiled to use SSE3 instructions, but these
are available on your machine and could speed up CPU computations.
2017-05-28 09:38:01.349705: W c:\tf_jenkins\home\workspace\release-
win\device\cpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45]
The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these
are available on your machine and could speed up CPU computations.
2017-05-28 09:38:01.354312: I c:\tf_jenkins\home\workspace\release-
win\device\cpu\os\windows\tensorflow\core\common_runtime\direct_session.cc:257] Device mapping:
My pin pointed questions to you are:
Why is the nVIDIA GPU not getting detected when all librariries and toolkits are installed without errors?
Why is it the output saying "TensorFlow library wasn't compiled to use SSE4.1 instructions, but these
are available on your machine and could speed up CPU computations" and how do i rectify this?
Please give a step by step solution. None other.
Thanks in advance for your answers.
Okay my pinpointed answers:
Why is the nVIDIA GPU not getting detected when all librariries and toolkits are installed without errors?
Ans: Go to NVidia GEFORCE Application and update the driver then the libraries will start getting detected and the errors will go away.
Why is it the output saying "TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations" and how do i rectify this?
Ans: Once you have done the above update the TensorFlow Library will stop giving the SSE errors. This will resolve your issue.
Please give a step by step solution. None other.
Ans: Given above should work it has worked for me.

Theano Installation On windows 64

Im new in Python and Theano library. I want to install Theano on windows 7-64. I have a display adapters :
Intel(R) HD Graphics 3000 which is not compatible with NVIDA.
My QUESTIONS:
1-Is obligatory to install CUDA to i can use Theano?
2- Even if i have an Ubuntu opearting system, with the same display adapters, CUDA still mandatory?
Any help!
Thanks
You do not need CUDA to run Theano.
Theano can run on either CPU or GPU. If you want to run on GPU you must (currently) use CUDA which means you must be using a NVIDIA display adapter. Without CUDA/NVIDIA you must run on CPU.
There is no disadvantage to running on CPU other than speed -- Theano can be much faster on GPU but everything that runs on a GPU will also run on a CPU as long as it has been coded generically (the default and standard method for Theano code).

Writing CUDA program for more than one GPU

I have more than one GPU and want to execute my kernels on them. Is there an API or software that can schedule/manage GPU resources dynamically? Utilizing resources of all available GPUs for the program.
A utility that may periodically report the available resource and my program will launch as many threads to GPUs.
Secondly, I am using Windows+ Visual Studio for my development. I have read that CUDA is supported on Linux. what changes do I need to do in my program?
I have more than GPUs and want to execute my kernels on them. Is there an API or software that can schedule/manage GPU resources dynamically.
For arbitrary kernels that you write, there is no API that I am aware of (certainly no CUDA API) that "automatically" makes use of multiple GPUs. Today's multi-GPU aware programs often use a strategy like this:
detect how many GPUs are available
partition the data set into chunks based on the number of GPUs available
successively transfer the chunks to each GPU, and launch the computation kernel on each GPU, switching GPUs using cudaSetDevice().
A program that follows the above approach, approximately, is the cuda simpleMultiGPU sample code. Once you have worked out the methodology for 2 GPUs, it's not much additional effort to go to 4 or 8 GPUs. This of course assumes your work is already separable and the data/algorithm partitioning work is "done".
I think this is an area of active research in many places, so if you do a google search you may turn up papers like this one or this one. Whether these are of interest to you will probably depend on your exact needs.
There are some new developments with CUDA libraries available with CUDA 6 that can perform certain specific operations (e.g. BLAS, FFT) "automatically" using multiple GPUs. To investigate this further, review the relevant CUBLAS XT documentation and CUFFT XT multi-GPU documentation and sample code. As far as I know, at the current time, these operations are limited to 2 GPUs for automatic work distribution. And these allow for automatic distribution of specific workloads (BLAS, FFT) not arbitrary kernels.
Secondly, I am using Windows+ Visual Studio for my development. I have read that CUDA is supported on Linux. what changes do I need to do in my program?
With the exception of the OGL/DX interop APIs CUDA is mostly orthogonal to choice of windows or linux as a platform. The typical IDE's are different (windows: nsight Visual Studio edition, Linux: nsight eclipse edition) but your code changes will mostly consist of ordinary porting differences between windows and linux. If you want to get started with linux, follow the getting started document.

Resources