PyTorch C++ is slow on Ubuntu aarch64 - pytorch

I'm trying to compare Windows x64 and Ubuntu aarch64 of our code, that uses PyTorch C++ 1.5.1.
When most of the process in our code is running from same speed to twice faster on Ubuntu than on Windows, all calls to PyTorch are running 3 times slower.
I try to find any information on PyTorch for arm64, if it has been as optimized as on x64? As there is no MKL, pytorch goes with eigen and openblas. Is it a reason for this slowdown?
Thank you,
Gérald

Related

Why Python running on Intel rather than Apple?

I am running a Python program that uses Torch. I'm using the Torch nightly build (which supports MPS/M1) and putting my tensors on mps, indicating that the M1 processor is being used as a GPU. However, Activity Monitor lists the "Kind" of my Python process as "Intel" rather than "Apple" (see image).
I am assuming this slows down my program. Any clue why this is happening, and how I can make sure Python runs as "Apple" rather than "Intel?
I confirmed that the GPU is being used (via Activity Monitor), and that my tensors are on MPS (rather than CPU or CUDA). I expected the program kind to be "Apple", but it is "Intel" instead.
Try running the Python program with the arch argument if you want it to run on the M1 CPU rather than the Intel processor:
python -m arch -arch arm64 my_program.py
The Intel CPU will be used if software needs functionality that the M1 processor does not provide. The maximum size of integers that the Python interpreter can handle may be checked using the sys.maxsize property to see if the M1 processor is in use. The Python interpreter is operating on a 32-bit architecture, which may not be optimal for the M1 CPU, if the value is less than 2^63-1. You might try utilizing a 64-bit Python interpreter in this situation.

Executable gives different FPS values ​in Yocto and Raspbian(everything looks the same in terms of configuration)

In Yocto project, built my project which is running on Raspbian OS. When i run executable, i get half FPS compared to executable running on Raspbian OS.
The libraries i use:
OpenCV
Tensorflow-Lite, Flatbuffer, Libedgetpu
I use Libedgetpu1-std, Tensorflow-lite 2.4.0 on Raspbian and Libedgetpu 2.5.0, Tensorflow-lite 2.5.0 on Yocto.
Thinking that the problem is that the versions or configurations of the libraries are not the same, i followed these steps:
I ran the executable which i built in Raspbian directly in the runtime of the Yocto project.(I have set the required library versions to the same library versions available in raspbian for it to work in runtime.)
But i still got low FPS. Here is how i calculate that i get half the FPS:
I am using TFLite's interpreter invoke function. I set a timer when entering and exiting the function, i calculate FPS over it. I can exemplify like this:
Timer_Begin();
m_tf_interpreter->Invoke();
Timer_End();
Somehow i think the Interpreter Invoke function is running slower on the Yocto side. I checked Kernel versions, CPU speeds, /boot/config.txt contents, USB power consumes of Raspbian and Yocto. However, I couldn't catch anything from anywhere.
Note : Using RPI4 and Coral-TPU(Plugged into USB 2.0).
We spoke with #Paulo Neves. He recommend Perf profiling and i did . In the perf profiling, i noticed that the CPU is running slowly. Although the frequencies are the same.
When i check the "scaling_governor", i saw that it was in "powersave" mode. The problem solved when i switched from "powersave" to "performance" mode from virtual kernel.
In addition, if you want to make the governor change permanent, you need to create a kernel config fragment.

Run Tensorflow in Qubole

I am trying to train LSTM using Spark python Notebook in Qubole. When I try to fit model, I received below error.
I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX
Why this error occur and how can I overcome this?
This is not an error, but a warning. The pre-built tensorflow binaries are not compiled with various CPU instruction extensions because not everyone has them. When TensorFlow is started, it checks which extensions are available on your machine and which ones the binary was compiled with. If your machine has some extensions that the binary was not compiled with, it lets you know this. If you do a lot of CPU computation and care about ultimate performance, you can build tensorflow yourself with extensions present on your machine.

Tensorflow instalation on Windows 10 Intel Core i3 Processor with nVIDIA GEFORCE 920M GPU

I am trying to install using the instructions here
I have a compatible nVIDIA GEFORCE 920M GPU and the CRUD DNN toolkit and the driver both installed on the System.
when I do the step on the python program to test tensorflow installation on GPU:
import tensorflow as tf
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
The output I get is:
>>> sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
2017-05-28 09:38:01.349304: W c:\tf_jenkins\home\workspace\release-
win\device\cpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45]
The TensorFlow library wasn't compiled to use SSE instructions, but these
are available on your machine and could speed up CPU computations.
2017-05-28 09:38:01.349459: W c:\tf_jenkins\home\workspace\release-
win\device\cpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45]
The TensorFlow library wasn't compiled to use SSE2 instructions, but these
are available on your machine and could speed up CPU computations.
2017-05-28 09:38:01.349583: W c:\tf_jenkins\home\workspace\release-
win\device\cpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45]
The TensorFlow library wasn't compiled to use SSE3 instructions, but these
are available on your machine and could speed up CPU computations.
2017-05-28 09:38:01.349705: W c:\tf_jenkins\home\workspace\release-
win\device\cpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45]
The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these
are available on your machine and could speed up CPU computations.
2017-05-28 09:38:01.354312: I c:\tf_jenkins\home\workspace\release-
win\device\cpu\os\windows\tensorflow\core\common_runtime\direct_session.cc:257] Device mapping:
My pin pointed questions to you are:
Why is the nVIDIA GPU not getting detected when all librariries and toolkits are installed without errors?
Why is it the output saying "TensorFlow library wasn't compiled to use SSE4.1 instructions, but these
are available on your machine and could speed up CPU computations" and how do i rectify this?
Please give a step by step solution. None other.
Thanks in advance for your answers.
Okay my pinpointed answers:
Why is the nVIDIA GPU not getting detected when all librariries and toolkits are installed without errors?
Ans: Go to NVidia GEFORCE Application and update the driver then the libraries will start getting detected and the errors will go away.
Why is it the output saying "TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations" and how do i rectify this?
Ans: Once you have done the above update the TensorFlow Library will stop giving the SSE errors. This will resolve your issue.
Please give a step by step solution. None other.
Ans: Given above should work it has worked for me.

Theano Installation On windows 64

Im new in Python and Theano library. I want to install Theano on windows 7-64. I have a display adapters :
Intel(R) HD Graphics 3000 which is not compatible with NVIDA.
My QUESTIONS:
1-Is obligatory to install CUDA to i can use Theano?
2- Even if i have an Ubuntu opearting system, with the same display adapters, CUDA still mandatory?
Any help!
Thanks
You do not need CUDA to run Theano.
Theano can run on either CPU or GPU. If you want to run on GPU you must (currently) use CUDA which means you must be using a NVIDIA display adapter. Without CUDA/NVIDIA you must run on CPU.
There is no disadvantage to running on CPU other than speed -- Theano can be much faster on GPU but everything that runs on a GPU will also run on a CPU as long as it has been coded generically (the default and standard method for Theano code).

Resources