Error OpenCV with CUDA using TBB for multiple GPUs - multithreading

My OpenCV CUDA program runs fine using a single NVidia 580GTX, but when using another, it gives the following error:
OpenCV Error: Gpu API call (invalid device ordinal) in mallocPitch
I know I need TBB to assign a GPU its job, but even though I installed OpenCV with TBB support (followed the willowgarage website), it says TBB support is required (CMake key 'WITH_TBB' must be true). Any help would really be appreciated since I need this to complete my computer science Master's project.
Thanks!

Ok its solved. turns out it was build 7232 that was the problem, since it works with the latest opencv build(7292) with no problems. Thanks all for the support

Related

Executable gives different FPS values ​in Yocto and Raspbian(everything looks the same in terms of configuration)

In Yocto project, built my project which is running on Raspbian OS. When i run executable, i get half FPS compared to executable running on Raspbian OS.
The libraries i use:
OpenCV
Tensorflow-Lite, Flatbuffer, Libedgetpu
I use Libedgetpu1-std, Tensorflow-lite 2.4.0 on Raspbian and Libedgetpu 2.5.0, Tensorflow-lite 2.5.0 on Yocto.
Thinking that the problem is that the versions or configurations of the libraries are not the same, i followed these steps:
I ran the executable which i built in Raspbian directly in the runtime of the Yocto project.(I have set the required library versions to the same library versions available in raspbian for it to work in runtime.)
But i still got low FPS. Here is how i calculate that i get half the FPS:
I am using TFLite's interpreter invoke function. I set a timer when entering and exiting the function, i calculate FPS over it. I can exemplify like this:
Timer_Begin();
m_tf_interpreter->Invoke();
Timer_End();
Somehow i think the Interpreter Invoke function is running slower on the Yocto side. I checked Kernel versions, CPU speeds, /boot/config.txt contents, USB power consumes of Raspbian and Yocto. However, I couldn't catch anything from anywhere.
Note : Using RPI4 and Coral-TPU(Plugged into USB 2.0).
We spoke with #Paulo Neves. He recommend Perf profiling and i did . In the perf profiling, i noticed that the CPU is running slowly. Although the frequencies are the same.
When i check the "scaling_governor", i saw that it was in "powersave" mode. The problem solved when i switched from "powersave" to "performance" mode from virtual kernel.
In addition, if you want to make the governor change permanent, you need to create a kernel config fragment.

Issue Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

I met this issue when I running my python file in linux.
I searched some answers in google like use the code below:
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
The system can be running but without ant out information. Actually I found the 2 measn ignore all information while 1 means give all information like error or normal output.
because I need to use GPU so my original code is:
os.environ["CUDA_VISIBLE_DEVICES"]="0"
But if I keep the code my output would be an error information like the title.
How can I do? and of course I need use the GPU and the codes can be running in colab which indicated that my code has no problem.
some guys even said uninstall tensorboard...that must a wrong way.
Or should I download tensorflow gpu not tensorflow in m,y virtual enviroment? when I USE THE tensorflow gpu version, the error is core dumped.
If when forcing the os.environ["CUDA_VISIBLE_DEVICES"]="0" doesn't work, then this means that your tensorflow gpu installation did not succeed. You must ensure you have the right combination of TensorFlow + CUDA + CUDNN. That is why you get the error, because due to improper versions/installation TF falls back on CPU.

FAISS search fails with vague error: "Illegal instruction" or kernel crash

Currently trying to run a basic similarity search via FAISS with reproducible code from that link. However, every time I run the code in the following venues, I have these problems:
Jupyter notebook - kernel crashes
VS Code - receive "Illegal Instruction" message in the terminal with no further documentation
I've got similar code working in Kaggle, so I suppose the problem is with my particular setup.
Based on the print statements, it appears that the error occurs during the call of the .search method. Because of how vague this error is, I've not been able to find much information on the problem. It seems that some people mentioned older processors may have a problem (AVX/AVX2 flags being the culprit?), though admittedly I didn't quite understand the connections.
Problem: Can I get some help understanding this error, and if possible, a potential solution?
Current setup:
WSL2
VSCODE (v. 1.49.0)
Jupyter-client (v. 6.1.7)
Jupyter-core (v. 4.6.3)
FAISS-cpu (v. 1.6.3)
Numpy (v. 1.19.2)
Older machine (AMD FX-8350 with 16GB RAM)
For anyone that runs across this error, the problem (in my case) was that my CPU was old enough that it doesn't support AVX2. To determine this, I used this SO post.
Once I ran the code in Colab or on a newer machine, all was well.

Theano Installation On windows 64

Im new in Python and Theano library. I want to install Theano on windows 7-64. I have a display adapters :
Intel(R) HD Graphics 3000 which is not compatible with NVIDA.
My QUESTIONS:
1-Is obligatory to install CUDA to i can use Theano?
2- Even if i have an Ubuntu opearting system, with the same display adapters, CUDA still mandatory?
Any help!
Thanks
You do not need CUDA to run Theano.
Theano can run on either CPU or GPU. If you want to run on GPU you must (currently) use CUDA which means you must be using a NVIDIA display adapter. Without CUDA/NVIDIA you must run on CPU.
There is no disadvantage to running on CPU other than speed -- Theano can be much faster on GPU but everything that runs on a GPU will also run on a CPU as long as it has been coded generically (the default and standard method for Theano code).

CUDA device seems to be blocked

I'm running a small CUDA application: the QuickSort benchmark algorithm (see here). I have a dual system with a NVIDIA 660GTX (device 0) and 8600GTS (device 1).
Under Windows 8 and Visual Studio, the application compiles and runs flawlessly on device 0. Under Linux (Ubuntu 12.04 LTS), the app compiles with nvcc and gcc but suddenly stops in its tracks, returning a (unspecified launch failure).
I have two issues:
After this error, my GPU cannot perform some other operations, e.g., running the SDK example bandwidhtTest blocks when it performs the first data transfer, but running deviceQuery continues to perform well. How can I reset my GPU? I've already tried the cudaDeviceReset() method but it doesn't help
How can I find what is going wrong under linux? Has someone a clue or seen this before?
Thanks in advance for your help!
Using the nvidia-smi utility you can reset the GPU if it is compatible
To my knowledge and experience, (unspecified launch failure) usually referees to segmentation fault. Have you specified the right GPU to use? Try to use cuda-memcheck to see if there is any memory out of bound scenario.
From my experience XID 31 was always caused by accessing bad pointer (aka Memory access violation).
I'd first pursue this trail. Run your application with cuda memcheck. Like that cuda-memcheck you_app args to your app and see if it finds any wrong memory accesses.
Also try stepping though the code with cuda-gdb or Nsight Eclipse Edition.
I've found that using
cuda-memcheck -b ...
prevents the device from locking up.

Resources