Dynamically loading CUDA [closed]

Dynamically loading CUDA [closed] - linux

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed yesterday.
Improve this question
I'm trying to add CUDA functionality to an existing code. The desired result is that if the user has cuda runtime installed on their machine, the code will use their cuda runtime (using dlopen) to check if a CUDA enabled GPU is available and then run the CUDA code on it if that's true. Otherwise, run the original non-GPU accelerated code. However, there are some gaps in my understanding of libraries and CUDA that make this tricky for me.
The code compiles just fine if I specify the location of the required CUDA libraries (cudart and cublas) and dynamically link them. However, I tried not linking these libraries and instead wrapping 'everything' I need using dlopen and dlsym to get handles to the functions I need. However, compilation fails when it gets to actual device code (definitions for angle bracket code) because it's looking for things like __cudaRegisterFunction during compile time. I've replaced the angle bracket calls with a wrapped version of cudaLaunchKernel but still get this issue, possibly because the definitions of the machine code themselves require some special calls.
Some fundamental things I'm unsure about are when the symbols in a shared lib have to be resolved. For example, let's say the user does not have cudart.so, is it possible for me to just not run any cudart/cuda code and avoid any runtime issues involving finding references to functions contained in this library? Or do all cudart.so functions need to be found in the .so file regardless of whether or not they're used? If the answer to this question is that only functions that are used need to be resolved, would this not obviate the need for wrapping functions via dlopen/dlsym? Another question somewhat related to this is: can you compile cuda code without linking to cudart? I may be confusing two separate issues in that it might be necessary to link to cudart.so when compiling CUDA code but that does not mean you are actually using cudart.so during runtime.
It's entirely possible I'm going about this the entirely wrong way so hopefully the general statement of what I'm trying to do can get me to some working answer.

Related

Porting duktape, getting duk_create_heap error during JS compilation of builtin initjs

This question might be too detailed for this forum, but I could not find a mailing list for duktape. Maybe this question will be useful for others trying to get duktape running on more obscure hardware.
I am trying to get duktape to work on an old ColdFire CPU, using an OLD gcc compiler (2.95.3). The board has limited resources (flash/RAM) but I seem to have enough of both. I must live with the old compiler.
I believe the duk_config.h is calculating the right options regarding endianness, etc. I am using a number of the duktape options to reduce code and data size. I have successfully used the same configuration on 64 and 32 bit Ubuntu and it works fine.
The "properties string" that is formed and set in duk_hthread_create_builtin_objects() is:
"bb u pnRHSBOL p2 a8 generic linux gcc" which seems correct (not sure of the effect of the "generic" tag for architecture).
I am getting a failure when calling duk_create_heap(). I have isolated the problem to a what I believe is a JS compile error related to duk_initjs. If I undef DUK_USE_BUILTIN_INITJS, initialization works. The error is a syntax error (not sure where yet). By running "strings" on my executable, I can see that the javascript program source string is there. As a side issue, when this error occurs, the longjmp doesn't work (setjmp never called?) so my fatal handler gets called, but I don't care about for now.
I thought it might be my small C stack (as it appears the js compiler uses recursion) but making the stack much larger didn't help.
I am starting to dig into the JS compiler, but this must be an issue with the architecture or my environment. Any suggestions appreciated!
EDIT: I just now noticed a post of a similar issue, and there was a request to repeat with "-DDUK_OPT_DEBUG -DDUK_OPT_DPRINT -DDUK_OPT_ASSERTIONS -DDUK_OPT_SELF_TESTS" I will try to use these options (if possible, I am very close to a relocation limit on my executable).

There was a bug in 1.4.0 release (https://github.com/svaarala/duktape/pull/550) which caused duk_config.h to incorrectly end up with an unpacked value representation even when the architecture supported packed representation. This might be an issue in your case - try adding and explicit -DDUK_OPT_PACKED_TVAL (which forces Duktape to use packed representation) to see if it helps.

How to resolve current process symbols in LLVM MCJIT based JIT?

I'm creating a simple MCJIT based JIT (implementing Kaleidoscope tutorial in Rust to be more precise). I'm using SectionMemoryManager::getSymbolAddress for symbol resolution. It sees symbols from libraries (e.g. sin function), but fails to resolve functions from my program (global, visible with nm, marked there by T). Is this the expected behavior? Or should it be some error in my code?
If this is the expected behavior, how should I properly resolve symbols from the current process? I'm adding symbols from the process with LLVMAddSymbol now, so resolution starts to work. Is this the right solution?
For those, who'll read my code. The problem with symbols is not related with the name mangling, as when I tried to make SectionMemoryManager::getSymbolAddress work, I used no_mangle directive, so they were named properly.

Thanks to Lang Hames, he has answered my question in other place. I cite the answer here for the case if somebody will look at the same problem as me:
In answer to your question: SectionMemoryManager::getSymbolAddress eventually (through the RTDyldMemoryManager base class) makes a call to llvm::sys::DynamicLibrary::SearchForAddressOfSymbol, which searches all previously loaded dynamic libraries for the symbol. You can call to llvm::sys::DynamicLibrary::LoadLibraryPermanently(nullptr) as part of your JIT initialisation (before any calls to getSymbolAddress) to import the program's symbols into DynamicLibrary's symbol tables.
If you really want to expose all functions in your program to the JIT'd code this is a good way to go. If you only want to expose a limited set of runtime functions you can put them in a shared library and just load that.

Linux: How to find out which (sub) dependency of my library needs a specific library?

The title may seem complicated.
I made a library to be loaded within a Tcl script. Now I need to transfer it to Ubuntu 12.04.
Tclsh gives the following error:
couldn't load file "/apollo/applications/Linux-PORT/i586/lib/libapmntwraptcl.so":
**libgeos-3.4.2.so**:
cannot open shared object file: No such file or directory
while executing "load $::env(ACCLIB)/libapmntwraptcl[info sharedlibextension]"
The library libgeos doesn't have the version 3.4.2 under Ubuntu 12.04. So I need to know which (sub) dependency of my library needs the famous libgeos-3.4.2.so, so that I can rebuild it or find an alternative.
Many thanks in advance.
Edit:
Thank you for your USEFUL answers. I already did ldd -v or -r. I have 200+ dependencies when I do ldd -r. The worst is, in the result list I see libgeos-3.3.8.so => /usr/lib/libgeos-3.3.8.so (0xb3ea9000) (version I have), but when I execute, Tclsh says
libgeos-3.4.2.so missing.
That's why I need something able to tell me the complete dependency tree of my library.
Could anyone give me a hint (not some useless showoff)?
Thank you so much.

You've accidentally (probably through no fault of your own) wandered into “DLL Hell”; the problem is that something that libapmntwraptcl.so depends on, possibly indirectly, does not have its dependencies satisfied. This sort of thing can be very difficult to solve precisely because the tools that know what went wrong (in particular, the system dynamic linker library) produce such little informative output by default.
What's even worse is that you have apparently multiple versions about. That's where DLL Hell reaches its worst incarnation. You need to be a detective to solve this; it's too hard to sensibly do remotely as many of the things that you poke your fingers at are determined by what previous steps said.
You need to identify exactly what versions you're loading, with ldd libapmntwraptcl.so (in your shell, not in Tcl). You also need to double check what your environment variables are immediately before the offending load command, as several of them can affect the loading process. The easiest way to do that is to put parray env just before the offending load, which will produce a dump of everything in the context where things could be failing; reading the manual page for ld.so will tell you a lot more about each of the possible candidates for trouble (there's many!).
You might also need to go through the list of libraries identified by the ldd program above and check whether each of those also has all their dependencies satisfied and in a way that you expect, and you should also bear in mind that failing to locate with ldd might not mean that the code actually fails. (That would be too easy.)
You can also try setting the LD_DEBUG environment variable to all before doing the load. That will produce quite a lot of information on standard out; maybe it will give you enough to figure out what is going wrong?
Finally, on Linux you need to bear in mind that there can be an RPATH set for a particular library (which can affect where it is found) and there's a system library cache which can also affect things.
I'm really sorry the error message isn't better. All I can really say is that it's exactly as much as Tcl is told about what went wrong, and its hardly anything.

Finding the shared library name to use with dlload

In my open-source project Artha I use libnotify for showing passive desktop notifications to the user.
Instead of statically linking libnotify, a lookup at runtime is made for the shared object (.so) file via dlload, if available on the target machine, Artha exposes the notification feature in it's GUI. On app. start, a call to dlload with filename param as libnotify.so.1 is made and if it returns a non-null pointer, then the feature is exposed.
A recurring problem with this model is that every time the version number of the library is bumped, Artha's code needs to be updated, currently libnotify.so.4 is the latest to entail such an occurance.
Is there a linux system call (irrespective of the distro the app. is running on), which can tell me if a particular library's shared object is available at runtime? I know that there exists the bruteforce option of enumerating the library by going from 1 to say 10, I find the solution ugly and inelegant.
Also, if this can be addressed via autoconf, then that solution is welcome too I.e. at build time, based on the target machine, the configure.h generated should've the right .so name that can be passed to dlload.
P.S.: I think good distros follow the style of creating links to libnotify.so.x so that a programmer can just do dlload("libnotify.so", RTLD_LAZY) and the right version numbered .so is loaded; unfortunately not all distros follow this, including Ubuntu.

The answer is: you don't.
dlopen() is not designed to deal with things like that, and trying to load whichever soversion you find on the system just because it happens to have the symbols you need is not a good way to do it.
Different sonames have different ABIs, and different ABIs means that you may be calling the same exact symbol name that is expecting a different set (or different size) of parameters, which will cause crashes or misbehaviour that are extremely difficult do debug.
You should have a read on how shared object versions work and what an ABI is.
The libfoo.so link is there for the link editor (ld) and is usually installed with the -devel packages for that reason; it might also very well not be a link but rather a text file with a linker script, often times on purpose to avoid exactly what you're trying to do.

Experience building and using Qt Embedded

I am currently trying to compile and build QT for Embedded Linux on an Ubuntu box for ARM architecture. So far, I have run into MANY errors while trying to MAKE. The biggest one being a 2000 line C++ function which caused a compiler error. What are other peoples experiences with this and how did you fix it?

My experience has always been favorable, given:
You must follow every single instruction in the installation instructions for Qt, without exception. Every time I've run into compilation errors, it's been because I tried to just do it quickly, instead of reading the attached documentation for that specific platform.
I'd review the instructions - there's probably some minor thing that needs to be done first, which will most likely eliminate your errors.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string