Runtime plugins in Rust

Runtime plugins in Rust - rust

We have a commercially sold application that is presently written in Java and Python. We are currently looking at moving to Rust for performance and non-crashy reasons.
In our present Java/Python architecture, we have a feature that manages customisations that particular customers want. This involves placing Java jars/classes and python files under a specific folder designated for customisation for specific customers. In the application configuration, the Java classpath and the PYTHON_PATH have this folder precede the folders containing the normal, uncustomised application code. Because of this, any code in this special folder will override the normal, uncustomised behaviour of the application.
We would like to keep this feature in some form when moving to Rust. We certainly want to avoid distributing source code to our customers for the core app (mostly Java now) and have customers compile, which is what we would need to do if we used Rust's module feature.
Is there a way we can we implement this feature when we go to Rust?
Target OS's are a mix of Linux and Windows.

Sounds like you want some kind of plugin architecture, with a dynamic library (also written in Rust) that's loaded at runtime.
Unfortunately, Rust doesn't have a stable ABI yet, meaning that those librarise would have to be compiled with the exact same compiler that built the main application. One workaround is to expose a C ABI from the plugin side, and use C FFI to call it, if you can live with the unsafety and hassle that entails. There's also the abi_stable crate, which might be safer/simpler to use.
Scripting languages might be another avenue to explore. For example, Rhai is a language specifically developed for use in Rust applications, and interoperates as seamlessly as these things get. Of course, performance of the scripted parts will not be as great as native Rust code.

I don't think that it is possible without recompiling it or at least compiling the config.rs file that you intend to create for individual users.
Assuming that the end user does not have Rust installed on their system, a few alternatives might be:
Using .yaml files for loading configs (similar to how GitHub Actions work)
Allowing users to run custom programs (you can use tokio::process to run them in an async manner)
Using rhaiscript (I personally prefer this option)

Taken from the official language docs for the modules feature
you could set up your commercial project in such a way that the source code is threated as an external crate, and then load it into the main project with the path attribute.
A minimal example, already on the docs:
#[path = "thread_files"]
mod thread {
// Load the `local_data` module from `thread_files/tls.rs` relative to
// this source file's directory.
#[path = "tls.rs"]
mod local_data;
}

Related

Loading Linux libraries at runtime

I think a major design flaw in Linux is the shared object hell when it comes to distributing programs in binary instead of source code form.
Here is my specific problem: I want to publish a Linux program in ELF binary form that should run on as many distributions as possible so my mandatory dependencies are as low as it gets: The only libraries required under any circumstances are libpthread, libX11, librt and libm (and glibc of course). I'm linking dynamically against these libraries when I build my program using gcc.
Optionally, however, my program should also support ALSA (sound interface), the Xcursor, Xfixes, and Xxf86vm extensions as well as GTK. But these should only be used if they are available on the user's system, otherwise my program should still run but with limited functionality. For example, if GTK isn't there, my program will fall back to terminal mode. Because my program should still be able to run without ALSA, Xcursor, Xfixes, etc. I cannot link dynamically against these libraries because then the program won't start at all if one of the libraries isn't there.
So I need to manually check if the libraries are present and then open them one by one using dlopen() and import the necessary function symbols using dlsym(). This, however, leads to all kinds of problems:
1) Library naming conventions:
Shared objects often aren't simply called "libXcursor.so" but have some kind of version extension like "libXcursor.so.1" or even really funny things like "libXcursor.so.0.2000". These extensions seem to differ from system to system. So which one should I choose when calling dlopen()? Using a hardcoded name here seems like a very bad idea because the names differ from system to system. So the only workaround that comes to my mind is to scan the whole library path and look for filenames starting with a "libXcursor.so" prefix and then do some custom version matching. But how do I know that they are really compatible?
2) Library search paths: Where should I look for the *.so files after all? This is also different from system to system. There are some default paths like /usr/lib and /lib but *.so files could also be in lots of other paths. So I'd have to open /etc/ld.so.conf and parse this to find out all library search paths. That's not a trivial thing to do because /etc/ld.so.conf files can also use some kind of include directive which means that I have to parse even more .conf files, do some checks against possible infinite loops caused by circular include directives etc. Is there really no easier way to find out the search paths for *.so?
So, my actual question is this: Isn't there a more convenient, less hackish way of achieving what I want to do? Is it really so complicated to create a Linux program that has some optional dependencies like ALSA, GTK, libXcursor... but should also work without it! Is there some kind of standard for doing what I want to do? Or am I doomed to do it the hackish way?
Thanks for your comments/solutions!

I think a major design flaw in Linux is the shared object hell when it comes to distributing programs in binary instead of source code form.
This isn't a design flaw as far as creators of the system are concerned; it's an advantage -- it encourages you to distribute programs in source form. Oh, you wanted to sell your software? Sorry, that's not the use case Linux is optimized for.
Library naming conventions: Shared objects often aren't simply called "libXcursor.so" but have some kind of version extension like "libXcursor.so.1" or even really funny things like "libXcursor.so.0.2000".
Yes, this is called external library versioning. Read about it here. As should be clear from that description, if you compiled your binaries using headers on a system that would normally give you libXcursor.so.1 as a runtime reference, then the only shared library you are compatible with is libXcursor.so.1, and trying to dlopen libXcursor.so.0.2000 will lead to unpredictable crashes.
Any system that provides libXcursor.so but not libXcursor.so.1 is either a broken installation, or is also incompatible with your binaries.
Library search paths: Where should I look for the *.so files after all?
You shouldn't be trying to dlopen any of these libraries using their full path. Just call dlopen("libXcursor.so.1", RTLD_GLOBAL);, and the runtime loader will search for the library in system-appropriate locations.

How to make a fix in one of the shared libraries (.so) in the project on linux?

I want to make a quick fix to one of the project's .so libraries. Is it safe to just recompile the .so and replace the original? Or I have to rebuild and reinstall the whole project? Or it depends?

It depends. Shared library needs to be binary-compatible with your executable.
For example,
if you changed the behaviour of one of library's internal functions, you probably don't need to recompile.
If you changed the size of a struct (e.g. by adding a member) that's known by the application, you will need to recompile, otherwise the library and the application will think the struct is smaller than it is, and will crash when the library tries to read an extra uninitialized member that the application didn't write to.
If you change the type or the position of arguments of any functions visible from the applications, you do need to recompile, because the library will try to read more arguments off the stack than the application has put on it (this is the case with C, in C++ argument types are the part of function signature, so the app will refuse run, rather than crashing).
The rule of thumb (for production releases) is that, if you are not consciously aware that you are maintaining binary compatibility, or not sure what binary compatibility is, you should recompile.

That's certainly the intent of using dynamic libraries: if something in the library needs updating, then you just update the library, and programs that use it don't need to be changed. If the signature of the function you're changing doesn't change, and it accomplishes the same thing, then this will in general be fine.
There are of course always edge cases where a program depends on some undocumented side-effect of a function, and then changing that function's implementation might change the side-effect and break the program; but c'est la vie.

If you have not changed the ABI of the shared library, you can just rebuild and replace the library.

It depends yes.
However, I assume you have the exact same source and compiler that built the other stuff and now if you only change in a .cpp file something, it is fine.
Other things e.g. changing an interface (between the shared lib and the rest of the system) in a header file is not fine.

If you don't change your library binary interface, it's ok to recompile and redeploy only the shared library.
Good references:
How To Write Shared Libraries
The Little Manual of API Design

Importance of compiling single-threaded v. multi-threaded (and lib naming conventions)?

[ EDIT ] ==>
To clarify, in those environments where multiple targets are deployed to the same directory, Planet Earth has decided on a convention to append "d" or "_d" or "_debug" to the "DEBUG" version (of a library or executable). Such a convention can be considered "ubiquitous" and "understood", although (of course) not everybody does this.
SIMILARLY, to resolve ambiguity between "shared" and "static" versions of a library, a common convention is to append something to distinguish between the static-and-shared (like "myfile.lib" for shared-import-lib-on-Windows and "myfile_s.lib" for static-import-lib-on-Windows). While Posix does not have this ambiguity based on file extension, remember that the file extension is not used on the "link line", so it is similarly useful to be able to explicitly specify the "static" or "shared" version of a library.
For the purpose of this question, both "debug/release" and "static/shared" are promoted to "ubiquitous convention to decorate the file name root".
QUESTION: Does any other deployment configuration get "promoted" to this level of "ubiquitous convention" such that it would become explicit in the file target root name?
My current guess is "no". For the answer to be "Yes", it would require: More than one configuration for given target is intended to be "used" (and thus deployed to a common directory, which is the assumed basis for the question).
In the past, we compiled with-and-without "web plug-in" capability, which similarly required that name decoration, but we no longer build those targets (so I won't assert that as an example). Similarly, we sometimes compile with-and-without multi-byte character support, but I hate that, so I won't assert that either.
[ORIGINAL QUESTION]
We're establishing library naming conventions/policy, to be applied across languages and platforms (e.g., we support hybrid products using several languages on different platforms, including C/C++, C#, Java). A particular goal is to ensure we handle targets/resources for mobile development (which is new to us) in addition to our traditional desktop (and embedded) applications.
Of course, one option is to have different paths for targets from different build configurations. For the purpose of this question, the decision is made to have all targets co-locate to a single directory, and to "decorate" the library/resource/executable name to avoid collisions based on build configuration (e.g., "DEBUG" v. "RELEASE", "static lib" v. "shared/DLL", etc.)
Current decision is similar to others on the web, where we append tokens to avoid naming collisions:
MyName.lib (release build, import for shared/dll)
MyName_s.lib (release build, static lib)
MyName_d.lib (debug build, import for shared/DLL)
MyName_ud.lib (Unicode/wide-char, debug, import for shared/DLL)
MyName_usd.lib (Unicode/wide-char, static lib, debug)
(The above are Windows examples, but these policies similarly apply to our POSIX systems.)
These are based on:
d (release or debug)
u (ASCII or Unicode/wide-char)
s (shared/DLL or static-lib)
QUESTION: We do not have legacy applications that must be compiled single-threaded, and my understanding is that (unlike Microsoft) POSIX systems can link single- and multi-threaded targets into a single application without issue. Given today's push towards multi-core and multi-threaded, Is there a need in a large enterprise to establish the following to identify "single-" versus "multi-threaded" compiled targets?
t (single-threaded or multi-threaded) *(??needed??)*
...and did we miss any other target collision, like compile with-and-without STL (on C++)?
As an aside, Microsoft has library naming conventions at:
http://msdn.microsoft.com/en-us/library/aa270400(v=vs.60).aspx and their DLL naming conventions at: http://msdn.microsoft.com/en-us/library/aa270964(v=vs.60).aspx
A similar question on SO a year ago that didn't talk about threading and didn't reference the Microsoft conventions can be found at: What is proper naming convention for MSVC dlls, static libraries and import libraries

You are using an ancient compiler. There is no need to establish such a standard in an enterprise, the vendor has already done this. Microsoft hasn't shipped a single-threaded version of the CRT for the past 13 years. Similarly, Windows has been a Unicode operating system for the past 17 years. It makes zero sense to still write Unicode agnostic code these days.
But yes, the common convention is to append a "d" for the debug build of a library. And to give a DLL version of a library a completely different name.

What are the pro and cons of statically linking a library?

I want to release an application I developed as a hobby both for Linux and Windows. This application depends on boost (and possibly other libraries). The norm for this kind of application (a chess engine) is to provide only an executable file and possibly some helper files.
I tough it would be a good idea to statically link the libraries so the executable would not have any dependencies. So the end user can just put the executable in a directory and start using it.
However, while doing some research online I found some negative comments about statically linking libraries, some even arguing that an application with statically linked libraries would be hardly portable, meaning that it would only run on my system of highly similar systems.
So what are the pros and cons of statically linking library?
I already know that the executable will be bigger. But I can't see why it would make my application less portable.

Pros:
No dependencies.
Cons:
Higher memory usage, as the OS can no longer use a shared copy of the library.
If the library needs to be updated, your application needs to be rebuilt. This is doubly important for libraries that then have security fixes.
Of course, a bigger issue for portability is the lack of source code distribution.

Let's say the static library "A" you include has a dependency on function "B". If this dependency can't be fulfilled by the target system, then your program won't run.
But if you're using dynamic linking, the user could maybe install another version of library "A" that uses function "C" instead of "B", so it can run successfully.

If you link the libraries statically, unless you add the smarts to also check the user's system for the libraries you've linked, you're locking your application to use those versions of the libraries until you update your executable. Security holes happen, and updates happen. (For a chess engine there may not be too much issue, but who knows.)

With dynamically linked libraries, if the library say X, you have linked with is not available at the user system, your code crashes ungracefully leaving the end user wondering.
Whereas, in the case of static libraries everything is fused into the executable, so a condition like above mayn't happen, the executable however will be very bulky.
The above problem in dynamically linked libraries can however, be eliminated by dynamic loading.

Any downsides to using statically linked applications on Linux?

I seen several discussions here on the subject, but wanted to ask about my particular situation:
If I have some 3rd part libraries which my application is using, and I'd like to link them together in order to save myself the hassle in LD_LIBRARY, etc., is there any downside to it on Linux, other then larger file size?
Also, is it possible to statically link only some libraries, and other (standard Linux libraries) to link dynamically?
Thanks.

It is indeed possible to dynamically link against some libraries and statically link against others.
It sounds like what you really want to do is dynamically link against the system libraries, and statically link against the nonstandard ones that a user may not have installed (or that different users may have different installations of).
That's perfectly reasonable.
It's not generally a good idea to statically link against system libraries, especially libc.
It can often make sense to statically link against libraries that do not come with the OS and that will not be distributed with your application.

There are some bits of libc - those that use nsswitch - that need to load libraries dynamically. This can cause problems if you want to produce a completely static binary.
Statically linking your 3rd party libraries into your application should be completely fine.

The statically linked binary will be larger than if you had uses a shared library, but I find that disadvantage outweight the library path hassles, provided I control the distribution of all the libraries involved. If you are dependant on a particular distros shared libraries, then you have no choice but to use dynamic linking.

The main disadvantage I see is your application loses any automatic bugfixes that might be applied to a shared library. On the flip-side you don't get new bugs.

Static linking does not just affect the file size of the library, it also affects the memory footprint and start up time of the application. Dynamically linked libraries are loaded once no matter how many programs use them. Statically linked libraries must be loaded once per program that uses them (because they are now part of that program).

To answer your second question, yes, it is possible to have dynamic and static libraries linked to the same application. Just be careful to avoid interlibrary dependencies so you don't have a problem with library order. You should be able to list the libraries in any arbitrary order. Where I work, we prefer to list them alphabetically.
Edit: To link a static library, use the flag -lfoo. To add a directory to the library search path, use -L/path/to/libfoo.
Edit: You don't have to link a dynamic library. Your program can use a function provided by your compiler to open a dynamic library at run time, or you can link it at compile time and the compiler will resolve the symbols but not include them in the binary. See pjc50's comment below.

Statically linking will make your binary bulky, but you wont need to have a shared version of that library on the target runtime environment. This is especially the case while developing embedded apps.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string