How do languages without header files export symbols from a closed-source library to clients?

How do languages without header files export symbols from a closed-source library to clients? - programming-languages

Header files are a necessity in C/C++/ObjC because each file needs the definition of all its symbols before being compiled into an object file. One side-effect is that a library distributor that do not want to open its source code can provide a client only with a header file and the .o to be linked. Is this possible in languages that rely on a full view of the source code during compilation, like Java? Are there any other languages with interesting solutions for this use case?
To provide a bit of context: In my (5y+) experience as a software engineer, I've only ever used libraries to which I have direct source code access in Java, Python and Go, and have never developed closed-source libraries. For the first two a closed-source developer could ship bytecode, but I don't see how it's possible to do that with Go, which doesn't rely on header files to forward-declare symbols.

Is this possible in languages that rely on a full view of the source code during compilation, like Java?
Bzzt. Java does not 'rely on a full view of the source code during compilation'. You don't need the source code to use a Java JAR file, right? The export symbol information is in the .class file. Consider the JDK's rt.jar as the most trivial example. If that didn't work, nothing would. And you don't need the source code for that.
In Modula, Ada, etc, it is also in the object code, somehow.
It is really just about only C and C++ that don't have this feature, and therefore must rely on header files.

Related

Loading Linux libraries at runtime

I think a major design flaw in Linux is the shared object hell when it comes to distributing programs in binary instead of source code form.
Here is my specific problem: I want to publish a Linux program in ELF binary form that should run on as many distributions as possible so my mandatory dependencies are as low as it gets: The only libraries required under any circumstances are libpthread, libX11, librt and libm (and glibc of course). I'm linking dynamically against these libraries when I build my program using gcc.
Optionally, however, my program should also support ALSA (sound interface), the Xcursor, Xfixes, and Xxf86vm extensions as well as GTK. But these should only be used if they are available on the user's system, otherwise my program should still run but with limited functionality. For example, if GTK isn't there, my program will fall back to terminal mode. Because my program should still be able to run without ALSA, Xcursor, Xfixes, etc. I cannot link dynamically against these libraries because then the program won't start at all if one of the libraries isn't there.
So I need to manually check if the libraries are present and then open them one by one using dlopen() and import the necessary function symbols using dlsym(). This, however, leads to all kinds of problems:
1) Library naming conventions:
Shared objects often aren't simply called "libXcursor.so" but have some kind of version extension like "libXcursor.so.1" or even really funny things like "libXcursor.so.0.2000". These extensions seem to differ from system to system. So which one should I choose when calling dlopen()? Using a hardcoded name here seems like a very bad idea because the names differ from system to system. So the only workaround that comes to my mind is to scan the whole library path and look for filenames starting with a "libXcursor.so" prefix and then do some custom version matching. But how do I know that they are really compatible?
2) Library search paths: Where should I look for the *.so files after all? This is also different from system to system. There are some default paths like /usr/lib and /lib but *.so files could also be in lots of other paths. So I'd have to open /etc/ld.so.conf and parse this to find out all library search paths. That's not a trivial thing to do because /etc/ld.so.conf files can also use some kind of include directive which means that I have to parse even more .conf files, do some checks against possible infinite loops caused by circular include directives etc. Is there really no easier way to find out the search paths for *.so?
So, my actual question is this: Isn't there a more convenient, less hackish way of achieving what I want to do? Is it really so complicated to create a Linux program that has some optional dependencies like ALSA, GTK, libXcursor... but should also work without it! Is there some kind of standard for doing what I want to do? Or am I doomed to do it the hackish way?
Thanks for your comments/solutions!

I think a major design flaw in Linux is the shared object hell when it comes to distributing programs in binary instead of source code form.
This isn't a design flaw as far as creators of the system are concerned; it's an advantage -- it encourages you to distribute programs in source form. Oh, you wanted to sell your software? Sorry, that's not the use case Linux is optimized for.
Library naming conventions: Shared objects often aren't simply called "libXcursor.so" but have some kind of version extension like "libXcursor.so.1" or even really funny things like "libXcursor.so.0.2000".
Yes, this is called external library versioning. Read about it here. As should be clear from that description, if you compiled your binaries using headers on a system that would normally give you libXcursor.so.1 as a runtime reference, then the only shared library you are compatible with is libXcursor.so.1, and trying to dlopen libXcursor.so.0.2000 will lead to unpredictable crashes.
Any system that provides libXcursor.so but not libXcursor.so.1 is either a broken installation, or is also incompatible with your binaries.
Library search paths: Where should I look for the *.so files after all?
You shouldn't be trying to dlopen any of these libraries using their full path. Just call dlopen("libXcursor.so.1", RTLD_GLOBAL);, and the runtime loader will search for the library in system-appropriate locations.

Interpreted standard library

It's common for a programming language to come with a standard library implemented at least partly in the language itself.
In the case of an interpreted language, the obvious implementation is to read the library source files when the interpreter starts up, but this runs into the messy but persistent problem of making sure the interpreter knows where to find those files even when both are moved around. It would be cleaner if they could be embedded in the interpreter itself, so there is just a single executable.
I can see a simple way to do this by just translating the library source files to C literal strings, but I'm curious as to whether there are any pitfalls I'm overlooking or refinements to the method.
So my question is, what existing interpreted languages attach library source files in the language itself, to the interpreter?

Bytecode virtual machines often provide an answer to this: store the bytecode in files (*.pyc, *.rbc) and load the bytecoded versions of the libraries using a simpler mechanism.
Smalltalks do this by dumping the standard heap into a separate file called an "image".
As for single-file distribution, append the library file(s) to the end of the executable file, and include special logic for the interpreter to read from its binary and find a structure of those interpretable program data, or alternatively build the interpreter with a static inclusion of the program data.

mixing code compiled with /MT and /MD

I have a large body of code, compiled with /MT (i.e. expecting to statically link against the CRT). I need to combine this with a static third-party library, which has been built with /MD (i.e. expecting to link the CRT dynamically).
Is it theoretically possible to link the two into one executable without recompiling either?
If I link with /nodefaultlib:msvcrt, I end up with a small number of undefined references to things like __imp__wgetenv. I'm tempted to try implementing those functions in my own code, forwarding to wgetenv, etc. Is that worth trying, or will I run straight into the next problem?
Unfortunately I'm Forbidden from taking the easy option of packing the thirdparty code into a separate DLL :-/

No. /MT and /MD are mutually exclusive.
All modules passed to a given invocation of the linker must have been compiled with the same run-time library compiler option (/MD, /MT, /LD).
Source

I found such solution in OpenSSL sources: All obj files of the library are compiled with combination: /MT /Zl. As author described, such combination allows to build static library with ability to compile with applications either dynamic CRT (/MD) or static CRT (/MT).

I faced similar situation where in I had two libraries one was built with MT and another one with MD. I had to build an executable which uses functionalities from both the libraries. The library built as MD was third party thus I couldn't rebuilt it and library built as MT has many dependencies and to built all of them as MD is a big pain. I was getting error from the third party config header file which made it mandatory to built the executable as MD. I was looking for the easy way of packaging third party dll as a separate dll as mentioned in question. However, I couldn't find enough explanation online on this easy way. Hence my two cents below.
The following is the way I circumvent it
I built another .dll which acted as an interface. This interface basically wrapped all api calls that was made to third party dll. The header file for this interface did not include any header file from third party dll rather all those header files were included in the interface.cpp file. Interface as you expect was built as MD.
Now In my main.cpp file I included this interface header file to make all the calls to third party dll through the interface.
Extra care has to be taken in passing arguments to the interface. Basic variables like int,bool etc can be passed as value. However any class or structure needs to be passed as const reference to avoid heap corruption. This is applicable to even string.
Happy to share more details if it is not clear!

Any downsides to using statically linked applications on Linux?

I seen several discussions here on the subject, but wanted to ask about my particular situation:
If I have some 3rd part libraries which my application is using, and I'd like to link them together in order to save myself the hassle in LD_LIBRARY, etc., is there any downside to it on Linux, other then larger file size?
Also, is it possible to statically link only some libraries, and other (standard Linux libraries) to link dynamically?
Thanks.

It is indeed possible to dynamically link against some libraries and statically link against others.
It sounds like what you really want to do is dynamically link against the system libraries, and statically link against the nonstandard ones that a user may not have installed (or that different users may have different installations of).
That's perfectly reasonable.
It's not generally a good idea to statically link against system libraries, especially libc.
It can often make sense to statically link against libraries that do not come with the OS and that will not be distributed with your application.

There are some bits of libc - those that use nsswitch - that need to load libraries dynamically. This can cause problems if you want to produce a completely static binary.
Statically linking your 3rd party libraries into your application should be completely fine.

The statically linked binary will be larger than if you had uses a shared library, but I find that disadvantage outweight the library path hassles, provided I control the distribution of all the libraries involved. If you are dependant on a particular distros shared libraries, then you have no choice but to use dynamic linking.

The main disadvantage I see is your application loses any automatic bugfixes that might be applied to a shared library. On the flip-side you don't get new bugs.

Static linking does not just affect the file size of the library, it also affects the memory footprint and start up time of the application. Dynamically linked libraries are loaded once no matter how many programs use them. Statically linked libraries must be loaded once per program that uses them (because they are now part of that program).

To answer your second question, yes, it is possible to have dynamic and static libraries linked to the same application. Just be careful to avoid interlibrary dependencies so you don't have a problem with library order. You should be able to list the libraries in any arbitrary order. Where I work, we prefer to list them alphabetically.
Edit: To link a static library, use the flag -lfoo. To add a directory to the library search path, use -L/path/to/libfoo.
Edit: You don't have to link a dynamic library. Your program can use a function provided by your compiler to open a dynamic library at run time, or you can link it at compile time and the compiler will resolve the symbols but not include them in the binary. See pjc50's comment below.

Statically linking will make your binary bulky, but you wont need to have a shared version of that library on the target runtime environment. This is especially the case while developing embedded apps.

What types of executables can be decompiled?

I think that java executables (jar files) are trivial to decompile and get the source code.
What about other languages? .net and all?
Which all languages can compile only to a decompile-able code?

In general, languages like Java, C#, and VB.NET are relatively easy to decompile because they are compiled to an intermediary language, not pure machine language. In their IL form, they retain more metadata than C code does when compiled to machine language.
Technically you aren't getting the original source code out, but a variation on the source code that, when compiled, will give you the compiled code back. It isn't identical to the source code, as things like comments, annotations, and compiler directives usually aren't carried forward into the compiled code.

Managed languages can be easily decompiled because executable must contain a lot of metadata to support reflection.
Languages like C++ can be compiled to native code. Program structure can be totally changed during compilation\translation processes.
Compiler can easily replace\merge\delete parts of your code. There is no 1 to 1 relationship between original and compiled (native) code.

.NET is very easy to decompile. The best tool to do that would be the .NET reflector recently acquired by RedGate.

Most languages can be decompiled but some are easier to decompile than others. .Net and Java put more information about the original program in the executables (method names, variable names etc.) so you get more of your original information back.
C++ for example will translate variables and functions etc. to memory adresses (yeah I know this is a gross simplification) so the decompiler won't know what stuff was called. But you can still get some of the structure of the program back though.

VB6 if compiled to pcode is also possible to decompile to almost full source using P32Dasm, Flash (or actionscript) is also possible to decompile to full source using something like Flare

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string