What does Rust's lack of incremental compilation mean, exactly? - rust

This question was asked before Rust officially supported incremental compilation. Rust 1.24.0 and later enable incremental compilation by default for development (debug) builds.
I'm an outsider trying to see if Rust is appropriate for my projects.
I've read that Rust lacks incremental compilation (beta features notwithstanding).
Is this similar to having everything be implemented in the headers in C++ (like in much of Boost)?
If the above is correct, does this limit Rust to rather small projects with small dependencies? (If, say, Qt or KDE were header-only libraries, then programs using them would be extremely painful to develop, since you'd effectively recompile Qt/KDE every time you want to compile your own code.)

In C and C++, a compilation unit is usually a source file and all the header files it transitively includes. An application or library is usually comprised of multiple compilation units that are linked together. An application or library can additionally be linked with other libraries. This means that changing a source file requires recompiling that source file only and then relinking, changing an external library only requires relinking, but changing a header file (whether it's part of the project or external; the compiler can't tell the difference) requires recompiling all source files that use it and then relinking.
In Rust, the crate is the compilation unit. (A crate can be an application or a library.) Rust doesn't use header files; instead, the equivalent information is stored as metadata in the compiled crates (which is faster to parse, and has the same effect as precompiled headers in C/C++). A crate can additionally be linked with other crates. This means that changing any of the source files for a crate requires recompiling the whole crate, and changing a crate requires recompiling all crates that depend on it (currently, this means recompiling from source, even if the API happens to not have changed).
To answer your questions, no, Rust doesn't recompile all dependencies every time you recompile your project; quite the opposite in fact.
Incremental compilation in Rust is about reusing the work done in previous compilations of a crate to speed up compilation times. For example, if you change a module and it doesn't affect the other modules, the compiler would be able to reuse the data that was generated when the other modules were compiled last time. The lack of incremental compilation is usually only a problem with large or complex crates (e.g. those who make heavy use of macros).

Related

What exactly is a "library" in a crate?

I'm a little confused about the concept "library" in rust, which is mentioned from "A crate is a binary or library".
If I'm right, a binary means an executable program (which can be run from shell, for example), but what is a library?
Are they some sort of object files with symbols like .a or .so, which will be linked to my program (like C/C++)
Or they are pure source codes which will be compiled together with my program?
As described by Masklinn, yes, Rust does have prebuilt library formats. However, these are mostly used internally, are finnicky for different compiler versions, and cargo still lacks support for them. In fact, crates.io requires libraries to be "open-source" (as in, you provide the source code, you could still have the source code load from some closed-source dependency), and it distributes the source code to whoever downloads the crate. Then, the source code is effectively compiled with your program (this is where rlibs come in to play, but cargo doesn't expose this to the user). This is also why you're able to inspect the source code for pretty much every crate.
If I'm right, a binary means an executable program (which can be run from shell, for example), but what is a library?
Yes. Specifically, per the Linkage documentation
A runnable executable will be produced. This requires that there is a main function in the crate which will be run when the program begins executing. This will link in all Rust and native dependencies, producing a single distributable binary. This is the default crate type.
Are they some sort of object files with symbols like .a or .so, which will be linked to my program (like C/C++)
Or they are pure source codes which will be compiled together with my program?
Never strictly the latter, but the exact artefact depends, as per the linkage documentation:
A Rust library will be produced. This is an ambiguous concept as to what exactly is produced because a library can manifest itself in several forms. The purpose of this generic lib option is to generate the "compiler recommended" style of library. The output library will always be usable by rustc, but the actual type of library may change from time-to-time.
The documentation then lists the various types of libraries:
rlib, a static library with rust-specific metadata (an augmented .a)
dylib, a dynamic library with rust-specific metadata (an augmented .so)
staticlib, a system static library (an actual .a)
cdylib, a system dynamic library (an actual .so)
I would think "lib" aliases to "rlib" but frankly I have no idea, and as the quote notes that's neither fixed nor documented by design.

What is the exact difference between a Crate and a Package?

I come from a Java background and have recently started with Rust.
The official Rust doc is pretty self-explanatory except the chapter that explains Crates and Packages.
The official doc complicates it with so many ORs and ANDs while explaining the two.
This reddit post explains it a little better, but is not thorough.
What is the exact difference between a Crate and Package in Rust? Where/When do we use them?
Much thanks!
Crates
From the perspective of the Rust compiler, "crate" is the name of the compilation unit. A crate consists of an hierarchy of modules in one or multiple files. This is in contrast to most "traditional" compiled languages like Java, C or C++, where the compilation unit is a single file.
From the perspective of an user, this definition isn't really helpful. Indeed, in most cases, you will need to distinguish between two types of crates:
binary crates can be compiled to executables by the Rust compiler. For example, Cargo, the Rust package manager, is a binary crate translated by the Rust compiler to the executable that you use to manage your project.
library crates are what you'd simply call libraries in other languages. A binary crate can depend on library crates to use functionality supplied by the libraries.
Packages
The concept of packages does not originate in the Rust compiler, but in Cargo, the Rust package manager. At least for simple projects, a package is also what you will check into version control.
A package consists of one or multiple crates, but no more than one library crate.
Creating packages
to create a new package consisting of one binary crate, you can run cargo new
to create a new package consisting of one library crate, you can run cargo new --lib
to create a package consisting of a library as well as one or multiple binaries, you can run either cargo new or cargo new --lib and then modify the package directory structure to add the other crate
When should you use crates, and when should you use packages?
As you can see now, this question doesn't really make sense – you should and must always use both. A package can't exist without at least one crate, and a crate is (at least if you are using Cargo) always part of a package.
Therefore, a better question is this:
When should you put multiple crates into one package?
There are multiple reasons to have more than one crate in a package. For example:
If you have a binary crate, it is idiomatic to have the "business logic" in a library in the same package. This has multiple advantages:
Libraries can be integration tested while binaries can't
If you later decide that the business logic needs to also be used in another binary, it is trivial to add this second binary to the package and also use the library
If you have a library crate that generates some files (a database engine or something like that), you may want to have a helper binary to inspect those files
Note that if you have a very big project, you may instead want to use the workspace feature of Cargo in these cases.

When building a Rust binary with lto=true, is there a way to limit the crates the linker examines?

The program I have ends up using over 110 crates in its build. However, the core performance gains (where 80+%) of the benefit is, resides in only a few 'crates'. Ya, I took the time to drill down through crates that used crates. Consequently, I'd like the linker to use lto options for only those 5-6 crates, rather that looking at all +110. Does anyone know if this is possible? And, if it is, how do I direct the linker to do it? Yes, the difference is just build-time, but its only going to get worse as I add more crates.

Why does `cargo new` create a binary instead of a library?

I am creating a library with Rust. On library creation I type
cargo new name
According to the docs this should create a lib, because --bin is omitted.
However, the file is auto set to a binary.
Is there a setting I have to adjust to disable auto setting all projects to binary?
Cargo features
Cargo’s CLI has one really important change this release: cargo new will now default to generating a binary, rather than a library. We try to keep Cargo’s CLI quite stable, but this change is important, and is unlikely to cause breakage.
For some background, cargo new accepts two flags: --lib, for creating libraries, and --bin, for creating binaries, or executables. If you don’t pass one of these flags, in previous versions of Cargo, it would default to --lib. We made this decision because each binary (often) depends on many libraries, and so the library case is more common. However, this is incorrect; each library is depended upon by many binaries. Furthermore, when getting started, what you often want is a program you can run and play around with. It’s not just new Rustaceans though; even very long-time community members have said that they find this default surprising. As such, we’re changing it.
Source
Since Cargo 1.25 cargo new defaults to creating a binary crate, instead of a library crate.
cargo new accepts two flags: --lib, for creating libraries, and --bin, for creating binaries, or executables.
See the Changelog for 1.25.

How to create a shared object that is statically linked with pthreads and libstdc++ on Linux/gcc?

How to create a shared object that is statically linked with pthreads and libstdc++ on Linux/gcc?
Before I go to answering your question as it was described, I will note that it is not exactly clear what you are trying to achieve in the end, and there is probably a better solution to your problem.
That said - there are two main problems with trying to do what you described:
One is, that you will need to decompose libpthread and libstdc++ to the object files they are made with. This is because ELF binaries (used on Linux) have two levels of "run time" library loading - even when an executable is statically linked, the loader has to load the statically linked libraries within the binary on execution, and map the right memory addresses. This is done before the shared linkage of libraries that are dynamically loaded (shared objects) and mapped to shared memory. Thus, a shared object cannot be statically linked with such libraries, as at the time the object is loaded, all static linked libraries were loaded already. This is one difference between linking with a static library and a plain object file - a static library is not merely glued like any object file into the executable, but still contains separate tables which are referred to on loading. (I believe that this is in contrast to the much simpler static libraries in MS-DOS and classic Windows, .LIB files, but there may be more to those than I remember).
Of course you do not actually have to decompose libpthread and libstdc++, you can just use the object files generated when building them. Collecting them may be a bit difficult though (look for the objects referred to by the final Makefile rule of those libraries). And you would have to use ld directly and not gcc/g++ to link, to avoid linking with the dynamic versions as well.
The second problem is consequential. If you do the above, you will sure have such a shared object / dynamic library as you asked to build. However, it will not be very useful, as once you try to link a regular executable that uses those libpthread/libstdc++ (the latter being any C++ program) with this shared object, it will fail with symbol conflicts - the symbols of the static libpthread/libstdc++ objects you linked your shared object against will clash with the symbols from the standard libpthread/libstdc++ used by that executable, no matter if it is dynamically or statically linked with the standard libraries.
You could of course then try to either hide all symbols in the static objects from libstdc++/libpthread used by your shared library, make them private in some way, or rename them automatically on linkage so that there will be no conflict. However, even if you get that to work, you will find some undesireable results in runtime, since both libstdc++/libpthread keep quite a bit of state in global variables and structures, which you would now have duplicate and each unaware of the other. This will lead to inconsistencies between these global data and the underlying operating system state such as file descriptors and memory bounds (and perhaps some values from the standard C library such as errno for libstdc++, and signal handlers and timers for libpthread.
To avoid over-broad interpretation, I will add a remark: at times there can be sensible grounds for wanting to statically link against even such basic libraries as libstdc++ and even libc, and even though it is becoming a bit more difficult with recent systems and versions of those libraries (due to a bit of coupling with the loader and special linker tricks used), it is definitely possible - I did it a few times, and know of other cases in which it is still done. However, in that case you need to link a whole executable statically. Static linkage with standard libraries combined with dynamic linkage with other objects is not normally feasible.
Edit: One issue which I forgot to mention but is important to take into account is C++ specific. C++ was unfortunately not designed to work well with the classic model of object linkage and loading (used on Unix and other systems). This makes shared libraries in C++ not really portable as they should be, because a lot of things such as type information and templates are not cleanly separated between objects (often being taken, together with a lot of actual library code at compile time from the headers). libstdc++ for that reason is tightly coupled with GCC, and code compiled with one version of g++ will in general only work with the libstdc++ from with this (or a very similar) version of g++. As you will surely notice if you ever try to build a program with GCC 4 with any non-trivial library on your system that was built with GCC 3, this is not just libstdc++. If your reason for wanting to do that is trying to ensure that your shared object is always linked with the specific versions of libstdc++ and libpthread that it was built against, this would not help because a program that uses a different/incompatible libstdc++ would also be built with an incompatible C++ compiler or version of g++, and would thus fail to link with your shared object anyway, aside from the actual libstdc++ conflicts.
If you wonder "why wasn't this done simpler?", a general rumination worth pondering: For C++ to work nicely with dynamic/shared libraries (meaning compatibility across compilers, and the ability to replace a dynamic library with another version with a compatible interface without rebuilding everything that uses it), not just compiler standartization is needed, but at the level of the operating system's loader, the structure and interface of object and library files and the work of the linker would need to be significantly extended beyond the relatively simple Unix classics used on common operating systems (Microsoft Windows, Mach based systems and NeXTStep relatives such as Mac OS, VMS relatives and some mainframe systems also included) for natively built code today. The linker and dynamic loader would need to be aware of such things as templates and typing, having to some extent functionality of a small compiler to actually adapt the library's code to the type given to it - and (personal subjective observation here) it seems that higher-level intermediate intermediate code (together with higher-level languages and just-in-time compilation) is catching ground faster and likely to be standardized sooner than such extensions to the native object formats and linkers.
You mentioned in a separate comment that you are trying to port a C++ library to an embedded device. (I am adding a new answer here instead of editing my original answer here because I think other StackOverflow users interested in this original question may still be interested in that answer in its context)
Obviously, depending on how stripped down your embedded system is (I have not much embedded Linux experience, so I am not sure what is most likely), you may of course be able to just install the shared libstdc++ on it and dynamically link everything as you would do otherwise.
If dynamically linking with libstdc++ would not be good for you or not work on your system (there are so many different levels of embedded systems that one cannot know), and you need to link against a static libstdc++, then as I said, your only real option is static linking the executable using the library with it and libstdc++. You mentioned porting a library to the embedded device, but if this is for the purpose of using it in some code you write or build on the device and you do not mind a static libstdc++, then linking everything statically (aside from perhaps libc) is probably OK.
If the size of libstdc++ is a problem, and you find that your library is actually only using a small part of its interfaces, then I would nonetheless suggest first trying to determine the actual space you would save by linking against only the parts you need. It may be significant or not, I never looked that deep into libstdc++ and I suspect that it has a lot of internal dependencies, so while you surely do not need some of the interfaces, you may or may not still depend on a big part of its internals - I do not know and did not try, but it may surprise you. You can get an idea by just linking a binary using the library against a static build of it and libstdc++ (not forgetting to strip the binary, of course), and comparing the size of the resulting executable that with the total size of a (stripped) executable dynamically linked together with the full (stripped) shared objects of the library and libstdc++.
If you find that the size difference is significant, but do not want to statically link everything, you try to reduce the size of libstdc++ by rebuilding it without some parts you know that you do not need (there are configure-time options for some parts of it, and you can also try to remove some independent objects at the final creation of libstdc++.so. There are some tools to optimize the size of libraries - search the web (I recall one from a company named MontaVista but do not see it on their web site now, there are some others too).
Other than the straightforward above, some ideas and suggestions to think of:
You mentioned that you use uClibc, which I never fiddled with myself (my experience with embedded programming is a lot more primitive, mostly involving assembly programming for the embedded processor and cross-compiling with minimal embedded libraries). I assume you checked this, and I know that uClibc is intended to be a lightweight but rather full standard C library, but do not forget that C++ code is hardly independent on the C library, and g++ and libstdc++ depend on quite some delicate things (I remember problems with libc on some proprietary Unix versions), so I would not just assume that g++ or the GNU libstdc++ actually works with uClibc without trying - I don't recall seeing it mentioned in the uClibc pages.
Also, if this is an embedded system, think of its performance, compute power, overall complexity, and timing/simplicity/solidity requirements. Take into consideration the complexity involved, and think whether using C++ and threads is appropriate in your embedded system, and if nothing else in the system uses those, whether it is worth introducing for that library. It may be, not knowing the library or system I cannot tell (again, embedded systems being such a wide range nowadays).
And in this case also, just a quick link I stumbled upon looking for uClibc -- if you are working on an embedded system, using uClibc, and want to use C++ code on it -- take a look at uClibc++. I do not know how much of the standard C++ stuff you need and it already supports, and it seems to be an ongoing project, so not clear if it is in a state good enough for you already, but assuming that your work is also under development still, it might be a good alternative to GCC's libstdc++ for your embedded work.
I think this guy explains quite well why that wouldn't make sense. C++ code that uses your shared object but a different libstdc++ would link alright, but wouldn't work.

Resources