Does #[derive(PartialEq, Eq)] increase code size? - rust

I submitted a patch to derive-builder because I needed the ability to test failures generated. This patch enabled Eq and PartialEq so I could I test for failures using assert_eq!().
The question asked was,
My understanding is that generating unnecessary trait implementations can increase code size.
It is my understanding that implementations that are not used do not generate more code? Which of these two is correct?

Here is what I did to test this theory, I generated a simple binary,
#[derive(Debug)]
struct Foo {
id: i64
}
fn main() {
let a = Foo { id: 42 }; => Foo
println!("Hello, world! [{} {:?}]", a.id, a);
}
I then did the same generation but with #[derive(Debug, PartialEq, Eq)]. In this, I found both generated the same hash; they were identical. Not content, I also tried creating a library and compiling with --release. Same thing. This time with just struct Foo (no main). In this case, I did observe a difference, here is the nuance:
Between the two runs the rlib (rust library) file had a different size.
An rlib is an archive. For me it had three files, one of them ended in cgu.0.rcgu.o the other in cgu.1.rcgu.o, and there was one file that was lib.rmeta.
Of the files in the archive, the *.o files were exactly the same (hashed).
The lib.rmeta file was larger in the library that also derived Eq, and PartialEq.
Now as to the merit of rmeta, the Rust documentation says this,
An rmeta file is custom binary format that contains the metadata for the crate. This file can be used for fast "checks" of a project by skipping all code generation (as is done with cargo check), collecting enough information for documentation (as is done with cargo doc), or for pipelining. This file is created if the --emit=metadata CLI option is used.
rmeta files do not support linking, since they do not contain compiled object files.
So it seems something gets bigger, but that something is ONLY used for tooling purposes.
I tried the above test with the library with and without pub. I would expect if a function was generated that wasn't used it would have at the least resulted in one .o file being larger. I was not able to observe this though.

Related

Is there a way to automatically register trait implementors?

I'm trying to load JSON files that refer to structs implementing a trait. When the JSON files are loaded, the struct is grabbed from a hashmap. The problem is, I'll probably have to have a lot of structs put into that hashmap all over my code. I would like to have that done automatically. To me this seems to be doable with procedural macros, something like:
#[my_proc_macro(type=ImplementedType)]
struct MyStruct {}
impl ImplementedType for MyStruct {}
fn load_implementors() {
let implementors = HashMap::new();
load_implementors!(implementors, ImplementedType);
}
Is there a way to do this?
No
There is a core issue that makes it difficult to skip manually inserting into a structure. Consider this simplified example, where we simply want to print values that are provided separately in the code-base:
my_register!(alice);
my_register!(bob);
fn main() {
my_print(); // prints "alice" and "bob"
}
In typical Rust, there is no mechanism to link the my_print() call to the multiple invocations of my_register. There is no support for declaration merging, run-time/compile-time reflection, or run-before-main execution that you might find in other languages that might make this possible (unless of course there's something I'm missing).
But Also Yes
There are third party crates built around link-time or run-time tricks that can make this possible:
ctor allows you to define functions that are executed before main(). With it, you can have my_register!() create invididual functions for alice and bob that when executed will add themselves to some global structure which can then be accessed by my_print().
linkme allows you to define a slice that is made from elements defined separately, which are combined at compile time. The my_register!() simply needs to use this crate's attributes to add an element to the slice, which my_print() can easily access.
I understand skepticism of these methods since the declarative approach is often clearer to me, but sometimes they are necessary or the ergonomic benefits outweigh the "magic".

Deserialize file using serde_json at compile time

At the beginning of my program, I read data from a file:
let file = std::fs::File::open("data/games.json").unwrap();
let data: Games = serde_json::from_reader(file).unwrap();
I would like to know how it would be possible to do this at compile time for the following reasons:
Performance: no need to deserialize at runtime
Portability: the program can be run on any machine without the need to have the json file containing the data with it.
I might also be useful to mention that, the data can be read only which means the solution can store it as static.
This is straightforward, but leads to some potential issues. First, we need to deal with something: do we want to load the tree of objects from a file, or parse that at runtime?
99% of the time, parsing on boot into a static ref is enough for people, so I'm going to give you that solution; I will point you to the "other" version at the end, but that requires a lot more work and is domain-specific.
The macro (because it has to be a macro) you are looking for to be able to include a file at compile-time is in the standard library: std::include_str!. As the name suggests, it takes your file at compile-time and generates a &'static str from it for you to use. You are then free to do whatever you like with it (such as parsing it).
From there, it is a simple matter to then use lazy_static! to generate a static ref to our JSON Value (or whatever it may be that you decide to go for) for every part of the program to use. In your case, for instance, it could look like this:
const GAME_JSON: &str = include_str!("my/file.json");
#[derive(Serialize, Deserialize, Debug)]
struct Game {
name: String,
}
lazy_static! {
static ref GAMES: Vec<Game> = serde_json::from_str(&GAME_JSON).unwrap();
}
You need to be aware of two things when doing this:
This will massively bloat your file size, as the &str isn't compressed in any way. Consider gzip
You'll need to worry about the usual concerns around multiple, threaded access to the same static ref, but since it isn't mutable you only really need to worry about a portion of it
The other way requires dynamically generating your objects at compile-time using a procedural macro. As stated, I wouldn't recommend it unless you really have a really expensive startup cost when parsing that JSON; most people will not, and the last time I had this was when dealing with deeply-nested multi-GB JSON files.
The crates you want to look out for are proc_macro2 and syn for the code generation; the rest is very similar to how you would write a normal method.
When you are deserializing something at runtime, you're essentially building some representation in program memory from another representation on disk. But at compile-time, there's no notion of "program memory" yet - where will this data deserialize too?
However, what you're trying to achieve is, in fact, possible. The main idea is like following: to create something in program memory, you must write some code which will create the data. What if you're able to generate the code automatically, based on the serialized data? That's what uneval crate does (disclaimer: I'm the author, so you're encouraged to look through the source to see if you can do better).
To use this approach, you'll have to create build.rs with approximately the following content:
// somehow include the Games struct with its Serialize and Deserialize implementations
fn main() {
let games: Games = serde_json::from_str(include_str!("data/games.json")).unwrap();
uneval::to_out_dir(games, "games.rs");
}
And in you initialization code you'll have the following:
let data: Games = include!(concat!(env!("OUT_DIR"), "/games.rs"));
Note, however, that this might be fairly hard to do in ergonomic way, since the necessary struct definitions now must be shared between the build.rs and the crate itself, as I mentioned in the comment. It might be a little easier if you split your crate in two, keeping struct definitions (and only them) in one crate, and the logic which uses them - in another one. There's some other ways - with include! trickery, or by using the fact that the build script is an ordinary Rust binary and can include other modules as well, - but this will complicate things even more.

How do I find the function pointers for tests from the LLVM IR code of a Rust program?

We are developing a mutation testing system based on LLVM. The system supports C++ projects that use GoogleTest and I am trying to support Rust. To do so, we need to accomplish the following steps:
Compile the language into LLVM IR. Rust supports this.
Find the tests in the LLVM IR.
Run the tests the code that is exercised by the tests ("testees").
The challenge is to find the unit test methods via LLVM IR API.
Consider the following example. It has 4 tests and one testee function:
pub fn sum(a: i32, b: i32) -> i32 {
return a + b;
}
pub fn just_print() {
println!("I am just_print() function. I just say hello!");
}
#[test]
fn rusttest_foo_sum1() {
assert!(sum(3, 4) == 7);
}
#[test]
fn rusttest_foo_sum2() {
assert!(sum(4, 5) == 9);
}
#[test]
fn rusttest_foo_sum3() {
assert!(sum(5, 6) == 11);
}
#[test]
fn rusttest_foo_sum4() {
assert!(sum(5, 6) == 11);
}
This is the slightly prettified LLVM IR that is produced when compiling this Rust code.
Having explored that LLVM IR for a while, one can notice that Rust/Cargo run tests via a function main that invokes the test_main_static function which is given arrays of descriptions. Each description is a pair of a test function name and a test function pointer. See the #ref.e at line 47.
Our challenge is to collect function pointers to these tests by parsing this sophisticated struct layout so that later we can run these functions via LLVM JIT by giving it the function pointers we accumulated.
The obvious brute-force approach we are going to take is to run through this struct layout and parse the structs carefully and find the correct offsets of the test functions. This approach appears to be not portable across different versions of Rust or LLVM IR that might change in a future.
What is the easiest and at the same time reliable way of finding the test function pointers, other than the default of parsing the offsets by hand?
This question has been also cross-posted to the Rust forums.
I made it work by using the brute-force approach that I outlined in my question. Using the LLVM C++ API, we:
find a pointer to test_main_static
find the reference to #ref.e in test_main_static
enumerate through #ref.e structure and find the test function pointers
The approach seems to work but our concern is still that it might be not portable across different versions of Rust/LLVM. One of our next steps will be to implement checks of integrity of the LLVM IR produced by rustc --test. Another step will be to try this RustTestFinder on real code bases and see if we have any problems.
I would still appreciate any information about LLVM IR produced by rustc --test that could make things more straightforward.

How to suppress the warning for "drop_with_repr_extern" at a fine granularity?

I am currently experimenting with multi-threading code, and its performance is affected by whether two data members share the same cache line or not.
In order to avoid false-sharing, I need to specify the layout of the struct without the Rust compiler interfering, and thus I use repr(C). However, this same struct also implements Drop, and therefore the compiler warns about the "incompatibility" of repr(C) and Drop, which I care naught for.
However, attempting to silence this futile warning has proven beyond me.
Here is a reduced example:
#[repr(C)]
#[derive(Default, Debug)]
struct Simple<T> {
item: T,
}
impl<T> Drop for Simple<T> {
fn drop(&mut self) {}
}
fn main() {
println!("{:?}", Simple::<u32>::default());
}
which emits #[warn(drop_with_repr_extern)].
I have tried specifying #[allow(drop_with_repr_extern)]:
at struct
at impl Drop
at mod
and neither worked. Only the crate-level suppression worked, which is rather heavy-handed.
Which leads us to: is there a more granular way of suppressing this warning?
Note: remarks on a better way to ensure that two data members are spread over different cache lines are welcome; however they will not constitute answers on their own.
The reason is near the end of rustc_lint/builtin.rs:
The lint does not walk the crate, instead using ctx.tcx.lang_items.drop_trait() to look up all Drop trait implementations within the crate. The annotations are only picked up while walking the crate. I've stumbled upon the same problem in this question. So unless someone changes the lint to actually walk the crate and pick up Drop impls as it goes, you need to annotate the whole crate.

Is there any way to include binary or text files in a Rust library?

I am trying to create a library and I want to include some binary (or text) files in it that will have data which will be parsed at runtime.
My intention is to have control over these files, update them constantly and change the version of the library in each update.
Is this possible via cargo? If so, how can I access these files from my library?
A workaround I thought of is to include some .rs files with structs and/or constants like &str which will store the data but I find it kind of ugly.
EDIT:
I have changed the accepted answer to the one that fits more my case, however take a look at Shepmaster's answer as this can be more suitable in your case.
Disclaimer: I mentioned it in a comment, but let me re-iterate here, as it gives me more space to elaborate.
As Shepmaster said, it is possible to include text or binary verbatim in a Rust library/executable using the include_bytes! and include_str! macros.
In your case, however, I would avoid it. By deferring the parsing of the content to run-time:
you allow building a flawed artifact.
you incur (more) run-time overhead (parsing time).
you incur (more) space overhead (parsing code).
Rust acknowledges this issue, and offers multiple mechanisms for code generation destined to overcome those limitations:
macros: if the logic can be encoded into a macro, then it can be included in a source file directly
plugins: powered up macros, which can encode any arbitrary logic and generate elaborate code (see regex!)
build.rs: an independent "Rust script" running ahead of the compilation proper whose role is to generate .rs files
In your case, the build.rs script sounds like a good fit:
by moving the parsing code there, you deliver a lighter artifact
by parsing ahead of time, you deliver a faster artifact
by parsing ahead of time, you deliver a correct artifact
The result of your parsing can be encoded in different ways, from functions to statics (possibly lazy_static!), as build.rs can generate any valid Rust code.
You can see how to use build.rs in the Cargo Documentation; you'll find there how to integrate it with Cargo and how to create files (and more).
The include_bytes! macro seems close to what you want. It only gives you a reference to a byte array though, so you'd have to do any parsing starting from that:
static HOST_FILE: &'static [u8] = include_bytes!("/etc/hosts");
fn main() {
let host_str = std::str::from_utf8(HOST_FILE).unwrap();
println!("Hosts are:\n{}", &host_str[..42]);
}
If you have UTF-8 content, you can use include_str!, as pointed out by Benjamin Lindley:
static HOST_FILE: &'static str = include_str!("/etc/hosts");
fn main() {
println!("Hosts are:\n{}", &HOST_FILE[..42]);
}

Resources