Deserialize file using serde_json at compile time - rust

At the beginning of my program, I read data from a file:
let file = std::fs::File::open("data/games.json").unwrap();
let data: Games = serde_json::from_reader(file).unwrap();
I would like to know how it would be possible to do this at compile time for the following reasons:
Performance: no need to deserialize at runtime
Portability: the program can be run on any machine without the need to have the json file containing the data with it.
I might also be useful to mention that, the data can be read only which means the solution can store it as static.

This is straightforward, but leads to some potential issues. First, we need to deal with something: do we want to load the tree of objects from a file, or parse that at runtime?
99% of the time, parsing on boot into a static ref is enough for people, so I'm going to give you that solution; I will point you to the "other" version at the end, but that requires a lot more work and is domain-specific.
The macro (because it has to be a macro) you are looking for to be able to include a file at compile-time is in the standard library: std::include_str!. As the name suggests, it takes your file at compile-time and generates a &'static str from it for you to use. You are then free to do whatever you like with it (such as parsing it).
From there, it is a simple matter to then use lazy_static! to generate a static ref to our JSON Value (or whatever it may be that you decide to go for) for every part of the program to use. In your case, for instance, it could look like this:
const GAME_JSON: &str = include_str!("my/file.json");
#[derive(Serialize, Deserialize, Debug)]
struct Game {
name: String,
}
lazy_static! {
static ref GAMES: Vec<Game> = serde_json::from_str(&GAME_JSON).unwrap();
}
You need to be aware of two things when doing this:
This will massively bloat your file size, as the &str isn't compressed in any way. Consider gzip
You'll need to worry about the usual concerns around multiple, threaded access to the same static ref, but since it isn't mutable you only really need to worry about a portion of it
The other way requires dynamically generating your objects at compile-time using a procedural macro. As stated, I wouldn't recommend it unless you really have a really expensive startup cost when parsing that JSON; most people will not, and the last time I had this was when dealing with deeply-nested multi-GB JSON files.
The crates you want to look out for are proc_macro2 and syn for the code generation; the rest is very similar to how you would write a normal method.

When you are deserializing something at runtime, you're essentially building some representation in program memory from another representation on disk. But at compile-time, there's no notion of "program memory" yet - where will this data deserialize too?
However, what you're trying to achieve is, in fact, possible. The main idea is like following: to create something in program memory, you must write some code which will create the data. What if you're able to generate the code automatically, based on the serialized data? That's what uneval crate does (disclaimer: I'm the author, so you're encouraged to look through the source to see if you can do better).
To use this approach, you'll have to create build.rs with approximately the following content:
// somehow include the Games struct with its Serialize and Deserialize implementations
fn main() {
let games: Games = serde_json::from_str(include_str!("data/games.json")).unwrap();
uneval::to_out_dir(games, "games.rs");
}
And in you initialization code you'll have the following:
let data: Games = include!(concat!(env!("OUT_DIR"), "/games.rs"));
Note, however, that this might be fairly hard to do in ergonomic way, since the necessary struct definitions now must be shared between the build.rs and the crate itself, as I mentioned in the comment. It might be a little easier if you split your crate in two, keeping struct definitions (and only them) in one crate, and the logic which uses them - in another one. There's some other ways - with include! trickery, or by using the fact that the build script is an ordinary Rust binary and can include other modules as well, - but this will complicate things even more.

Related

Is there a way to automatically register trait implementors?

I'm trying to load JSON files that refer to structs implementing a trait. When the JSON files are loaded, the struct is grabbed from a hashmap. The problem is, I'll probably have to have a lot of structs put into that hashmap all over my code. I would like to have that done automatically. To me this seems to be doable with procedural macros, something like:
#[my_proc_macro(type=ImplementedType)]
struct MyStruct {}
impl ImplementedType for MyStruct {}
fn load_implementors() {
let implementors = HashMap::new();
load_implementors!(implementors, ImplementedType);
}
Is there a way to do this?
No
There is a core issue that makes it difficult to skip manually inserting into a structure. Consider this simplified example, where we simply want to print values that are provided separately in the code-base:
my_register!(alice);
my_register!(bob);
fn main() {
my_print(); // prints "alice" and "bob"
}
In typical Rust, there is no mechanism to link the my_print() call to the multiple invocations of my_register. There is no support for declaration merging, run-time/compile-time reflection, or run-before-main execution that you might find in other languages that might make this possible (unless of course there's something I'm missing).
But Also Yes
There are third party crates built around link-time or run-time tricks that can make this possible:
ctor allows you to define functions that are executed before main(). With it, you can have my_register!() create invididual functions for alice and bob that when executed will add themselves to some global structure which can then be accessed by my_print().
linkme allows you to define a slice that is made from elements defined separately, which are combined at compile time. The my_register!() simply needs to use this crate's attributes to add an element to the slice, which my_print() can easily access.
I understand skepticism of these methods since the declarative approach is often clearer to me, but sometimes they are necessary or the ergonomic benefits outweigh the "magic".

Is it possible to write a file containing macros-gathered data at compile time?

I have a couple of macro which collect signatures of some of my defined functions and then generate a function which can emit this data as &`static str.
E.g.
// stores 'test_fn1() -> u8' inside lazy_static variable
#[test_macro]
fn test_fn1() -> u8 {}
// stores 'test_fn2(u8) -> u8' inside the same lazy_static variable
#[test_macro]
fn test_fn2(arg1: u8) -> u8 {}
// generates a function 'get_signatures' that emits everything lazy_static variable has
generate_get_signatures!();
I want to output this data to some separate file during the same compilation process.
Is it possible?
It is, currently, technically possible for a macro to write a file. However:
The Rust compiler does not guarantee that macros will be invoked in any order, or only once, or even at all (e.g. in case of partially recompiling a modified crate) — citation. Therefore, you cannot rely on generate_get_signatures being able to see test_macro's results.
There is interest in executing proc macros inside a WebAssembly sandbox, for security and deterministic builds. If this were implemented, you would no longer even be able to write files.
It is just not recommended for proc-macros to have side effects.
Proc macros are supposed to be “pure functions” from input source code to output source code, independently at each macro call site. Don't use them to produce other outputs. Use build.rs if you need a complex code generation step.
If your problem is not to actually generate a file but to be able to discover all of the individual #[test_macro] annotated functions, then you may find inventory or linkme useful; they provide ways to access (at run time) information that is gathered from multiple sources in the code without needing to hardcode a centralized list.

Is it possible to define structs at runtime or otherwise achieve a similar effect?

I want to create a function (for a library) which will output a struct for any CSV which contains all the columns and their data. This means that the column names (unless explicitly provided by the user) will not be known until runtime.
Is it possible to create a struct definition at runtime or mutate an existing struct? If so, how?
For example, how can I mutate the following struct structure:
struct Point {
x: String,
y: String,
}
To the following (in memory only):
struct Point {
x: String,
y: String,
z: String,
}
This behaviour is possible in languages such as Python, but I am not sure if it is possible in compiled languages such as Rust.
No, it is not possible.
Simplified, at compile time, the layout (ordering, offset, padding, etc.) of every struct is computed, allowing the size of the struct to be known. When the code is generated, all of this high-level information is thrown away and the machine code knows to jump X bytes in to access field foo.
None of this machinery to convert source code to machine code is present in a Rust executable. If it was, every Rust executable would probably gain several hundred megabytes (the current Rust toolchain weighs in at 300+MB).
Other languages work around this by having a runtime or interpreter that is shared. You cannot take a Python source file and run it without first installing a shared Python interpreter, for example.
Additionally, Rust is a statically typed language. When you have a value, you know exactly what fields and methods are available. There is no way to do this with dynamically-generated structs — there's no way to tell if a field/method actually exists when you write the code that attempts to use it.
As pointed out in the comments, dynamic data needs a dynamic data structure, such as a HashMap.

Is there any way to include binary or text files in a Rust library?

I am trying to create a library and I want to include some binary (or text) files in it that will have data which will be parsed at runtime.
My intention is to have control over these files, update them constantly and change the version of the library in each update.
Is this possible via cargo? If so, how can I access these files from my library?
A workaround I thought of is to include some .rs files with structs and/or constants like &str which will store the data but I find it kind of ugly.
EDIT:
I have changed the accepted answer to the one that fits more my case, however take a look at Shepmaster's answer as this can be more suitable in your case.
Disclaimer: I mentioned it in a comment, but let me re-iterate here, as it gives me more space to elaborate.
As Shepmaster said, it is possible to include text or binary verbatim in a Rust library/executable using the include_bytes! and include_str! macros.
In your case, however, I would avoid it. By deferring the parsing of the content to run-time:
you allow building a flawed artifact.
you incur (more) run-time overhead (parsing time).
you incur (more) space overhead (parsing code).
Rust acknowledges this issue, and offers multiple mechanisms for code generation destined to overcome those limitations:
macros: if the logic can be encoded into a macro, then it can be included in a source file directly
plugins: powered up macros, which can encode any arbitrary logic and generate elaborate code (see regex!)
build.rs: an independent "Rust script" running ahead of the compilation proper whose role is to generate .rs files
In your case, the build.rs script sounds like a good fit:
by moving the parsing code there, you deliver a lighter artifact
by parsing ahead of time, you deliver a faster artifact
by parsing ahead of time, you deliver a correct artifact
The result of your parsing can be encoded in different ways, from functions to statics (possibly lazy_static!), as build.rs can generate any valid Rust code.
You can see how to use build.rs in the Cargo Documentation; you'll find there how to integrate it with Cargo and how to create files (and more).
The include_bytes! macro seems close to what you want. It only gives you a reference to a byte array though, so you'd have to do any parsing starting from that:
static HOST_FILE: &'static [u8] = include_bytes!("/etc/hosts");
fn main() {
let host_str = std::str::from_utf8(HOST_FILE).unwrap();
println!("Hosts are:\n{}", &host_str[..42]);
}
If you have UTF-8 content, you can use include_str!, as pointed out by Benjamin Lindley:
static HOST_FILE: &'static str = include_str!("/etc/hosts");
fn main() {
println!("Hosts are:\n{}", &HOST_FILE[..42]);
}

How to implement a custom allocator?

I am looking for a way to implement something like a memory pool in Rust.
I want to allocate a set of related small objects in chunks, and delete the set of objects at once. The objects won't be freed separately. There are several benefits to this approach:
It reduces fragmentation.
It saves memory.
Is there any way to create a allocator like this in Rust?
It sounds like you want the typed arena crate, which is stable and can be used in Rust 1.0.
extern crate typed_arena;
#[derive(Debug)]
struct Foo {
a: u8,
b: u8,
}
fn main() {
let allocator = typed_arena::Arena::new();
let f = allocator.alloc(Foo { a: 42, b: 101 });
println!("{:?}", f)
}
This does have limitations - all the objects must be the same. In my usage, I have a very small set of types that I wish to have, so I have just created a set of Arenas, one for each type.
If that isn't suitable, you can look to arena::Arena, which is unstable and slower than a typed arena.
The basic premise of both allocators is simple - you allow the arena to consume an item and it moves the bits around to its own memory allocation.
Another meaning for the word "allocator" is what is used when you box a value. It is planned that Rust will gain support for "placement new" at some point, and the box syntax is reserved for that.
In unstable versions of Rust, you can do something like box Foo(42), and a (hypothetical) enhancement to that would allow you to say something like box my_arena Foo(42), which would use the specified allocator. This capability is a few versions away from existing it seems.
Funny thing is, the allocator you want is already available in arena crate. It is unstable, so you have to use nightlies to use this crate. You can look at its sources if you want to know how it is implemented.
You may want to look at arena::TypedArena in the standard library (Note: this is not stable and, as a result, is only available in nightly builds).
If this doesn't fit your needs, you can always examine the source code (you can click the [src] link in the top right of the documentation) to see how it's done.

Resources