My question is: Is there a way to get the exact module resolution entry point in the proc-macro stage?
First off some background info on what I'm trying to achieve.
I'm in the process of writing a crate that can automatically implement various traits on a struct, such as PartialEq and Into.
This is somewhat special since these traits involve other structs that can theoretically be located anywhere else. I then need to get the actual tokens of that struct, so I can do some crude type checking and automatically map fields that have the same names.
For this reason, I'm doing some crude module resolution of my own by searching the file tree and parsing some files in the current crate.
Such an invocation currently looks like this:
#[derive(InterStruct)]
#[into("crate::into_test::IntoStruct")]
pub struct FromStruct {
...
}
This will now implement
impl From<FromStruct> for crate::into_test::IntoStruct {
fn from(from: FromStruct) -> Self {
...
}
}
This logic already works, if the module containing the struct is located in the $CARGO_MANIFEST_DIR/src folder.
However, if I would want to run this logic in an integration tests folder, this is where it gets tricky.
I couldn't find a way to detect the actual entry point for module resolution during the proc-macro stage. The only thing that's exposed is the $CARGO_MANIFEST_DIR, but there seems to be now way to detect whether we start at src/main.rs, src/lib.rs or tests/some_test.rs.
This get's even more complicated as I'm currently trying to test compile time errors via compiletest-rs.
compiletest-rs (if I understood correctly) creates a temporary directory, copies the file to test to $TEMPDIR/main.rs and directly calls rustc with the exact path to the dependency directories of your project (e.g. '-L target/debug').
Since there's no cargo involved, the rustc call inherits the $CARGO_MANIFEST_DIR environment variable from the parent process. This then points to the actual crate root instead of the $TEMPDIR.
I would really like to properly test the error cases of my crate, but I can't find a way to get the module resolution entry point in the proc-macro stage.
Related
Suppose I have this file hierarchy in a Rust package:
src/...
src/m1/mod.rs
src/m1/path/m2.rs
What would be the practical difference between having the line:
pub mod path::m2;
in my file m1/mod.rs, versus having the line:
pub use path::m2;
Trying to refresh my understanding of Rust after a time away, so this isn't my first learning cycle. (Of course, for other readers it may be.) I'm saying this because I'm not asking for a general explanation of the differences between use and mod. My unclarity is specific to the two directives above. It seems like they would both serve to make the module in file src/m1/path/m2.rs available to the module m1 and to anything else that imported it (because of the pub prefix on both directives). Is that right? Would these be perfect aliases, or would there be some differences? Is either idiom preferable to the other?
mod foo; is akin to copying and pasting a module into the current scope. That is, if the current scope can find module foo at its own "top level" — basically, if there's a file foo.rs or a folder foo in the same directory — then mod foo; basically gets transformed into mod foo { /* contents of foo */ }. Note that the syntax for mod requires that the thing after mod be an identifier, not an arbitrary path (so mod path::m2; would be illegal). I can only assume that modules that could be brought into scope aren't automatically brought into scope in order to limit the amount of work the compiler has to do when resolving names.
Meanwhile, once a container of items — whether that be a module, type, trait, etc — has been made available in the current scope, shortcuts to its items can be created with use path::to::item. If containers of items were ordinary variables, this would be akin to something like let item = path.to.item, if that were legal.
I'll edit this answer later to give a fuller explanation. But, with help from the comments and other answer posted here, plus some experimentation, I think I've come to a better understanding of these directives. There may be circumstances where one could use either pub use or pub mod, though they'd be contrived and in any case where both could work one should prefer the former (see third point below). The key differences are:
pub use is followed by a path (a bare identifier m2 would be the same as self::m2). pub mod is only followed by an identifier.
pub mod m2; in main.rs or lib.rs would include the contents of file ./m2.rs (or ./m2/mod.rs). If the line pub mod m2; is instead in a file path/m1.rs (or path/m1/mod.rs) then the included file would instead be path/m1/m2.rs (or path/m1/m2/mod.rs).
(You could include a module from another location using mod m2 { include!("path/m2.rs") } but this isn't idiomatic. I've also seen some attribute tricks that affect the location of the loaded module. But generally things work as described in the previous paragraph.)
The use directive doesn't request/order the compilation of any additional files. A mod directive is needed to do that.
(In fact the additional files aren't compiled separately, but merged into the source file where the mod directive occurs. Only the files that are crate roots (plus whatever is merged into them) get compiled.)
If one file in your crate had a pub mod m2; line, then another file could conceivably have a choice of also using pub mod or pub use. Subject to the constraints imposed by point 1 above.
But if you were in such circumstances, you wouldn't want to use the mod directive, as that would merge the relevant code into your source tree a second time. Perhaps the compiler might eventually undo the duplication, but what would you gain by hoping for that?
There's also this statement from the "Separating Modules into Different Files" chapter in the book:
Note that you only need to load the contents of a file using a mod declaration once somewhere in your module tree. Once the compiler knows the file is part of the project (and knows where in the module tree the code resides because of where you’ve put the mod statement), other files in your project should refer to the code in that file using a path to where it was declared...
For example, have module within /xyz/ sub-directory. Inside the directory are two files, mod.rs and network.rs say.
Why do mod.rs and network.rs have the same function names, but different code within the functions? Is there any reason for this? I thought mod.rs was just basically a defintions file to declare a module, and specify which other .rs files within the sub-directory should be treated as their own creates / modules.
Any help?
It sounds like you are referring to a design decision made by a specific crate. You are correct in assuming there is no special consideration given by the compiler to function/type/ident names in separate files/modules.
That being said it seems likely that what you are referring to might be using conditional compilation. Conditional compilation lets the compiler decide if a given piece of code is compiled or not. You will usually see this used to handle which implementation of a function is used when compiling code on different operating systems since it is often it too inefficient or simply impossible to check at runtime. Some library authors might also decide to add an implementation that it can fallback to instead of throwing a hard error.
Here is a quick example of why xyz might want to have 3 different implementations of foobar.
// xyz/mod.rs
mod windows;
mod unix;
// If this crate is compiled on windows re-export the contents of windows.rs
#[cfg(windows)]
pub use windows::*;
// If this crate is compiled on unix/linux re-export the contents of unix.rs
#[cfg(unix)]
pub use unix::*;
// If not on either windows or unix provide a default implementation to use instead
#[cfg(not(any(windows, unix)))]
pub fn foobar() -> i32 {
panic!("This function is unsupported on the current os")
}
I have some generated .rs code (from grpc proto files) and they are checked in with my normal Rust code under src but in some sub modules. The issue is that when doing cargo test the doc test will run and some of the generated .rs have comments with indentations (code blocks) and cargo doc test will try to compile them and fail.
For example cargo test will try to compile (and perhaps run these lines) show here.
Is there a way to exclude or ignore those generated .rs for doc test (without manually changing them)?
My solution
I've resolved this by injecting code that separates the offending comment and the structure underneath:
fn main() {
tonic_build::configure()
.type_attribute(
".google.api.HttpRule",
"#[cfg(not(doctest))]\n\
#[allow(dead_code)]\n\
pub struct HttpRuleComment{}\n\
/// HACK: see docs in [`HttpRuleComment`] ignored in doctest pass",
)
.compile(
&["proto/api/google/api/http.proto"],
&["proto/"],
)
.unwrap();
}
Explanation:
tonic_build allows adding attributes to specific protobuf paths, but it does not discern between attributes or any other strings.
#[cfg(not(doctest))] excludes the comment's offending code from doc tests, because it excludes the structure it's added to, this trick is taken from discussion on Rust user forum.
#[allow(dead_code)] silences warnings about injected structure not being used (as it shouldn't be used).
pub struct HttpRuleComment{} creates a public structure that can be referenced in the docstring for the original struct.
/// ...[HttpRuleComment]... is a docstring referencing injected struct, so that the original documentation can still be accessed.
Notes:
I'm not sure if your case is the same, but I found myself on this SO page a lot while searching for an answer and decided to share the results.
In my case, the only offending comment was in HttpRule, you may need to write a macro or do lots of copy-paste if you happen to have multiple offending comments.
I would like to generate multiple binaries using a lot of the same common code. If I write everything in src/main.rs I can simply mark items at pub(crate) and access the code without exporting it. However if I put the binary in src/bin/foo.rs then I can not find a way to access this without marking everything pub. I would not like to mark everything pub not only because I don't want others to depend on it but also because it renders visibility checking ineffective.
The only workaround I have found is to put the file inside of the src directory then put a simple shim in bin/foo-bar.rs that just calls my_crate::bin_foo_bar::main(). This isn't very tidy and requires a bunch of overhead.
Inside your package you may define a single lib crate and multiple binary crates. If you declare a type inside your library crate as pub(crate) it will obviously unvisible from your binary crates. So revise the definitions. A package is not a crate, it is a package of crates. And pub(crate) types are only visible inside the crate they belong to.
If I have multiple .rs files in the src directory of a Cargo package, what are the rules for visibility, importing, etc.?
Currently, any extra (i.e. not the file that is explicitly identified as the source for the executable in Cargo.toml) files are ignored.
What do I need to do to fix this?
There is nothing special about Cargo at all in this way. It’s all the perfectly normal Rust module system. If Cargo will be compiling src/lib.rs, that’s more or less equivalent to having executed rustc --crate-type lib src/lib.rs (there are more command line arguments in practice, but that’s the basics of it).
Other files are then used with mod, use and so forth. Files are not automatically imported or anything like that. This part is not documented very clearly yet; a couple of things that show briefly how to achieve things are http://rustbyexample.com/mod/split.html and http://doc.rust-lang.org/reference.html#modules, but any non-trivial code base will use them and so you can pick just about any code base to look at for examples.
It's hard to say what you're getting tripped up on from the info you shared. Here are three seemingly trivial things that I still had to refer to the documentaton to figure out:
First of all,
mod foo;
looks like a declaration, but it without arguments it is actually something like an include. So you use the same keyword both for declaring and including modules, i.e. there is no using:: keyword.
Second, modules themselves can be public or private. If you didn't add a pub keyword both on the function in question AND on the containing module, that may be tripping you up.
pub mod foo {pub fn bar();}
Third, there seems to be an implicit module added at the top of every file. This is confusing; the reference manual talks about a strict separation between file paths and names, and the module paths in your code, but that abstraction seems to be leaky here.
Note, Rust is still pre-1.0 (0.12) at the time of writing, at the module system and file paths are relatively high level, so don't be surprised if what I said may already wrong by the time you read this.
Files are implicitly included from your rust code.
For instance, if a file src/foo.rs pointed by path in a [lib] or [[bin]] section of your Cargo.toml contains:
mod bar;
It tells cargo to build src/bar.rs too, and include it.