Calling a library function with rust-abi - rust

I have a couple of crates X and Y. The first ones manages low-level stuff and defines an entry point. Then it should transfer control to the Y, which do high-level logic. I would like to explicitly declare Y's function in X and call it (so the linker connects these while building Y-crate).
I know Rust has non-stable ABI, but I use both of these crates in one workspace with the same compiler.
I tried with the following example, but the linker cannot find real_main (undefined reference ...):
// X's main.rs
#[link(name = "y_lib_name", link = "static")]
extern "Rust" {
fn real_main();
}
fn main() {
// is it possible to avoid unsafe here, btw?
unsafe { real_main() }
}
// Y's lib.rs
pub fn real_main() { .. }
Though the symbol exists:
$ nm target/debug/deps/liby_lib_name-8ade2b95fe4044c6.rlib | rg real_main
nm: lib.rmeta: file format not recognized
0000000000000000 T _ZN8y_lib_name9real_main17h8664760a5d2676beE
I guess my issue can be resolved via macro in X like the following:
// in X
#[macro_export]
macro_rules define_ep! {
($fn:expr) => {{ fn main() { $fn(); } }}
}
// in Y
fn f() {}
X::define_ep!(f);
but I would prefer the linker-backed way because:
if you forgot linker just give you an error;
(not sure) linker gives an error if symbols appears more than once.

Related

How does Rust retrieve the input argc and argv values from a running program?

I know that the Rust application initialization entry is dynamically generated by rustc. And I inspected the code at compiler/rustc_codegen_ssa/src/base.rs which the part of it is shown as below.
fn create_entry_fn<'a, 'tcx, Bx: BuilderMethods<'a, 'tcx>>(
cx: &'a Bx::CodegenCx,
rust_main: Bx::Value,
rust_main_def_id: DefId,
use_start_lang_item: bool,
) -> Bx::Function {
// The entry function is either `int main(void)` or `int main(int argc, char **argv)`,
// depending on whether the target needs `argc` and `argv` to be passed in.
let llfty = if cx.sess().target.main_needs_argc_argv {
cx.type_func(&[cx.type_int(), cx.type_ptr_to(cx.type_i8p())], cx.type_int())
} else {
cx.type_func(&[], cx.type_int())
};
And what I found in the same file was really interesting as what I showed below, here from the comment, we can understand that Rust is collecting the input argc and argv at this place, and all these two parameters will be passed into the lang_start function later if I understand correctly.
/// Obtain the `argc` and `argv` values to pass to the rust start function.
fn get_argc_argv<'a, 'tcx, Bx: BuilderMethods<'a, 'tcx>>(
cx: &'a Bx::CodegenCx,
bx: &mut Bx,
) -> (Bx::Value, Bx::Value) {
if cx.sess().target.main_needs_argc_argv {
// Params from native `main()` used as args for rust start function
let param_argc = bx.get_param(0);
let param_argv = bx.get_param(1);
let arg_argc = bx.intcast(param_argc, cx.type_isize(), true);
let arg_argv = param_argv;
(arg_argc, arg_argv)
} else {
// The Rust start function doesn't need `argc` and `argv`, so just pass zeros.
let arg_argc = bx.const_int(cx.type_int(), 0);
let arg_argv = bx.const_null(cx.type_ptr_to(cx.type_i8p()));
(arg_argc, arg_argv)
}
}
But I also found another place where seems to do the same thing as what I've showed above at library/std/src/sys/unix/args.rs. For example, if you run a Rust app on Macos, seems Rust will use two FFI functions (_NSGetArgc / _NSGetArgv) to retrieve the argc and argv:
#[cfg(any(target_os = "macos", target_os = "ios"))]
mod imp {
use super::Args;
use crate::ffi::CStr;
pub unsafe fn init(_argc: isize, _argv: *const *const u8) {}
pub fn cleanup() {}
#[cfg(target_os = "macos")]
pub fn args() -> Args {
use crate::os::unix::prelude::*;
extern "C" {
// These functions are in crt_externs.h.
fn _NSGetArgc() -> *mut libc::c_int;
fn _NSGetArgv() -> *mut *mut *mut libc::c_char;
}
let vec = unsafe {
let (argc, argv) =
(*_NSGetArgc() as isize, *_NSGetArgv() as *const *const libc::c_char);
(0..argc as isize)
.map(|i| {
let bytes = CStr::from_ptr(*argv.offset(i)).to_bytes().to_vec();
OsStringExt::from_vec(bytes)
})
.collect::<Vec<_>>()
};
Args { iter: vec.into_iter() }
}
So, what's the difference between these two places? Which place actually does the real retrieval stuff?
To reply directly to the question "Which place actually does the real retrieval stuff?", well, it depends on:
The target OS: Linux, MacOS, Windows, WebAssembly
The target "environment" (e.g. libc): glibc, musl, wasi, even miri in Rust's case
They basically are either passed as arguments to the program entry-point or provided "globally" by using functions/syscalls:
In the first case (passed as arguments), the Rust compiler generate code for initializing two static values ARGC and ARGV (located at std/src/sys/unix/args.rs#L87), which are then used by std::env::args() for the developer to use.
Note that, depending on the libc used, this phase is done either at _start and/or by some ld+libc-specific routine (it gets messy when taking dynamic linking into account)
In the case of glibc it's done by the GNU non-standard "init_array" extension (which is notably used for "cdylib" crates/.so executables): std/src/sys/unix/args.rs#L108-L128
Also in case you directly specify the entry-point using the #[start] attribute you get direct access to the argc/argv values (compiler/rustc_codegen_ssa/src/base.rs#L447)
In the second case, no initialization code is needed and the args-getter functions are called by std::env::args() when needed, as you already noticed on MacOS
Such as MacOS (and Windows apparently) uses both methods, providing argc/argv both as arguments to _start and as getter functions callable from anywhere, which Rust uses.
Linux actually uses the first case only, although it wouldn't be surprising if the glibc provided some functions to get these values (by some wibbly wobbly magic methods), but the standard way is the first one.
For further reading, you can look at some links and articles about the "program loader" on Linux (sadly, there's not much on the subject in general, especially for other OSes):
LWN article "How programs get run: ELF binaries": https://lwn.net/Articles/631631/ (especially the "Populating the stack" part)
"The start attribute" section in one article of the "Rust OS dev" series: https://os.phil-opp.com/freestanding-rust-binary/#the-start-attribute
Reply to a (too broad, closed) Stack Overflow question about program loading and running: https://stackoverflow.com/a/32689330/1498917

Rust ffi include dynamic library in cross platform fashion

I would like to include a dynamic C library in Rust with FFI.
The library is actually also build with Rust, but exposes a C interface, so that it can be used from other languages, too. When I build the library (type: cdylib) with cargo I get a .dylib on MacOS and a .dll as well as a .dll.lib file on windows. These libraries also get different names, derived from the project name (libmy_lib.dylib on MacOS and my_lib.dll as well as my_lib.dll.lib on Windows).
I would like to reference these files in a cross-platform way. Because currently I have to use
#[link(name = "my_lib.dll", kind = "dylib")]
on windows, whereas on MacOS I need to use
#[link(name = "my_lib", kind = "dylib")]
I have already tried to rename the my_lib.dll.lib to my_lib.lib, but I still get a Linker Error, saying
LINK : fatal error LNK1181: cannot open input file 'my_lib.lib'
How can I reference the files, so that I can use my code for Mac and Windows? If thats only possible with cfg_attr tags I would also accept that. Ideally I would also like to get rid of the .lib file for windows if possible.
You could use crate libloading
Example:
let lib = unsafe {
#[cfg(unix)]
let path = "mylib.so";
#[cfg(windows)]
let path = "mylib.dll";
libloading::Library::new(path).expect("Failed to load library")
};
let func: libloading::Symbol<unsafe extern fn() -> u32> = unsafe {
lib.get(b"my_func").expect("Failed to load function `my_func`")
};
// Can call func later while `lib` in scope
For what it's worth, I found a temporary solution for this now.
I used this pattern:
#[cfg(windows)]
#[link(name = "my_lib.dll", kind = "dylib")]
extern {
// Reference the exported functions
}
#[cfg(unix)]
#[link(name = "my_lib", kind = "dylib")]
extern {
// Reference the exported functions
}
I don't like it that much, because I had to define the very same extern{} block twice, but it works and I could also extend this pattern to for example use #[cfg(target_os = "macos")] if needed...
EDIT: Thanks to #Angelicos Phosphoros I improved the code a bit by using a macro like so:
/// Import native functions with the Rust FFI
macro_rules! import_native_functions {
() => {
// Reference the exported functions
};
}
#[cfg(windows)]
#[link(name = "my_lib.dll", kind = "dylib")]
extern {
import_native_functions!();
}
#[cfg(unix)]
#[link(name = "my_lib", kind = "dylib")]
extern {
import_native_functions!();
}

Can I create my own conditional compilation attributes?

There are several ways of doing something in my crate, some result in fast execution, some in low binary size, some have other advantages, so I provide the user interfaces to all of them. Unused functions will be optimized away by the compiler. Internal functions in my crate have to use these interfaces as well, and I would like them to respect the user choice at compile time.
There are conditional compilation attributes like target_os, which store a value like linux or windows. How can I create such an attribute, for example prefer_method, so I and the user can use it somewhat like in the following code snippets?
My crate:
#[cfg(not(any(
not(prefer_method),
prefer_method = "fast",
prefer_method = "small"
)))]
compile_error("invalid `prefer_method` value");
pub fn bla() {
#[cfg(prefer_method = "fast")]
foo_fast();
#[cfg(prefer_method = "small")]
foo_small();
#[cfg(not(prefer_method))]
foo_default();
}
pub fn foo_fast() {
// Fast execution.
}
pub fn foo_small() {
// Small binary file.
}
pub fn foo_default() {
// Medium size, medium fast.
}
The user crate:
#[prefer_method = "small"]
extern crate my_crate;
fn f() {
// Uses the `foo_small` function, the other `foo_*` functions will not end up in the binary.
my_crate::bla();
// But the user can also call any function, which of course will also end up in the binary.
my_crate::foo_default();
}
I know there are --cfg attributes, but AFAIK these only represent boolean flags, not enumeration values, which allow setting multiple flags when only one enumeration value is valid.
Firstly, the --cfg flag supports key-value pairs using the syntax --cfg 'prefer_method="fast"'. This will allow you to write code like:
#[cfg(prefer_method = "fast")]
fn foo_fast() { }
You can also set these cfg options from a build script. For example:
// build.rs
fn main() {
println!("cargo:rustc-cfg=prefer_method=\"method_a\"");
}
// src/main.rs
#[cfg(prefer_method = "method_a")]
fn main() {
println!("It's A");
}
#[cfg(prefer_method = "method_b")]
fn main() {
println!("It's B");
}
#[cfg(not(any(prefer_method = "method_a", prefer_method = "method_b")))]
fn main() {
println!("No preferred method");
}
The above code will result in an executable that prints "It's A".
There's no syntax like the one you suggest to specify cfg settings. The best thing to expose these options to your crates' users is through Cargo features.
For example:
# Library Cargo.toml
# ...
[features]
method_a = []
method_b = []
// build.rs
fn main() {
// prefer method A if both method A and B are selected
if cfg!(feature = "method_a") {
println!("cargo:rustc-cfg=prefer_method=\"method_a\"");
} else if cfg!(feature = "method_b") {
println!("cargo:rustc-cfg=prefer_method=\"method_b\"");
}
}
# User Cargo.toml
# ...
[dependencies.my_crate]
version = "..."
features = ["method_a"]
However, in this case, I'd recommend just using the Cargo features directly in your code (i.e. #[cfg(feature = "fast")]) rather than adding the build script since there's a one-to-one correspondence between the cargo feature and the rustc-cfg being added.

How to call C function in Rust

I'm a C programmer and I'm trying to call Rust function in my application and the rust function also need call C functions which exist at my application.
I know that if I want call C function in Rust I have to do like this
#[link(name = "mylib")]
extern "C" {
pub fn c_function();
}
But the c_function doesn't exist in any lib but only in my application env now.
For example:
My C code is
void c_function()
{
return 1;
}
void main()
{
rust_function();
}
My Rust code is(cargo new --lib myrustlib)
pub unsafe extern "C" fn rust_function() {
//If I want to call c_function which is in C world here, How could I do this?
//I have tried using extern "C" {pub fn c_function();} but faild.
//And an error is outputted like this "undefined reference to `c_function'"
}
You're on the right track. You can auto-generate C header from the Rust program with cbindgen, and the other way, Rust bindings with bindgen.
Add crate-type = ["lib", "staticlib", "cdylib"] to Cargo.toml to generate .a and .so/.dylib/.dll versions of the Rust library that you can link with the C program.

How do I get the return address of a function?

I am writing a Rust library containing an implementation of the callbacks for LLVM SanitizerCoverage. These callbacks can be used to trace the execution of an instrumented program.
A common way to produce a trace is to print the address of each executed basic block. However, in order to do that, it is necessary to retrieve the address of the call instruction that invoked the callback. The C++ examples provided by LLVM rely on the compiler intrinsic __builtin_return_address(0) in order to obtain this information.
extern "C" void __sanitizer_cov_trace_pc_guard(uint32_t *guard) {
if (!*guard) return;
void *PC = __builtin_return_address(0);
printf("guard: %p %x PC %p\n", guard, *guard, PC);
}
I am trying to reproduce the same function in Rust but, apparently, there is no equivalent to __builtin_return_address. The only reference I found is from an old version of Rust, but the function described is not available anymore. The function is the following:
pub unsafe extern "rust-intrinsic" fn return_address() -> *const u8
My current hacky solution involves having a C file in my crate that contains the following function:
void* get_return_address() {
return __builtin_return_address(1);
}
If I call it from a Rust function, I am able to obtain the return address of the Rust function itself. This solution, however, requires the compilation of my Rust code with -C force-frame-pointers=yes for it to work, since the C compiler intrinsic relies on the presence of frame pointers.
Concluding, is there a more straightforward way of getting the return address of the current function in Rust?
edit: The removal of the return_address intrinsic is discussed in this GitHub issue.
edit 2: Further testing showed that the backtrace crate is able to correctly extract the return address of the current function, thus avoiding the hack I described before. Credit goes to this tweet.
The problem with this solution is the overhead that is generated creating a full backtrace when only the return address of the current function is needed. In addition, the crate is using C libraries to extract the backtrace; this looks like something that should be done in pure Rust.
edit 3: The compiler intrinsic __builtin_return_address(0) generates a call to the LLVM intrinsic llvm.returnaddress. The corresponding documentation can be found here.
I could not find any official documentation about this, but found out by asking in the rust-lang repository: You can link against LLVM intrinsics, like llvm.returnaddress, with only a few lines of code:
extern {
#[link_name = "llvm.returnaddress"]
fn return_address() -> *const u8;
}
fn foo() {
println!("I was called by {:X}", return_address());
}
The LLVM intrinsic llvm.addressofreturnaddress might also be interesting.
As of 2022 Maurice's answer doesn't work as-is and requires an additional argument.
#![feature(link_llvm_intrinsics)]
extern {
#[link_name = "llvm.returnaddress"]
fn return_address(a: i32) -> *const u8;
}
macro_rules! caller_address {
() => {
unsafe { return_address(0) }
};
}
fn print_caller() {
println!("caller: {:p}", caller_address!());
}
fn main() {
println!("main address: {:p}", main as *const ());
print_caller();
}
Output:
main address: 0x56261a13bb50
caller: 0x56261a13bbaf
Playground link;

Resources