For each of the following thread-local storage implementations, how can an external thread-local variable be accessed in Rust programs using the standard ffi mechanisms as exposed by the compiler or standard library?
C11
gcc's tls extension
pthreads
Windows TLS API
Rust has a nightly feature, which allows linking to external thread local variables. The stabilization of the feature is tracked here.
C11 / GCC TLS extension
C11 defines the _Thread_local keyword to define thread-storage duration for an object. There also exists a thread_local macro alias.
GCC also implements a Thread Local extension which uses __thread as a keyword.
Linking to both an external C11 _Thread_local and a gcc __thread variable is possible using nightly (tested with rustc 1.17.0-nightly (0e7727795 2017-02-19) and gcc 5.4)
#![feature(thread_local)]
extern crate libc;
use libc::c_int;
#[link(name="test", kind="static")]
extern {
#[thread_local]
static mut test_global: c_int;
}
fn main() {
let mut threads = vec![];
for _ in 0..5 {
let thread = std::thread::spawn(|| {
unsafe {
test_global += 1;
println!("{}", test_global);
test_global += 1;
}
});
threads.push(thread);
}
for thread in threads {
thread.join().unwrap();
}
}
This allows get access to a variable declared as either of the following:
_Thread_local extern int test_global;
extern __local int test_global;
The output of the above Rust code will be:
1
1
1
1
1
Which is expected, when the variable is defined as thread local.
Related
I know that the Rust application initialization entry is dynamically generated by rustc. And I inspected the code at compiler/rustc_codegen_ssa/src/base.rs which the part of it is shown as below.
fn create_entry_fn<'a, 'tcx, Bx: BuilderMethods<'a, 'tcx>>(
cx: &'a Bx::CodegenCx,
rust_main: Bx::Value,
rust_main_def_id: DefId,
use_start_lang_item: bool,
) -> Bx::Function {
// The entry function is either `int main(void)` or `int main(int argc, char **argv)`,
// depending on whether the target needs `argc` and `argv` to be passed in.
let llfty = if cx.sess().target.main_needs_argc_argv {
cx.type_func(&[cx.type_int(), cx.type_ptr_to(cx.type_i8p())], cx.type_int())
} else {
cx.type_func(&[], cx.type_int())
};
And what I found in the same file was really interesting as what I showed below, here from the comment, we can understand that Rust is collecting the input argc and argv at this place, and all these two parameters will be passed into the lang_start function later if I understand correctly.
/// Obtain the `argc` and `argv` values to pass to the rust start function.
fn get_argc_argv<'a, 'tcx, Bx: BuilderMethods<'a, 'tcx>>(
cx: &'a Bx::CodegenCx,
bx: &mut Bx,
) -> (Bx::Value, Bx::Value) {
if cx.sess().target.main_needs_argc_argv {
// Params from native `main()` used as args for rust start function
let param_argc = bx.get_param(0);
let param_argv = bx.get_param(1);
let arg_argc = bx.intcast(param_argc, cx.type_isize(), true);
let arg_argv = param_argv;
(arg_argc, arg_argv)
} else {
// The Rust start function doesn't need `argc` and `argv`, so just pass zeros.
let arg_argc = bx.const_int(cx.type_int(), 0);
let arg_argv = bx.const_null(cx.type_ptr_to(cx.type_i8p()));
(arg_argc, arg_argv)
}
}
But I also found another place where seems to do the same thing as what I've showed above at library/std/src/sys/unix/args.rs. For example, if you run a Rust app on Macos, seems Rust will use two FFI functions (_NSGetArgc / _NSGetArgv) to retrieve the argc and argv:
#[cfg(any(target_os = "macos", target_os = "ios"))]
mod imp {
use super::Args;
use crate::ffi::CStr;
pub unsafe fn init(_argc: isize, _argv: *const *const u8) {}
pub fn cleanup() {}
#[cfg(target_os = "macos")]
pub fn args() -> Args {
use crate::os::unix::prelude::*;
extern "C" {
// These functions are in crt_externs.h.
fn _NSGetArgc() -> *mut libc::c_int;
fn _NSGetArgv() -> *mut *mut *mut libc::c_char;
}
let vec = unsafe {
let (argc, argv) =
(*_NSGetArgc() as isize, *_NSGetArgv() as *const *const libc::c_char);
(0..argc as isize)
.map(|i| {
let bytes = CStr::from_ptr(*argv.offset(i)).to_bytes().to_vec();
OsStringExt::from_vec(bytes)
})
.collect::<Vec<_>>()
};
Args { iter: vec.into_iter() }
}
So, what's the difference between these two places? Which place actually does the real retrieval stuff?
To reply directly to the question "Which place actually does the real retrieval stuff?", well, it depends on:
The target OS: Linux, MacOS, Windows, WebAssembly
The target "environment" (e.g. libc): glibc, musl, wasi, even miri in Rust's case
They basically are either passed as arguments to the program entry-point or provided "globally" by using functions/syscalls:
In the first case (passed as arguments), the Rust compiler generate code for initializing two static values ARGC and ARGV (located at std/src/sys/unix/args.rs#L87), which are then used by std::env::args() for the developer to use.
Note that, depending on the libc used, this phase is done either at _start and/or by some ld+libc-specific routine (it gets messy when taking dynamic linking into account)
In the case of glibc it's done by the GNU non-standard "init_array" extension (which is notably used for "cdylib" crates/.so executables): std/src/sys/unix/args.rs#L108-L128
Also in case you directly specify the entry-point using the #[start] attribute you get direct access to the argc/argv values (compiler/rustc_codegen_ssa/src/base.rs#L447)
In the second case, no initialization code is needed and the args-getter functions are called by std::env::args() when needed, as you already noticed on MacOS
Such as MacOS (and Windows apparently) uses both methods, providing argc/argv both as arguments to _start and as getter functions callable from anywhere, which Rust uses.
Linux actually uses the first case only, although it wouldn't be surprising if the glibc provided some functions to get these values (by some wibbly wobbly magic methods), but the standard way is the first one.
For further reading, you can look at some links and articles about the "program loader" on Linux (sadly, there's not much on the subject in general, especially for other OSes):
LWN article "How programs get run: ELF binaries": https://lwn.net/Articles/631631/ (especially the "Populating the stack" part)
"The start attribute" section in one article of the "Rust OS dev" series: https://os.phil-opp.com/freestanding-rust-binary/#the-start-attribute
Reply to a (too broad, closed) Stack Overflow question about program loading and running: https://stackoverflow.com/a/32689330/1498917
I would like to include a dynamic C library in Rust with FFI.
The library is actually also build with Rust, but exposes a C interface, so that it can be used from other languages, too. When I build the library (type: cdylib) with cargo I get a .dylib on MacOS and a .dll as well as a .dll.lib file on windows. These libraries also get different names, derived from the project name (libmy_lib.dylib on MacOS and my_lib.dll as well as my_lib.dll.lib on Windows).
I would like to reference these files in a cross-platform way. Because currently I have to use
#[link(name = "my_lib.dll", kind = "dylib")]
on windows, whereas on MacOS I need to use
#[link(name = "my_lib", kind = "dylib")]
I have already tried to rename the my_lib.dll.lib to my_lib.lib, but I still get a Linker Error, saying
LINK : fatal error LNK1181: cannot open input file 'my_lib.lib'
How can I reference the files, so that I can use my code for Mac and Windows? If thats only possible with cfg_attr tags I would also accept that. Ideally I would also like to get rid of the .lib file for windows if possible.
You could use crate libloading
Example:
let lib = unsafe {
#[cfg(unix)]
let path = "mylib.so";
#[cfg(windows)]
let path = "mylib.dll";
libloading::Library::new(path).expect("Failed to load library")
};
let func: libloading::Symbol<unsafe extern fn() -> u32> = unsafe {
lib.get(b"my_func").expect("Failed to load function `my_func`")
};
// Can call func later while `lib` in scope
For what it's worth, I found a temporary solution for this now.
I used this pattern:
#[cfg(windows)]
#[link(name = "my_lib.dll", kind = "dylib")]
extern {
// Reference the exported functions
}
#[cfg(unix)]
#[link(name = "my_lib", kind = "dylib")]
extern {
// Reference the exported functions
}
I don't like it that much, because I had to define the very same extern{} block twice, but it works and I could also extend this pattern to for example use #[cfg(target_os = "macos")] if needed...
EDIT: Thanks to #Angelicos Phosphoros I improved the code a bit by using a macro like so:
/// Import native functions with the Rust FFI
macro_rules! import_native_functions {
() => {
// Reference the exported functions
};
}
#[cfg(windows)]
#[link(name = "my_lib.dll", kind = "dylib")]
extern {
import_native_functions!();
}
#[cfg(unix)]
#[link(name = "my_lib", kind = "dylib")]
extern {
import_native_functions!();
}
I'm a C programmer and I'm trying to call Rust function in my application and the rust function also need call C functions which exist at my application.
I know that if I want call C function in Rust I have to do like this
#[link(name = "mylib")]
extern "C" {
pub fn c_function();
}
But the c_function doesn't exist in any lib but only in my application env now.
For example:
My C code is
void c_function()
{
return 1;
}
void main()
{
rust_function();
}
My Rust code is(cargo new --lib myrustlib)
pub unsafe extern "C" fn rust_function() {
//If I want to call c_function which is in C world here, How could I do this?
//I have tried using extern "C" {pub fn c_function();} but faild.
//And an error is outputted like this "undefined reference to `c_function'"
}
You're on the right track. You can auto-generate C header from the Rust program with cbindgen, and the other way, Rust bindings with bindgen.
Add crate-type = ["lib", "staticlib", "cdylib"] to Cargo.toml to generate .a and .so/.dylib/.dll versions of the Rust library that you can link with the C program.
I am writing a Rust library containing an implementation of the callbacks for LLVM SanitizerCoverage. These callbacks can be used to trace the execution of an instrumented program.
A common way to produce a trace is to print the address of each executed basic block. However, in order to do that, it is necessary to retrieve the address of the call instruction that invoked the callback. The C++ examples provided by LLVM rely on the compiler intrinsic __builtin_return_address(0) in order to obtain this information.
extern "C" void __sanitizer_cov_trace_pc_guard(uint32_t *guard) {
if (!*guard) return;
void *PC = __builtin_return_address(0);
printf("guard: %p %x PC %p\n", guard, *guard, PC);
}
I am trying to reproduce the same function in Rust but, apparently, there is no equivalent to __builtin_return_address. The only reference I found is from an old version of Rust, but the function described is not available anymore. The function is the following:
pub unsafe extern "rust-intrinsic" fn return_address() -> *const u8
My current hacky solution involves having a C file in my crate that contains the following function:
void* get_return_address() {
return __builtin_return_address(1);
}
If I call it from a Rust function, I am able to obtain the return address of the Rust function itself. This solution, however, requires the compilation of my Rust code with -C force-frame-pointers=yes for it to work, since the C compiler intrinsic relies on the presence of frame pointers.
Concluding, is there a more straightforward way of getting the return address of the current function in Rust?
edit: The removal of the return_address intrinsic is discussed in this GitHub issue.
edit 2: Further testing showed that the backtrace crate is able to correctly extract the return address of the current function, thus avoiding the hack I described before. Credit goes to this tweet.
The problem with this solution is the overhead that is generated creating a full backtrace when only the return address of the current function is needed. In addition, the crate is using C libraries to extract the backtrace; this looks like something that should be done in pure Rust.
edit 3: The compiler intrinsic __builtin_return_address(0) generates a call to the LLVM intrinsic llvm.returnaddress. The corresponding documentation can be found here.
I could not find any official documentation about this, but found out by asking in the rust-lang repository: You can link against LLVM intrinsics, like llvm.returnaddress, with only a few lines of code:
extern {
#[link_name = "llvm.returnaddress"]
fn return_address() -> *const u8;
}
fn foo() {
println!("I was called by {:X}", return_address());
}
The LLVM intrinsic llvm.addressofreturnaddress might also be interesting.
As of 2022 Maurice's answer doesn't work as-is and requires an additional argument.
#![feature(link_llvm_intrinsics)]
extern {
#[link_name = "llvm.returnaddress"]
fn return_address(a: i32) -> *const u8;
}
macro_rules! caller_address {
() => {
unsafe { return_address(0) }
};
}
fn print_caller() {
println!("caller: {:p}", caller_address!());
}
fn main() {
println!("main address: {:p}", main as *const ());
print_caller();
}
Output:
main address: 0x56261a13bb50
caller: 0x56261a13bbaf
Playground link;
Are there any general rules, design documentation or something similar that explains how the Rust standard library deals with threads that were not spawned by std::thread?
I have a cdylib crate and want to use it from another language in a threaded manner:
use std::mem;
use std::sync::{Arc, Mutex};
use std::thread;
type jlong = usize;
type SharedData = Arc<Mutex<u32>>;
struct Foo {
data: SharedData,
}
#[no_mangle]
pub fn Java_com_example_Foo_init(shared_data: &SharedData) -> jlong {
let this = Box::into_raw(Box::new(Foo { data: shared_data.clone() }));
this as jlong
}
#[cfg(target_pointer_width = "32")]
unsafe fn jlong_to_pointer<T>(val: jlong) -> *mut T {
mem::transmute::<u32, *mut T>(val as u32)
}
#[cfg(target_pointer_width = "64")]
unsafe fn jlong_to_pointer<T>(val: jlong) -> *mut T {
mem::transmute::<jlong, *mut T>(val)
}
#[no_mangle]
pub fn Java_com_example_Foo_f(this: jlong) {
let mut this = unsafe { jlong_to_pointer::<Foo>(this).as_mut().unwrap() };
let data = this.data.clone();
let mut data = data.lock().unwrap();
*data = *data + 5;
}
specifically in
let shared_data = Arc::new(Mutex::new(5));
let foo = Java_com_example_Foo_init(&shared_data);
is it safe to modify shared_data from a thread spawned by thread::spawn if Java_com_example_Foo_f will be called from an unknown JVM thread?
Possible reason why it can be bad.
Yes. The issue you linked relates to librustrt, which was removed before Rust 1.0. RFC 230, which removed librustrt, specifically notes:
When embedding Rust code into other contexts -- whether calling from C code or embedding in high-level languages -- there is a fair amount of setup needed to provide the "runtime" infrastructure that libstd relies on. If libstd was instead bound to the native threading and I/O system, the embedding setup would be much simpler.
Additionally, see PR #19654 which implemented that RFC:
When using Rust in an embedded context, it should now be possible to call a Rust function directly as a C function with absolutely no setup, though in that case panics will cause the process to abort. In this regard, the C/Rust interface will look much like the C/C++ interface.
For current documentation, the Rustonomicon chapter on FFI's examples of Rust code to be called from C make use of libstd (including Mutex, I believe, though that's an implementation detail of println!) without any caveats relating to runtime setup.