What does DlsymWeak::initialize do in rust? - rust

I have a program that calls libc::memchr many times. When profiling my code I see that
the function using the most time is DlsymWeak::initialize. initialize seem to be called by memchr, which is called by my wrapper:
pub fn memchr_libc_ptr(ptr: *const std::os::raw::c_void, len: usize, needle: u8) -> Option<usize> {
let res = unsafe { // Profiler calls out this function call as slow
libc::memchr(
ptr,
needle as i32,
len)
};
if res == 0 as *mut std::os::raw::c_void {
return None;
}
let res = res as *const u8;
let ptr = ptr as *const u8;
Some((unsafe { res.offset_from(ptr) }) as usize)
}
Further more the initialize source code says that this method should be cold "should only happen during first-time initialization" but clearly it's being called much more than that.
What is DlsymWeak::initialize?
Can I avoid all these calls to it?
This is running on MacOS 12.3.1 rustup 1.25.1 (bb60b1e89 2022-07-12), x86-64 with the following profile
[profile.release]
lto = true
codegen-units = 1
debug = true
panic = "abort"
overflow-checks = false
incremental = false

Related

What is the equivalent working Rust code example for this WAT (WebAssembly Text) example?

The below WAT, adapted from a couple of Wasmtime examples, runs absolutely fine embedded in my application but what I thought was the equivalent Rust code fails with:
Running `target\debug\hello_world.exe`
Error: expected 5 imports, found 1
error: process didn't exit successfully: `target\debug\hello_world.exe` (exit code: 1)
Here's the working WAT:
(module
(import "host" "greet" (func $greet (param i32 i32)))
(func (export "run")
i32.const 4 ;; ptr
i32.const 22 ;; len
call $greet)
(memory (export "memory") 1)
(data (i32.const 4) "Calling back from WAT!")
)
And the failing Rust:
use std::ffi::CString;
#[link(wasm_import_module = "host")]
extern "C" {
fn greet(ptr: i32, len: i32);
}
static GREETING: &str = "Calling back from Rust!";
#[no_mangle]
pub extern "C" fn run() {
let greeting = CString::new(GREETING).expect("contains null byte");
let ptr = greeting.as_ptr();
std::mem::forget(ptr);
unsafe {
greet(ptr as i32, GREETING.len() as i32);
}
}
My minimal example app that embeds the WASM modules:
use std::str;
use anyhow::Result;
use wasmtime::{Caller, Engine, Extern, Func, Instance, Module, Store};
struct MyState {
name: String,
count: usize,
}
fn main() -> Result<()> {
let engine = Engine::default();
let code = include_bytes!("../hello.wat");
let module = Module::new(&engine, code)?;
let mut store = Store::new(
&engine,
MyState {
name: "Hello, junglie85!".to_string(),
count: 0,
},
);
let greet_func = Func::wrap(
&mut store,
|mut caller: Caller<'_, MyState>, ptr: i32, len: i32| {
let mem = match caller.get_export("memory") {
Some(Extern::Memory(mem)) => mem,
_ => anyhow::bail!("failed to find host memory"),
};
let data = mem
.data(&caller)
.get(ptr as u32 as usize..)
.and_then(|arr| arr.get(..len as u32 as usize));
let string = match data {
Some(data) => match str::from_utf8(data) {
Ok(s) => s,
Err(_) => anyhow::bail!("invalid utf-8"),
},
None => anyhow::bail!("pointer/length out of bounds"),
};
println!("> {} {}", caller.data().name, string);
caller.data_mut().count += 1;
Ok(())
},
);
let imports = [greet_func.into()];
let instance = Instance::new(&mut store, &module, &imports)?;
let run = instance.get_typed_func::<(), ()>(&mut store, "run")?;
println!("# Global count = {}", store.data().count);
run.call(&mut store, ())?;
println!("# Global count = {}", store.data().count);
Ok(())
}
What is the Rust equivalent of the WAT example?

Get C FILE pointer from bytes::Bytes in Rust

I would like to read a GRIB file downloaded from server using ecCodes library in Rust. However, my current solution results in segmentation fault. The extracted example, replicating the problem, is below.
I download the file using reqwest crate and get the response as Bytes1 using bytes(). To read the file with ecCodes I need to create a codes_handle using codes_grib_handle_new_from_file()2, which as argument requires *FILE usually get from fopen(). However, I would like to skip IO operations. So I figured I could use libc::fmemopen() to get *FILE from Bytes. But when I pass the *mut FILE from fmemopen() to codes_grib_handle_new_from_file() segmentation fault occurs.
I suspect the issue is when I get from Bytes a *mut c_void required by fmemopen(). I figured I can do this like that:
//get a *mut c_void pointer fom Bytes
//file has &Bytes type
let mut buf = BytesMut::from(file.as_ref());
let ptr = buf.as_mut_ptr();
let ptr = ptr as *mut c_void;
Because *mut is required, I create BytesMut from which I can then get mut pointer. I think those conversion are problematic, because in debugger info ptr contains a diffrent memory adress than ptr field of file.
Using *FILE got from libc::fopen() for the same file does not result in segfault. So the problem is somwhere around fmemopen().
The ecCodes library is correctly built (passes all tests and works in C) and linked (the calls in callstack are correct).
The full extracted example:
#![allow(unused)]
#![allow(non_camel_case_types)]
use bytes::{Bytes, BytesMut};
use libc::{c_char, c_void, fmemopen, size_t, FILE};
use reqwest;
use tokio;
// generated by bindgen
#[repr(C)]
#[derive(Debug, Copy, Clone)]
pub struct codes_handle {
_unused: [u8; 0],
}
// generated by bindgen
#[repr(C)]
#[derive(Debug, Copy, Clone)]
pub struct codes_context {
_unused: [u8; 0],
}
#[tokio::main]
async fn main() {
// download the grib file from server
// then get response as bytes
let url = "https://nomads.ncep.noaa.gov/pub/data/nccf/com/gfs/prod/gfs.20210612/00/atmos/gfs.t00z.pgrb2.1p00.f000";
let file = reqwest::get(url).await.unwrap().bytes().await.unwrap();
// get Bytes from *FILE with fmemopen
// file must outlive the pointer so it is borrowed here
let file_handle = open_with_fmemopen(&file);
let grib_handle = open_with_codes(file_handle);
}
pub fn open_with_fmemopen(file: &Bytes) -> *mut FILE {
// size of buffer and mode to be read with
let size = file.len() as size_t;
let mode = "r".as_ptr() as *const c_char;
// get a *mut c_void pointer fom Bytes
let mut buf = BytesMut::from(file.as_ref());
let ptr = buf.as_mut_ptr();
let ptr = ptr as *mut c_void;
// get *FILE with fmemopen
let obj;
unsafe {
obj = fmemopen(ptr, size, mode);
}
obj
}
pub fn open_with_codes(file_handle: *mut FILE) -> *mut codes_handle {
// default context for ecCodes
let context: *mut codes_context = std::ptr::null_mut();
// variable to hold error code
let mut error: i32 = 0;
// get codes_handle from *FILE
let grib_handle;
unsafe {
// segmentation fault occurs here
grib_handle = codes_grib_handle_new_from_file(context, file_handle, &mut error as *mut i32);
}
grib_handle
}
// binding to ecCodes C library
#[link(name = "eccodes")]
extern "C" {
pub fn codes_grib_handle_new_from_file(
c: *mut codes_context,
f: *mut FILE,
error: *mut i32,
) -> *mut codes_handle;
}
And because the example might require considerable effort to set up I also attach the call stack from GDB of the seg fault:
__memmove_avx_unaligned_erms 0x00007f738b415fa6
fmemopen_read 0x00007f738b31c9b4
_IO_new_file_underflow 0x00007f738b31fd51
__GI___underflow 0x00007f738b32142e
__GI___underflow 0x00007f738b32142e
__GI__IO_default_xsgetn 0x00007f738b32142e
__GI__IO_fread 0x00007f738b312493
stdio_read 0x00007f738bb8db37
_read_any 0x00007f738bb8cf1b
read_any 0x00007f738bb8cfa3
_wmo_read_any_from_file_malloc 0x00007f738bb8e6f7
wmo_read_grib_from_file_malloc 0x00007f738bb8e7d7
grib_handle_new_from_file_no_multi 0x00007f738bb872a2
grib_new_from_file 0x00007f738bb8678f
grib_handle_new_from_file 0x00007f738bb85998
codes_grib_handle_new_from_file 0x00007f738bb8532b
example::open_with_codes main.rs:68
example::main::{{closure}} main.rs:34
core::future::from_generator::{{impl}}::poll<generator-0> mod.rs:80
tokio::park::thread::{{impl}}::block_on::{{closure}}<core::future::from_generator::GenFuture<generator-0>> thread.rs:263
tokio::coop::with_budget::{{closure}}<core::task::poll::Poll<()>,closure-0> coop.rs:106
std::thread::local::LocalKey<core::cell::Cell<tokio::coop::Budget>>::try_with<core::cell::Cell<tokio::coop::Budget>,closure-0,core::task::poll::Poll<()>> local.rs:272
std::thread::local::LocalKey<core::cell::Cell<tokio::coop::Budget>>::with<core::cell::Cell<tokio::coop::Budget>,closure-0,core::task::poll::Poll<()>> local.rs:248
tokio::coop::with_budget<core::task::poll::Poll<()>,closure-0> coop.rs:99
tokio::coop::budget<core::task::poll::Poll<()>,closure-0> coop.rs:76
tokio::park::thread::CachedParkThread::block_on<core::future::from_generator::GenFuture<generator-0>> thread.rs:263
tokio::runtime::enter::Enter::block_on<core::future::from_generator::GenFuture<generator-0>> enter.rs:151
tokio::runtime::thread_pool::ThreadPool::block_on<core::future::from_generator::GenFuture<generator-0>> mod.rs:71
tokio::runtime::Runtime::block_on<core::future::from_generator::GenFuture<generator-0>> mod.rs:452
example::main main.rs:34
core::ops::function::FnOnce::call_once<fn(),()> function.rs:227
std::sys_common::backtrace::__rust_begin_short_backtrace<fn(),()> backtrace.rs:125
std::rt::lang_start::{{closure}}<()> rt.rs:66
core::ops::function::impls::{{impl}}::call_once<(),Fn<()>> function.rs:259
std::panicking::try::do_call<&Fn<()>,i32> panicking.rs:379
std::panicking::try<i32,&Fn<()>> panicking.rs:343
std::panic::catch_unwind<&Fn<()>,i32> panic.rs:431
std::rt::lang_start_internal rt.rs:51
std::rt::lang_start<()> rt.rs:65
main 0x0000560f1d93c76c
__libc_start_main 0x00007f738b2bb565
_start 0x0000560f1d935f0e
1 From bytes crate, not std::io
2 grib_handle returned by the function is just an alias of codes_handle
1- Try changing
let mode = "r".as_ptr() as *const c_char;
to
let mode = "r\0".as_ptr() as *const c_char;
Rust's &str is not null-terminated, while you're passing it to C where string literals are expected to be null-terminated.
2- Try the following implementation for open_with_fmemopen:
pub fn open_with_fmemopen(file: &Bytes) -> *mut FILE {
unsafe {
let obj = fmemopen(file.as_ref() as *const _ as _, file.len(), "r\0".as_ptr() as _);
obj
}
}

If an ffi function modifies a pointer, should the owning struct be referenced mutable?

I am currently experimenting with the FFI functionality of Rust and implemented a simble HTTP request using libcurl as an exercise. Consider the following self-contained example:
use std::ffi::c_void;
#[repr(C)]
struct CURL {
_private: [u8; 0],
}
// Global CURL codes
const CURL_GLOBAL_DEFAULT: i64 = 3;
const CURLOPT_WRITEDATA: i64 = 10001;
const CURLOPT_URL: i64 = 10002;
const CURLOPT_WRITEFUNCTION: i64 = 20011;
// Curl types
type CURLcode = i64;
type CURLoption = i64;
// Curl function bindings
#[link(name = "curl")]
extern "C" {
fn curl_easy_init() -> *mut CURL;
fn curl_easy_setopt(handle: *mut CURL, option: CURLoption, value: *mut c_void) -> CURLcode;
fn curl_easy_perform(handle: *mut CURL) -> CURLcode;
fn curl_global_init(flags: i64) -> CURLcode;
}
// Curl callback for data retrieving
extern "C" fn callback_writefunction(
data: *mut u8,
size: usize,
nmemb: usize,
user_data: *mut c_void,
) -> usize {
let slice = unsafe { std::slice::from_raw_parts(data, size * nmemb) };
let mut vec = unsafe { Box::from_raw(user_data as *mut Vec<u8>) };
vec.extend_from_slice(slice);
Box::into_raw(vec);
nmemb * size
}
type Result<T> = std::result::Result<T, CURLcode>;
// Our own curl handle
pub struct Curl {
handle: *mut CURL,
data_ptr: *mut Vec<u8>,
}
impl Curl {
pub fn new() -> std::result::Result<Curl, CURLcode> {
let ret = unsafe { curl_global_init(CURL_GLOBAL_DEFAULT) };
if ret != 0 {
return Err(ret);
}
let handle = unsafe { curl_easy_init() };
if handle.is_null() {
return Err(2); // CURLE_FAILED_INIT according to libcurl-errors(3)
}
// Set data callback
let ret = unsafe {
curl_easy_setopt(
handle,
CURLOPT_WRITEFUNCTION,
callback_writefunction as *mut c_void,
)
};
if ret != 0 {
return Err(2);
}
// Set data pointer
let data_buf = Box::new(Vec::new());
let data_ptr = Box::into_raw(data_buf);
let ret = unsafe {
curl_easy_setopt(handle, CURLOPT_WRITEDATA, data_ptr as *mut std::ffi::c_void)
};
match ret {
0 => Ok(Curl { handle, data_ptr }),
_ => Err(2),
}
}
pub fn set_url(&self, url: &str) -> Result<()> {
let url_cstr = std::ffi::CString::new(url.as_bytes()).unwrap();
let ret = unsafe {
curl_easy_setopt(
self.handle,
CURLOPT_URL,
url_cstr.as_ptr() as *mut std::ffi::c_void,
)
};
match ret {
0 => Ok(()),
x => Err(x),
}
}
pub fn perform(&self) -> Result<String> {
let ret = unsafe { curl_easy_perform(self.handle) };
if ret == 0 {
let b = unsafe { Box::from_raw(self.data_ptr) };
let data = (*b).clone();
Box::into_raw(b);
Ok(String::from_utf8(data).unwrap())
} else {
Err(ret)
}
}
}
fn main() -> Result<()> {
let my_curl = Curl::new().unwrap();
my_curl.set_url("https://www.example.com")?;
my_curl.perform().and_then(|data| Ok(println!("{}", data)))
// No cleanup code in this example for the sake of brevity.
}
While this works, I found it surprising that my_curl does not need to be declared mut, since none of the methods use &mut self, even though they pass a mut* pointer to the FFI function
s.
Should I change the declaration of perform to use &mut self instead of &self (for safety), since the internal buffer gets modified? Rust does not enforce this, but of course Rust does not know that the buffer gets modified by libcurl.
This small example runs fine, but I am unsure if I would be facing any kind of issues in larger programs, when the compiler might optimize for non-mutable access on the Curl struct, even though the instance of the struct is getting modified - or at least the data the pointer is pointing to.
Contrary to popular belief, there is absolutely no borrowchecker-induced restriction in Rust on passing *const/*mut pointers. There doesn't need to be, because dereferencing pointers is inherently unsafe, and can only be done in such blocks, with the programmer verifying all necessary invariants manually. In your case, you need to tell the compiler that is a mutable reference, as you already suspected.
The interested reader should definitely give the ffi section of the nomicon a read, to find out about some interesting ways to shoot yourself in the foot with it.

How do I convert *mut *mut c_void to &str without Box::from_raw?

I've been playing around with writing Redis Modules in Rust. This is my first attempt at using Rust FFI and bindings. How do I call this method and end up with a data value in Rust without destroying the Redis pointer?
extern "C" {
pub static mut RedisModule_GetTimerInfo: ::std::option::Option<
unsafe extern "C" fn(
ctx: *mut RedisModuleCtx,
id: RedisModuleTimerID,
remaining: *mut u64,
data: *mut *mut ::std::os::raw::c_void,
) -> ::std::os::raw::c_int,
>;
}
See the RedisModule_GetTimerInfo API Docs for more details.
I ended up getting this to work, but it throws an error if I call it with the same id twice:
let mut ms = 0 as u64;
let val = "";
let ptr = Box::into_raw(Box::new(&mut val)) as *mut *mut c_void;
let ok = unsafe { RedisModule_GetTimerInfo.unwrap()(ctx, id, &mut ms, ptr) };
let mut data: Option<String> = None;
if ok == 0 {
let val = unsafe { Box::from_raw(*ptr as *mut &str) };
// trim nul bytes
data = Some(val.trim_matches(char::from(0)).to_string());
}
This didn't work because of how Box::from_raw owns the raw pointer and the pointer is destroyed when the box is dropped.
I tried countless ways to make this work without using Box::into_raw & Box::from_raw and all of times they either end up crashing Redis or end up as a pointer that I don't know how to convert to &str.
Update: I originally had an example of using RedisModule_StopTimer which was a mistake. Corrected to use the method I was asking about.
I'm one of the maintainers of the redismodule-rs crate, which provides a high-level Rust API for writing Redis modules.
Prompted by your question, I looked into adding these timer APIs to the crate in a safe manner, and will push the code to the repo once I'm done with it.
The following code shows how to retrieve the data safely:
// Create local variables to hold the returned values
let mut remaining: u64 = 0;
let mut data: *mut c_void = std::ptr::null_mut();
// Call the API and retrieve the values into the local variables
let status = unsafe {
RedisModule_GetTimerInfo.unwrap()(ctx, timer_id, &mut remaining, &mut data)
};
if status == REDISMODULE_OK {
// Cast the *mut c_void supplied by the Redis API to
// a raw pointer of our custom type:
let data = data as *mut T; // T is the type of the data, e.g. String
// Dereference the raw pointer (we know this is safe,
// since Redis should return our original pointer which
// we know to be good), and turn in into a safe reference:
let data = unsafe { &*data };
println!("Remaining: {}, data: {:?}", remaining, data);
}
Using one of the links #Shepmaster added, I was finally able to figure this out. I swear I tried some variation of this but didn't think to try double boxing...
Here's what I did:
let val = Box::new(Box::new("") as Box<&str>);
let ptr = Box::into_raw(val);
let ok = unsafe { RedisModule_GetTimerInfo.unwrap()(ctx, id, &mut ms, ptr as *mut *mut c_void) };
let mut data: Option<String> = None;
if ok == 0 {
let val = unsafe {**ptr as &str};
data = Some(val.trim_matches(char::from(0)).to_string());
}
Thanks all for your help!

Rust + LLVM ORC JIT cannot find symbol address

I have been trying to use the ORC JIT compiler from the LLVM C bindings in Rust, but I keep running into the problem that LLVMOrcGetSymbolAddress is not able to find the symbol of my function run in the module I provide it. The code below combines the most important parts of my code that unfortunately doesn't work. All goes well until the last part of foo, where an error is returned because the function cannot be found. The function is definitely in the module, as LLVMGetNamedFunction is able to find it, but somehow the ORC engine cannot see it. Can anyone see what I am doing wrong? I am using LLVM 6.0 and the llvm-sys Rust bindings. Everything works fine if I use MCJIT but I need ORC for lazy compilation.
fn foo(module: LLVMModuleRef) -> Result<I64Func, LlvmError> {
let def_triple = LLVMGetDefaultTargetTriple();
let mut target_ref = ptr::null_mut();
let mut error_str = ptr::null_mut();
// Get target from default triple
if LLVMGetTargetFromTriple(def_triple, &mut target_ref, &mut error_str) != 0 {
let msg = format!("Creating target from triple failed: {}", CStr::from_ptr(error_str).to_str().unwrap());
LLVMDisposeMessage(def_triple);
LLVMDisposeMessage(error_str);
return Err(LlvmError(msg));
}
// Check if JIT is available
if LLVMTargetHasJIT(target_ref) == 0 {
let msg = format!("Cannot do JIT on this platform");
LLVMDisposeMessage(def_triple);
return Err(LlvmError(msg));
}
// Create target machine
let tm_ref = LLVMCreateTargetMachine(target_ref,
def_triple,
CString::new("").unwrap().as_ptr(),
CString::new("").unwrap().as_ptr(),
llvm_opt_level(optimization_level)?,
LLVMRelocMode::LLVMRelocDefault,
LLVMCodeModel::LLVMCodeModelJITDefault);
LLVMDisposeMessage(def_triple);
let engine = LLVMOrcCreateInstance(tm_ref);
// Add eagerly compiled IR
let mut handle = LLVMOrcModuleHandle::default();
let shared_module = LLVMOrcMakeSharedModule(module);
let ctx = engine as *mut libc::c_void;
map_orc_err(engine, LLVMOrcAddEagerlyCompiledIR(engine,
&mut handle,
shared_module,
symbol_resolver_callback,
ctx))?;
// Find function named 'run'
let c_name = CString::new("run").unwrap().as_ptr();
let mut func_addr = LLVMOrcTargetAddress::default();
map_orc_err(engine, LLVMOrcGetSymbolAddress(engine, &mut func_addr, c_name))?;
if func_addr == 0 {
// This errors always gets thrown
return Err(LlvmError(format!("No function named {} in module", name)));
}
let function: I64Func = mem::transmute(func_addr);
Ok(function)
}
extern "C" fn symbol_resolver_callback(symbol: *const libc::c_char, ctx: *mut libc::c_void) -> LLVMOrcTargetAddress {
let mut address = LLVMOrcTargetAddress::default();
let engine: LLVMOrcJITStackRef = ctx as LLVMOrcJITStackRef;
unsafe { LLVMOrcGetSymbolAddress(engine, &mut address, symbol) };
address
}
unsafe fn map_orc_err(engine: LLVMOrcJITStackRef, error_code: LLVMOrcErrorCode) -> Result<(), LlvmError> {
match error_code {
LLVMOrcErrorCode::LLVMOrcErrSuccess => Ok(()),
LLVMOrcErrorCode::LLVMOrcErrGeneric => {
let c_str: &CStr = CStr::from_ptr(LLVMOrcGetErrorMsg(engine));
let str_slice: &str = c_str.to_str().unwrap();
let str_buf: String = str_slice.to_owned();
Err(LlvmError(str_buf))
}
}
}
EDIT: I tried downgrading to LLVM 4.0 just to see what effect that might have. I still cannot resolve the function, but now I'm getting an assertion error:
Assertion failed: (!Name.empty() && "getNameWithPrefix requires non-empty name"), function getNameWithPrefixImpl, file /tmp/llvm-4.0-20180412-49671-1pw0nxu/llvm-4.0.1.src/lib/IR/Mangler.cpp, line 37.
EDIT 2: Below is some basic IR for which the engine fails to find the function address:
define i64 #bar(i64 %arg) {
%1 = add i64 %arg, 1
ret i64 %1
}
define i64 #run(i64 %arg) {
%1 = add i64 %arg, 1
%2 = call i64 #bar(i64 %1)
ret i64 %2
}

Resources