I'm writing a foreign function interface (ffi) to expose the API of a pre-existing C++ library to some new Rust code I am writing. I am using the Rust cxx module for this.
I am running into some problems related to Pin (a topic which I have to admit I don't have a complete grasp of).
My C++ module has an API that exposes pointers to some contained objects from the main object that owns these contained objects. Here is a contrived example:
// test.hpp
#include <string>
#include <memory>
class Row {
std::string row_data;
public:
void set_value(const std::string& new_value) {
this->row_data = new_value;
}
};
class Table {
Row row;
public:
Table() : row() {};
Row* get_row() {
return &this->row;
}
};
inline std::unique_ptr<Table> make_table() {
return std::make_unique<Table>();
}
The idea is that you create a Table object, which then allows you to obtain a pointer to it's Row so you can manipulate it.
My attempt to create a Rust FFI looks like this:
// main.rs
use std::pin::Pin;
use cxx::let_cxx_string;
#[cxx::bridge]
mod ffi {
unsafe extern "C++" {
include!("src/test.hpp");
type Table;
pub fn make_table() -> UniquePtr<Table>;
fn get_row(self: Pin<&mut Table>) -> *mut Row;
type Row;
pub fn set_value(self: Pin<&mut Row>, value: &CxxString);
}
}
impl ffi::Table {
pub fn get_row_ref<'a>(self: Pin<&'a mut ffi::Table>) -> Pin<&'a mut ffi::Row> {
unsafe { Pin::new_unchecked(&mut *self.get_row()) }
}
}
fn main() {
let mut table = ffi::make_table();
let row = table.pin_mut().get_row_ref();
let_cxx_string!(hello="hello world");
row.set_value(&hello);
let_cxx_string!(hello2="bye world");
row.set_value(&hello2);
}
Note that:
The cxx module requires that non-const C++ methods take Pin<&mut T> as their receiver
The C++ get_row method returns a pointer, which I want to convert into a reference to the Row which has the same lifetime as the owning Table object - that's what the get_row_ref wrapper is for.
I have two problems:
Is it sound for me to call Pin::new_unchecked here? The documentation implies it is not:
calling Pin::new_unchecked on an &'a mut T is unsafe because while you are
able to pin it for the given lifetime 'a, you have no control over whether
it is kept pinned once 'a ends
If that is not safe, how do I proceed?
This program fails to compile with the following error:
error[E0382]: use of moved value: `row`
--> src/main.rs:41:2
|
34 | let row = table.pin_mut().get_row_ref();
| --- move occurs because `row` has type `Pin<&mut Row>`, which does not implement the `Copy` trait
...
38 | row.set_value(&hello);
| --- value moved here
...
41 | row.set_value(&hello2);
| ^^^ value used here after move
The first call to set_value consumes the pinned reference, and after that it can't
be used again. &mut T is not Copy, so Pin<&mut Row> is not Copy either.
How do I set up the API so that the reference to Row can be used for multiple
successive method calls (within the constraints established by cxx)?
For those wanting to try it out:
# Cargo.toml
[dependencies]
cxx = "1.0.52"
[build-dependencies]
cxx-build = "1.0"
// build.rs
fn main() {
cxx_build::bridge("src/main.rs")
.flag("-std=c++17")
.include(".")
.compile("test");
}
Is it sound for me to call Pin::new_unchecked here?
Yes, it's sound. In this context, we know the Row is pinned because:
The Table is pinned;
The Row is stored inline in the Table;
C++'s move semantics essentially means every C++ object is "pinned" anyway.
This program fails to compile with the following error:
When you call a method on a normal mutable reference (&mut T), the compiler implicitly performs a reborrow in order to avoid moving the mutable reference, because &mut T is not Copy. Unfortunately, this compiler "magic" doesn't extend to Pin<&mut T> (which is not Copy either), so instead we must reborrow explicitly.
The easiest way to reborrow is to use Pin::as_mut(). This use case is even called out in the documentation:
This method is useful when doing multiple calls to functions that consume the pinned type.
fn main() {
let mut table = ffi::make_table();
let mut row = table.pin_mut().get_row_ref();
let_cxx_string!(hello="hello world");
row.as_mut().set_value(&hello);
let_cxx_string!(hello2="bye world");
row.as_mut().set_value(&hello2);
}
The use of as_mut() on the last use of row is not strictly necessary, but applying it consistently is probably clearer. When compiled with optimizations, this function is probably a noop anyway (for Pin<&mut T>).
How do I set up the API so that the reference to Row can be used for multiple successive method calls (within the constraints established by cxx)?
If you want to hide the as_mut()'s, you could add a method that accepts a &mut Pin<&mut ffi::Row> and does the as_mut() call. (Note that as_mut() is defined on &mut Pin<P>, so the compiler will insert a reborrow of the outer &mut.) Yes, this means there are now two levels of indirection.
impl ffi::Row {
pub fn set_value2(self: &mut Pin<&mut ffi::Row>, value: &cxx::CxxString) {
self.as_mut().set_value(value)
}
}
fn main() {
let mut table = ffi::make_table();
let mut row = table.pin_mut().get_row_ref();
let_cxx_string!(hello="hello world");
row.set_value2(&hello);
let_cxx_string!(hello2="bye world");
row.set_value2(&hello2);
}
Related
Running into an ownership issue when attempting to reference multiple values from a HashMap in a struct as parameters in a function call. Here is a PoC of the issue.
use std::collections::HashMap;
struct Resource {
map: HashMap<String, String>,
}
impl Resource {
pub fn new() -> Self {
Resource {
map: HashMap::new(),
}
}
pub fn load(&mut self, key: String) -> &mut String {
self.map.get_mut(&key).unwrap()
}
}
fn main() {
// Initialize struct containing a HashMap.
let mut res = Resource {
map: HashMap::new(),
};
res.map.insert("Item1".to_string(), "Value1".to_string());
res.map.insert("Item2".to_string(), "Value2".to_string());
// This compiles and runs.
let mut value1 = res.load("Item1".to_string());
single_parameter(value1);
let mut value2 = res.load("Item2".to_string());
single_parameter(value2);
// This has ownership issues.
// multi_parameter(value1, value2);
}
fn single_parameter(value: &String) {
println!("{}", *value);
}
fn multi_parameter(value1: &mut String, value2: &mut String) {
println!("{}", *value1);
println!("{}", *value2);
}
Uncommenting multi_parameter results in the following error:
28 | let mut value1 = res.load("Item1".to_string());
| --- first mutable borrow occurs here
29 | single_parameter(value1);
30 | let mut value2 = res.load("Item2".to_string());
| ^^^ second mutable borrow occurs here
...
34 | multi_parameter(value1, value2);
| ------ first borrow later used here
It would technically be possible for me to break up the function calls (using the single_parameter function approach), but it would be more convenient to pass the
variables to a single function call.
For additional context, the actual program where I'm encountering this issue is an SDL2 game where I'm attempting to pass multiple textures into a single function call to be drawn, where the texture data may be modified within the function.
This is currently not possible, without resorting to unsafe code or interior mutability at least. There is no way for the compiler to know if two calls to load will yield mutable references to different data as it cannot always infer the value of the key. In theory, mutably borrowing both res.map["Item1"] and res.map["Item2"] would be fine as they would refer to different values in the map, but there is no way for the compiler to know this at compile time.
The easiest way to do this, as already mentioned, is to use a structure that allows interior mutability, like RefCell, which typically enforces the memory safety rules at run-time before returning a borrow of the wrapped value. You can also work around the borrow checker in this case by dealing with mut pointers in unsafe code:
pub fn load_many<'a, const N: usize>(&'a mut self, keys: [&str; N]) -> [&'a mut String; N] {
// TODO: Assert that keys are distinct, so that we don't return
// multiple references to the same value
keys.map(|key| self.load(key) as *mut _)
.map(|ptr| unsafe { &mut *ptr })
}
Rust Playground
The TODO is important, as this assertion is the only way to ensure that the safety invariant of only having one mutable reference to any value at any time is upheld.
It is, however, almost always better (and easier) to use a known safe interior mutation abstraction like RefCell rather than writing your own unsafe code.
Why can I have multiple mutable references to a static type in the same scope?
My code:
static mut CURSOR: Option<B> = None;
struct B {
pub field: u16,
}
impl B {
pub fn new(value: u16) -> B {
B { field: value }
}
}
struct A;
impl A {
pub fn get_b(&mut self) -> &'static mut B {
unsafe {
match CURSOR {
Some(ref mut cursor) => cursor,
None => {
CURSOR= Some(B::new(10));
self.get_b()
}
}
}
}
}
fn main() {
// first creation of A, get a mutable reference to b and change its field.
let mut a = A {};
let mut b = a.get_b();
b.field = 15;
println!("{}", b.field);
// second creation of A, a the mutable reference to b and change its field.
let mut a_1 = A {};
let mut b_1 = a_1.get_b();
b_1.field = 16;
println!("{}", b_1.field);
// Third creation of A, get a mutable reference to b and change its field.
let mut a_2 = A {};
let b_2 = a_2.get_b();
b_2.field = 17;
println!("{}", b_1.field);
// now I can change them all
b.field = 1;
b_1.field = 2;
b_2.field = 3;
}
I am aware of the borrowing rules
one or more references (&T) to a resource,
exactly one mutable reference (&mut T).
In the above code, I have a struct A with the get_b() method for returning a mutable reference to B. With this reference, I can mutate the fields of struct B.
The strange thing is that more than one mutable reference can be created in the same scope (b, b_1, b_2) and I can use all of them to modify B.
Why can I have multiple mutable references with the 'static lifetime shown in main()?
My attempt at explaining this is behavior is that because I am returning a mutable reference with a 'static lifetime. Every time I call get_b() it is returning the same mutable reference. And at the end, it is just one identical reference. Is this thought right? Why am I able to use all of the mutable references got from get_b() individually?
There is only one reason for this: you have lied to the compiler. You are misusing unsafe code and have violated Rust's core tenet about mutable aliasing. You state that you are aware of the borrowing rules, but then you go out of your way to break them!
unsafe code gives you a small set of extra abilities, but in exchange you are now responsible for avoiding every possible kind of undefined behavior. Multiple mutable aliases are undefined behavior.
The fact that there's a static involved is completely orthogonal to the problem. You can create multiple mutable references to anything (or nothing) with whatever lifetime you care about:
fn foo() -> (&'static i32, &'static i32, &'static i32) {
let somewhere = 0x42 as *mut i32;
unsafe { (&*somewhere, &*somewhere, &*somewhere) }
}
In your original code, you state that calling get_b is safe for anyone to do any number of times. This is not true. The entire function should be marked unsafe, along with copious documentation about what is and is not allowed to prevent triggering unsafety. Any unsafe block should then have corresponding comments explaining why that specific usage doesn't break the rules needed. All of this makes creating and using unsafe code more tedious than safe code, but compared to C where every line of code is conceptually unsafe, it's still a lot better.
You should only use unsafe code when you know better than the compiler. For most people in most cases, there is very little reason to create unsafe code.
A concrete reminder from the Firefox developers:
I want to create some references to a str with Rc, without cloning str:
fn main() {
let s = Rc::<str>::from("foo");
let t = Rc::clone(&s); // Creating a new pointer to the same address is easy
let u = Rc::clone(&s[1..2]); // But how can I create a new pointer to a part of `s`?
let w = Rc::<str>::from(&s[0..2]); // This seems to clone str
assert_ne!(&w as *const _, &s as *const _);
}
playground
How can I do this?
While it's possible in principle, the standard library's Rc does not support the case you're trying to create: a counted reference to a part of reference-counted memory.
However, we can get the effect for strings using a fairly straightforward wrapper around Rc which remembers the substring range:
use std::ops::{Deref, Range};
use std::rc::Rc;
#[derive(Clone, Debug, Eq, Hash, PartialEq)]
pub struct RcSubstr {
string: Rc<str>,
span: Range<usize>,
}
impl RcSubstr {
fn new(string: Rc<str>) -> Self {
let span = 0..string.len();
Self { string, span }
}
fn substr(&self, span: Range<usize>) -> Self {
// A full implementation would also have bounds checks to ensure
// the requested range is not larger than the current substring
Self {
string: Rc::clone(&self.string),
span: (self.span.start + span.start)..(self.span.start + span.end)
}
}
}
impl Deref for RcSubstr {
type Target = str;
fn deref(&self) -> &str {
&self.string[self.span.clone()]
}
}
fn main() {
let s = RcSubstr::new(Rc::<str>::from("foo"));
let u = s.substr(1..2);
// We need to deref to print the string rather than the wrapper struct.
// A full implementation would `impl Debug` and `impl Display` to produce
// the expected substring.
println!("{}", &*u);
}
There are a lot of conveniences missing here, such as suitable implementations of Display, Debug, AsRef, Borrow, From, and Into — I've provided only enough code to illustrate how it can work. Once supplemented with the appropriate trait implementations, this should be just as usable as Rc<str> (with the one edge case that it can't be passed to a library type that wants to store Rc<str> in particular).
The crate arcstr claims to offer a finished version of this basic idea, but I haven't used or studied it and so can't guarantee its quality.
The crate owning_ref provides a way to hold references to parts of an Rc or other smart pointer, but there are concerns about its soundness and I don't fully understand which circumstances that applies to (issue search which currently has 3 open issues).
Looking through the blurz bluetooth library for Rust.
There is a variable declared with a value equal to a reference of a temp value(?).
This value is then passed into another function by reference.
How is ownership handled for a variable set to the value of a reference in a single statement and what does it mean to then use that reference as a reference?
Example:
let bt_session = &Session::create_session(None)?;
let adapter: Adapter = Adapter::init(bt_session)?;
adapter.set_powered(true)?;
let session = DiscoverySession::create_session(
&bt_session,
adapter.get_id()
)?;
See variable bt_session.
Source code example link.
A commented example to illustrate some concepts:
use rand;
#[derive(Debug)]
struct Struct {
id: i32
}
impl Drop for Struct {
fn drop(&mut self) {
// print when struct is dropped
println!("dropped {:?}", self);
}
}
fn rand_struct() -> Struct {
Struct { id: rand::random() }
}
/*
this does not compile:
fn rand_struct_ref<'a>() -> &'a Struct {
&rand_struct()
// struct dropped here so we can't return it to caller's scope
}
*/
fn takes_struct_ref(s: &Struct) {}
fn main() {
// brings struct into scope and immediately creates ref to it
// this is okay because the struct is not dropped until end of this scope
let s = &rand_struct();
println!("holding ref to {:?}", s);
// all these work because of deref coercion
takes_struct_ref(s);
takes_struct_ref(&s);
takes_struct_ref(&&s);
// struct dropped here
}
playground
To answer your first question: You can't return a reference from a function if the underlying value is dropped at the end of the function but it's okay to immediately make a reference to a returned value since that value lives for the rest of the caller's scope. You can see this in action if you run the above example, as holding ref to Struct { id: <id> } will get printed before dropped Struct { id: <id> }.
To answer your second question: the reason you can pass an &Struct or an &&Struct or an &&&Struct and so on to function which only takes an &Struct is because of a Rust syntax sugar feature called deref coercion which automatically derefs variables when they are used as function arguments or as part of method calls. In your shared code examples it looks like a ref of a ref is being passed to the function but actually it's just passing a single ref after auto deref coercion takes place.
See also
Why is it legal to borrow a temporary?
What are Rust's exact auto-dereferencing rules?
I want my method of struct to perform in a synchronized way. I wanted to do this by using Mutex (Playground):
use std::sync::Mutex;
use std::collections::BTreeMap;
pub struct A {
map: BTreeMap<String, String>,
mutex: Mutex<()>,
}
impl A {
pub fn new() -> A {
A {
map: BTreeMap::new(),
mutex: Mutex::new(()),
}
}
}
impl A {
fn synchronized_call(&mut self) {
let mutex_guard_res = self.mutex.try_lock();
if mutex_guard_res.is_err() {
return
}
let mut _mutex_guard = mutex_guard_res.unwrap(); // safe because of check above
let mut lambda = |text: String| {
let _ = self.map.insert("hello".to_owned(),
"d".to_owned());
};
lambda("dd".to_owned());
}
}
Error message:
error[E0500]: closure requires unique access to `self` but `self.mutex` is already borrowed
--> <anon>:23:26
|
18 | let mutex_guard_res = self.mutex.try_lock();
| ---------- borrow occurs here
...
23 | let mut lambda = |text: String| {
| ^^^^^^^^^^^^^^ closure construction occurs here
24 | if let Some(m) = self.map.get(&text) {
| ---- borrow occurs due to use of `self` in closure
...
31 | }
| - borrow ends here
As I understand when we borrow anything from the struct we are unable to use other struct's fields till our borrow is finished. But how can I do method synchronization then?
The closure needs a mutable reference to the self.map in order to insert something into it. But closure capturing works with whole bindings only. This means, that if you say self.map, the closure attempts to capture self, not self.map. And self can't be mutably borrowed/captured, because parts of self are already immutably borrowed.
We can solve this closure-capturing problem by introducing a new binding for the map alone such that the closure is able to capture it (Playground):
let mm = &mut self.map;
let mut lambda = |text: String| {
let _ = mm.insert("hello".to_owned(), text);
};
lambda("dd".to_owned());
However, there is something you overlooked: since synchronized_call() accepts &mut self, you don't need the mutex! Why? Mutable references are also called exclusive references, because the compiler can assure at compile time that there is only one such mutable reference at any given time.
Therefore you statically know, that there is at most one instance of synchronized_call() running on one specific object at any given time, if the function is not recursive (calls itself).
If you have mutable access to a mutex, you know that the mutex is unlocked. See the Mutex::get_mut() method for more explanation. Isn't that amazing?
Rust mutexes do not work the way you are trying to use them. In Rust, a mutex protects specific data relying on the borrow-checking mechanism used elsewhere in the language. As a consequence, declaring a field Mutex<()> doesn't make sense, because it is protecting read-write access to the () unit object that has no values to mutate.
As Lukas explained, your call_synchronized as declared doesn't need to do synchronization because its signature already requests an exclusive (mutable) reference to self, which prevents it from being invoked from multiple threads on the same object. In other words, you need to change the signature of call_synchronized because the current one does not match the functionality it is intended to provide.
call_synchronized needs to accept a shared reference to self, which will signal to Rust that it can be called from multiple threads in the first place. Inside call_synchronized a call to Mutex::lock will simultaneously lock the mutex and provide a mutable reference to the underlying data, carefully scoped so that the lock is held for the duration of the reference:
use std::sync::Mutex;
use std::collections::BTreeMap;
pub struct A {
synced_map: Mutex<BTreeMap<String, String>>,
}
impl A {
pub fn new() -> A {
A {
synced_map: Mutex::new(BTreeMap::new()),
}
}
}
impl A {
fn synchronized_call(&self) {
let mut map = self.synced_map.lock().unwrap();
// omitting the lambda for brevity, but it would also work
// (as long as it refers to map rather than self.map)
map.insert("hello".to_owned(), "d".to_owned());
}
}