Is this `unsafe` code Undefined Behavior in Rust? - rust

I wanted to get a reference to structure from member references.
So, I wrote the following code.
#[repr(C)]
struct Data {
x: i32,
y: u32,
}
fn compose<'a>(x: &'a i32, y: &'a u32) -> Option<&'a Data> {
use memoffset::offset_of;
let x_address = x as *const i32 as usize;
let y_address = y as *const u32 as usize;
if x_address + offset_of!(Data, y) != y_address {
return None;
}
return Some(unsafe { &*(x_address as *const Data) });
}
It seems to work correctly but I don't know if it is safe or not.
Is this undefined behavior or is there a good crate for solving this problem?

Yes. This is undefined behavior. It is not reported by Miri (although a similar code is) because it doesn't track ptr-to-int-to-ptr roundtrips, but it is still undefined behavior to access a reference past its bounds, not to mention that even if this was fine compose() would still be unsound because two references can happen to have adjacent addresses but come from different allocations.
It may not be undefined behavior if UCG#134/UCG#256 will be decided to not be undefined behavior (you'll be allowed to read past the bounds of a reference). However, even in that case I'm not sure, because you have an active reference for y (although it is unlikely to matter). In the meantime, it is best to avoid this.

Related

How to write a Rust function with argument that accept both of *const and *mut pointer

I'm trying to find out how to write a function that receives both of *const and *mut pointer. Rust doesn't seems to have a trait for pointers. Is there a solution in latest version of Rust?
Here is an example, the function needs a pointer, but not care about the mutability of the pointer.
use std::ffi::c_void;
fn main() {
let const_ptr = 0usize as *const c_void; // null const pointer
function(const_ptr);
let mut_ptr = 0usize as *mut c_void; // null mut pointer
function(mut_ptr);
}
fn function<T>(arg: *ANY T) -> *ANY T {
arg
}
Rust does not support any kind of direct overload in their fn or method signatures. But as always, exists and idiomatic way to go. This is, go with traits.
trait Pointereable {}
fn accept_diff_pointers<T: Pointereable>(ptr: T) { // do stuff }
Then, you have to implement that new trait for the desired types:
impl Pointereable for X {}
impl Pointereable for Y {}
...
and so on and so forth, where in this example, X and Y are the exact pointer types that you want to work with.
One last note:
Cast const things to mut ones is Undefined Behaviour.
Transmuting an & to &mut is Undefined Behavior. While certain usages may appear safe, note that the Rust optimizer is free to assume that a shared reference won't change through its lifetime and thus such transmutation will run afoul of those assumptions.
Read more here: https://doc.rust-lang.org/nomicon/transmutes.html
I am explaining thease because with the trait bound approach to accept different pointers, you'll will have to play a bit to match the correct one.
Also note that in the nomicon, it's talking about references. Trasmutting them like exposed above is undefined behaviour.
Casting pointers in this context it's always fine, but dereferencing them is a lot more complex. It may or not may lead to undefined behaviour, so you must take it carefully.

FFI: Convert nullable pointer to option

I'm using rust-bindgen to access a C library from Rust. Some functions return nullable pointers to structs, which bindgen represents as
extern "C" {
pub fn get_some_data() -> *const SomeStruct;
}
Now, for a higher level wrapper, I would like to convert this to a Option<&'a SomeStruct> with an appropriate lifetime. Due to the nullable pointer optimization, this is actually represented identically to *const SomeStruct. However, I couln't find any concise syntax to cast between the two. Transmuting
let data: Option<&'a SomeStruct> = unsafe { mem::transmute( get_some_data() ) };
and reborrowing
let data_ptr = get_some_data();
let data = if data_ptr.is_null() { None } else { unsafe { &*data_ptr } };
could be used. The docs for mem::transmute state that
transmute is incredibly unsafe. There are a vast number of ways to cause undefined behavior with this function. transmute should be the absolute last resort.
and recommends re-borrowing instead for
Turning a *mut T into an &mut T
However, for the nullable pointer, this is quite clumsy as shown in the second example.
Q: Is there a more concise Syntax for this cast? Alternatively, is there a way to tell bindgen to generate
extern "C" {
pub fn get_some_data() -> Option<&SomeStruct>;
}
directly?
Use <*const T>::as_ref¹:
let data = unsafe { get_some_data().as_ref() };
Since a raw pointer may not point to a valid object of sufficient lifetime for any 'a, as_ref is unsafe to call.
There is a corresponding as_mut for *mut T → Option<&mut T>.
¹ This is a different as_ref from, for example, AsRef::as_ref and Option::as_ref, both of which are common in safe code.

Can a type know when a mutable borrow to itself has ended?

I have a struct and I want to call one of the struct's methods every time a mutable borrow to it has ended. To do so, I would need to know when the mutable borrow to it has been dropped. How can this be done?
Disclaimer: The answer that follows describes a possible solution, but it's not a very good one, as described by this comment from Sebastien Redl:
[T]his is a bad way of trying to maintain invariants. Mostly because dropping the reference can be suppressed with mem::forget. This is fine for RefCell, where if you don't drop the ref, you will simply eventually panic because you didn't release the dynamic borrow, but it is bad if violating the "fraction is in shortest form" invariant leads to weird results or subtle performance issues down the line, and it is catastrophic if you need to maintain the "thread doesn't outlive variables in the current scope" invariant.
Nevertheless, it's possible to use a temporary struct as a "staging area" that updates the referent when it's dropped, and thus maintain the invariant correctly; however, that version basically amounts to making a proper wrapper type and a kind of weird way to use it. The best way to solve this problem is through an opaque wrapper struct that doesn't expose its internals except through methods that definitely maintain the invariant.
Without further ado, the original answer:
Not exactly... but pretty close. We can use RefCell<T> as a model for how this can be done. It's a bit of an abstract question, but I'll use a concrete example to demonstrate. (This won't be a complete example, but something to show the general principles.)
Let's say you want to make a Fraction struct that is always in simplest form (fully reduced, e.g. 3/5 instead of 6/10). You write a struct RawFraction that will contain the bare data. RawFraction instances are not always in simplest form, but they have a method fn reduce(&mut self) that reduces them.
Now you need a smart pointer type that you will always use to mutate the RawFraction, which calls .reduce() on the pointed-to struct when it's dropped. Let's call it RefMut, because that's the naming scheme RefCell uses. You implement Deref<Target = RawFraction>, DerefMut, and Drop on it, something like this:
pub struct RefMut<'a>(&'a mut RawFraction);
impl<'a> Deref for RefMut<'a> {
type Target = RawFraction;
fn deref(&self) -> &RawFraction {
self.0
}
}
impl<'a> DerefMut for RefMut<'a> {
fn deref_mut(&mut self) -> &mut RawFraction {
self.0
}
}
impl<'a> Drop for RefMut<'a> {
fn drop(&mut self) {
self.0.reduce();
}
}
Now, whenever you have a RefMut to a RawFraction and drop it, you know the RawFraction will be in simplest form afterwards. All you need to do at this point is ensure that RefMut is the only way to get &mut access to the RawFraction part of a Fraction.
pub struct Fraction(RawFraction);
impl Fraction {
pub fn new(numerator: i32, denominator: i32) -> Self {
// create a RawFraction, reduce it and wrap it up
}
pub fn borrow_mut(&mut self) -> RefMut {
RefMut(&mut self.0)
}
}
Pay attention to the pub markings (and lack thereof): I'm using those to ensure the soundness of the exposed interface. All three types should be placed in a module by themselves. It would be incorrect to mark the RawFraction field pub inside Fraction, since then it would be possible (for code outside the module) to create an unreduced Fraction without using new or get a &mut RawFraction without going through RefMut.
Supposing all this code is placed in a module named frac, you can use it something like this (assuming Fraction implements Display):
let f = frac::Fraction::new(3, 10);
println!("{}", f); // prints 3/10
f.borrow_mut().numerator += 3;
println!("{}", f); // prints 3/5
The types encode the invariant: Wherever you have Fraction, you can know that it's fully reduced. When you have a RawFraction, &RawFraction, etc., you can't be sure. If you want, you may also make RawFraction's fields non-pub, so that you can't get an unreduced fraction at all except by calling borrow_mut on a Fraction.
Basically the same thing is done in RefCell. There you want to reduce the runtime borrow-count when a borrow ends. Here you want to perform an arbitrary action.
So let's re-use the concept of writing a function that returns a wrapped reference:
struct Data {
content: i32,
}
impl Data {
fn borrow_mut(&mut self) -> DataRef {
println!("borrowing");
DataRef { data: self }
}
fn check_after_borrow(&self) {
if self.content > 50 {
println!("Hey, content should be <= {:?}!", 50);
}
}
}
struct DataRef<'a> {
data: &'a mut Data
}
impl<'a> Drop for DataRef<'a> {
fn drop(&mut self) {
println!("borrow ends");
self.data.check_after_borrow()
}
}
fn main() {
let mut d = Data { content: 42 };
println!("content is {}", d.content);
{
let b = d.borrow_mut();
//let c = &d; // Compiler won't let you have another borrow at the same time
b.data.content = 123;
println!("content set to {}", b.data.content);
} // borrow ends here
println!("content is now {}", d.content);
}
This results in the following output:
content is 42
borrowing
content set to 123
borrow ends
Hey, content should be <= 50!
content is now 123
Be aware that you can still obtain an unchecked mutable borrow with e.g. let c = &mut d;. This will be silently dropped without calling check_after_borrow.

Is it possible to return a reference created inside function scope? [duplicate]

This question already has answers here:
Is there any way to return a reference to a variable created in a function?
(5 answers)
Closed 3 years ago.
I have a fairly simple program:
fn f<'a>() -> &'a i32 {
&1
}
fn main() {
println!("{}", f());
}
It doesn't compile (some of the output elided):
$ rustc test.rs
test.rs:2:6: 2:7 error: borrowed value does not live long enough
test.rs:2 &1
I understand why it fails.
I don't know how to return a reference created inside the function scope. Is there way to do that?
Why can't the lifetime be elided for a single return?
EDIT: I changed the title since it suggested returning boxed type would help which is not (see answers).
As of Rust 1.21, a new feature named rvalue static promotion means that the code in the question does now compile.
In this instance because 1 is a constant, the compiler promotes it to a static meaning that the returned reference has the 'static lifetime. The function de-sugared looks something like:
fn f<'a>() -> &'a i32 {
static ONE: i32 = 1;
&ONE
}
This works for any compile-time constant, including structs:
struct Foo<'a> {
x: i32,
y: i32,
p: Option<&'a Foo<'a>>
}
fn default_foo<'a>() -> &'a Foo<'a> {
&Foo { x: 12, y: 90, p: None }
}
But this will not compile:
fn bad_foo<'a>(x: i32) -> &'a Foo<'a> {
/* Doesn't compile as x isn't constant! */
&Foo { x, y: 90, p: None }
}
Since Rust uses RAII style resource management, as soon as the program leaves a scope, all values within that scope which did not move will get destroyed. The value has to live somewhere for a reference to be valid. Therefore either return the value as such (if you are worried about having an additional copy when you do this, then don't worry since that copy will get optimized away) or box it and return the box. Unless you are returning a statically allocated string as &str as follows, you simply cannot return a "new" (for the caller) reference:
fn f<'a>() -> &'a str {
"yo"
}
Boxing the reference will not help. Box<T> is virtually identical to an unboxed T in most respects, including ownership and lifetime issues. The fundamental issue is that local variables will stop existing as soon as the function returns. Thus, a reference to a local variable will point to deallocated memory by the time the calling function gets its hand on that reference. Putting wrapping paper around the reference doesn't fix that problem.
I assume this is a simplified example distilled from a real program you're having trouble with. I can't give targeted advice for that for lack of information, but generally it is a very good idea to return things by value (i.e., just -> i32 in this case) instead of a reference.

Caught between a lifetime and an FFI place

I am caught between two different issues/bugs, and can't come up with a decent solution. Any help would be greatly appreciated
Context, FFI, and calling a lot of C functions, and wrapping C types in rust structs.
The first problem is ICE: this path should not cause illegal move.
This is forcing me to do all my struct-wrapping using & references as in:
pub struct CassResult<'a> {
result:&'a cql_ffi::CassResult
}
Instead of the simpler, and preferable:
pub struct CassResult {
result:cql_ffi::CassResult
}
Otherwise code like:
pub fn first_row(&self) -> Result<CassRow,CassError> {unsafe{
Ok(CassRow{row:*cql_ffi::cass_result_first_row(self.result)})
}}
Will result in:
error: internal compiler error: this path should not cause illegal move
Ok(CassRow{row:*cql_ffi::cass_result_first_row(self.result)})
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
So, I go ahead and wrap everything using lifetime managed references, and all is not-horrible until I try to implement an iterator. At which point I see no way around this problem.
method next has an incompatible type for trait: expected concrete lifetime, found bound lifetime parameter
So given those two conflicting issues, I am totally stuck and can't find any way to implement a proper rust iterator around a FFI iterator-like construct.
Edit: With Shep's suggestion, I get:
pub struct CassResult {
pub result:cql_ffi::CassResult
}
and
pub fn get_result(&mut future:future) -> Option<CassResult> {unsafe{
let result:&cql_ffi::CassResult = &*cql_ffi::cass_future_get_result(&mut future.future);
Some(CassResult{result:*result})
}}
but then get:
error: cannot move out of borrowed content
Some(CassResult{result:*result}
Is there any way to make that pattern work? It's repeated all over this FFI wrapping code.
Only a partial answer: use the "streaming iterator" trait and macro.
I have had a similar problem making Rust bindings around the C mysql API. The result is code like this, instead of native for syntax:
let query = format!("SELECT id_y, value FROM table_x WHERE id = {}", id_x);
let res = try!(db::run_query(&query));
streaming_for!( row, res.into_iter(), {
let id_y: usize = try!(row.convert::<usize>(0));
let value: f64 = try!(row.convert::<f64>(1));
});
Here res holds the result and frees memory on drop. The lifetime of row is tied to res:
/// Res has an attached lifetime to guard an internal pointer.
struct Res<'a>{ p: *mut c_void }
/// Wrapper created by into_iter()
struct ResMoveIter<'a>{ res: Res<'a> }
impl<'a> /*StreamingIterator<'a, Row<'a>> for*/ ResMoveIter<'a>{
/// Get the next row, or None if no more rows
pub fn next(&'a mut self) -> Option<Row<'a>>{
...
}
}
#[unsafe_destructor]
impl<'a> Drop for Res<'a>{
fn drop(&mut self){
...
}
}
To answer my own question. The only decent answer was a way around the original ICE, but as thepowersgang comments, the correct way to do this now is to use :std::ptr::read, so using that approach, no ICE, and hopefully progress.

Resources