Should functions that depend upon specific values be made unsafe? - rust

I have a function that takes a usize equivalent to a pointer, and aligns it up to the next alignment point.
It doesn't require any unsafe as it's side effect free, but the alignment must be a power of two with this implementation. This means that if you use the function with bad parameters, you might get undefined behaviour later down the line. I can't check for this inside the function itself with assert! as it's supposed to be very fast.
/// Align the given address `addr` upwards to alignment `align`.
///
/// Unsafe as `align` must be a power of two.
unsafe fn align_next_unsafe(addr: usize, align: usize) -> usize {
(addr + align - 1) & !(align - 1)
}
Currently, I've made this unsafe for the above reasons, but I'm not sure if that's best practice. Should I only define a function as unsafe if it has side effects? Or is this a valid time to require an unsafe block?

I'll preface this by saying this is a fairly opinion-heavy answer, and represents a point of view, rather than "the truth".
Consider this code taken from the Vec docs:
let x = vec![1, 2, 4];
let x_ptr = x.as_ptr();
unsafe {
for i in 0..x.len() {
assert_eq!(*x_ptr.add(i), 1 << i);
}
}
The function you're describing seems to have a similar safety profile to Vec::as_ptr. Vec::as_ptr is not unsafe, and does nothing particularly bad on its own; having an invalid *const T isn't bad until you dereference it. That's why dereferencing the raw pointer requires unsafe.
Similarly, I'd argue that align_next doesn't do anything particularly bad unless that value is then passed into some unsafe context. As with any question of unsafe, it's a tradeoff between safety/risk and ergonomics.
In Vec::as_ptr's case, the risk is relatively low; the stdlib has lots of eyes on it, and is well "battle-tested". Moreover, it is a single function with a single implementation.
If your align_next was a function on a trait, I'd be much more tempted to make it unsafe, since someone in the future could implement it badly, and you might have other code whose safety relies on a correct implementation of align_next.
However, in your case, I'd say the pattern is similar to Vec::as_ptr, and you should make sure that any functions that consume this value are marked unsafe if they can cause UB.
I'd also second Martin Gallagher's point about creating a Result returning variant and benchmarking (you could also try an Option<usize>-returning API to make use of null-pointer optimizations).

Related

Why rust standard library has too much unsafe code

I was looking at the rust String standard library, and there was so much unsafe code like this one:
#[inline]
#[stable(feature = "rust1", since = "1.0.0")]
pub fn remove(&mut self, idx: usize) -> char {
let ch = match self[idx..].chars().next() {
Some(ch) => ch,
None => panic!("cannot remove a char from the end of a string"),
};
let next = idx + ch.len_utf8();
let len = self.len();
unsafe {
ptr::copy(self.vec.as_ptr().add(next), self.vec.as_mut_ptr().add(idx), len - next);
self.vec.set_len(len - (next - idx));
}
ch
}
why there is so much unsafe code in the standard library?
and how is the language still safe?
There is a misconception here that using unsafe is automatically unsound and will cause memory errors. It does not. In fact, you are not allowed to cause memory errors even in unsafe code blocks; if you do, then the code will exhibit undefined behavior and the whole program is ill-defined. The point of unsafe is to allow things that the compiler cannot ensure are actually safe. That responsibility falls to the developer to ensure the code does not invoke undefined behavior by understanding the safety requirements required to use unsafe syntax, functions, and other items.
The design philosophy for writing and using unsafe functions is if some set of parameters or circumstances may cause a function to exhibit undefined behavior, then it must be marked unsafe and should be documented what the safe parameters and circumstances are. The caller must then abide by this documentation within an unsafe block. The flip side of this design philosophy is that if a function is not marked unsafe, then no possible parameters or circumstances may cause undefined behavior.
In this situation, shifting bytes around in memory is not always safe so you must use unsafe to call ptr::copy. However, the method .remove() is not marked unsafe so whatever happens in the unsafe block must be safe if the developers of the Rust standard library have done their job, and I'm sure they have. You can see that any possible input is bounds-checked and what is being copy'd is within the already allocated block. The only way this could cause undefined behavior is if there was already undefined behavior or broken invariants before calling this function.
You cannot build the Rust standard library without using unsafe. The underlying manual memory management that computers are based on is inherently fraught with memory foot-guns, however you can build off of these "unsafe" operations with guarantees that make them safe.
Some of the unsafe'ty is required, but other instances are simply for performance reasons. Safe abstractions may require many checks to ensure they are safe, especially if any kind of dynamicism is involved, but if your existing invariants are encoded correctly, then using unsafe can avoid those checks while still being safe. In this function, it probably could have been done entirely safely by just relying on other self.vec methods (which would have unsafe internally at some point), but it may include additional bounds checks that would be entirely unnecessary.
The standard library is expected to operate with as little overhead as possible, while staying safe (unless the function is marked unsafe of course).

why does value allocated in stack didn't result in double free pointer?

Please tell me why didn't result in double free pointer that the value is allocated in stack? Thanks.
#[test]
fn read_value_that_allocated_in_stack_is_no_problem() {
let origin = Value(1);
let copied = unsafe { std::ptr::read(&origin) };
assert_eq!(copied, Value(1));
assert_eq!(copied, origin);
}
/// test failed as expected: double free detected
#[test]
fn read_value_that_allocated_in_heap_will_result_in_double_free_problem() {
let origin = Box::new(Value(1));
let copied = unsafe { std::ptr::read(&origin) };
assert_eq!(copied, Box::new(Value(1)));
assert_eq!(copied, origin);
}
#[derive(Debug, PartialEq)]
struct Value<T>(T);
The unsafe method you are using just creates a bitwise copy of the referenced value. When you do this with a Box, it's not okay but for something like your Value struct containing an integer, it is okay to make the copy as Drop of integers has no side effects while drop of Box accesses global allocator and changes the state.
If you do not understand any term I used for this explanation, try to search it or ask in the comments.
Those tests hide the fact that you use different types in them. It isn't really about stack or heap.
In the first one you use Value<i32> type, which is your custom type, presumably without custom Drop implemented. If so, then Rust will call Drop on each member, in this case the i32 member. Which does nothing. And so nothing happens when both objects go out of scope. Even if you implement Drop, it would have to have some serious side-effects (like call to free) for it to fail.
In the second one you actually use Box type, which does implement Drop. Internally it calls free (in addition to dropping the underlying object). And so free is called twice on drop, trying to free the same pointer (because of the unsafe copy).
This is not a double free because we do not free the memory twice. Because we do not free memory at all.
However, whether this is valid or UB is another question. Miri does not flag it as UB, however Miri does not aim to flag any UB out there. If Value<i32> was Copy, that would be fine. As it is not, it depends on the question whether std::ptr::read() invalidates the original data, i.e. is it always invalid to use a data that was std::ptr::read()'ed, or only if it violates Stacked Borrows semantics, like in the case of copying the Box itself where both destructors try to access the Box thereafter?
The answer is that it's not decided yet. As per UCG issue #307, "Are Copy implementations semantically relevant (besides specialization)?":
#steffahn
Overall, this still being an open question means that while miri doesn't complain, one should avoid code like this because it's not yet certain that it won't be UB, right?
#RalfJung
Yes.
In conclusion, you should avoid code like that.

Reuse raw pointer to rehydrate Rust Box

I have a function that coverts a Box::into_raw result into a u64. I later 're-Box' with from the u64.
// Somewhere at inception
let bbox = Box::new(MyStruct::from_u32(1u32).unwrap());
let rwptr = Box::into_raw(bbox);
let bignum_ptr = rwptr as u64;
// Later in life
let rehyrdrate: Box<MyStruct> = unsafe {
Box::from_raw(bignum_ptr as *mut MyStruct)
};
What I would like to do is 're-Box' that bignum_ptr again, and again, as needed. Is this possible?
A box owns the data it points to, and will deallocate/drop it when it goes out of scope, so if you need use the same pointer in more than one place, Box is not the correct type. To support multiple "revivals" of the pointer, you can use a reference instead:
// safety contract: bignum_ptr comes from a valid pointer, there are no
// mutable references
let rehydrate: &MyStruct = unsafe { &*(bignum_ptr as *const MyStruct) };
When the time comes to free the initial box and its data (and you know that no outstanding references exist), only then re-create the box using Box::from_raw:
// safety contract: bignum_ptr comes from a valid pointer, no references
// of any kind remain
drop(unsafe { Box::from_raw(bignum_ptr as *const MyStruct) });
What I would like to do is 're-Box' that bignum_ptr again, and again, as needed. Is this possible?
If you mean creating many boxes from the same pointer without taking it out each time, no.
If you mean putting it in and out repeatedly and round-tripping every time via an integer, probably yes; however, I would be careful with code like that. Most likely, it will work, but be aware that the memory model for Rust is not formalized and the rules around pointer provenance may change in the future. Even the C and C++ standards (from where Rust memory model comes from) have open questions around those, including round-tripping via an integer type.
Furthermore, your code assumes a pointer fits in a u64, which is likely true for most architectures, but maybe not all in the future.
At the very least, I suggest you use mem::transmute rather than a cast.
In short: don't do it. There is likely a better design for what you are trying to achieve.

Rust: can I have a fixed size slice by borrowing the whole fixed size array in a smaller scope in a simple way

I saw the workarounds and they where kinda long. Am I missing a feature of Rust or a simple solution (Important: not workaround). I feel like I should be able to do this with maybe a simple macro but arrayref crate implementations aren't what I am looking for. Is this a feature that needs to be added to Rust or creating fixed size slicing from fixed sized array in a smaller scope is something bad.
Basically what I want to do is this;
fn f(arr:[u8;4]){
arr[0];
}
fn basic(){
let mut arr:[u8;12] = [0;12];
// can't I borrow the whole array but but a fixed slice to it?
f(&mut arr[8..12]); // But this is know on compile time?
f(&mut arr[8..12] as &[u8;4]); // Why can't I do these things?
}
What I want can be achieved by below code(from other so threads)
use array_ref;
fn foo(){
let buf:[u8;12] = [0;12];
let (_, fixed_slice) = mut_array_refs![
&mut buf,
8,
4
];
write_u32_into(fixed_slice,0);
}
fn write_u32_into(fixed_slice:&mut [u8;12],num:u32){
// won't have to check if fixed_slice.len() == 12 and won't panic
}
But I looked into the crate and even though this never panics there are many unsafe blocks and many lines of code. It is a workaround for the Rust itself. In the first place I wanted something like this to get rid of the overhead of checking the size and the possible runtime panic.
Also this is a little overhead it doesn't matter isn't a valid answer because technically I should be able to guarantee this in compile time even if the overhead is small this doesn't mean rust doesn't need to have this type of feature or I should not be looking for an ideal way.
Note: Can this be solved with lifetimes?
Edit: If we where able to have a different syntax for fixed slices such as arr[12;;16] and when I borrowed them this way it would borrow it would borrow the whole arr. I think this way many functions for example (write_u32) would be implemented in a more "rusty" way.
Use let binding with slice_patterns feature. It was stabilized in Rust 1.42.
let v = [1, 2, 3]; // inferred [i32; 3]
let [_, ref subarray # ..] = v; // subarray is &[i32; 2]
let a = v[0]; // 1
let b = subarray[1]; // 3
Here is a section from the Rust reference about slice patterns.
Why it doesn't work
What you want is not available as a feature in rust stable or nightly because multiple things related to const are not stabilized yet, namely const generics and const traits. The reason traits are involved is because the arr[8..12] is a call to the core::ops::Index::<Range<usize>> trait that returns a reference to a slice, in your case [u8]. This type is unsized and not equal to [u8; 4] even if the compiler could figure out that it is, rust is inherently safe and can be overprotective sometimes to ensure safety.
What can you do then?
You have a few routes you can take to solve this issue, I'll stay in a no_std environment for all this as that seems to be where you're working and will avoid extra crates.
Change the function signature
The current function signature you have takes the four u8s as an owned value. If you only are asking for 4 values you can instead take those values as parameters to the function. This option breaks down when you need larger arrays but at that point, it would be better to take the array as a reference or using the method below.
The most common way, and the best way in my opinion, is to take the array in as a reference to a slice (&[u8] or &mut [u8]). This is not the same as taking a pointer to the value in C, slices in rust also carry the length of themselves so you can safely iterate through them without worrying about buffer overruns or if you read all the data. This does require changing the algorithms below to account for variable-sized input but most of the time there is a just as good option to use.
The safe way
Slice can be converted to arrays using TryInto, but this comes at the cost of runtime size checking which you seem to want to avoid. This is an option though and may result in a minimal performance impact.
Example:
fn f(arr: [u8;4]){
arr[0];
}
fn basic(){
let mut arr:[u8;12] = [0;12];
f(arr[8..12].try_into().unwrap());
}
The unsafe way
If you're willing to leave the land of safety there are quite a few things you can do to force the compiler to recognize the data as you want it to, but they can be abused. It's usually better to use rust idioms rather than force other methods in but this is a valid option.
fn basic(){
let mut arr:[u8;12] = [0;12];
f(unsafe {*(arr[8..12].as_ptr() as *const [u8; 4])});
}
TL;DR
I recommend changing your types to utilize slices rather than arrays but if that's not feasible I'd suggest avoiding unsafety, the performance won't be as bad as you think.

When should I use direct access into a Rust Vec instead of the get method?

Rust supports two methods for accessing the elements of a vector:
let mut v = vec![1, 2, 3];
let first_element = &v[0];
let second_element = v.get(1);
The get() method returns an Option type, which seems like a useful safety feature. The C-like syntax &v[0] seems shorter to type, but gives up the safety benefits, since invalid reads cause a run-time error rather than producing an indication that the read was out of bounds.
It's not clear to me when I would want to use the direct access approach, because it seems like the only advantage is that it's quicker to type (I save 3 characters). Is there some other advantage (perhaps a speedup?) that I'm not seeing? I guess I would save the conditional of a match expression, but that doesn't seem like it offers much benefit compared to the costs.
Neither of them is quicker because they both do bounds checks. In fact, your question is quite generic because there are other pairs of methods where one of them panics while the other returns an option, such as String::reserve vs String::try_reserve.
If you are sure that you are in bounds, use the brackets version. This is only a syntactic shortcut for get().unwrap().
If you are unsure of this, use the get() method and do your check.
If you critically need maximum speed and you cannot use an iterator and you have determined through benchmarks that the indexing is the bottleneck and you are sure to be in bounds, you can use the get_unchecked() method. Be careful about this because it is unsafe: it is always better to not have any unsafe block in your code.
Just a little bit of advice: if you are concerned by your program performance, avoid using those methods and prefer to use iterators as much as you can. For example, the second example is faster than the first one because in the first case there are one million bounds checks:
let v: Vec<_> = (0..1000_000).collect();
for idx in 0..1000_000 {
// do something with v[idx]
}
for num in &v {
// do something with num
}

Resources