Why these unaligned pointer deferences work? - rust

In Rust, turning a *const u8 into a *const T is ok, but dereferencing the cast pointer is unsafe because the memory pointed to may not satisfy T's requirements of size, alignment and valid byte pattern. I'm trying to come up with an example that violates the alignment requirement, but satisfy the 2 others.
So I generate a random slice of 7 u8 and try to interpret different length-4 sub-slices as an f32 value. Any byte patttern is a valid f32 and 4 u8 are indead size_of::<f32>(). So the only thing that varies is the alignment of the sub-slice pointer, which is shifted from the base slice:
slice: [ 0 | 1 | 2 | 3 | 4 | 5 | 6 ]
sub-slices: [ 0 1 2 3 ]
[ 1 2 3 4 ]
[ 2 3 4 5 ]
[ 3 4 5 6 ]
This is the code that I run
use std::mem::transmute;
use std::ptr::read;
use std::convert::TryInto;
//use rand::Rng;
fn to_f32(v: &[u8]) -> f32 {
let ptr = v.as_ptr() as *const f32;
unsafe {
// [1] dereference
*ptr
// [2] alternatively
//ptr.read()
}
}
fn main() {
println!("align_of::<f32>() = {}", std::mem::align_of::<f32>());
//let mut rng = rand::thread_rng();
// with a pointer on the stack
let v: [u8; 7] = [ 0x4A, 0x3A, 0x2a, 0x10, 0x0F, 0xD2, 0x37];
// with a pointer on the heap
//let v = Box::new(rng.gen::<[u8;7]>());
for i in 0..4 {
let ptr = &v[i..(i+4)];
let f = to_f32(ptr);
// max alignment of ptr
let alignment = 1 << (ptr.as_ptr() as usize).trailing_zeros();
// other ways to convert, as a control check
let repr = ptr.try_into().expect("");
let f2 = unsafe { transmute::<[u8; 4], f32>(repr) };
let f3 = f32::from_le_bytes(repr);
println!("{:x?} [{alignment}]: {repr:02x?} : {f} =? {f2} = {f3}", ptr.as_ptr());
assert_eq!(f, f2);
assert_eq!(f, f3);
}
}
The code outputs:
align_of::<f32>() = 4
0x7fffa431a5d1 [1]: [4a, 3a, 2a, 10] : 0.000000000000000000000000000033571493 =? 0.000000000000000000000000000033571493 = 0.000000000000000000000000000033571493
0x7fffa431a5d2 [2]: [3a, 2a, 10, 0f] : 0.000000000000000000000000000007107881 =? 0.000000000000000000000000000007107881 = 0.000000000000000000000000000007107881
0x7fffa431a5d3 [1]: [2a, 10, 0f, d2] : -153612880000 =? -153612880000 = -153612880000
0x7fffa431a5d4 [4]: [10, 0f, d2, 37] : 0.000025040965 =? 0.000025040965 = 0.000025040965
The question is why is this code never asserting, even though it [1] unsafely dereference an unaligned pointer or [2] calls ptr::read() that explicitly requires valid alignment ?

Dereferencing an unaligned pointer is Undefined Behavior. Undefined Behavior is undefined, anything can happen, and that includes the expected result. This does not mean the code is correct. Specifically, x86 allows unaligned reads, so this is likely the reason it does not fail.
Miri indeed reports an error in your code:
error: Undefined Behavior: accessing memory with alignment 1, but alignment 4 is required
--> src/main.rs:10:9
|
10 | *ptr
| ^^^^ accessing memory with alignment 1, but alignment 4 is required
|
= help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
= help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
= note: BACKTRACE:
= note: inside `to_f32` at src/main.rs:10:9: 10:13
note: inside `main`
--> src/main.rs:28:17
|
28 | let f = to_f32(ptr);
| ^^^^^^^^^^^

Related

What is count as moving the value in rust?

I am trying to implement a function for converting BigInt to vector of u8 in Rust.
`
fn BigInt_to_Vector(x: BigInt) -> Vec<u8> {
let mut v : Vec<u8> = vec![];
let mut n = x.clone();
let byte : BigInt = BigInt::from(256);
while(x > BigInt::from(0)) {
v.push((n%byte).to_u8().unwrap());
n = n/byte;
}
v
}
`
I am encountering the following error :
`
let byte : BigInt = BigInt::from(256);
| ---- move occurs because `byte` has type `BigInt`, which does not implement the `Copy` trait
v.push((n%byte).to_u8().unwrap());
| ---- value moved here
n = n/byte;
| ^^^^ value used here after move
`
I know this error can be directly dealt by deriving the value BigInt::from(256) every time instead of using variable byte (for example, n%byte becomes n%BigInt::from(256)).
But I am unable to understand the reason behind the error even after doing some internet search. And finally, if I want to keep using byte variable, what measures should I take ?
The solution here is borrow n and byte where you don't want them moved:
fn BigInt_to_Vector(x: BigInt) -> Vec<u8> {
let mut v: Vec<u8> = vec![];
let mut n = x.clone();
let byte = BigInt::from(256);
let zero = BigInt::from(0);
while x > zero {
// Borrow `n` and `byte` for the rem op so they aren't moved
v.push((&n % &byte).to_u8().unwrap());
// Borrow `byte` for the div op so it isn't moved
// It's okay if `n` is moved here
n = n / &byte;
}
v
}

Why doesnt my rust program compile when i use 64 bit unsigned integers instead of 32 bit?

When I try to make my fibonacci evaluator calculate u64 integers the compiler gets upset and refuses to compile. I thought this was built in, but it says "missing crate or module".
error: expected one of `#` or `|`, found `:`
--> src/main.rs:15:12
|
15 | for fib: u64 in r {
| ^
| |
| expected one of `#` or `|`
| help: maybe write a path separator here: `::`
error[E0433]: failed to resolve: use of undeclared crate or module `fib`
--> src/main.rs:15:9
|
15 | for fib: u64 in r {
| ^^^ use of undeclared crate or module `fib`
This code compiles fine without any issues when I use plain old u32 integers:
use std::io;
use std::ops::Range;
fn main() {
println!("Please enter a fibonacci number to evaluate to:");
let mut n: String = String::new();
io::stdin().read_line(&mut n).expect("Not a number");
let mut _n: u32 = n.trim().parse().expect("Please type a number!");
let mut r: Range<u32> = Range { start: 0, end: _n };
let mut fib: u32 = 0;
for fib in r {
fibonacci(fib);
println!("The fibonacci number is {}", fibonacci(fib));
}
}
fn fibonacci(n: u32) -> u32 {
match n {
0 => 1,
1 => 1,
_ => fibonacci(n - 1) + fibonacci(n - 2),
}
}
Why is this happening?
The type of the iteration variable is implied by the iterator. You can't annotate it with a type, because then the iteration variable would just be of the wrong type.
Instead, consider changing the type of r:
let mut r: Range<u64> = Range{start:0,end: _n};
Other notes:
The declaration let mut fib:u32 = 0; is useless; this variable is never used. (for fib creates a new variable named fib scoped to the loop.)
You will have to update fn fibonacci(n: u32) -> u32 to accept and return u64.
You probably should also update _n to be u64.
Consider running your code through rustfmt to fix the wildly inconsistent indentation.

Why doesn't converting a raw pointer to a u8 into a raw pointer to an array of 8 booleans print the right result?

I'm experimenting with raw pointers in Rust. I have the following code:
fn main() {
let mut t: u8 = 0;
let addr = &mut t as *mut u8 as usize;
let x = addr as *mut [bool; 8];
let y = addr as *mut u8;
unsafe {
*y = 0b10101010;
println!("{:?} {:b}", *x, *y);
}
}
It produces the following output: [true, true, true, true, true, true, true, false] 10101010
while I expect it to print [true, false, true, false, true, false, true, false] 10101010.
What is going on? Isn't bool array stored bit by bit?
The behavior of that program is undefined (so the output is meaningless). From Miri:
error: Undefined Behavior: memory access failed: pointer must be in-bounds at offset 8, but is outside bounds of alloc1381 which has size 1
--> src/main.rs:11:31
|
11 | println!("{:?} {:b}", *x, *y);
| ^^ memory access failed: pointer must be in-bounds at offset 8, but is outside bounds of alloc1381 which has size 1
|
= help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
= help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
Boolean arrays are stored byte-by-byte, not bit-by-bit. Use the bitvec or bitfield crate if you want bit-by-bit storage. There is no way for a pointer to point to an individual bit: pointers always point to bytes (pointers to bits aren't supported by basically any ISAs). bools are 1 byte long, and cannot safely have any value other than 0_u8 or 1_u8.

Assign reference to array of different sizes

I have a function which selects a different array based on whether a boolean is set to true or false, similar to the following:
const V1: [u8; 2] = [1,2];
const V2: [u8; 4] = [1,2,3,4];
fn test(b: bool) {
let v = if b { &V1 } else { &V2 };
}
fn main() {
test(false);
}
However I get the following error:
error[E0308]: `if` and `else` have incompatible types
--> src/main.rs:5:33
|
5 | let v = if b { &V1 } else { &V2 };
| --- ^^^ expected an array with a fixed size of 2 elements, found one with 4 elements
| |
| expected because of this
|
= note: expected type `&[u8; 2]`
found reference `&[u8; 4]`
I tried storing the constants as vectors, but to_vec cannot be used for constants.
An alternative would be to copy the array into a vector inside test, but I'd rather not have to make copies every time.
Is there a way to do this without copying the array every whenever the function is called?
The answer is to use slices, but unfortunately the rust type inference isn't clever enough to realize that. If you annotate the type of v explicitly as an &[u8] then everything should compile.
const V1: [u8; 2] = [1,2];
const V2: [u8; 4] = [1,2,3,4];
fn test(b: bool) {
let v: &[u8] = if b { &V1 } else { &V2 };
}
fn main() {
test(false);
}
Rust must know the type and lifetime of all variables at compile time; in this case, you are using the [u8] type, which is a slice type for u8 elements. Slices are stored as a reference to the first element as well as the number of elements. For example, your V1 slice stores u8 elements and there are 2 of them and V2 stores u8 elements and there are 4 of them. But note that these two are not the same type because of the difference in the number of elements.
So, if you'd like to return a borrowed value of one of the two slices (V1 or V2) you are able to do so as long as the compiler has two pieces of information; the type and their lifetime. We know that the compiler can figure out the type of both V1 and V2 since it is explicitly declared and they both live in static memory (data is part of the program source), so all we have to do is say that we are returning a reference to a slice (borrow) of u8s and they will be around for as long as the program is running (static lifetime). And even though V1 and V2 are not the same type, they look the same when you borrow them since all we are saying is that the return value references a bunch of u8 elements and we leave it up to the compiler to make sure it knows the number of elements for each at compile time. Check out the working example below.
const V1: [u8; 2] = [1,2];
const V2: [u8; 4] = [1,2,3,4];
fn test(b: bool) -> &'static [u8] {
let v: &[u8] = if b { &V1 } else { &V2 };
v
}
fn main() {
println!("{:?}", test(false));
}
As a final note, when attempting to solve these problems, don't be afraid to make mistakes; the compiler is actually quite friendly and very helpful when trying to figure out what to do next as shown in the error message below.
|
6 | fn test(b: bool) -> &[u8] {
| ^ help: consider giving it an explicit bounded or 'static lifetime: `&'static`
|
= help: this function's return type contains a borrowed value with an elided lifetime, but the lifetime cannot be derived from the arguments

Where is Rust storing all these bytes?

In trying to understand how stack memory works, I wrote the following code to display addresses of where data gets stored:
fn main() {
let a = "0123456789abcdef0";
let b = "123456789abcdef01";
let c = "23456789abcdef012";
println!("{:p} {}", &a, a.len());
println!("{:p} {}", &b, b.len());
println!("{:p} {}", &c, c.len());
}
The output is:
0x7fff288a5448 17
0x7fff288a5438 17
0x7fff288a5428 17
It implies that all 17 bytes are stored in a space of 16 bytes, which can't be right. My one guess is that there's some optimization happening, but I get the same results even when I build with --opt-level 0.
The equivalent C seems to do the right thing:
#include <stdio.h>
#include <string.h>
int main() {
char a[] = "0123456789abcdef";
char b[] = "123456789abcdef0";
char c[] = "23456789abcdef01";
printf("%p %zu\n", &a, strlen(a) + 1);
printf("%p %zu\n", &b, strlen(b) + 1);
printf("%p %zu\n", &c, strlen(c) + 1);
return 0;
}
Output:
0x7fff5837b440 17
0x7fff5837b420 17
0x7fff5837b400 17
String literals "..." are stored in static memory, and the variables a, b, c are just (fat) pointers to them. They have type &str, which has the following layout:
struct StrSlice {
data: *const u8,
length: uint
}
where the data field points at the sequence of bytes that form the text, and the length field says how many bytes there are.
On a 64-bit platform this is 16-bytes (and on a 32-bit platform, 8 bytes). The real equivalent in C (ignoring null termination vs. stored length) would be storing into a const char* instead of a char[], changing the C to this prints:
0x7fff21254508 17
0x7fff21254500 17
0x7fff212544f8 17
i.e. the pointers are 8 bytes apart.
You can check these low-level details using --emit=asm or --emit=llvm-ir, or clicking the corresponding button on the playpen (possibly adjusting the optimisation level too). E.g.
fn main() {
let a = "0123456789abcdef0";
}
compiled with --emit=llvm-ir and no optimisations gives (with my trimming and annotations):
%str_slice = type { i8*, i64 }
;; global constant with the string's text
#str1042 = internal constant [17 x i8] c"0123456789abcdef0"
; Function Attrs: uwtable
define internal void #_ZN4main20h55efe3c71b4bb8f4eaaE() unnamed_addr #0 {
entry-block:
;; create stack space for the `a` variable
%a = alloca %str_slice
;; get a pointer to the first element of the `a` struct (`data`)...
%0 = getelementptr inbounds %str_slice* %a, i32 0, i32 0
;; ... and store the pointer to the string data in it
store i8* getelementptr inbounds ([17 x i8]* #str1042, i32 0, i32 0), i8** %0
;; get a pointer to the second element of the `a` struct (`length`)...
%1 = getelementptr inbounds %str_slice* %a, i32 0, i32 1
;; ... and store the length of the string (17) in it.
store i64 17, i64* %1
ret void
}

Resources