rustc generate incorrect assembly when inline

rustc generate incorrect assembly when inline - rust

I'm trying to inline some functions but the assembly code rustc generate is incorrect.
main.rs:
#[derive(Copy, Clone, PartialOrd, PartialEq, Eq)]
pub struct MyType1(usize);
impl MyType1 {
#[inline(always)]
pub fn my_func (&self) -> usize { *self / 4096 }
}
impl core::ops::Div<usize> for MyType1 {
type Output = usize;
fn div (self, other: usize) -> usize { self.0 / other }
}
pub struct MyType2 {
pub data1: MyType1,
pub data2: usize,
}
static STATIC_VAR: MyType2 = MyType2 {
data1: MyType1(0),
data2: 0,
};
pub fn main () {
let my_static_var = unsafe { &mut *(&STATIC_VAR as *const MyType2 as *mut MyType2) };
my_static_var.data1 = MyType1(0x1a000);
my_static_var.data2 = my_static_var.data1.my_func ();
}
Assembly code of main function:
; var int64_t var_10h # rsp+0x8
│ ; var int64_t var_8h # rsp+0x10
│ 0x00004300 4883ec18 sub rsp, 0x18
│ 0x00004304 31c0 xor eax, eax
│ 0x00004306 89c7 mov edi, eax
│ 0x00004308 48c744241000. mov qword [var_8h], 0x1a000 ; [0x1a000:8]=0xd5e9fffffa7b84
│ 0x00004311 488b4c2410 mov rcx, qword [var_8h]
│ 0x00004316 48890d131d02. mov qword [obj.main::STATIC_VAR::h1456afe986ab6f8a], rcx ; [0x26030:8]=0
│ 0x0000431d be00100000 mov esi, 0x1000
│ 0x00004322 e889ffffff call sym <main::MyType1 as core::ops::arith::Div<usize>>::div::he4115301add5ef17 ; sym._main::MyType1_as_core::ops::arith::Div_usize__::div::he4115301add5ef17
│ 0x00004327 4889442408 mov qword [var_10h], rax
│ 0x0000432c 488b442408 mov rax, qword [var_10h]
│ 0x00004331 488905001d02. mov qword [0x00026038], rax ; [0x26038:8]=0
│ 0x00004338 4883c418 add rsp, 0x18
└ 0x0000433c c3 ret
As you can see, main call MyType1::Div function with 2 params, 0 and 0x1000, with not correct. It should be *self/4096.
build command: rustc main.rs
rustc --version: rustc 1.43.1 (8d69840ab 2020-05-04)

As #Frxstrem and #Stargateur point out. Making mutable ref from immutable is undefined behavior and anything can go wrong. So I need to make STATIC_VAR mutable:
static mut STATIC_VAR: MyType2 = MyType2 {
data1: MyType1(0),
data2: 0,
};
and change the line:
let my_static_var = unsafe { &mut *(&STATIC_VAR as *const MyType2 as *mut MyType2) };
to:
let my_static_var = unsafe { &mut STATIC_VAR };

Related

Optimize Rust function for certain likely parameter values

Is it possible to ask the compiler to optimize the code if I know that the domain of a certain parameter will likely be among a few select values?
eg.
// x will be within 1..10
fn foo(x: u32, y: u32) -> u32 {
// some logic
}
The above function should be compiled into
fn foo(x: u32, y: u32) -> u32 {
match x {
1 => foo1(y), // foo1 is generated by the compiler from foo, optimized for when x == 1
2 => foo2(y), // foo2 is generated by the compiler from foo, optimized for when x == 2
...
10 => foo10(y),
_ => foo_default(x, y) // unoptimized foo logic
}
}
I would like the compiler to generate the above rewrite based on some hint.

You can put the logic in a #[inline(always)] foo_impl(), then call it with the values you expect:
// x will be within 1..10
#[inline(always)]
fn foo_impl(x: u32, y: u32) -> u32 {
// some logic
}
fn foo(x: u32, y: u32) -> u32 {
match x {
1 => foo_impl(1, y),
2 => foo_impl(2, y),
// ...
10 => foo_impl(10, y),
_ => foo_impl(x, y),
}
}
Because of the #[inline(always)] the compiler will inline all foo_impl() calls then use the constants to optimize the call. Nothing is guaranteed, but it should be pretty reliable (haven't tested though).
Make sure to benchmark: this can actually be a regression due to code bloat.

Let's use this toy example:
fn foo(x: u32, y: u32) -> u32 {
x * y
}
movl %edi, %eax
imull %esi, %eax
retq
But in your application, you know that x is very likely to be 2 every time. We can communicate that to the compiler with std::intrinsics::likely:
#![feature(core_intrinsics)]
fn foo(x: u32, y: u32) -> u32 {
if std::intrinsics::likely(x == 2) {
foo_impl(x, y)
} else {
foo_impl(x, y)
}
}
fn foo_impl(x: u32, y: u32) -> u32 {
x * y
}
leal (%rsi,%rsi), %eax
imull %edi, %esi
cmpl $2, %edi
cmovnel %esi, %eax
retq
DISCLAIMER: I'm not experienced enough to know if this is a good optimization or not, just that the hint changed the output.
Unfortunately while I think this is the clearest syntax, std::intrinsics are not stabilized. Fortunately though, we can get the same behavior using the #[cold] attribute, which is available on stable, that can convey your desire to the compiler:
fn foo(x: u32, y: u32) -> u32 {
if x == 2 {
foo_impl(x, y)
} else {
foo_impl_unlikely(x, y)
}
}
fn foo_impl(x: u32, y: u32) -> u32 {
x * y
}
#[cold]
fn foo_impl_unlikely(x: u32, y: u32) -> u32 {
foo_impl(x, y)
}
leal (%rsi,%rsi), %eax
imull %edi, %esi
cmpl $2, %edi
cmovnel %esi, %eax
retq
I'm skeptical whether applying this to your use-case will actually yield the transformation you propose. I'd think there'd have to be a significant impact on const-propagation and even a willingness from the compiler to optimise x < 10 into a branch of ten constants, but using the hints above will let it decide what is best.
But sometimes, you know what is best more than the compiler and can force the const-propagation by applying the transformation manually: as you've done in your original example or a different way in #ChayimFriedman's answer.

Mutating elements inside iterator

I would like to iterate over some elements inside a vector contained as a member in a struct called Test. The idea is to mutate Test independently in each iteration and signify success if some external logic on each mutated Test is successful. For simplicity, the mutation is just changing the vector element to 123u8. The problem I have is not being able to change the elements inside a loop. I have two solutions which I though would give the same answer:
#[derive(Debug)]
struct Test {
vec: Vec<u8>
}
impl Test {
fn working_solution(&mut self, number: u8) -> bool {
self.vec[0] = number;
self.vec[1] = number;
self.vec[2] = number;
true
}
fn non_working_solution(&mut self, number: u8) -> bool {
self.vec.iter().all(|mut x| {
x = &number; // mutation
true // external logic
})
}
}
fn main() {
let vec = vec![0u8,1u8,2u8];
let mut test = Test { vec };
println!("Original: {:?}", test);
test.working_solution(123u8);
println!("Altered: {:?}", test);
let vec = vec![0u8,1u8,2u8];
let mut test = Test { vec };
println!("Original: {:?}", test);
test.non_working_solution(123u8);
println!("Altered: {:?}", test);
}
(playground)
Output:
Original: Test { vec: [0, 1, 2] }
Altered: Test { vec: [123, 123, 123] }
Original: Test { vec: [0, 1, 2] }
Altered: Test { vec: [0, 1, 2] }
Expected output:
Original: Test { vec: [0, 1, 2] }
Altered: Test { vec: [123, 123, 123] }
Original: Test { vec: [0, 1, 2] }
Altered: Test { vec: [123, 123, 123] }
How do I change a member of self when using an iterator?

As you can see in the documentation, ìter takes a &self, that is, whatever you do, you can not modify self (you can create a modified copy, but this is not the point of what you want to do here).
Instead, you can use the method iter_mut, which is more or less the same, but takes a &mut self, i.e., you can modify it.
An other side remark, you don't want to use all, which is used to check if a property is true on all elements (hence the bool returned), instead, you want to use for_each which applies a function to all elements.
fn non_working_solution(&mut self, number: u8) {
self.vec.iter_mut().for_each(|x| {
*x = number; // mutation
})
}
(Playground)
As Stargateur mentioned in the comments, you can also use a for loop:
fn non_working_solution(&mut self, number: u8) {
for x in self.vec.iter_mut() {
*x = number
}
}

Since Rust 1.50, there is a dedicated method for filling a slice with a value — [_]::fill:
self.vec.fill(number)
In this case, fill seems to generate less code than a for loop or for_each:
(Compiler Explorer)
pub fn f(slice: &mut [u8], number: u8) {
slice.fill(number);
}
pub fn g(slice: &mut [u8], number: u8) {
for x in slice {
*x = number;
}
}
pub fn h(slice: &mut [u8], number: u8) {
slice
.iter_mut()
.for_each(|x| *x = number);
}
example::f:
mov rax, rsi
mov esi, edx
mov rdx, rax
jmp qword ptr [rip + memset#GOTPCREL]
example::g:
test rsi, rsi
je .LBB1_2
push rax
mov rax, rsi
movzx esi, dl
mov rdx, rax
call qword ptr [rip + memset#GOTPCREL]
add rsp, 8
.LBB1_2:
ret
example::h:
test rsi, rsi
je .LBB2_1
mov rax, rsi
movzx esi, dl
mov rdx, rax
jmp qword ptr [rip + memset#GOTPCREL]
.LBB2_1:
ret

Get C FILE pointer from bytes::Bytes in Rust

I would like to read a GRIB file downloaded from server using ecCodes library in Rust. However, my current solution results in segmentation fault. The extracted example, replicating the problem, is below.
I download the file using reqwest crate and get the response as Bytes1 using bytes(). To read the file with ecCodes I need to create a codes_handle using codes_grib_handle_new_from_file()2, which as argument requires *FILE usually get from fopen(). However, I would like to skip IO operations. So I figured I could use libc::fmemopen() to get *FILE from Bytes. But when I pass the *mut FILE from fmemopen() to codes_grib_handle_new_from_file() segmentation fault occurs.
I suspect the issue is when I get from Bytes a *mut c_void required by fmemopen(). I figured I can do this like that:
//get a *mut c_void pointer fom Bytes
//file has &Bytes type
let mut buf = BytesMut::from(file.as_ref());
let ptr = buf.as_mut_ptr();
let ptr = ptr as *mut c_void;
Because *mut is required, I create BytesMut from which I can then get mut pointer. I think those conversion are problematic, because in debugger info ptr contains a diffrent memory adress than ptr field of file.
Using *FILE got from libc::fopen() for the same file does not result in segfault. So the problem is somwhere around fmemopen().
The ecCodes library is correctly built (passes all tests and works in C) and linked (the calls in callstack are correct).
The full extracted example:
#![allow(unused)]
#![allow(non_camel_case_types)]
use bytes::{Bytes, BytesMut};
use libc::{c_char, c_void, fmemopen, size_t, FILE};
use reqwest;
use tokio;
// generated by bindgen
#[repr(C)]
#[derive(Debug, Copy, Clone)]
pub struct codes_handle {
_unused: [u8; 0],
}
// generated by bindgen
#[repr(C)]
#[derive(Debug, Copy, Clone)]
pub struct codes_context {
_unused: [u8; 0],
}
#[tokio::main]
async fn main() {
// download the grib file from server
// then get response as bytes
let url = "https://nomads.ncep.noaa.gov/pub/data/nccf/com/gfs/prod/gfs.20210612/00/atmos/gfs.t00z.pgrb2.1p00.f000";
let file = reqwest::get(url).await.unwrap().bytes().await.unwrap();
// get Bytes from *FILE with fmemopen
// file must outlive the pointer so it is borrowed here
let file_handle = open_with_fmemopen(&file);
let grib_handle = open_with_codes(file_handle);
}
pub fn open_with_fmemopen(file: &Bytes) -> *mut FILE {
// size of buffer and mode to be read with
let size = file.len() as size_t;
let mode = "r".as_ptr() as *const c_char;
// get a *mut c_void pointer fom Bytes
let mut buf = BytesMut::from(file.as_ref());
let ptr = buf.as_mut_ptr();
let ptr = ptr as *mut c_void;
// get *FILE with fmemopen
let obj;
unsafe {
obj = fmemopen(ptr, size, mode);
}
obj
}
pub fn open_with_codes(file_handle: *mut FILE) -> *mut codes_handle {
// default context for ecCodes
let context: *mut codes_context = std::ptr::null_mut();
// variable to hold error code
let mut error: i32 = 0;
// get codes_handle from *FILE
let grib_handle;
unsafe {
// segmentation fault occurs here
grib_handle = codes_grib_handle_new_from_file(context, file_handle, &mut error as *mut i32);
}
grib_handle
}
// binding to ecCodes C library
#[link(name = "eccodes")]
extern "C" {
pub fn codes_grib_handle_new_from_file(
c: *mut codes_context,
f: *mut FILE,
error: *mut i32,
) -> *mut codes_handle;
}
And because the example might require considerable effort to set up I also attach the call stack from GDB of the seg fault:
__memmove_avx_unaligned_erms 0x00007f738b415fa6
fmemopen_read 0x00007f738b31c9b4
_IO_new_file_underflow 0x00007f738b31fd51
__GI___underflow 0x00007f738b32142e
__GI___underflow 0x00007f738b32142e
__GI__IO_default_xsgetn 0x00007f738b32142e
__GI__IO_fread 0x00007f738b312493
stdio_read 0x00007f738bb8db37
_read_any 0x00007f738bb8cf1b
read_any 0x00007f738bb8cfa3
_wmo_read_any_from_file_malloc 0x00007f738bb8e6f7
wmo_read_grib_from_file_malloc 0x00007f738bb8e7d7
grib_handle_new_from_file_no_multi 0x00007f738bb872a2
grib_new_from_file 0x00007f738bb8678f
grib_handle_new_from_file 0x00007f738bb85998
codes_grib_handle_new_from_file 0x00007f738bb8532b
example::open_with_codes main.rs:68
example::main::{{closure}} main.rs:34
core::future::from_generator::{{impl}}::poll<generator-0> mod.rs:80
tokio::park::thread::{{impl}}::block_on::{{closure}}<core::future::from_generator::GenFuture<generator-0>> thread.rs:263
tokio::coop::with_budget::{{closure}}<core::task::poll::Poll<()>,closure-0> coop.rs:106
std::thread::local::LocalKey<core::cell::Cell<tokio::coop::Budget>>::try_with<core::cell::Cell<tokio::coop::Budget>,closure-0,core::task::poll::Poll<()>> local.rs:272
std::thread::local::LocalKey<core::cell::Cell<tokio::coop::Budget>>::with<core::cell::Cell<tokio::coop::Budget>,closure-0,core::task::poll::Poll<()>> local.rs:248
tokio::coop::with_budget<core::task::poll::Poll<()>,closure-0> coop.rs:99
tokio::coop::budget<core::task::poll::Poll<()>,closure-0> coop.rs:76
tokio::park::thread::CachedParkThread::block_on<core::future::from_generator::GenFuture<generator-0>> thread.rs:263
tokio::runtime::enter::Enter::block_on<core::future::from_generator::GenFuture<generator-0>> enter.rs:151
tokio::runtime::thread_pool::ThreadPool::block_on<core::future::from_generator::GenFuture<generator-0>> mod.rs:71
tokio::runtime::Runtime::block_on<core::future::from_generator::GenFuture<generator-0>> mod.rs:452
example::main main.rs:34
core::ops::function::FnOnce::call_once<fn(),()> function.rs:227
std::sys_common::backtrace::__rust_begin_short_backtrace<fn(),()> backtrace.rs:125
std::rt::lang_start::{{closure}}<()> rt.rs:66
core::ops::function::impls::{{impl}}::call_once<(),Fn<()>> function.rs:259
std::panicking::try::do_call<&Fn<()>,i32> panicking.rs:379
std::panicking::try<i32,&Fn<()>> panicking.rs:343
std::panic::catch_unwind<&Fn<()>,i32> panic.rs:431
std::rt::lang_start_internal rt.rs:51
std::rt::lang_start<()> rt.rs:65
main 0x0000560f1d93c76c
__libc_start_main 0x00007f738b2bb565
_start 0x0000560f1d935f0e
1 From bytes crate, not std::io
2 grib_handle returned by the function is just an alias of codes_handle

1- Try changing
let mode = "r".as_ptr() as *const c_char;
to
let mode = "r\0".as_ptr() as *const c_char;
Rust's &str is not null-terminated, while you're passing it to C where string literals are expected to be null-terminated.
2- Try the following implementation for open_with_fmemopen:
pub fn open_with_fmemopen(file: &Bytes) -> *mut FILE {
unsafe {
let obj = fmemopen(file.as_ref() as *const _ as _, file.len(), "r\0".as_ptr() as _);
obj
}
}

How to execute raw instructions from a memory buffer in Rust?

I'm attempting to make a buffer of memory executable, then execute it in Rust. I've gotten all the way until I need to cast the raw executable bytes as code/instructions. You can see a working example in C below.
Extra details:
Rust 1.34
Linux
CC 8.2.1
unsigned char code[] = {
0x55, // push %rbp
0x48, 0x89, 0xe5, // mov %rsp,%rbp
0xb8, 0x37, 0x00, 0x00, 0x00, // mov $0x37,%eax
0xc9, // leaveq
0xc3 // retq
};
void reflect(const unsigned char *code) {
void *buf;
/* copy code to executable buffer */
buf = mmap(0, sizeof(code), PROT_READ|PROT_WRITE|PROT_EXEC,MAP_PRIVATE|MAP_ANON,-1,0);
memcpy(buf, code, sizeof(code));
((void (*) (void))buf)();
}
extern crate mmap;
use mmap::{MapOption, MemoryMap};
unsafe fn reflect(instructions: &[u8]) {
let map = MemoryMap::new(
instructions.len(),
&[
MapOption::MapAddr(0 as *mut u8),
MapOption::MapOffset(0),
MapOption::MapFd(-1),
MapOption::MapReadable,
MapOption::MapWritable,
MapOption::MapExecutable,
MapOption::MapNonStandardFlags(libc::MAP_ANON),
MapOption::MapNonStandardFlags(libc::MAP_PRIVATE),
],
)
.unwrap();
std::ptr::copy(instructions.as_ptr(), map.data(), instructions.len());
// How to cast into extern "C" fn() ?
}

Use mem::transmute to cast a raw pointer to a function pointer type.
use std::mem;
let func: unsafe extern "C" fn() = mem::transmute(map.data());
func();

What is the equivalent of a safe memset for slices?

In many cases, I need to clear areas of buffers or set a slice to certain value. What is the native recommended way of doing this?
This is invalid Rust, but I would like to do something similar to this:
let mut some_buffer = vec![0u8; 100];
buffer[10..20].set(0xFF)
I could use a for loop but I have the feeling I am missing something given that I am new to Rust.
In C++, I would do something like:
std::array<int,6> foobar;
foobar.fill(5);
In Python, it would be similar:
tmp = np.zeros(10)
tmp[3:6]=2

You aren't the only one. A feature request / RFC exists for the same thing:
Safe memset for slices #2067
However, you are putting the cart before the horse. Do you really care that it calls memset? I would guess not, just that it's efficient. A big draw of Rust is that the compiler can "throw away" many abstractions at build time. For example, why call a function when some CPU instructions will do the same thing?
pub fn thing(buffer: &mut [u8]) {
for i in &mut buffer[10..20] { *i = 42 }
}
playground::thing:
pushq %rax
cmpq $19, %rsi
jbe .LBB0_1
movabsq $3038287259199220266, %rax
movq %rax, 10(%rdi)
movw $10794, 18(%rdi)
popq %rax
retq
.LBB0_1:
movl $20, %edi
callq core::slice::slice_index_len_fail#PLT
ud2
pub fn thing(buffer: &mut [u8]) {
for i in &mut buffer[10..200] { *i = 99 }
}
.LCPI0_0:
.zero 16,99
playground::thing:
pushq %rax
cmpq $199, %rsi
jbe .LBB0_1
movaps .LCPI0_0(%rip), %xmm0
movups %xmm0, 184(%rdi)
movups %xmm0, 170(%rdi)
movups %xmm0, 154(%rdi)
movups %xmm0, 138(%rdi)
movups %xmm0, 122(%rdi)
movups %xmm0, 106(%rdi)
movups %xmm0, 90(%rdi)
movups %xmm0, 74(%rdi)
movups %xmm0, 58(%rdi)
movups %xmm0, 42(%rdi)
movups %xmm0, 26(%rdi)
movups %xmm0, 10(%rdi)
popq %rax
retq
.LBB0_1:
movl $200, %edi
callq core::slice::slice_index_len_fail#PLT
ud2
As kazemakase points out, when the set region becomes "big enough", the optimizer switches to using memset instead of inlining the instructions:
pub fn thing(buffer: &mut [u8]) {
for i in &mut buffer[11..499] { *i = 240 }
}
playground::thing:
pushq %rax
cmpq $498, %rsi
jbe .LBB0_1
addq $11, %rdi
movl $240, %esi
movl $488, %edx
callq memset#PLT
popq %rax
retq
.LBB0_1:
movl $499, %edi
callq core::slice::slice_index_len_fail#PLT
ud2
You can wrap this function in an extension trait if you'd like:
trait FillExt<T> {
fn fill(&mut self, v: T);
}
impl FillExt<u8> for [u8] {
fn fill(&mut self, v: u8) {
for i in self {
*i = v
}
}
}
pub fn thing(buffer: &mut [u8], val: u8) {
buffer[10..20].fill(val)
}
See also:
Creating a vector of zeros for a specific size
Efficiently insert or replace multiple elements in the middle or at the beginning of a Vec?

As of Rust 1.50.0, released on 2021-02-11, slice::fill is now stable, meaning your example now works if you change the function name:
let mut buffer = vec![0u8; 20];
buffer[5..10].fill(0xFF);
println!("{:?}", buffer);
Will print [0, 0, 0, 0, 0, 255, 255, 255, 255, 255, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

rustc generate incorrect assembly when inline - rust

Related

Optimize Rust function for certain likely parameter values

Mutating elements inside iterator

Get C FILE pointer from bytes::Bytes in Rust

How to execute raw instructions from a memory buffer in Rust?

What is the equivalent of a safe memset for slices?

Categories

Resources