What is the equivalent of a safe memset for slices? - rust

In many cases, I need to clear areas of buffers or set a slice to certain value. What is the native recommended way of doing this?
This is invalid Rust, but I would like to do something similar to this:
let mut some_buffer = vec![0u8; 100];
buffer[10..20].set(0xFF)
I could use a for loop but I have the feeling I am missing something given that I am new to Rust.
In C++, I would do something like:
std::array<int,6> foobar;
foobar.fill(5);
In Python, it would be similar:
tmp = np.zeros(10)
tmp[3:6]=2

You aren't the only one. A feature request / RFC exists for the same thing:
Safe memset for slices #2067
However, you are putting the cart before the horse. Do you really care that it calls memset? I would guess not, just that it's efficient. A big draw of Rust is that the compiler can "throw away" many abstractions at build time. For example, why call a function when some CPU instructions will do the same thing?
pub fn thing(buffer: &mut [u8]) {
for i in &mut buffer[10..20] { *i = 42 }
}
playground::thing:
pushq %rax
cmpq $19, %rsi
jbe .LBB0_1
movabsq $3038287259199220266, %rax
movq %rax, 10(%rdi)
movw $10794, 18(%rdi)
popq %rax
retq
.LBB0_1:
movl $20, %edi
callq core::slice::slice_index_len_fail#PLT
ud2
pub fn thing(buffer: &mut [u8]) {
for i in &mut buffer[10..200] { *i = 99 }
}
.LCPI0_0:
.zero 16,99
playground::thing:
pushq %rax
cmpq $199, %rsi
jbe .LBB0_1
movaps .LCPI0_0(%rip), %xmm0
movups %xmm0, 184(%rdi)
movups %xmm0, 170(%rdi)
movups %xmm0, 154(%rdi)
movups %xmm0, 138(%rdi)
movups %xmm0, 122(%rdi)
movups %xmm0, 106(%rdi)
movups %xmm0, 90(%rdi)
movups %xmm0, 74(%rdi)
movups %xmm0, 58(%rdi)
movups %xmm0, 42(%rdi)
movups %xmm0, 26(%rdi)
movups %xmm0, 10(%rdi)
popq %rax
retq
.LBB0_1:
movl $200, %edi
callq core::slice::slice_index_len_fail#PLT
ud2
As kazemakase points out, when the set region becomes "big enough", the optimizer switches to using memset instead of inlining the instructions:
pub fn thing(buffer: &mut [u8]) {
for i in &mut buffer[11..499] { *i = 240 }
}
playground::thing:
pushq %rax
cmpq $498, %rsi
jbe .LBB0_1
addq $11, %rdi
movl $240, %esi
movl $488, %edx
callq memset#PLT
popq %rax
retq
.LBB0_1:
movl $499, %edi
callq core::slice::slice_index_len_fail#PLT
ud2
You can wrap this function in an extension trait if you'd like:
trait FillExt<T> {
fn fill(&mut self, v: T);
}
impl FillExt<u8> for [u8] {
fn fill(&mut self, v: u8) {
for i in self {
*i = v
}
}
}
pub fn thing(buffer: &mut [u8], val: u8) {
buffer[10..20].fill(val)
}
See also:
Creating a vector of zeros for a specific size
Efficiently insert or replace multiple elements in the middle or at the beginning of a Vec?

As of Rust 1.50.0, released on 2021-02-11, slice::fill is now stable, meaning your example now works if you change the function name:
let mut buffer = vec![0u8; 20];
buffer[5..10].fill(0xFF);
println!("{:?}", buffer);
Will print [0, 0, 0, 0, 0, 255, 255, 255, 255, 255, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

Related

Optimize Rust function for certain likely parameter values

Is it possible to ask the compiler to optimize the code if I know that the domain of a certain parameter will likely be among a few select values?
eg.
// x will be within 1..10
fn foo(x: u32, y: u32) -> u32 {
// some logic
}
The above function should be compiled into
fn foo(x: u32, y: u32) -> u32 {
match x {
1 => foo1(y), // foo1 is generated by the compiler from foo, optimized for when x == 1
2 => foo2(y), // foo2 is generated by the compiler from foo, optimized for when x == 2
...
10 => foo10(y),
_ => foo_default(x, y) // unoptimized foo logic
}
}
I would like the compiler to generate the above rewrite based on some hint.
You can put the logic in a #[inline(always)] foo_impl(), then call it with the values you expect:
// x will be within 1..10
#[inline(always)]
fn foo_impl(x: u32, y: u32) -> u32 {
// some logic
}
fn foo(x: u32, y: u32) -> u32 {
match x {
1 => foo_impl(1, y),
2 => foo_impl(2, y),
// ...
10 => foo_impl(10, y),
_ => foo_impl(x, y),
}
}
Because of the #[inline(always)] the compiler will inline all foo_impl() calls then use the constants to optimize the call. Nothing is guaranteed, but it should be pretty reliable (haven't tested though).
Make sure to benchmark: this can actually be a regression due to code bloat.
Let's use this toy example:
fn foo(x: u32, y: u32) -> u32 {
x * y
}
movl %edi, %eax
imull %esi, %eax
retq
But in your application, you know that x is very likely to be 2 every time. We can communicate that to the compiler with std::intrinsics::likely:
#![feature(core_intrinsics)]
fn foo(x: u32, y: u32) -> u32 {
if std::intrinsics::likely(x == 2) {
foo_impl(x, y)
} else {
foo_impl(x, y)
}
}
fn foo_impl(x: u32, y: u32) -> u32 {
x * y
}
leal (%rsi,%rsi), %eax
imull %edi, %esi
cmpl $2, %edi
cmovnel %esi, %eax
retq
DISCLAIMER: I'm not experienced enough to know if this is a good optimization or not, just that the hint changed the output.
Unfortunately while I think this is the clearest syntax, std::intrinsics are not stabilized. Fortunately though, we can get the same behavior using the #[cold] attribute, which is available on stable, that can convey your desire to the compiler:
fn foo(x: u32, y: u32) -> u32 {
if x == 2 {
foo_impl(x, y)
} else {
foo_impl_unlikely(x, y)
}
}
fn foo_impl(x: u32, y: u32) -> u32 {
x * y
}
#[cold]
fn foo_impl_unlikely(x: u32, y: u32) -> u32 {
foo_impl(x, y)
}
leal (%rsi,%rsi), %eax
imull %edi, %esi
cmpl $2, %edi
cmovnel %esi, %eax
retq
I'm skeptical whether applying this to your use-case will actually yield the transformation you propose. I'd think there'd have to be a significant impact on const-propagation and even a willingness from the compiler to optimise x < 10 into a branch of ten constants, but using the hints above will let it decide what is best.
But sometimes, you know what is best more than the compiler and can force the const-propagation by applying the transformation manually: as you've done in your original example or a different way in #ChayimFriedman's answer.

Mutating elements inside iterator

I would like to iterate over some elements inside a vector contained as a member in a struct called Test. The idea is to mutate Test independently in each iteration and signify success if some external logic on each mutated Test is successful. For simplicity, the mutation is just changing the vector element to 123u8. The problem I have is not being able to change the elements inside a loop. I have two solutions which I though would give the same answer:
#[derive(Debug)]
struct Test {
vec: Vec<u8>
}
impl Test {
fn working_solution(&mut self, number: u8) -> bool {
self.vec[0] = number;
self.vec[1] = number;
self.vec[2] = number;
true
}
fn non_working_solution(&mut self, number: u8) -> bool {
self.vec.iter().all(|mut x| {
x = &number; // mutation
true // external logic
})
}
}
fn main() {
let vec = vec![0u8,1u8,2u8];
let mut test = Test { vec };
println!("Original: {:?}", test);
test.working_solution(123u8);
println!("Altered: {:?}", test);
let vec = vec![0u8,1u8,2u8];
let mut test = Test { vec };
println!("Original: {:?}", test);
test.non_working_solution(123u8);
println!("Altered: {:?}", test);
}
(playground)
Output:
Original: Test { vec: [0, 1, 2] }
Altered: Test { vec: [123, 123, 123] }
Original: Test { vec: [0, 1, 2] }
Altered: Test { vec: [0, 1, 2] }
Expected output:
Original: Test { vec: [0, 1, 2] }
Altered: Test { vec: [123, 123, 123] }
Original: Test { vec: [0, 1, 2] }
Altered: Test { vec: [123, 123, 123] }
How do I change a member of self when using an iterator?
As you can see in the documentation, ìter takes a &self, that is, whatever you do, you can not modify self (you can create a modified copy, but this is not the point of what you want to do here).
Instead, you can use the method iter_mut, which is more or less the same, but takes a &mut self, i.e., you can modify it.
An other side remark, you don't want to use all, which is used to check if a property is true on all elements (hence the bool returned), instead, you want to use for_each which applies a function to all elements.
fn non_working_solution(&mut self, number: u8) {
self.vec.iter_mut().for_each(|x| {
*x = number; // mutation
})
}
(Playground)
As Stargateur mentioned in the comments, you can also use a for loop:
fn non_working_solution(&mut self, number: u8) {
for x in self.vec.iter_mut() {
*x = number
}
}
Since Rust 1.50, there is a dedicated method for filling a slice with a value — [_]::fill:
self.vec.fill(number)
In this case, fill seems to generate less code than a for loop or for_each:
(Compiler Explorer)
pub fn f(slice: &mut [u8], number: u8) {
slice.fill(number);
}
pub fn g(slice: &mut [u8], number: u8) {
for x in slice {
*x = number;
}
}
pub fn h(slice: &mut [u8], number: u8) {
slice
.iter_mut()
.for_each(|x| *x = number);
}
example::f:
mov rax, rsi
mov esi, edx
mov rdx, rax
jmp qword ptr [rip + memset#GOTPCREL]
example::g:
test rsi, rsi
je .LBB1_2
push rax
mov rax, rsi
movzx esi, dl
mov rdx, rax
call qword ptr [rip + memset#GOTPCREL]
add rsp, 8
.LBB1_2:
ret
example::h:
test rsi, rsi
je .LBB2_1
mov rax, rsi
movzx esi, dl
mov rdx, rax
jmp qword ptr [rip + memset#GOTPCREL]
.LBB2_1:
ret

rustc generate incorrect assembly when inline

I'm trying to inline some functions but the assembly code rustc generate is incorrect.
main.rs:
#[derive(Copy, Clone, PartialOrd, PartialEq, Eq)]
pub struct MyType1(usize);
impl MyType1 {
#[inline(always)]
pub fn my_func (&self) -> usize { *self / 4096 }
}
impl core::ops::Div<usize> for MyType1 {
type Output = usize;
fn div (self, other: usize) -> usize { self.0 / other }
}
pub struct MyType2 {
pub data1: MyType1,
pub data2: usize,
}
static STATIC_VAR: MyType2 = MyType2 {
data1: MyType1(0),
data2: 0,
};
pub fn main () {
let my_static_var = unsafe { &mut *(&STATIC_VAR as *const MyType2 as *mut MyType2) };
my_static_var.data1 = MyType1(0x1a000);
my_static_var.data2 = my_static_var.data1.my_func ();
}
Assembly code of main function:
; var int64_t var_10h # rsp+0x8
│ ; var int64_t var_8h # rsp+0x10
│ 0x00004300 4883ec18 sub rsp, 0x18
│ 0x00004304 31c0 xor eax, eax
│ 0x00004306 89c7 mov edi, eax
│ 0x00004308 48c744241000. mov qword [var_8h], 0x1a000 ; [0x1a000:8]=0xd5e9fffffa7b84
│ 0x00004311 488b4c2410 mov rcx, qword [var_8h]
│ 0x00004316 48890d131d02. mov qword [obj.main::STATIC_VAR::h1456afe986ab6f8a], rcx ; [0x26030:8]=0
│ 0x0000431d be00100000 mov esi, 0x1000
│ 0x00004322 e889ffffff call sym <main::MyType1 as core::ops::arith::Div<usize>>::div::he4115301add5ef17 ; sym._main::MyType1_as_core::ops::arith::Div_usize__::div::he4115301add5ef17
│ 0x00004327 4889442408 mov qword [var_10h], rax
│ 0x0000432c 488b442408 mov rax, qword [var_10h]
│ 0x00004331 488905001d02. mov qword [0x00026038], rax ; [0x26038:8]=0
│ 0x00004338 4883c418 add rsp, 0x18
└ 0x0000433c c3 ret
As you can see, main call MyType1::Div function with 2 params, 0 and 0x1000, with not correct. It should be *self/4096.
build command: rustc main.rs
rustc --version: rustc 1.43.1 (8d69840ab 2020-05-04)
As #Frxstrem and #Stargateur point out. Making mutable ref from immutable is undefined behavior and anything can go wrong. So I need to make STATIC_VAR mutable:
static mut STATIC_VAR: MyType2 = MyType2 {
data1: MyType1(0),
data2: 0,
};
and change the line:
let my_static_var = unsafe { &mut *(&STATIC_VAR as *const MyType2 as *mut MyType2) };
to:
let my_static_var = unsafe { &mut STATIC_VAR };

How to execute raw instructions from a memory buffer in Rust?

I'm attempting to make a buffer of memory executable, then execute it in Rust. I've gotten all the way until I need to cast the raw executable bytes as code/instructions. You can see a working example in C below.
Extra details:
Rust 1.34
Linux
CC 8.2.1
unsigned char code[] = {
0x55, // push %rbp
0x48, 0x89, 0xe5, // mov %rsp,%rbp
0xb8, 0x37, 0x00, 0x00, 0x00, // mov $0x37,%eax
0xc9, // leaveq
0xc3 // retq
};
void reflect(const unsigned char *code) {
void *buf;
/* copy code to executable buffer */
buf = mmap(0, sizeof(code), PROT_READ|PROT_WRITE|PROT_EXEC,MAP_PRIVATE|MAP_ANON,-1,0);
memcpy(buf, code, sizeof(code));
((void (*) (void))buf)();
}
extern crate mmap;
use mmap::{MapOption, MemoryMap};
unsafe fn reflect(instructions: &[u8]) {
let map = MemoryMap::new(
instructions.len(),
&[
MapOption::MapAddr(0 as *mut u8),
MapOption::MapOffset(0),
MapOption::MapFd(-1),
MapOption::MapReadable,
MapOption::MapWritable,
MapOption::MapExecutable,
MapOption::MapNonStandardFlags(libc::MAP_ANON),
MapOption::MapNonStandardFlags(libc::MAP_PRIVATE),
],
)
.unwrap();
std::ptr::copy(instructions.as_ptr(), map.data(), instructions.len());
// How to cast into extern "C" fn() ?
}
Use mem::transmute to cast a raw pointer to a function pointer type.
use std::mem;
let func: unsafe extern "C" fn() = mem::transmute(map.data());
func();

Does Rust optimize passing temporary structures by value?

Let's say I have a vector of structures in Rust. Structures are quite big. When I want to insert a new one, I write the code like this:
my_vec.push(MyStruct {field1: value1, field2: value2, ... });
The push definition is
fn push(&mut self, value: T)
which means the value is passed by value. I wonder if Rust creates a temporary object first then does a copy to the push function or does it optimize the code so that no temporary objects are created and copied?
Let's see. This program:
struct LotsOfBytes {
bytes: [u8; 1024]
}
#[inline(never)]
fn consume(mut lob: LotsOfBytes) {
}
fn main() {
let lob = LotsOfBytes { bytes: [0; 1024] };
consume(lob);
}
Compiles to the following LLVM IR code:
%LotsOfBytes = type { [1024 x i8] }
; Function Attrs: noinline nounwind uwtable
define internal fastcc void #_ZN7consume20hf098deecafa4b74bkaaE(%LotsOfBytes* noalias nocapture dereferenceable(1024)) unnamed_addr #0 {
entry-block:
%1 = getelementptr inbounds %LotsOfBytes* %0, i64 0, i32 0, i64 0
tail call void #llvm.lifetime.end(i64 1024, i8* %1)
ret void
}
; Function Attrs: nounwind uwtable
define internal void #_ZN4main20hf3cbebd3154c5390qaaE() unnamed_addr #2 {
entry-block:
%lob = alloca %LotsOfBytes, align 8
%lob1 = getelementptr inbounds %LotsOfBytes* %lob, i64 0, i32 0, i64 0
%arg = alloca %LotsOfBytes, align 8
%0 = getelementptr inbounds %LotsOfBytes* %lob, i64 0, i32 0, i64 0
call void #llvm.lifetime.start(i64 1024, i8* %0)
call void #llvm.memset.p0i8.i64(i8* %lob1, i8 0, i64 1024, i32 8, i1 false)
%1 = getelementptr inbounds %LotsOfBytes* %arg, i64 0, i32 0, i64 0
call void #llvm.lifetime.start(i64 1024, i8* %1)
call void #llvm.memcpy.p0i8.p0i8.i64(i8* %1, i8* %0, i64 1024, i32 8, i1 false)
call fastcc void #_ZN7consume20hf098deecafa4b74bkaaE(%LotsOfBytes* noalias nocapture dereferenceable(1024) %arg)
call void #llvm.lifetime.end(i64 1024, i8* %1)
call void #llvm.lifetime.end(i64 1024, i8* %0)
ret void
}
This line is interesting in particular:
call fastcc void #_ZN7consume20hf098deecafa4b74bkaaE(%LotsOfBytes* noalias nocapture dereferenceable(1024) %arg)
If I understand correctly, this means that consume is called with a pointer to LotsOfBytes, so yes, rustc optimizes passing big structures by value.

Resources