Mutating elements inside iterator - rust

I would like to iterate over some elements inside a vector contained as a member in a struct called Test. The idea is to mutate Test independently in each iteration and signify success if some external logic on each mutated Test is successful. For simplicity, the mutation is just changing the vector element to 123u8. The problem I have is not being able to change the elements inside a loop. I have two solutions which I though would give the same answer:
#[derive(Debug)]
struct Test {
vec: Vec<u8>
}
impl Test {
fn working_solution(&mut self, number: u8) -> bool {
self.vec[0] = number;
self.vec[1] = number;
self.vec[2] = number;
true
}
fn non_working_solution(&mut self, number: u8) -> bool {
self.vec.iter().all(|mut x| {
x = &number; // mutation
true // external logic
})
}
}
fn main() {
let vec = vec![0u8,1u8,2u8];
let mut test = Test { vec };
println!("Original: {:?}", test);
test.working_solution(123u8);
println!("Altered: {:?}", test);
let vec = vec![0u8,1u8,2u8];
let mut test = Test { vec };
println!("Original: {:?}", test);
test.non_working_solution(123u8);
println!("Altered: {:?}", test);
}
(playground)
Output:
Original: Test { vec: [0, 1, 2] }
Altered: Test { vec: [123, 123, 123] }
Original: Test { vec: [0, 1, 2] }
Altered: Test { vec: [0, 1, 2] }
Expected output:
Original: Test { vec: [0, 1, 2] }
Altered: Test { vec: [123, 123, 123] }
Original: Test { vec: [0, 1, 2] }
Altered: Test { vec: [123, 123, 123] }
How do I change a member of self when using an iterator?

As you can see in the documentation, ìter takes a &self, that is, whatever you do, you can not modify self (you can create a modified copy, but this is not the point of what you want to do here).
Instead, you can use the method iter_mut, which is more or less the same, but takes a &mut self, i.e., you can modify it.
An other side remark, you don't want to use all, which is used to check if a property is true on all elements (hence the bool returned), instead, you want to use for_each which applies a function to all elements.
fn non_working_solution(&mut self, number: u8) {
self.vec.iter_mut().for_each(|x| {
*x = number; // mutation
})
}
(Playground)
As Stargateur mentioned in the comments, you can also use a for loop:
fn non_working_solution(&mut self, number: u8) {
for x in self.vec.iter_mut() {
*x = number
}
}

Since Rust 1.50, there is a dedicated method for filling a slice with a value — [_]::fill:
self.vec.fill(number)
In this case, fill seems to generate less code than a for loop or for_each:
(Compiler Explorer)
pub fn f(slice: &mut [u8], number: u8) {
slice.fill(number);
}
pub fn g(slice: &mut [u8], number: u8) {
for x in slice {
*x = number;
}
}
pub fn h(slice: &mut [u8], number: u8) {
slice
.iter_mut()
.for_each(|x| *x = number);
}
example::f:
mov rax, rsi
mov esi, edx
mov rdx, rax
jmp qword ptr [rip + memset#GOTPCREL]
example::g:
test rsi, rsi
je .LBB1_2
push rax
mov rax, rsi
movzx esi, dl
mov rdx, rax
call qword ptr [rip + memset#GOTPCREL]
add rsp, 8
.LBB1_2:
ret
example::h:
test rsi, rsi
je .LBB2_1
mov rax, rsi
movzx esi, dl
mov rdx, rax
jmp qword ptr [rip + memset#GOTPCREL]
.LBB2_1:
ret

Related

Optimize Rust function for certain likely parameter values

Is it possible to ask the compiler to optimize the code if I know that the domain of a certain parameter will likely be among a few select values?
eg.
// x will be within 1..10
fn foo(x: u32, y: u32) -> u32 {
// some logic
}
The above function should be compiled into
fn foo(x: u32, y: u32) -> u32 {
match x {
1 => foo1(y), // foo1 is generated by the compiler from foo, optimized for when x == 1
2 => foo2(y), // foo2 is generated by the compiler from foo, optimized for when x == 2
...
10 => foo10(y),
_ => foo_default(x, y) // unoptimized foo logic
}
}
I would like the compiler to generate the above rewrite based on some hint.
You can put the logic in a #[inline(always)] foo_impl(), then call it with the values you expect:
// x will be within 1..10
#[inline(always)]
fn foo_impl(x: u32, y: u32) -> u32 {
// some logic
}
fn foo(x: u32, y: u32) -> u32 {
match x {
1 => foo_impl(1, y),
2 => foo_impl(2, y),
// ...
10 => foo_impl(10, y),
_ => foo_impl(x, y),
}
}
Because of the #[inline(always)] the compiler will inline all foo_impl() calls then use the constants to optimize the call. Nothing is guaranteed, but it should be pretty reliable (haven't tested though).
Make sure to benchmark: this can actually be a regression due to code bloat.
Let's use this toy example:
fn foo(x: u32, y: u32) -> u32 {
x * y
}
movl %edi, %eax
imull %esi, %eax
retq
But in your application, you know that x is very likely to be 2 every time. We can communicate that to the compiler with std::intrinsics::likely:
#![feature(core_intrinsics)]
fn foo(x: u32, y: u32) -> u32 {
if std::intrinsics::likely(x == 2) {
foo_impl(x, y)
} else {
foo_impl(x, y)
}
}
fn foo_impl(x: u32, y: u32) -> u32 {
x * y
}
leal (%rsi,%rsi), %eax
imull %edi, %esi
cmpl $2, %edi
cmovnel %esi, %eax
retq
DISCLAIMER: I'm not experienced enough to know if this is a good optimization or not, just that the hint changed the output.
Unfortunately while I think this is the clearest syntax, std::intrinsics are not stabilized. Fortunately though, we can get the same behavior using the #[cold] attribute, which is available on stable, that can convey your desire to the compiler:
fn foo(x: u32, y: u32) -> u32 {
if x == 2 {
foo_impl(x, y)
} else {
foo_impl_unlikely(x, y)
}
}
fn foo_impl(x: u32, y: u32) -> u32 {
x * y
}
#[cold]
fn foo_impl_unlikely(x: u32, y: u32) -> u32 {
foo_impl(x, y)
}
leal (%rsi,%rsi), %eax
imull %edi, %esi
cmpl $2, %edi
cmovnel %esi, %eax
retq
I'm skeptical whether applying this to your use-case will actually yield the transformation you propose. I'd think there'd have to be a significant impact on const-propagation and even a willingness from the compiler to optimise x < 10 into a branch of ten constants, but using the hints above will let it decide what is best.
But sometimes, you know what is best more than the compiler and can force the const-propagation by applying the transformation manually: as you've done in your original example or a different way in #ChayimFriedman's answer.

If the same string literal appears in code twice, does it appear in the executable once?

Let's say that I created some Rust source code that combines lots of duplicate string literals. Are they de-duplicated during the compilation process?
Yes, Tim McNamara gave a good and concise way of confirming this, but if you want to explore further to see how this works, you can also try getting the assembly code output from Rust (you can try this on Compiler Explorer):
pub fn test() -> (&'static str, &'static str) {
let a = "Hello";
let b = "Hello";
(a, b)
}
Use rustc to get assembly output (--crate-type=lib ensures that unused functions are not cleaned up as "dead code"):
rustc src/main.rs -o output.s --emit asm --crate-type=lib
And in the assembly output, you should see something like this (the output can differ based on a number of factors):
.section __TEXT,__text,regular,pure_instructions
.build_version macos, 11, 0
.globl __ZN4main4test17h1a94a89cb89e6ba1E
.p2align 2
__ZN4main4test17h1a94a89cb89e6ba1E:
.cfi_startproc
mov x9, x8
adrp x10, l___unnamed_1#PAGE
add x10, x10, l___unnamed_1#PAGEOFF
mov x8, x10
str x8, [x9]
mov w8, #5
str x8, [x9, #8]
str x10, [x9, #16]
str x8, [x9, #24]
ret
.cfi_endproc
.section __TEXT,__const
l___unnamed_1:
.ascii "Hello"
.subsections_via_symbols
There is a single label l___unnamed_1 which contains the string literal Hello and it is used twice.
Yes! If you create the following program, which prints the memory address of two variables, you'll see that they print the same value. That is, both a and b refer to the same underlying data.
fn main() {
let a = "Hello";
let b = "Hello";
println!("{:p} {:p}", a, b);
}
To try this out yourself, you can run the program within the Rust playground. Here's one example output:
0x55b17e61905b 0x55b17e61905b
It's possible to take this idea even further. Let's experiment by scattering the same literal in different functions and modules.
static GREETING: &'static str = "Hello";
#[inline(never)]
fn f1() {
let f1_greeting = "Hello";
println!("{:p}", f1_greeting);
}
#[inline(never)]
fn f2() {
let f2_greeting = "Hello";
println!("{:p}", f2_greeting);
}
mod submodule {
pub fn f3() {
let f3_greeting = "Hello";
println!("{:p}", f3_greeting);
}
}
fn main() {
let a = "Hello";
let b = "Hello";
println!("{:p}", GREETING);
println!("{:p}", a);
println!("{:p}", b);
f1();
f2();
submodule::f3();
}
You'll see that the outcome is the same: only one copy of the literal is loaded into memory.
0x55b17e61905b
0x55b17e61905b
0x55b17e61905b
0x55b17e61905b
0x55b17e61905b
0x55b17e61905b

Procedural macro for retrieving data from a nested struct by index

I am trying to write a rust derive macro for retrieving data from a nested struct by index. The struct only contains primitive types u8, i8, u16, i16, u32, i32, u64, i64, or other structs thereof. I have an Enum which encapsulates the leaf field data in a common type which I call an Item(). I want the macro to create a .get() implementation which returns an item based on a u16 index.
Here is the desired behavior.
#[derive(Debug, PartialEq, PartialOrd, Copy, Clone)]
pub enum Item {
U8(u8),
I8(i8),
U16(u16),
I16(i16),
U32(u32),
I32(i32),
U64(u64),
I64(i64),
}
struct NestedData {
a: u16,
b: i32,
}
#[derive(GetItem)]
struct Data {
a: i32,
b: u64,
c: NestedData,
}
let data = Data {
a: 42,
b: 1000,
c: NestedData { a: 500, b: -2 },
};
assert_eq!(data.get(0).unwrap(), Item::I32(42));
assert_eq!(data.get(1).unwrap(), Item::U64(1000));
assert_eq!(data.get(2).unwrap(), Item::U16(500));
assert_eq!(data.get(3).unwrap(), Item::I32(-2));
For this particular example, I want the macro to expand to the following...
impl Data {
pub fn get(&self, index: u16) -> Result<Item, Error> {
match index {
0 => Ok(Item::U16(self.a)),
1 => Ok(Item::I32(self.b)),
2 => Ok(Item::I32(self.c.a)),
3 => Ok(Item::U64(self.c.b)),
_ => Err(Error::BadIndex),
}
}
}
I have a working macro for a single layer struct, but I am not sure about how to modify it to support nested structs. Here is where I am at...
use proc_macro2::TokenStream;
use quote::quote;
use syn::{Data, DataStruct, DeriveInput, Fields, Type, TypePath};
pub fn impl_get_item(input: DeriveInput) -> syn::Result<TokenStream> {
let model_name = input.ident;
let fields = match input.data {
Data::Struct(DataStruct {
fields: Fields::Named(fields),
..
}) => fields.named,
_ => panic!("The GetItem derive can only be applied to structs"),
};
let mut matches = TokenStream::new();
let mut item_index: u16 = 0;
for field in fields {
let item_name = field.ident;
let item_type = field.ty;
let ts = match item_type {
Type::Path(TypePath { path, .. }) if path.is_ident("u8") => {
quote! {#item_index => Ok(Item::U8(self.#item_name)),}
}
Type::Path(TypePath { path, .. }) if path.is_ident("i8") => {
quote! {#item_index => Ok(Item::I8(self.#item_name)),}
}
Type::Path(TypePath { path, .. }) if path.is_ident("u16") => {
quote! {#item_index => Ok(Item::U16(self.#item_name)),}
}
Type::Path(TypePath { path, .. }) if path.is_ident("i16") => {
quote! {#item_index => Ok(Item::I16(self.#item_name)),}
}
Type::Path(TypePath { path, .. }) if path.is_ident("u32") => {
quote! {#item_index => Ok(Item::U32(self.#item_name)),}
}
Type::Path(TypePath { path, .. }) if path.is_ident("i32") => {
quote! {#item_index => Ok(Item::I32(self.#item_name)),}
}
Type::Path(TypePath { path, .. }) if path.is_ident("u64") => {
quote! {#item_index => Ok(Item::U64(self.#item_name)),}
}
Type::Path(TypePath { path, .. }) if path.is_ident("i64") => {
quote! {#item_index => Ok(Item::I64(self.#item_name)),}
}
_ => panic!("{:?} uses unsupported type {:?}", item_name, item_type),
};
matches.extend(ts);
item_index += 1;
}
let output = quote! {
#[automatically_derived]
impl #model_name {
pub fn get(&self, index: u16) -> Result<Item, Error> {
match index {
#matches
_ => Err(Error::BadIndex),
}
}
}
};
Ok(output)
}
I'm not going to give a complete answer as my proc-macro skills are non-existant, but I don't think the macro part is tricky once you've got the structure right.
The way I'd approach this is to define a trait that all the types will use. I'm going to call this Indexible which is probably bad. The point of the trait is to provide the get function and a count of all fields contained within this object.
trait Indexible {
fn nfields(&self) -> usize;
fn get(&self, idx:usize) -> Result<Item>;
}
I'm using fn nfields(&self) -> usize rather than fn nfields() -> usize as taking &self means I can use this on vectors and slices and probably some other types (It also makes the following code slightly neater).
Next you need to implement this trait for your base types:
impl Indexible for u8 {
fn nfields(&self) -> usize { 1 }
fn get(&self, idx:usize) -> Result<Item> { Ok(Item::U8(*self)) }
}
...
Generating all these is probably a good use for a macro (but the proc macro that you're talking about).
Next, you need to generate these for your desired types: My implementations look like this:
impl Indexible for NestedData {
fn nfields(&self) -> usize {
self.a.nfields() +
self.b.nfields()
}
fn get(&self, idx:usize) -> Result<Item> {
let idx = idx;
// member a
if idx < self.a.nfields() {
return self.a.get(idx)
}
let idx = idx - self.a.nfields();
// member b
if idx < self.b.nfields() {
return self.b.get(idx)
}
Err(())
}
}
impl Indexible for Data {
fn nfields(&self) -> usize {
self.a.nfields() +
self.b.nfields() +
self.c.nfields()
}
fn get(&self, idx:usize) -> Result<Item> {
let idx = idx;
if idx < self.a.nfields() {
return self.a.get(idx)
}
let idx = idx - self.a.nfields();
if idx < self.b.nfields() {
return self.b.get(idx)
}
let idx = idx - self.b.nfields();
if idx < self.c.nfields() {
return self.c.get(idx)
}
Err(())
}
}
You can see a complete running version in the playground.
These look like they can be easily generated by a macro.
If you want slightly better error messages on types that wont work, you should explicitly trea each member as an Indexible like this: (self.a as Indexible).get(..).
It might seem that this is not going to be particularly efficient, but the compiler is able to determine that most of these pieces are constant and inline them. For example, using rust 1.51 with -C opt-level=3, the following function
pub fn sum(data: &Data) -> usize {
let mut sum = 0;
for i in 0..data.nfields() {
sum += match data.get(i) {
Err(_) => panic!(),
Ok(Item::U8(v)) => v as usize,
Ok(Item::U16(v)) => v as usize,
Ok(Item::I32(v)) => v as usize,
Ok(Item::U64(v)) => v as usize,
_ => panic!(),
}
}
sum
}
compiles to just this
example::sum:
movsxd rax, dword ptr [rdi + 8]
movsxd rcx, dword ptr [rdi + 12]
movzx edx, word ptr [rdi + 16]
add rax, qword ptr [rdi]
add rax, rdx
add rax, rcx
ret
You can see this in the compiler explorer

rustc generate incorrect assembly when inline

I'm trying to inline some functions but the assembly code rustc generate is incorrect.
main.rs:
#[derive(Copy, Clone, PartialOrd, PartialEq, Eq)]
pub struct MyType1(usize);
impl MyType1 {
#[inline(always)]
pub fn my_func (&self) -> usize { *self / 4096 }
}
impl core::ops::Div<usize> for MyType1 {
type Output = usize;
fn div (self, other: usize) -> usize { self.0 / other }
}
pub struct MyType2 {
pub data1: MyType1,
pub data2: usize,
}
static STATIC_VAR: MyType2 = MyType2 {
data1: MyType1(0),
data2: 0,
};
pub fn main () {
let my_static_var = unsafe { &mut *(&STATIC_VAR as *const MyType2 as *mut MyType2) };
my_static_var.data1 = MyType1(0x1a000);
my_static_var.data2 = my_static_var.data1.my_func ();
}
Assembly code of main function:
; var int64_t var_10h # rsp+0x8
│ ; var int64_t var_8h # rsp+0x10
│ 0x00004300 4883ec18 sub rsp, 0x18
│ 0x00004304 31c0 xor eax, eax
│ 0x00004306 89c7 mov edi, eax
│ 0x00004308 48c744241000. mov qword [var_8h], 0x1a000 ; [0x1a000:8]=0xd5e9fffffa7b84
│ 0x00004311 488b4c2410 mov rcx, qword [var_8h]
│ 0x00004316 48890d131d02. mov qword [obj.main::STATIC_VAR::h1456afe986ab6f8a], rcx ; [0x26030:8]=0
│ 0x0000431d be00100000 mov esi, 0x1000
│ 0x00004322 e889ffffff call sym <main::MyType1 as core::ops::arith::Div<usize>>::div::he4115301add5ef17 ; sym._main::MyType1_as_core::ops::arith::Div_usize__::div::he4115301add5ef17
│ 0x00004327 4889442408 mov qword [var_10h], rax
│ 0x0000432c 488b442408 mov rax, qword [var_10h]
│ 0x00004331 488905001d02. mov qword [0x00026038], rax ; [0x26038:8]=0
│ 0x00004338 4883c418 add rsp, 0x18
└ 0x0000433c c3 ret
As you can see, main call MyType1::Div function with 2 params, 0 and 0x1000, with not correct. It should be *self/4096.
build command: rustc main.rs
rustc --version: rustc 1.43.1 (8d69840ab 2020-05-04)
As #Frxstrem and #Stargateur point out. Making mutable ref from immutable is undefined behavior and anything can go wrong. So I need to make STATIC_VAR mutable:
static mut STATIC_VAR: MyType2 = MyType2 {
data1: MyType1(0),
data2: 0,
};
and change the line:
let my_static_var = unsafe { &mut *(&STATIC_VAR as *const MyType2 as *mut MyType2) };
to:
let my_static_var = unsafe { &mut STATIC_VAR };

Is there a way to init a non-trivial static std::collections::HashMap without making it static mut?

In this code, A does not need to be static mut, but the compiler forces B to be static mut:
use std::collections::HashMap;
use std::iter::FromIterator;
static A: [u32; 21] = [
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
];
static mut B: Option<HashMap<u32, String>> = None;
fn init_tables() {
let hm = HashMap::<u32, String>::from_iter(A.iter().map(|&i| (i, (i + 10u32).to_string())));
unsafe {
B = Some(hm);
}
}
fn main() {
init_tables();
println!("{:?} len: {}", A, A.len());
unsafe {
println!("{:?}", B);
}
}
This is the only way I have found to get close to what I actually want: a global, immutable HashMap to be used by several functions, without littering all my code with unsafe blocks.
I know that a global variable is a bad idea for multi-threaded applications, but mine is single threaded, so why should I pay the price for an eventuality which will never arise?
Since I use rustc directly and not cargo, I don't want the "help" of extern crates like lazy_static. I tried to decypher what the macro in that package does, but to no end.
I also tried to write this with thread_local() and a RefCell but I had trouble using A to initialize B with that version.
In more general terms, the question could be "How to get stuff into the initvars section of a program in Rust?"
If you can show me how to initialize B directly (without a function like init_tables()), your answer is probably right.
If a function like init_tables() is inevitable, is there a trick like an accessor function to reduce the unsafe litter in my program?
How to get stuff into the initvars section of a program in Rust?
Turns out rustc puts static data in .rodata section and static mut data in .data section of the generated binary:
#[no_mangle]
static DATA: std::ops::Range<u32> = 0..20;
fn main() { DATA.len(); }
$ rustc static.rs
$ objdump -t -j .rodata static
static: file format elf64-x86-64
SYMBOL TABLE:
0000000000025000 l d .rodata 0000000000000000 .rodata
0000000000025490 l O .rodata 0000000000000039 str.0
0000000000026a70 l O .rodata 0000000000000400 elf_crc32.crc32_table
0000000000026870 l O .rodata 0000000000000200 elf_zlib_default_dist_table
0000000000026590 l O .rodata 00000000000002e0 elf_zlib_default_table
0000000000025060 g O .rodata 0000000000000008 DATA
0000000000027f2c g O .rodata 0000000000000100 _ZN4core3str15UTF8_CHAR_WIDTH17h6f9f810be98aa5f2E
So changing from static mut to static at the source code level significantly changes the binary generated. The .rodata section is read-only and trying to write to it will seg fault the program.
If init_tables() is of the judgement day category (inevitable)
It is probably inevitable. Since the default .rodata linkage won't work, one has to control it directly:
use std::collections::HashMap;
use std::iter::FromIterator;
static A: std::ops::Range<u32> = 0..20;
#[link_section = ".bss"]
static B: Option<HashMap<u32, String>> = None;
fn init_tables() {
let data = HashMap::from_iter(A.clone().map(|i| (i, (i + 10).to_string())));
unsafe {
let b: *mut Option<HashMap<u32, String>> = &B as *const _ as *mut _;
(&mut *b).replace(data);
}
}
fn main() {
init_tables();
println!("{:?} len: {}", A, A.len());
println!("{:#?} 5 => {:?}", B, B.as_ref().unwrap().get(&5));
}
I don't want the "help" of extern crates like lazy_static
Actually lazy_static isn't that complicated. It has some clever use of the Deref trait. Here is a much simplified standalone version and it is more ergonomically friendly than the first example:
use std::collections::HashMap;
use std::iter::FromIterator;
use std::ops::Deref;
use std::sync::Once;
static A: std::ops::Range<u32> = 0..20;
static B: BImpl = BImpl;
struct BImpl;
impl Deref for BImpl {
type Target = HashMap<u32, String>;
#[inline(always)]
fn deref(&self) -> &Self::Target {
static LAZY: (Option<HashMap<u32, String>>, Once) = (None, Once::new());
LAZY.1.call_once(|| unsafe {
let x: *mut Option<Self::Target> = &LAZY.0 as *const _ as *mut _;
(&mut *x).replace(init_tables());
});
LAZY.0.as_ref().unwrap()
}
}
fn init_tables() -> HashMap<u32, String> {
HashMap::from_iter(A.clone().map(|i| (i, (i + 10).to_string())))
}
fn main() {
println!("{:?} len: {}", A, A.len());
println!("{:#?} 5 => {:?}", *B, B.get(&5));
}

Resources