Let's say I have a vector of structures in Rust. Structures are quite big. When I want to insert a new one, I write the code like this:
my_vec.push(MyStruct {field1: value1, field2: value2, ... });
The push definition is
fn push(&mut self, value: T)
which means the value is passed by value. I wonder if Rust creates a temporary object first then does a copy to the push function or does it optimize the code so that no temporary objects are created and copied?
Let's see. This program:
struct LotsOfBytes {
bytes: [u8; 1024]
}
#[inline(never)]
fn consume(mut lob: LotsOfBytes) {
}
fn main() {
let lob = LotsOfBytes { bytes: [0; 1024] };
consume(lob);
}
Compiles to the following LLVM IR code:
%LotsOfBytes = type { [1024 x i8] }
; Function Attrs: noinline nounwind uwtable
define internal fastcc void #_ZN7consume20hf098deecafa4b74bkaaE(%LotsOfBytes* noalias nocapture dereferenceable(1024)) unnamed_addr #0 {
entry-block:
%1 = getelementptr inbounds %LotsOfBytes* %0, i64 0, i32 0, i64 0
tail call void #llvm.lifetime.end(i64 1024, i8* %1)
ret void
}
; Function Attrs: nounwind uwtable
define internal void #_ZN4main20hf3cbebd3154c5390qaaE() unnamed_addr #2 {
entry-block:
%lob = alloca %LotsOfBytes, align 8
%lob1 = getelementptr inbounds %LotsOfBytes* %lob, i64 0, i32 0, i64 0
%arg = alloca %LotsOfBytes, align 8
%0 = getelementptr inbounds %LotsOfBytes* %lob, i64 0, i32 0, i64 0
call void #llvm.lifetime.start(i64 1024, i8* %0)
call void #llvm.memset.p0i8.i64(i8* %lob1, i8 0, i64 1024, i32 8, i1 false)
%1 = getelementptr inbounds %LotsOfBytes* %arg, i64 0, i32 0, i64 0
call void #llvm.lifetime.start(i64 1024, i8* %1)
call void #llvm.memcpy.p0i8.p0i8.i64(i8* %1, i8* %0, i64 1024, i32 8, i1 false)
call fastcc void #_ZN7consume20hf098deecafa4b74bkaaE(%LotsOfBytes* noalias nocapture dereferenceable(1024) %arg)
call void #llvm.lifetime.end(i64 1024, i8* %1)
call void #llvm.lifetime.end(i64 1024, i8* %0)
ret void
}
This line is interesting in particular:
call fastcc void #_ZN7consume20hf098deecafa4b74bkaaE(%LotsOfBytes* noalias nocapture dereferenceable(1024) %arg)
If I understand correctly, this means that consume is called with a pointer to LotsOfBytes, so yes, rustc optimizes passing big structures by value.
Related
In gcc, we can use asm volatile("":::"memory");
But I can't find a option similar to "memory" in document of rust inline asm.
Is there any way to do that?
In Rust, memory clobbering is the default. You should use options(nomem) to opt it out.
For example:
pub unsafe fn no_nomem() {
std::arch::asm!("");
}
pub unsafe fn nomem() {
std::arch::asm!("", options(nomem));
}
LLVM IR:
define void #_ZN7example8no_nomem17h95b023e6c43118daE() unnamed_addr #0 !dbg !5 {
call void asm sideeffect alignstack inteldialect "", "~{dirflag},~{fpsr},~{flags},~{memory}"(), !dbg !10, !srcloc !11
br label %bb1, !dbg !10
bb1: ; preds = %start
ret void, !dbg !12
}
define void #_ZN7example5nomem17hc75cf2d808290004E() unnamed_addr #0 !dbg !13 {
call void asm sideeffect alignstack inteldialect "", "~{dirflag},~{fpsr},~{flags}"() #1, !dbg !14, !srcloc !15
br label %bb1, !dbg !14
bb1: ; preds = %start
ret void, !dbg !16
}
The function without nomem emits a ~{memory} barrier.
I would like to iterate over some elements inside a vector contained as a member in a struct called Test. The idea is to mutate Test independently in each iteration and signify success if some external logic on each mutated Test is successful. For simplicity, the mutation is just changing the vector element to 123u8. The problem I have is not being able to change the elements inside a loop. I have two solutions which I though would give the same answer:
#[derive(Debug)]
struct Test {
vec: Vec<u8>
}
impl Test {
fn working_solution(&mut self, number: u8) -> bool {
self.vec[0] = number;
self.vec[1] = number;
self.vec[2] = number;
true
}
fn non_working_solution(&mut self, number: u8) -> bool {
self.vec.iter().all(|mut x| {
x = &number; // mutation
true // external logic
})
}
}
fn main() {
let vec = vec![0u8,1u8,2u8];
let mut test = Test { vec };
println!("Original: {:?}", test);
test.working_solution(123u8);
println!("Altered: {:?}", test);
let vec = vec![0u8,1u8,2u8];
let mut test = Test { vec };
println!("Original: {:?}", test);
test.non_working_solution(123u8);
println!("Altered: {:?}", test);
}
(playground)
Output:
Original: Test { vec: [0, 1, 2] }
Altered: Test { vec: [123, 123, 123] }
Original: Test { vec: [0, 1, 2] }
Altered: Test { vec: [0, 1, 2] }
Expected output:
Original: Test { vec: [0, 1, 2] }
Altered: Test { vec: [123, 123, 123] }
Original: Test { vec: [0, 1, 2] }
Altered: Test { vec: [123, 123, 123] }
How do I change a member of self when using an iterator?
As you can see in the documentation, ìter takes a &self, that is, whatever you do, you can not modify self (you can create a modified copy, but this is not the point of what you want to do here).
Instead, you can use the method iter_mut, which is more or less the same, but takes a &mut self, i.e., you can modify it.
An other side remark, you don't want to use all, which is used to check if a property is true on all elements (hence the bool returned), instead, you want to use for_each which applies a function to all elements.
fn non_working_solution(&mut self, number: u8) {
self.vec.iter_mut().for_each(|x| {
*x = number; // mutation
})
}
(Playground)
As Stargateur mentioned in the comments, you can also use a for loop:
fn non_working_solution(&mut self, number: u8) {
for x in self.vec.iter_mut() {
*x = number
}
}
Since Rust 1.50, there is a dedicated method for filling a slice with a value — [_]::fill:
self.vec.fill(number)
In this case, fill seems to generate less code than a for loop or for_each:
(Compiler Explorer)
pub fn f(slice: &mut [u8], number: u8) {
slice.fill(number);
}
pub fn g(slice: &mut [u8], number: u8) {
for x in slice {
*x = number;
}
}
pub fn h(slice: &mut [u8], number: u8) {
slice
.iter_mut()
.for_each(|x| *x = number);
}
example::f:
mov rax, rsi
mov esi, edx
mov rdx, rax
jmp qword ptr [rip + memset#GOTPCREL]
example::g:
test rsi, rsi
je .LBB1_2
push rax
mov rax, rsi
movzx esi, dl
mov rdx, rax
call qword ptr [rip + memset#GOTPCREL]
add rsp, 8
.LBB1_2:
ret
example::h:
test rsi, rsi
je .LBB2_1
mov rax, rsi
movzx esi, dl
mov rdx, rax
jmp qword ptr [rip + memset#GOTPCREL]
.LBB2_1:
ret
I'm trying to declare a global variable, who's type is a struct with a function pointer and a char pointer element { i64 ()*, i8* }, and then set the fields to null during main, but I'm getting an assertion error using a debug version of LLVM.
/media/work/contrib/llvm-project/llvm/lib/IR/ConstantsContext.h:745: void llvm::ConstantUniqueMap<ConstantClass>::remove(ConstantClass*) [with ConstantClass = llvm::ConstantExpr]: Assertion `I != Map.end() && "Constant not found in constant table!"' failed.
I believe this problem is causing another issue during optimization when compiling something a bit more complicated. The error itself occurs when disposing of the module at the end. A distilled runnable example in rust is:
use std::ffi::CString;
extern crate llvm_sys;
pub use self::llvm_sys::prelude::{ LLVMValueRef };
use self::llvm_sys::*;
use self::llvm_sys::prelude::*;
use self::llvm_sys::core::*;
use self::llvm_sys::target::*;
use self::llvm_sys::target_machine::*;
use self::llvm_sys::transforms::pass_manager_builder::*;
fn main() {
unsafe {
let context = LLVMContextCreate();
let module = LLVMModuleCreateWithName(cstr("module"));
build(module, context);
println!("{}", emit_module(module));
LLVMDisposeModule(module);
LLVMContextDispose(context);
}
}
pub fn cstr(string: &str) -> *mut i8 {
CString::new(string).unwrap().into_raw()
}
pub unsafe fn build(module: LLVMModuleRef, context: LLVMContextRef) {
let builder = LLVMCreateBuilderInContext(context);
let mut argtypes = vec!();
let func_type = LLVMFunctionType(LLVMInt64TypeInContext(context), argtypes.as_mut_ptr(), argtypes.len() as u32, false as i32);
let fptr_type = LLVMPointerType(func_type, 0);
let context_type = LLVMPointerType(LLVMInt8TypeInContext(context), 0);
let mut structfields = vec!(fptr_type, context_type);
let struct_type = LLVMStructType(structfields.as_mut_ptr(), structfields.len() as u32, false as i32);
let initializer = LLVMConstNull(struct_type);
let global = LLVMAddGlobal(module, struct_type, cstr("function"));
LLVMSetInitializer(global, initializer);
LLVMSetLinkage(global, LLVMLinkage::LLVMExternalLinkage);
let mut argtypes = vec!();
let main_type = LLVMFunctionType(LLVMInt64TypeInContext(context), argtypes.as_mut_ptr(), argtypes.len() as u32, false as i32);
let function = LLVMAddFunction(module, cstr("main"), main_type);
let bb = LLVMAppendBasicBlockInContext(context, function, cstr("entry"));
LLVMPositionBuilderAtEnd(builder, bb);
let mut indices = vec!(LLVMConstInt(LLVMInt32TypeInContext(context), 0, 0), LLVMConstInt(LLVMInt32TypeInContext(context), 0, 0));
let field = LLVMBuildGEP(builder, global, indices.as_mut_ptr(), indices.len() as u32, cstr(""));
LLVMBuildStore(builder, LLVMConstNull(fptr_type), field);
LLVMBuildRet(builder, LLVMConstInt(LLVMInt64TypeInContext(context), 0, 0));
LLVMDisposeBuilder(builder);
}
pub fn emit_module(module: LLVMModuleRef) -> String {
unsafe { CString::from_raw(LLVMPrintModuleToString(module)).into_string().unwrap() }
}
The full output is:
; ModuleID = 'module'
source_filename = "module"
#function = global { i64 ()*, i8* } zeroinitializer
define i64 #main() {
entry:
store i64 ()* null, i64 ()** getelementptr inbounds ({ i64 ()*, i8* }, { i64 ()*, i8* }* #function, i32 0, i32 0), align 8
ret i64 0
}
/media/work/contrib/llvm-project/llvm/lib/IR/ConstantsContext.h:745: void llvm::ConstantUniqueMap<ConstantClass>::remove(ConstantClass*) [with ConstantClass = llvm::ConstantExpr]: Assertion `I != Map.end() && "Constant not found in constant table!"' failed.
Aborted (core dumped)
Any help or suggestions would be most appreciated. Thanks
I finally figured out what the issue is. In the example above, the function LLVMStructType() also has an alternate version LLVMStructTypeInContext(), which if used instead will fix the assertion error. There is also another function LLVMModuleCreateWithNameInContext, which should probably be used instead, but in the example above, it will work without fixing this.
It's also possible to replace all the InContext versions with their non-context versions to fix the problem in the example
In my actual program, removing the InContext versions didn't work for some reason, but the two non-context functions mentioned above were being used, as well a couple uses of the non-context LLVMAppendBasicBlock. Replacing these with the InContext version fixed all the assertion errors, include the one that started this:
void llvm::Value::doRAUW(llvm::Value*, llvm::Value::ReplaceMetadataUses): Assertion `New->getType() == getType() && "replaceAllUses of value with new value of different type!"' failed.
The type pointers weren't the same, so it must have been because they were defined in different contexts.
In this code, A does not need to be static mut, but the compiler forces B to be static mut:
use std::collections::HashMap;
use std::iter::FromIterator;
static A: [u32; 21] = [
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
];
static mut B: Option<HashMap<u32, String>> = None;
fn init_tables() {
let hm = HashMap::<u32, String>::from_iter(A.iter().map(|&i| (i, (i + 10u32).to_string())));
unsafe {
B = Some(hm);
}
}
fn main() {
init_tables();
println!("{:?} len: {}", A, A.len());
unsafe {
println!("{:?}", B);
}
}
This is the only way I have found to get close to what I actually want: a global, immutable HashMap to be used by several functions, without littering all my code with unsafe blocks.
I know that a global variable is a bad idea for multi-threaded applications, but mine is single threaded, so why should I pay the price for an eventuality which will never arise?
Since I use rustc directly and not cargo, I don't want the "help" of extern crates like lazy_static. I tried to decypher what the macro in that package does, but to no end.
I also tried to write this with thread_local() and a RefCell but I had trouble using A to initialize B with that version.
In more general terms, the question could be "How to get stuff into the initvars section of a program in Rust?"
If you can show me how to initialize B directly (without a function like init_tables()), your answer is probably right.
If a function like init_tables() is inevitable, is there a trick like an accessor function to reduce the unsafe litter in my program?
How to get stuff into the initvars section of a program in Rust?
Turns out rustc puts static data in .rodata section and static mut data in .data section of the generated binary:
#[no_mangle]
static DATA: std::ops::Range<u32> = 0..20;
fn main() { DATA.len(); }
$ rustc static.rs
$ objdump -t -j .rodata static
static: file format elf64-x86-64
SYMBOL TABLE:
0000000000025000 l d .rodata 0000000000000000 .rodata
0000000000025490 l O .rodata 0000000000000039 str.0
0000000000026a70 l O .rodata 0000000000000400 elf_crc32.crc32_table
0000000000026870 l O .rodata 0000000000000200 elf_zlib_default_dist_table
0000000000026590 l O .rodata 00000000000002e0 elf_zlib_default_table
0000000000025060 g O .rodata 0000000000000008 DATA
0000000000027f2c g O .rodata 0000000000000100 _ZN4core3str15UTF8_CHAR_WIDTH17h6f9f810be98aa5f2E
So changing from static mut to static at the source code level significantly changes the binary generated. The .rodata section is read-only and trying to write to it will seg fault the program.
If init_tables() is of the judgement day category (inevitable)
It is probably inevitable. Since the default .rodata linkage won't work, one has to control it directly:
use std::collections::HashMap;
use std::iter::FromIterator;
static A: std::ops::Range<u32> = 0..20;
#[link_section = ".bss"]
static B: Option<HashMap<u32, String>> = None;
fn init_tables() {
let data = HashMap::from_iter(A.clone().map(|i| (i, (i + 10).to_string())));
unsafe {
let b: *mut Option<HashMap<u32, String>> = &B as *const _ as *mut _;
(&mut *b).replace(data);
}
}
fn main() {
init_tables();
println!("{:?} len: {}", A, A.len());
println!("{:#?} 5 => {:?}", B, B.as_ref().unwrap().get(&5));
}
I don't want the "help" of extern crates like lazy_static
Actually lazy_static isn't that complicated. It has some clever use of the Deref trait. Here is a much simplified standalone version and it is more ergonomically friendly than the first example:
use std::collections::HashMap;
use std::iter::FromIterator;
use std::ops::Deref;
use std::sync::Once;
static A: std::ops::Range<u32> = 0..20;
static B: BImpl = BImpl;
struct BImpl;
impl Deref for BImpl {
type Target = HashMap<u32, String>;
#[inline(always)]
fn deref(&self) -> &Self::Target {
static LAZY: (Option<HashMap<u32, String>>, Once) = (None, Once::new());
LAZY.1.call_once(|| unsafe {
let x: *mut Option<Self::Target> = &LAZY.0 as *const _ as *mut _;
(&mut *x).replace(init_tables());
});
LAZY.0.as_ref().unwrap()
}
}
fn init_tables() -> HashMap<u32, String> {
HashMap::from_iter(A.clone().map(|i| (i, (i + 10).to_string())))
}
fn main() {
println!("{:?} len: {}", A, A.len());
println!("{:#?} 5 => {:?}", *B, B.get(&5));
}
When writing relatively realtime code, generally heap allocations in the main execution loop are avoided. So in my experience you allocate all the memory your program needs in an initialization step, and then pass the memory around as needed. A toy example in C might look something like the following:
#include <stdlib.h>
#define LEN 100
void not_realtime() {
int *v = malloc(LEN * sizeof *v);
for (int i = 0; i < LEN; i++) {
v[i] = 1;
}
free(v);
}
void realtime(int *v, int len) {
for (int i = 0; i < len; i++) {
v[i] = 1;
}
}
int main(int argc, char **argv) {
not_realtime();
int *v = malloc(LEN * sizeof *v);
realtime(v, LEN);
free(v);
}
And I believe roughly the equivalent in Rust:
fn possibly_realtime() {
let mut v = vec![0; 100];
for i in 0..v.len() {
v[i] = 1;
}
}
fn realtime(v: &mut Vec<i32>) {
for i in 0..v.len() {
v[i] = 1;
}
}
fn main() {
possibly_realtime();
let mut v: Vec<i32> = vec![0; 100];
realtime(&mut v);
}
What I'm wondering is: is Rust able to optimize possibly_realtime such that the local heap allocation of v only occurs once and is reused on subsequent calls to possibly_realtime? I'm guessing not but maybe there's some magic that makes it possible.
To investigate this, it is useful to add #[inline(never)] to your function, then view the LLVM IR on the playground.
Rust 1.54
This is not optimized. Here's an excerpt:
; playground::possibly_realtime
; Function Attrs: noinline nonlazybind uwtable
define internal fastcc void #_ZN10playground17possibly_realtime17h2ab726cd567363f3E() unnamed_addr #0 personality i32 (i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*)* #rust_eh_personality {
start:
%0 = tail call i8* #__rust_alloc_zeroed(i64 400, i64 4) #9, !noalias !8
%1 = icmp eq i8* %0, null
br i1 %1, label %bb20.i.i.i.i, label %vector.body
Every time that possibly_realtime is called, memory is allocated via __rust_alloc_zeroed.
Slightly before Rust 1.0
This is not optimized. Here's an excerpt:
; Function Attrs: noinline uwtable
define internal fastcc void #_ZN17possibly_realtime20h1a3a159dd4b50685eaaE() unnamed_addr #0 {
entry-block:
%0 = tail call i8* #je_mallocx(i64 400, i32 0), !noalias !0
%1 = icmp eq i8* %0, null
br i1 %1, label %then-block-255-.i.i, label %normal-return2.i
Every time that possibly_realtime is called, memory is allocated via je_mallocx.
Editorial
Reusing a buffer is a great way to leak secure information, and I'd encourage you to avoid it as much as possible. I'm sure you are already familiar with these problems, but I want to make sure that future searchers make a note.
I also doubt that this "optimization" will be added to Rust, especially not without explicit opt-in by the programmer. There needs to be somewhere that the pointer to the allocated memory could be stored, but there really isn't anywhere. That means it would need to be a global or thread-local variable! Rust can run in environments without threads, but a global variable would still preclude recursive calls to this method. All in all, I think that passing the buffer into the method is much more explicit about what will happen.
I also assume that your example uses a Vec with a fixed size for demo purposes, but if you truly know the size at compile time, a fixed-size array could be a better choice.
As of 2021, Rust is capable of optimizing out heap allocation and inlining vtable method calls (playground):
fn old_adder(a: f64) -> Box<dyn Fn(f64)->f64> {
Box::new(move |x| a + x)
}
#[inline(never)]
fn test() {
let adder = old_adder(1.);
assert_eq!(adder(1.), 2.);
}
fn main() {
test();
}