Will data be copied byte by byte when moving a variable? - rust

I'm new to rust, and I am wondering what exactly happens when moving a variable.
struct Point {
x: i32,
y: i32,
}
fn main() {
let p = Point { x: 1, y: 1 };
let q = p;
}
When let q = p;, will the data (size of which is 8 bytes) be copied from a memory address to another? Since p is moved here thus it cannot be used anymore, I think it is good to make q's underlying memory address equal to p's. In another word, I think it is OK that nothing is copied in the machine code.
So my question is: will data be copied byte by byte when moving a variable? If it will, why?

[W]ill data be copied byte by byte when moving a variable?
In general, yes. To move a value, Rust simply performs a bitwise copy. If the value is not Copy, the source won't be used anymore after the move. If the value is Copy, both the source and the destination can be used.
However, there are many cases when the compiler backend can eliminate the copy by proving that the code beahves identical without the copy. This optimization happens completely in LLVM. In your example, the LLVM IR still contains the instructions to move the data, but the generated code does not contain the move even in debug mode.
If it will, why?
There are many reasons why the compiler can be unable to use the same memory for source and destination. In your example, with two variables in the same stack frame, it's easy to see that the move is not needed, but the code is a bit pointless anyway (though sometimes people do move values inside a function to make a variable immutable).
Here are just a few illustrations why the compiler may be unable to reuse the source memory for the destination:
The source value may be on the stack, while the destination is on the heap or vice versa. The statement let b = Box::new(3); will move the value 3 from the stack to the heap'; let i = *b; will move it from the heap back to the stack. It's still possible that the compiler can eliminate these moves, e.g. by writing the constant 3 to the heap immediately, without writing it to the stack first.
Source and destination may be on different stack frames, when moving values across functions – e.g. when passing a value into a function, or when returning a value from a function.
Source an destination values may be stored in struct fields, so they need to have the right offset inside the struct.
These are just a few examples. The takeaway is that in general, a move may result in a bitwise copy. Keep in mind that a bitwise copy is very cheap, though, and that the optimizer usually does a good job, so you should only worry about this if you actually have a proven performance bottleneck.

Related

Does std::ptr::write transfer the "uninitialized-ness" of the bytes it writes?

I'm working on a library that help transact types that fit in a pointer-size int over FFI boundaries. Suppose I have a struct like this:
use std::mem::{size_of, align_of};
struct PaddingDemo {
data: u8,
force_pad: [usize; 0]
}
assert_eq!(size_of::<PaddingDemo>(), size_of::<usize>());
assert_eq!(align_of::<PaddingDemo>(), align_of::<usize>());
This struct has 1 data byte and 7 padding bytes. I want to pack an instance of this struct into a usize and then unpack it on the other side of an FFI boundary. Because this library is generic, I'm using MaybeUninit and ptr::write:
use std::ptr;
use std::mem::MaybeUninit;
let data = PaddingDemo { data: 12, force_pad: [] };
// In order to ensure all the bytes are initialized,
// zero-initialize the buffer
let mut packed: MaybeUninit<usize> = MaybeUninit::zeroed();
let ptr = packed.as_mut_ptr() as *mut PaddingDemo;
let packed_int = unsafe {
std::ptr::write(ptr, data);
packed.assume_init()
};
// Attempt to trigger UB in Miri by reading the
// possibly uninitialized bytes
let copied = unsafe { ptr::read(&packed_int) };
Does that assume_init call triggered undefined behavior? In other words, when the ptr::write copies the struct into the buffer, does it copy the uninitialized-ness of the padding bytes, overwriting the initialized state as zero bytes?
Currently, when this or similar code is run in Miri, it doesn't detect any Undefined Behavior. However, per the discussion about this issue on github, ptr::write is supposedly allowed to copy those padding bytes, and furthermore to copy their uninitialized-ness. Is that true? The docs for ptr::write don't talk about this at all, nor does the nomicon section on uninitialized memory.
Does that assume_init call triggered undefined behavior?
Yes. "Uninitialized" is just another value that a byte in the Rust Abstract Machine can have, next to the usual 0x00 - 0xFF. Let us write this special byte as 0xUU. (See this blog post for a bit more background on this subject.) 0xUU is preserved by copies just like any other possible value a byte can have is preserved by copies.
But the details are a bit more complicated.
There are two ways to copy data around in memory in Rust.
Unfortunately, the details for this are also not explicitly specified by the Rust language team, so what follows is my personal interpretation. I think what I am saying is uncontroversial unless marked otherwise, but of course that could be a wrong impression.
Untyped / byte-wise copy
In general, when a range of bytes is being copied, the source range just overwrites the target range -- so if the source range was "0x00 0xUU 0xUU 0xUU", then after the copy the target range will have that exact list of bytes.
This is what memcpy/memmove in C behave like (in my interpretation of the standard, which is not very clear here unfortunately). In Rust, ptr::copy{,_nonoverlapping} probably performs a byte-wise copy, but it's not actually precisely specified right now and some people might want to say it is typed as well. This was discussed a bit in this issue.
Typed copy
The alternative is a "typed copy", which is what happens on every normal assignment (=) and when passing values to/from a function. A typed copy interprets the source memory at some type T, and then "re-serializes" that value of type T into the target memory.
The key difference to a byte-wise copy is that information which is not relevant at the type T is lost. This is basically a complicated way of saying that a typed copy "forgets" padding, and effectively resets it to uninitialized. Compared to an untyped copy, a typed copy loses more information. Untyped copies preserve the underlying representation, typed copies just preserve the represented value.
So even when you transmute 0usize to PaddingDemo, a typed copy of that value can reset this to "0x00 0xUU 0xUU 0xUU" (or any other possible bytes for the padding) -- assuming data sits at offset 0, which is not guaranteed (add #[repr(C)] if you want that guarantee).
In your case, ptr::write takes an argument of type PaddingDemo, and the argument is passed via a typed copy. So already at that point, the padding bytes may change arbitrarily, in particular they may become 0xUU.
Uninitialized usize
Whether your code has UB then depends on yet another factor, namely whether having an uninitialized byte in a usize is UB. The question is, does a (partially) uninitialized range of memory represent some integer? Currently, it does not and thus there is UB. However, whether that should be the case is heavily debated and it seems likely that we will eventually permit it.
Many other details are still unclear, though -- for example, transmuting "0x00 0xUU 0xUU 0xUU" to an integer may well result in a fully uninitialized integer, i.e., integers may not be able to preserve "partial initialization". To preserve partially initialized bytes in integers we would have to basically say that an integer has no abstract "value", it is just a sequence of (possibly uninitialized) bytes. This does not reflect how integers get used in operations like /. (Some of this also depends on LLVM decisions around poison and freeze; LLVM might decide that when doing a load at integer type, the result is fully poison if any input byte is poison.) So even if the code is not UB because we permit uninitialized integers, it may not behave as expected because the data you want to transfer is being lost.
If you want to transfer raw bytes around, I suggest to use a type suited for that, such as MaybeUninit. If you use an integer type, the goal should be to transfer integer values -- i.e., numbers.

Is it safe to clone a type-erased Arc via raw pointer?

I'm in a situation where I'm working with data wrapped in an Arc, and I sometimes end up using into_raw to get the raw pointer to the underlying data. My use case also calls for type-erasure, so the raw pointer often gets cast to a *const c_void, then cast back to the appropriate concrete type when re-constructing the Arc.
I've run into a situation where it would be useful to be able to clone the Arc without needing to know the concrete type of the underlying data. As I understand it, it should be safe to reconstruct the Arc with a dummy type solely for the purpose of calling clone, so long as I never actually dereference the data. So, for example, this should be safe:
pub unsafe fn clone_raw(handle: *const c_void) -> *const c_void {
let original = Arc::from_raw(handle);
let copy = original.clone();
mem::forget(original);
Arc::into_raw(copy)
}
Is there anything that I'm missing that would make this actually unsafe? Also, I assume the answer would apply to Rc as well, but if there are any differences please let me know!
This is almost always unsafe.
An Arc<T> is just a pointer to a heap-allocated struct which roughly looks like
struct ArcInner<T: ?Sized> {
strong: atomic::AtomicUsize,
weak: atomic::AtomicUsize,
data: T, // You get a raw pointer to this element
}
into_raw() gives you a pointer to the data element. The implementation of Arc::from_raw() takes such a pointer, assumes that it's a pointer to the data-element in an ArcInner<T>, walks back in memory and assumes to find an ArcInner<T> there. This assumption depends on the memory-layout of T, specifically it's alignment and therefore it's exact placement in ArcInner.
If you call into_raw() on an Arc<U> and then call from_raw() as if it was an Arc<V> where U and V differ in alignment, the offset-calculation of where U/V is in ArcInner will be wrong and the call to .clone() will corrupt the data structure. Dereferencing T is therefore not required to trigger memory unsafety.
In practice, this might not be a problem: Since data is the third element after two usize-elements, most T will probably be aligned the same way. However, if the stdlib-implementation changes or you end up compiling for a platform where this assumption is wrong, reconstructing an Arc<V>::from_raw that was created by an Arc<U> where the memory layout of V and U is different will be unsafe and crash.
Update:
Having thought about it some more I downgrade my vote from "might be safe, but cringy" to "most likely unsafe" because I can always do
#[repr(align(32))]
struct Foo;
let foo = Arc::new(Foo);
In this example Foo will be aligned to 32 bytes, making ArcInner<Foo> 32 bytes in size (8+8+16+0) while a ArcInner<()> is just 16 bytes (8+8+0+0). Since there is no way to tell what the alignment of T is after the type has been erased, there is no way to reconstruct a valid Arc.
There is an escape hatch that might be safe in practice: By wrapping T into another Box, the layout of ArcInner<T> is always the same. In order to force this upon any user, you can do something like
struct ArcBox<T>(Arc<Box<T>>)
and implement Deref on that. Using ArcBox instead of Arc forces the memory layout of ArcInner to always be the same, because T is behind another pointer. This, however, means that all access to T requires a double dereference, which might badly affect performance.

Does Iterator::collect allocate the same amount of memory as String::with_capacity?

In C++ when joining a bunch of strings (where each element's size is known roughly), it's common to pre-allocate memory to avoid multiple re-allocations and moves:
std::vector<std::string> words;
constexpr size_t APPROX_SIZE = 20;
std::string phrase;
phrase.reserve((words.size() + 5) * APPROX_SIZE); // <-- avoid multiple allocations
for (const auto &w : words)
phrase.append(w);
Similarly, I did this in Rust (this chunk needs the unicode-segmentation crate)
fn reverse(input: &str) -> String {
let mut result = String::with_capacity(input.len());
for gc in input.graphemes(true /*extended*/).rev() {
result.push_str(gc)
}
result
}
I was told that the idiomatic way of doing it is a single expression
fn reverse(input: &str) -> String {
input
.graphemes(true /*extended*/)
.rev()
.collect::<Vec<&str>>()
.concat()
}
While I really like it and want to use it, from a memory allocation point of view, would the former allocate less chunks than the latter?
I disassembled this with cargo rustc --release -- --emit asm -C "llvm-args=-x86-asm-syntax=intel" but it doesn't have source code interspersed, so I'm at a loss.
Your original code is fine and I do not recommend changing it.
The original version allocates once: inside String::with_capacity.
The second version allocates at least twice: first, it creates a Vec<&str> and grows it by pushing &strs onto it. Then, it counts the total size of all the &strs and creates a new String with the correct size. (The code for this is in the join_generic_copy method in str.rs.) This is bad for several reasons:
It allocates unnecessarily, obviously.
Grapheme clusters can be arbitrarily large, so the intermediate Vec can't be usefully sized in advance -- it just starts at size 1 and grows from there.
For typical strings, it allocates way more space than would actually be needed just to store the end result, because &str is usually 16 bytes in size while a UTF-8 grapheme cluster is typically much less than that.
It wastes time iterating over the intermediate Vec to get the final size where you could just take it from the original &str.
On top of all this, I wouldn't even consider this version idiomatic, because it collects into a temporary Vec in order to iterate over it, instead of just collecting the original iterator, as you had in an earlier version of your answer. This version fixes problem #3 and makes #4 irrelevant but doesn't satisfactorily address #2:
input.graphemes(true).rev().collect()
collect uses FromIterator for String, which will try to use the lower bound of the size_hint from the Iterator implementation for Graphemes. However, as I mentioned earlier, extended grapheme clusters can be arbitrarily long, so the lower bound can't be any greater than 1. Worse, &strs may be empty, so FromIterator<&str> for String doesn't know anything about the size of the result in bytes. This code just creates an empty String and calls push_str on it repeatedly.
Which, to be clear, is not bad! String has a growth strategy that guarantees amortized O(1) insertion, so if you have mostly tiny strings that won't need to be reallocated often, or you don't believe the cost of allocation is a bottleneck, using collect::<String>() here may be justified if you find it more readable and easier to reason about.
Let's go back to your original code.
let mut result = String::with_capacity(input.len());
for gc in input.graphemes(true).rev() {
result.push_str(gc);
}
This is idiomatic. collect is also idiomatic, but all collect does is basically the above, with a less accurate initial capacity. Since collect doesn't do what you want, it's not unidiomatic to write the code yourself.
There is a slightly more concise, iterator-y version that still makes only one allocation. Use the extend method, which is part of Extend<&str> for String:
fn reverse(input: &str) -> String {
let mut result = String::with_capacity(input.len());
result.extend(input.graphemes(true).rev());
result
}
I have a vague feeling that extend is nicer, but both of these are perfectly idiomatic ways of writing the same code. You should not rewrite it to use collect, unless you feel that expresses the intent better and you don't care about the extra allocation.
Related
Efficiency of flattening and collecting slices

Modifying the data of an array vs seq and passing the address of an array vs seq to asyncnet proc `send`

I've been working on a server that expects data to be received through a buffer. I have an object which is defined like this and some procedures that modify the buffer in it:
Packet* = ref object
buf*: seq[int8]
#buf*: array[0..4096, int8]
pos*: int
proc newPacket*(size: int): Packet =
result = Packet(buf: newSeq[int8](size))
#result = Packet()
proc sendPacket*(s: AsyncSocket, p: Packet) =
aSyncCheck s.send(addr(p.buf), p.pos)
Now the reason I have two lines commented is because that was the code I originally used, but creating an object that initialises an array with 4096 elements every time probably wasn't very good for performance. However, it works and the seq[int8] version does not.
The strange thing is though, my current code will work perfectly fine if I use the old static buffer buf*: array[0..4096, int8]. In sendPacket, I have made sure to check the data contained in the buffer to make sure both the array and seq[int8] versions are equal, and they are. (Or at least appear to be). In other words, if I were to do var p = createPacket(17) and write to p.buf with exactly 17 bytes, the values of the elements appear to be the same in both versions.
So despite the data appearing to be the same in both versions, I get a different result when calling send when passing the address of the buffer.
In case it matters, the data would be read like this:
result = p.buf[p.pos]
inc(p.pos)
And written to like this:
p.buf[p.pos] = cast[int8](value)
inc(p.pos)
Just a few things I've looked into, which were probably unrelated to my problem anyway: I looked at GC_ref and GC_unref which had no effect on my problem and also looked at maybe trying to use alloc0 where buf is defined as pointer but I couldn't seem to access the data of that pointer and that probably isn't what I should be doing in the first place. Also if I do var data = p.buf and pass the addr of data instead, I get a different result, but still not the intended one.
So I guess what I want to get to the bottom of is:
Why does send work perfectly fine when I use array[0..4096, int8] but not seq[int8] which is initialised with newSeq, even when they appear to contain the same data?
Does my current layout for receiving and writing data even make sense in a language like Nim (or any language for that matter)? Is there a better way?
In order not to initialize the array you can use the noinit pragma like this:
buf* {.noinit.}: array[0..4096, int8]
You are probably taking the pointer to the seq, not the pointer to the data inside the seq, so try using addr(p.buf[0]).
A pos field is useless if you are using the seq version since you have p.buf.len already, but you probably know that already and just left it in for the array. If you want to use the seq and expect large packets, make sure to use newSeqOfCap to only allocate the memory once.
Also, your array is 1 byte too big, it goes from 0 to 4096 inclusively! Instead you can use [0..4095, int8] or just [4096, int8].
Personally I would prefer to use a uint8 type inside of buf, so that you can just put in values from 0 to 255 instead of -128 to 127
Using a seq inside of a ref object means you have two layers of indirection when accessing buf, as well as two objects that the GC will have to clean up. You could just make Packet an alias for seq[uint8] (without ref): type Packet* = seq[uint8]. Or you can use the array version if you want to store some more data inside the Packet later on.

How can I convert a Vec<T> into a C-friendly *mut T?

I have a Rust library that returns a u8 array to a C caller via FFI. The library also handles dropping the array after the client is done with it. The library has no state, so the client needs to own the array until it is passed back to the library for freeing.
Using box::from_raw and boxed::into_raw would be nice, but I couldn't manage to work out how to convert the array into the return type.
A Vec<T> is described by 3 values:
A pointer to its first element, that can be obtained with .as_mut_ptr()
A length, that can be obtained with .len()
A capacity, that can be obtained with .capacity()
In terms of a C array, the capacity is the size of memory allocated, while the length is the number of elements actually contained in the array. Both are counting in number of T. You normally would need to provide these 3 values to your C code.
If you want them to be equals, you can use .shrink_to_fit() on the vector to reduce its capacity as near as its size as possible depending on the allocator.
If you give back the ownership of the Vec<T> to your C code, don't forget to call std::mem::forget(v) on it once you have retrieved the 3 values described before, to avoid having its destructor running at the end of the function.
Afterwards, you can create back a Vec from these 3 values using from_raw_parts(..) like this:
let v = unsafe { Vec::<T>::from_raw_parts(ptr, length, capacity) };
and when its destructor will run the memory will be correctly freed. Be careful, the 3 values need to be correct for deallocation of memory to be correct. It's not very important for a Vec<u8>, but the destructor of Vec will run the destructor of all data it contains according to its length.

Resources