I have an 8k buffer of bytes. The data in the first part of the buffer is highly structured, with a set of variables of size u32 and u16. The data in the later part of the buffer is for small blobs of bytes. After the structured data, there is an array of offsets that point to the start of each small blob. Something like this:
struct MyBuffer {
myvalue: u32,
myothervalue: u16,
offsets: [u16], // of unknown length
bytes: [u8] // fills the rest of the buffer
}
I'm looking for an efficient way to take an 8k blob of bytes fetched from disk, and then overlay it or cast it to the MyBuffer struct. In this way I can get/set the structured values easily (let myvar = my_buffer.myvalue), and I can also access the small blobs as slices (let myslice = my_buffer[offsets[2]..offsets[3]]).
The benefit of this approach is you get efficient, zero-copy access to the data.
The fact that the number of offsets and the number of blobs of bytes is unknown makes this tricky.
In C, it's easy; you just cast a pointer to the 8k buffer to the appropriate struct and it just works. You have two different data structures pointing at the same memory.
How can I do the same thing in Rust?
I have discovered that there is an entire Rust ecosystem dedicated to solving this problem. Rust itself handles it poorly.
There are many Rust serialization frameworks, including those that are "zero-copy". The best list is here: https://github.com/djkoloski/rust_serialization_benchmark
The zero-copy frameworks include abomonation, capnp, flatbuffers, rkyv, and alkahest.
Related
Multiple structures in rust have shrink_to or shrink_to_fit methods, such as Vec and String. But apparently there's nothing like shrink_from_to.
Why would I want that?
Assume I've a XY gigabyte string or vector in memory and know the exact start and end positions of the part I am interesting in (which allocates only Z GB from start to end, somewhere in the middle). I could call truncate and then shrink_from_to effectivly freeing memory.
However I've still gigabytes of memory occupied by [0..start] which are of no relevance for my further processing.
Question
Is there any way to free this memory too without reallocating and copying the relevant parts there?
Note that shrink_to_fit does reallocate by copying into a smaller buffer and freeing the old buffer. Your best bet is probably just converting the slice you care about into an owned Vec and then dropping the original Vec.
fn main() {
let v1 = (0..1000).collect::<Vec<_>>(); // needs to be dropped
println!("{}", v1.capacity()); // 1000
let v2 = v1[100..150].to_owned(); // don't drop this!
println!("{}", v2.capacity()); // 50
drop(v1);
}
Is there any way to free this memory too without reallocating and copying the relevant parts there?
Move the segment you want to keep to the start of the collection (e.g. replace_range, drain, copy_within, rotate, ...), then truncate, then shrink.
APIs like realloc and mremap work in terms of "memory blocks" (aka allocations returned by malloc/mmap), they don't work in terms of random pointers. So a hypothetical shrink_from_to would just be doing that under the cover, since you can't really resize allocations from both ends.
I'm working on a library that help transact types that fit in a pointer-size int over FFI boundaries. Suppose I have a struct like this:
use std::mem::{size_of, align_of};
struct PaddingDemo {
data: u8,
force_pad: [usize; 0]
}
assert_eq!(size_of::<PaddingDemo>(), size_of::<usize>());
assert_eq!(align_of::<PaddingDemo>(), align_of::<usize>());
This struct has 1 data byte and 7 padding bytes. I want to pack an instance of this struct into a usize and then unpack it on the other side of an FFI boundary. Because this library is generic, I'm using MaybeUninit and ptr::write:
use std::ptr;
use std::mem::MaybeUninit;
let data = PaddingDemo { data: 12, force_pad: [] };
// In order to ensure all the bytes are initialized,
// zero-initialize the buffer
let mut packed: MaybeUninit<usize> = MaybeUninit::zeroed();
let ptr = packed.as_mut_ptr() as *mut PaddingDemo;
let packed_int = unsafe {
std::ptr::write(ptr, data);
packed.assume_init()
};
// Attempt to trigger UB in Miri by reading the
// possibly uninitialized bytes
let copied = unsafe { ptr::read(&packed_int) };
Does that assume_init call triggered undefined behavior? In other words, when the ptr::write copies the struct into the buffer, does it copy the uninitialized-ness of the padding bytes, overwriting the initialized state as zero bytes?
Currently, when this or similar code is run in Miri, it doesn't detect any Undefined Behavior. However, per the discussion about this issue on github, ptr::write is supposedly allowed to copy those padding bytes, and furthermore to copy their uninitialized-ness. Is that true? The docs for ptr::write don't talk about this at all, nor does the nomicon section on uninitialized memory.
Does that assume_init call triggered undefined behavior?
Yes. "Uninitialized" is just another value that a byte in the Rust Abstract Machine can have, next to the usual 0x00 - 0xFF. Let us write this special byte as 0xUU. (See this blog post for a bit more background on this subject.) 0xUU is preserved by copies just like any other possible value a byte can have is preserved by copies.
But the details are a bit more complicated.
There are two ways to copy data around in memory in Rust.
Unfortunately, the details for this are also not explicitly specified by the Rust language team, so what follows is my personal interpretation. I think what I am saying is uncontroversial unless marked otherwise, but of course that could be a wrong impression.
Untyped / byte-wise copy
In general, when a range of bytes is being copied, the source range just overwrites the target range -- so if the source range was "0x00 0xUU 0xUU 0xUU", then after the copy the target range will have that exact list of bytes.
This is what memcpy/memmove in C behave like (in my interpretation of the standard, which is not very clear here unfortunately). In Rust, ptr::copy{,_nonoverlapping} probably performs a byte-wise copy, but it's not actually precisely specified right now and some people might want to say it is typed as well. This was discussed a bit in this issue.
Typed copy
The alternative is a "typed copy", which is what happens on every normal assignment (=) and when passing values to/from a function. A typed copy interprets the source memory at some type T, and then "re-serializes" that value of type T into the target memory.
The key difference to a byte-wise copy is that information which is not relevant at the type T is lost. This is basically a complicated way of saying that a typed copy "forgets" padding, and effectively resets it to uninitialized. Compared to an untyped copy, a typed copy loses more information. Untyped copies preserve the underlying representation, typed copies just preserve the represented value.
So even when you transmute 0usize to PaddingDemo, a typed copy of that value can reset this to "0x00 0xUU 0xUU 0xUU" (or any other possible bytes for the padding) -- assuming data sits at offset 0, which is not guaranteed (add #[repr(C)] if you want that guarantee).
In your case, ptr::write takes an argument of type PaddingDemo, and the argument is passed via a typed copy. So already at that point, the padding bytes may change arbitrarily, in particular they may become 0xUU.
Uninitialized usize
Whether your code has UB then depends on yet another factor, namely whether having an uninitialized byte in a usize is UB. The question is, does a (partially) uninitialized range of memory represent some integer? Currently, it does not and thus there is UB. However, whether that should be the case is heavily debated and it seems likely that we will eventually permit it.
Many other details are still unclear, though -- for example, transmuting "0x00 0xUU 0xUU 0xUU" to an integer may well result in a fully uninitialized integer, i.e., integers may not be able to preserve "partial initialization". To preserve partially initialized bytes in integers we would have to basically say that an integer has no abstract "value", it is just a sequence of (possibly uninitialized) bytes. This does not reflect how integers get used in operations like /. (Some of this also depends on LLVM decisions around poison and freeze; LLVM might decide that when doing a load at integer type, the result is fully poison if any input byte is poison.) So even if the code is not UB because we permit uninitialized integers, it may not behave as expected because the data you want to transfer is being lost.
If you want to transfer raw bytes around, I suggest to use a type suited for that, such as MaybeUninit. If you use an integer type, the goal should be to transfer integer values -- i.e., numbers.
In C++ when joining a bunch of strings (where each element's size is known roughly), it's common to pre-allocate memory to avoid multiple re-allocations and moves:
std::vector<std::string> words;
constexpr size_t APPROX_SIZE = 20;
std::string phrase;
phrase.reserve((words.size() + 5) * APPROX_SIZE); // <-- avoid multiple allocations
for (const auto &w : words)
phrase.append(w);
Similarly, I did this in Rust (this chunk needs the unicode-segmentation crate)
fn reverse(input: &str) -> String {
let mut result = String::with_capacity(input.len());
for gc in input.graphemes(true /*extended*/).rev() {
result.push_str(gc)
}
result
}
I was told that the idiomatic way of doing it is a single expression
fn reverse(input: &str) -> String {
input
.graphemes(true /*extended*/)
.rev()
.collect::<Vec<&str>>()
.concat()
}
While I really like it and want to use it, from a memory allocation point of view, would the former allocate less chunks than the latter?
I disassembled this with cargo rustc --release -- --emit asm -C "llvm-args=-x86-asm-syntax=intel" but it doesn't have source code interspersed, so I'm at a loss.
Your original code is fine and I do not recommend changing it.
The original version allocates once: inside String::with_capacity.
The second version allocates at least twice: first, it creates a Vec<&str> and grows it by pushing &strs onto it. Then, it counts the total size of all the &strs and creates a new String with the correct size. (The code for this is in the join_generic_copy method in str.rs.) This is bad for several reasons:
It allocates unnecessarily, obviously.
Grapheme clusters can be arbitrarily large, so the intermediate Vec can't be usefully sized in advance -- it just starts at size 1 and grows from there.
For typical strings, it allocates way more space than would actually be needed just to store the end result, because &str is usually 16 bytes in size while a UTF-8 grapheme cluster is typically much less than that.
It wastes time iterating over the intermediate Vec to get the final size where you could just take it from the original &str.
On top of all this, I wouldn't even consider this version idiomatic, because it collects into a temporary Vec in order to iterate over it, instead of just collecting the original iterator, as you had in an earlier version of your answer. This version fixes problem #3 and makes #4 irrelevant but doesn't satisfactorily address #2:
input.graphemes(true).rev().collect()
collect uses FromIterator for String, which will try to use the lower bound of the size_hint from the Iterator implementation for Graphemes. However, as I mentioned earlier, extended grapheme clusters can be arbitrarily long, so the lower bound can't be any greater than 1. Worse, &strs may be empty, so FromIterator<&str> for String doesn't know anything about the size of the result in bytes. This code just creates an empty String and calls push_str on it repeatedly.
Which, to be clear, is not bad! String has a growth strategy that guarantees amortized O(1) insertion, so if you have mostly tiny strings that won't need to be reallocated often, or you don't believe the cost of allocation is a bottleneck, using collect::<String>() here may be justified if you find it more readable and easier to reason about.
Let's go back to your original code.
let mut result = String::with_capacity(input.len());
for gc in input.graphemes(true).rev() {
result.push_str(gc);
}
This is idiomatic. collect is also idiomatic, but all collect does is basically the above, with a less accurate initial capacity. Since collect doesn't do what you want, it's not unidiomatic to write the code yourself.
There is a slightly more concise, iterator-y version that still makes only one allocation. Use the extend method, which is part of Extend<&str> for String:
fn reverse(input: &str) -> String {
let mut result = String::with_capacity(input.len());
result.extend(input.graphemes(true).rev());
result
}
I have a vague feeling that extend is nicer, but both of these are perfectly idiomatic ways of writing the same code. You should not rewrite it to use collect, unless you feel that expresses the intent better and you don't care about the extra allocation.
Related
Efficiency of flattening and collecting slices
I have a Rust library that returns a u8 array to a C caller via FFI. The library also handles dropping the array after the client is done with it. The library has no state, so the client needs to own the array until it is passed back to the library for freeing.
Using box::from_raw and boxed::into_raw would be nice, but I couldn't manage to work out how to convert the array into the return type.
A Vec<T> is described by 3 values:
A pointer to its first element, that can be obtained with .as_mut_ptr()
A length, that can be obtained with .len()
A capacity, that can be obtained with .capacity()
In terms of a C array, the capacity is the size of memory allocated, while the length is the number of elements actually contained in the array. Both are counting in number of T. You normally would need to provide these 3 values to your C code.
If you want them to be equals, you can use .shrink_to_fit() on the vector to reduce its capacity as near as its size as possible depending on the allocator.
If you give back the ownership of the Vec<T> to your C code, don't forget to call std::mem::forget(v) on it once you have retrieved the 3 values described before, to avoid having its destructor running at the end of the function.
Afterwards, you can create back a Vec from these 3 values using from_raw_parts(..) like this:
let v = unsafe { Vec::<T>::from_raw_parts(ptr, length, capacity) };
and when its destructor will run the memory will be correctly freed. Be careful, the 3 values need to be correct for deallocation of memory to be correct. It's not very important for a Vec<u8>, but the destructor of Vec will run the destructor of all data it contains according to its length.
I am working on a Linux kernel module that requires me to check data right before it is written to a local disk. The data to be written is fetched from a remote disk. Therefore, I know that the data from the fetch is stored in the page cache. I also know that Linux has a data structure that manages block I/O requests in-flight called the bio struct.
The bio struct contains a list of structures called bio_vecs.
struct bio_vec {
/* pointer to the physical page on which this buffer resides */
struct page *bv_page;
/* the length in bytes of this buffer */
unsigned int bv_len;
/* the byte offset within the page where the buffer resides */
unsigned int bv_offset;
};
It has a list of these because the block representation in memory may not be physically contiguous. What I want to do is grab each piece of the buffer using the list of bio_vecs and put them together as one so that I could take an MD5 hash of the block. How do I use the pointer to the page, the length of the buffer and its offset to get the raw data in the buffer? Are there already functions for this or do I have to write my own?
you can use bio_data(struct bio *bio) function for accessing the data.
Accessing the data from bio_data could be troublesome as its return type is void*(so %S wont work),but it can be successfully tackle by, little type casting.
Following is the piece of code that will do the job:
char *ptr;
ptr=(char *)bio_data(bio);
for(i=0;i<4096;i++) //4096 as bio is going to be in 4kb chunk
{
printk("%c",*ptr);
ptr++;
}