Take the example below:
static COUNT: AtomicUsize = AtomicUsize::new(0);
#[proc_macro_attribute]
pub fn count_usages(_attr: TokenStream, item: TokenStream) -> TokenStream {
let c = COUNT.fetch_add(1, Ordering::AcqRel);
println!("Do stuff with c: {}", c);
item
}
Although the order attributes are processed may differ, will the final count be the same each time for cases such as:
Incremental building
Registry crates and local crates sharing the same proc_macro library and version
Compiler internal parallelism
A practical use case (mine in particular) is generating a compile-time pseudo-static variable memory layout that will be reused in multiple memory managers within a statically linked executable.
While some things might work with the current implementation, procedural macros should not use global variables and expect them to be preserved between calls.
There is currently no official mechanism for storing state between invocations of a procedural macro.
The issue “Crate local state for procedural macros?” mentions some of these points:
Proc macros may not be run on every compilation, for instance if incremental compilation is on and they are in a module that is clean
There is no guarantee of ordering -- if do_it! needs data from all config! invocations, that's a problem.
Properly supporting this feature means adding a new API
Related
Often times in embedded setting we need to declare static structs (drivers etc) so that
their memory is known and assigned at compile time.
Is there any way to achieve something similar in rust?
For example, I want to have a uart driver struct
struct DriverUart{
...
}
and an associated impl block.
Now, I want to avoid having a function named new(), and instead, I want to somewhere allocate this memory a-priori (or to have a new function that I can call statically outside
any code block).
In C I would simply put an instantiation of this struct in some header file and it will be statically allocated and globally available.
I haven't found any thing similar in rust.
If it is not possible then why? and what is the best why we can achieve something similar?
Thanks!
Now, I want to avoid having a function named new(), and instead, I want to somewhere allocate this memory a-priori (or to have a new function that I can call statically outside any code block). In C I would simply put an instantiation of this struct in some header file and it will be statically allocated and globally available. I haven't found any thing similar in rust. If it is not possible then why? and what is the best why we can achieve something similar?
https://doc.rust-lang.org/std/keyword.static.html
You can do the same in Rust, without the header, as long as all the elements are const:
struct DriverUart {
whatever: u32
}
static thing: DriverUart = DriverUart { whatever: 5 };
If you need to evaluate non-const expressions, then that obviously will not work and you'll need to use something like lazy_static or once_cell to instantiate simili-statics.
And of course, what with Rust being a safe languages and statics being shared state, mutable statics are wildly unsafe if not mitigated via thread-safe interior-mutability containers (e.g. an atomic, or a Mutex though those are currently non-const, and it's unclear if they can ever be otherwise), a static is considered to always be shared between threads.
Take a look at the following simple example:
use std::rc::Rc;
struct MyStruct {
a: i8,
}
fn main() {
let mut my_struct = MyStruct { a: 0 };
my_struct.a = 5;
let my_struct_rc = Rc::new(my_struct);
println!("my_struct_rc.a = {}", my_struct_rc.a);
}
The official documentation of Rc says:
The type Rc<T> provides shared ownership of a value of type T,
allocated in the heap.
Theoretically it is clear. But, firstly my_struct is not immediately wrapped into Rc, and secondly MyStruct is a very simple type. I can see 2 scenarios here.
When my_struct is moved into the Rc the memory content is literally copied from the stack to the heap.
The compiler is able to resolve that my_struct will be moved into the Rc, so it puts it on the heap from the beginning.
If number 1 is true, then there might be a hidden performance bottleneck as when reading the code one does not explicitly see memory being copied (I am assuming MyStruct being much more complex).
If number 2 is true, I wonder whether the compiler is always able to resolve such things. The provided example is very simple, but I can imagine that my_struct is much more complex and is mutated several times by different functions before being moved to the Rc.
Tl;dr It could be either scenario, but for the most part, you should just write code in the most obvious way and let the compiler worry about it.
According to the semantics of the abstract machine, that is, the theoretical model of computation that defines Rust's behavior, there is always a copy. In fact, there are at least two: my_struct is first created in the stack frame of main, but then has to be moved into the stack frame of Rc::new. Then Rc::new has to create an allocation and move my_struct a second time, from its own stack frame into the newly allocated memory*. Each of these moves is conceptually a copy.
However, this analysis isn't particularly useful for predicting the performance of code in practice, for three reasons:
Copies are actually pretty darn cheap. Moving my_struct from one place to another may actually be much cheaper, in the long run, than referencing it with a pointer. Copying a chunk of bytes is easy to optimize on modern processors; following a pointer to some arbitrary location is not. (Bear in mind also that the complexity of the structure is irrelevant because all moves are bytewise copies; for instance, moving any Vec is just copying three usizes regardless of the contents.)
If you haven't measured the performance and shown that excessive copying is a problem, you must not assume that it is without evidence: you may accidentally pessimize instead of optimizing your code. Measure first.
The semantics of the abstract machine is not the semantics of your real machine. The whole point of an optimizing compiler is to figure out the best way to transform one to the other. Under reasonable assumptions, it's very unlikely that the code here would result in 2 copies with optimizations turned on. But how the compiler eliminates one or both copies may be dependent on the rest of the code: not just on the snippet that contains them but on how the data is initialized and so forth. Real machine performance is complicated and generally requires analysis of more than just a few lines at a time. Again, this is the whole point of an optimizing compiler: it can do a much more comprehensive analysis, much faster than you or I can.
Even if the compiler leaves a copy "on the table", you shouldn't assume without evidence that removing the copy would make things better simply because it is a copy. Measure first.
It probably doesn't matter anyway, in this case. Requesting a new allocation from the heap is likely† more expensive than copying a bunch of bytes from one place to another, so fiddling around with 1 fast copy vs. no copies while ignoring a (plausible) big bottleneck is probably a waste of time. Don't try to optimize things before you've profiled your application or library to see where the most performance is being lost. Measure first.
See also
Questions about overflowing the stack by accidentally putting large data on it (to which the solution is usually to use Vec instead of an array):
How to allocate arrays on the heap in Rust 1.0?
Thread '<main>' has overflowed its stack when allocating a large array using Box
* Rc, although part of the standard library, is written in plain Rust code, which is how I analyze it here. Rc could theoretically be subject to guaranteed optimizations that aren't available to ordinary code, but that doesn't happen to be relevant to this case.
† Depending at least on the allocator and on whether new memory must be acquired from the OS or if a recently freed allocation can be re-used.
You can just test what happens:
Try to use my_struct after creating an Rc out of it. The value has been moved, so you can't use it.
use std::rc::Rc;
struct MyStruct {
a: i8,
}
fn main() {
let mut my_struct = MyStruct { a: 0 };
my_struct.a = 5;
let my_struct_rc = Rc::new(my_struct);
println!("my_struct_rc.a = {}", my_struct_rc.a);
// Add this line. Compilation error "borrow of moved value"
println!("my_struct.a = {}", my_struct.a);
}
Make your struct implement the Copy trait, and it will be automatically copied into the Rc::new function. Now the code above works, because the my_struct variable is not moved anywhere, just copied.
#[derive(Clone, Copy)]
struct MyStruct {
a: i8,
}
The compiler is able to resolve that my_struct will be moved into the Rc, so it puts it on the heap from the beginning.
Take a look at Rc::new source code (removed the comment which is irrelevant).
struct RcBox<T: ?Sized> {
strong: Cell<usize>,
weak: Cell<usize>,
value: T,
}
// ...
pub fn new(value: T) -> Rc<T> {
Self::from_inner(Box::into_raw_non_null(box RcBox {
strong: Cell::new(1),
weak: Cell::new(1),
value,
}))
}
It takes the value you pass to it, and creates a Box, so it's always put on the heap. This is plain Rust and I don't think it performs too many sophisticated optimizations, but that may change.
Note that "move" in Rust may also copy data implicitly, and this may depend on the current compiler's behavior. In that case, if you are concerned about performance you can try to make the struct as small as possible, and store some information on the heap. For example when a Vec<T> is moved, as far as I know it only copies the capacity, length and pointer to the heap, but the actual array which is on the heap is not copied element by element, so only a few bytes are copied when moving a vector (assuming the data is copied, because that's also subject to compiler optimizations in case copying is not actually needed).
I just wanted to déclare a pointer to a struct in a crate shared by several component of my project but using the same process. I m meaning the aim is to get it initilized only once.
type Box = [u64; 64];
pub static mut mmaped: &mut Box;
which on compile time generate this error.
free static item without body
where mmaped is later assigned a value in the following way only one time from the top crate and it s value used from the multiple crates it depends on.
mmaped = unsafe { std::mem::transmute(addr) };
So how to provide a definition mmaped without having to mmap it more than one time?
This question isn t a duplicate of this one as it doesn t talk about exporting the singleton outside the crate and I m getting a compiler error specifically for doing it.
I'm trying to understand what exactly the Rust aliasing/memory model allows. In particular I'm interested in when accessing memory outside the range you have a reference to (which might be aliased by other code on the same or different threads) becomes undefined behaviour.
The following examples all access memory outside what is ordinarily allowed, but in ways that would be safe if the compiler produced the obvious assembly code. In addition, I see little conflict potential with compiler optimization, but they might still violate strict aliasing rules of Rust or LLVM thus constituting undefined behavior.
The operations are all properly aligned and thus cannot cross a cache-line or page boundary.
Read the aligned 32-bit word surrounding the data we want to access and discard the parts outside of what we're allowed to read.
Variants of this could be useful in SIMD code.
pub fn read(x: &u8) -> u8 {
let pb = x as *const u8;
let pw = ((pb as usize) & !3) as *const u32;
let w = unsafe { *pw }.to_le();
(w >> ((pb as usize) & 3) * 8) as u8
}
Same as 1, but reads the 32-bit word using an atomic_load intrinsic.
pub fn read_vol(x: &u8) -> u8 {
let pb = x as *const u8;
let pw = ((pb as usize) & !3) as *const AtomicU32;
let w = unsafe { (&*pw).load(Ordering::Relaxed) }.to_le();
(w >> ((pb as usize) & 3) * 8) as u8
}
Replace the aligned 32-bit word containing the value we care about using CAS. It overwrites the parts outside what we're allowed to access with what's already in there, so it only affects the parts we're allowed to access.
This could be useful to emulate small atomic types using bigger ones. I used AtomicU32 for simplicity, in practice AtomicUsize is the interesting one.
pub fn write(x: &mut u8, value:u8) {
let pb = x as *const u8;
let atom_w = unsafe { &*(((pb as usize) & !3) as *const AtomicU32) };
let mut old = atom_w.load(Ordering::Relaxed);
loop {
let shift = ((pb as usize) & 3) * 8;
let new = u32::from_le((old.to_le() & 0xFF_u32 <<shift)|((value as u32) << shift));
match atom_w.compare_exchange_weak(old, new, Ordering::SeqCst, Ordering::Relaxed) {
Ok(_) => break,
Err(x) => old = x,
}
}
}
This is a very interesting question.
There are actually several issues with these functions, making them unsound (i.e., not safe to expose) for various formal reasons.
At the same time, I am unable to actually construct a problematic interaction between these functions and compiler optimizations.
Out-of-bounds accesses
I'd say all of these functions are unsound because they can access unallocated memory. Each of them I can call with a &*Box::new(0u8) or &mut *Box::new(0u8), resulting in out-of-bounds accesses, i.e. accesses beyond what was allocated using malloc (or whatever allocator). Neither C nor LLVM permit such accesses. (I'm using the heap because I find it easier to think about allocations there, but the same applies to the stack where every stack variable is really its own independent allocation.)
Granted, the LLVM language reference doesn't actually define when a load has undefined behavior due to the access not being inside the object. However, we can get a hint in the documentation of getlementptr inbounds, which says
The in bounds addresses for an allocated object are all the addresses that point into the object, plus the address one byte past the end.
I am fairly certain that being in bounds is a necessary but not sufficient requirement for actually using an address with load/store.
Note that this is independent of what happens on the assembly level; LLVM will do optimizations based on a much higher-level memory model that argues in terms of allocated blocks (or "objects" as C calls them) and staying within the bounds of these blocks.
C (and Rust) are not assembly, and it is not possible to use assembly-based reasoning on them.
Most of the time it is possible to derive contradictions from assembly-based reasoning (see e.g. this bug in LLVM for a very subtle example: casting a pointer to an integer and back is not a NOP).
This time, however, the only examples I can come up with are fairly far-fetched: For example, with memory-mapped IO, even reads from a location could "mean" something to the underlying hardware, and there could be such a read-sensitive location sitting right next to the one that's passed into read.
But really I don't know much about this kind of embedded/driver development, so this may be entirely unrealistic.
(EDIT: I should add that I am not an LLVM expert. Probably the llvm-dev mailing list is a better place to determine if they are willing to commit to permitting such out-of-bounds accesses.)
Data races
There is another reason at least some of these functions are not sound: Concurrency. You clearly already saw this coming, judging from the use of concurrent accesses.
Both read and read_vol are definitely unsound under the concurrency semantics of C11. Imagine x is the first element of a [u8], and another thread is writing to the second element at the same time as we execute read/read_vol. Our read of the whole 32bit word overlaps with the other thread's write. This is a classical "data race": Two threads accessing the same location at the same time, one access being a write, and one access not being atomic. Under C11, any data race is UB so we are out. LLVM is slightly more permissive so both read and read_val are probably allowed, but right now Rust declares that it uses the C11 model.
Also note that "vol" is a bad name (assuming you meant this as short-hand for "volatile") -- in C, atomicity has nothing to do with volatile! It is literally impossible to write correct concurrent code when using volatile and not atomics. Unfortunately, Java's volatile is about atomicity, but that's a very different volatile than the one in C.
And finally, write also introduces a data race between an atomic read-modify-update and a non-atomic write in the other thread, so it is UB in C11 as well. And this time it is also UB in LLVM: Another thread could be reading from one of the extra locations that write affects, so calling write would introduce a data race between our writing and the other thread's reading. LLVM specifies that in this case, the read returns undef. So, calling write can make safe accesses to the same location in other threads return undef, and subsequently trigger UB.
Do we have any examples of issues caused by these functions?
The frustrating part is, while I found multiple reasons to rule out your functions following the spec(s), there seems to be no good reason that these functions are ruled out! The read and read_vol concurrency issues are fixed by LLVM's model (which however has other problems, compared to C11), but write is illegal in LLVM just because read-write data races make the read return undef -- and in this case we know we are writing the same value that was already stored in these other bytes! Couldn't LLVM just say that in this special case (writing the value that's already there), the read must return that value? Probably yes, but this stuff is subtle enough that I would also not be surprised if that invalidates some obscure optimization.
Moreover, at least on non-embedded platforms the out-of-bounds accesses done by read are unlikely to cause actual trouble. I guess one could imagine a semantics which returns undef when reading an out-of-bounds byte that is guaranteed to sit on the same page as an in-bounds byte. But that would still leave write illegal, and that is a really tough one: write can only be allowed if the memory on these other locations is left absolutely unchanged. There could be arbitrary data sitting there from other allocations, parts of the stack frame, whatever. So somehow the formal model would have to let you read those other bytes, not allow you to gain anything by inspecting them, but also verify that you are not changing the bytes before writing them back with a CAS. I'm not aware of any model that would let you do that. But I thank you for bringing these nasty cases to my attention, it's always good to know that there is still plenty of stuff left to research in terms of memory models :)
Rust's aliasing rules
Finally, what you were probably wondering about is whether these functions violate any of the additional aliasing rules that Rust adds. The trouble is, we don't know -- these rules are still under development. However, all the proposals I have seen so far would indeed rule out your functions: When you hold an &mut u8 (say, one that points right next to the one that's passed to read/read_vol/write), the aliasing rules provide a guarantee that no access whatsoever will happen to that byte by anyone but you. So, your functions reading from memory that others could hold a &mut u8 to already makes them violate the aliasing rules.
However, the motivation for these rules is to conform with the C11 concurrency model and LLVM's rules for memory access. If LLVM declares something UB, we have to make it UB in Rust as well unless we are willing to change our codegen in a way that avoids the UB (and typically sacrifices performance). Moreover, given that Rust adopted the C11 concurrency model, the same holds true for that. So for these cases, the aliasing rules really don't have any choice but make these accesses illegal. We could revisit this once we have a more permissive memory model, but right now our hands are bound.
I'm looking for a way to take a large object and break it into smaller mutable child objects, which can be processed in parallel.
Something like:
struct PixelBuffer { data:Vec<u32>, width:u32, height:u32 }
struct PixelBlock { data:Vec<u32> }
impl PixelBuffer {
fn decompose(&'a mut self) -> Vec<Guard<'a, PixelBlock>>> {
...
}
}
Where the resulting PixelBlock's can be processed in parallel, and the parent PixelBuffer will remain locked until all Guard<PixelBlock> are dropped.
This is effectively mutable pointer aliasing; the large data block in PixelBuffer will be directly modified via each PixelBlock.
However, each PixelBlock is non-overlapping segment from the internal data in PixelBuffer.
You can certainly do this in unsafe code (internal buffer is a raw pointer; generate a new external pointer for each PixelBlock); but is it possible to achieve the same result using safe code?
(NB. I'm open to using a data block allocated from libc::malloc if that'll help?)
This works fine and is a natural consequence of how, e.g., iterators work: the next method hands out a sequence of values that are not lifetime-connected to the reference they come from, i.e. fn next(&mut self) -> Option<Self::Item>. This automatically means that any iterator that yields &mut pointers (like, slice.iter_mut()) is yielding pointers to non-overlapping memory, because anything else would be incorrect.
One way to use this in parallel is something like my simple_parallel library, e.g. Pool::for_.
(You'll need to give more details about the internals of PixelBuffer to be more specific about how to do it in this case.)
There is no way to completely avoid unsafe Rust, because the compiler cannot currently evaluate the safety of sub-slices. However, the standard library contains code that provides a safe wrapper that you can use.
Read up on std::slice::Chunks and std::slice::ChunksMut.
Sample code: https://play.rust-lang.org/?gist=ceec5be3e1530c0a6d3b&version=stable
However, your next problem is sending the slices to separate threads, because the best way to do that would be thread::scoped, which is currently deprecated due to some safety problems that were discovered this year...
Also, keep in mind that Vec<_> owns its contents, whereas slices are just a view. Generally, you want to write most functions in terms of slices, and keep only one "Vec" to hold the data.