Are objects accessed indirectly in D? - object

As I've read all objects in D are fully location independent. How this requirement is achieved?
One thing that comes to my mind, is that all references are not pointers to the objects, but to some proxy, so when you move object (in memory) you just update that proxy, not all references used in program.
But this is just my guess. How it is done in D for real?

edit: bottom line up front, no proxy object, objects are referenced directly through regular pointers. /edit
structs aren't allowed to keep a pointer to themselves, so if they get copied, they should continue to just work. This isn't strictly enforced by the language though:
struct S {
S* lol;
void beBad() {
lol = &this; // this compiler will allow this....
}
}
S pain() {
S s;
s.beBad();
return s;
}
void main() {
S s;
s = pain();
assert(s.lol !is &s); // but it will also move the object without notice!
}
(EDIT: actually, I guess you could use a postblit to update internal pointers, so it isn't quite without notice. If you're careful enough, you could make it work, but then again, if you're careful enough, you can shoot between your toes without hitting your foot too. EDIT2: Actually no, the compiler/runtime is still allowed to move it without even calling the postblit. One example of where this happens is if it copies a stack frame to the heap to make a closure. The struct data is moved to a new address without being informed. So yeah. /edit)
And actually, that assert isn't guaranteed to pass, the compiler might choose to call pain straight on the local object declared in main, so the pointer would work (though I'm not able to force this optimization here for a demo, generally, when you return a struct from a function, it is actually done via a hidden pointer the caller passes - the caller says "put the return value right here" thus avoiding a copy/move in some cases).
But anyway, the point just is that the compiler is free to copy or not to copy a struct at its leisure, so if you do keep the address of this around in it, it may become invalid without notice; keeping that pointer is not a compile error, but it is undefined behavior.
The situation is different with classes. Classes are allowed to keep references to this internally since a class is (in theory, realized by the garbage collector implementation)) an independent object with an infinite lifetime. While it may be moved (such as be a moving GC (not implemented in D today)), if it is moved, all references to it, internal and external, would also be required to be updated.
So classes can't have the memory pulled out from under them like structs can (unless you the programmer take matters into your own hands and bypass the GC...)
The location independent thing I'm pretty sure is referring only to structs and only to the rule that they can't have pointers to themselves. There's no magic done with references or pointers - they indeed work with memory addresses, no proxy objects.

Related

Static objects in rust

Often times in embedded setting we need to declare static structs (drivers etc) so that
their memory is known and assigned at compile time.
Is there any way to achieve something similar in rust?
For example, I want to have a uart driver struct
struct DriverUart{
...
}
and an associated impl block.
Now, I want to avoid having a function named new(), and instead, I want to somewhere allocate this memory a-priori (or to have a new function that I can call statically outside
any code block).
In C I would simply put an instantiation of this struct in some header file and it will be statically allocated and globally available.
I haven't found any thing similar in rust.
If it is not possible then why? and what is the best why we can achieve something similar?
Thanks!
Now, I want to avoid having a function named new(), and instead, I want to somewhere allocate this memory a-priori (or to have a new function that I can call statically outside any code block). In C I would simply put an instantiation of this struct in some header file and it will be statically allocated and globally available. I haven't found any thing similar in rust. If it is not possible then why? and what is the best why we can achieve something similar?
https://doc.rust-lang.org/std/keyword.static.html
You can do the same in Rust, without the header, as long as all the elements are const:
struct DriverUart {
whatever: u32
}
static thing: DriverUart = DriverUart { whatever: 5 };
If you need to evaluate non-const expressions, then that obviously will not work and you'll need to use something like lazy_static or once_cell to instantiate simili-statics.
And of course, what with Rust being a safe languages and statics being shared state, mutable statics are wildly unsafe if not mitigated via thread-safe interior-mutability containers (e.g. an atomic, or a Mutex though those are currently non-const, and it's unclear if they can ever be otherwise), a static is considered to always be shared between threads.

Does reading or writing a whole 32-bit word, even though we only have a reference to a part of it, result in undefined behaviour?

I'm trying to understand what exactly the Rust aliasing/memory model allows. In particular I'm interested in when accessing memory outside the range you have a reference to (which might be aliased by other code on the same or different threads) becomes undefined behaviour.
The following examples all access memory outside what is ordinarily allowed, but in ways that would be safe if the compiler produced the obvious assembly code. In addition, I see little conflict potential with compiler optimization, but they might still violate strict aliasing rules of Rust or LLVM thus constituting undefined behavior.
The operations are all properly aligned and thus cannot cross a cache-line or page boundary.
Read the aligned 32-bit word surrounding the data we want to access and discard the parts outside of what we're allowed to read.
Variants of this could be useful in SIMD code.
pub fn read(x: &u8) -> u8 {
let pb = x as *const u8;
let pw = ((pb as usize) & !3) as *const u32;
let w = unsafe { *pw }.to_le();
(w >> ((pb as usize) & 3) * 8) as u8
}
Same as 1, but reads the 32-bit word using an atomic_load intrinsic.
pub fn read_vol(x: &u8) -> u8 {
let pb = x as *const u8;
let pw = ((pb as usize) & !3) as *const AtomicU32;
let w = unsafe { (&*pw).load(Ordering::Relaxed) }.to_le();
(w >> ((pb as usize) & 3) * 8) as u8
}
Replace the aligned 32-bit word containing the value we care about using CAS. It overwrites the parts outside what we're allowed to access with what's already in there, so it only affects the parts we're allowed to access.
This could be useful to emulate small atomic types using bigger ones. I used AtomicU32 for simplicity, in practice AtomicUsize is the interesting one.
pub fn write(x: &mut u8, value:u8) {
let pb = x as *const u8;
let atom_w = unsafe { &*(((pb as usize) & !3) as *const AtomicU32) };
let mut old = atom_w.load(Ordering::Relaxed);
loop {
let shift = ((pb as usize) & 3) * 8;
let new = u32::from_le((old.to_le() & 0xFF_u32 <<shift)|((value as u32) << shift));
match atom_w.compare_exchange_weak(old, new, Ordering::SeqCst, Ordering::Relaxed) {
Ok(_) => break,
Err(x) => old = x,
}
}
}
This is a very interesting question.
There are actually several issues with these functions, making them unsound (i.e., not safe to expose) for various formal reasons.
At the same time, I am unable to actually construct a problematic interaction between these functions and compiler optimizations.
Out-of-bounds accesses
I'd say all of these functions are unsound because they can access unallocated memory. Each of them I can call with a &*Box::new(0u8) or &mut *Box::new(0u8), resulting in out-of-bounds accesses, i.e. accesses beyond what was allocated using malloc (or whatever allocator). Neither C nor LLVM permit such accesses. (I'm using the heap because I find it easier to think about allocations there, but the same applies to the stack where every stack variable is really its own independent allocation.)
Granted, the LLVM language reference doesn't actually define when a load has undefined behavior due to the access not being inside the object. However, we can get a hint in the documentation of getlementptr inbounds, which says
The in bounds addresses for an allocated object are all the addresses that point into the object, plus the address one byte past the end.
I am fairly certain that being in bounds is a necessary but not sufficient requirement for actually using an address with load/store.
Note that this is independent of what happens on the assembly level; LLVM will do optimizations based on a much higher-level memory model that argues in terms of allocated blocks (or "objects" as C calls them) and staying within the bounds of these blocks.
C (and Rust) are not assembly, and it is not possible to use assembly-based reasoning on them.
Most of the time it is possible to derive contradictions from assembly-based reasoning (see e.g. this bug in LLVM for a very subtle example: casting a pointer to an integer and back is not a NOP).
This time, however, the only examples I can come up with are fairly far-fetched: For example, with memory-mapped IO, even reads from a location could "mean" something to the underlying hardware, and there could be such a read-sensitive location sitting right next to the one that's passed into read.
But really I don't know much about this kind of embedded/driver development, so this may be entirely unrealistic.
(EDIT: I should add that I am not an LLVM expert. Probably the llvm-dev mailing list is a better place to determine if they are willing to commit to permitting such out-of-bounds accesses.)
Data races
There is another reason at least some of these functions are not sound: Concurrency. You clearly already saw this coming, judging from the use of concurrent accesses.
Both read and read_vol are definitely unsound under the concurrency semantics of C11. Imagine x is the first element of a [u8], and another thread is writing to the second element at the same time as we execute read/read_vol. Our read of the whole 32bit word overlaps with the other thread's write. This is a classical "data race": Two threads accessing the same location at the same time, one access being a write, and one access not being atomic. Under C11, any data race is UB so we are out. LLVM is slightly more permissive so both read and read_val are probably allowed, but right now Rust declares that it uses the C11 model.
Also note that "vol" is a bad name (assuming you meant this as short-hand for "volatile") -- in C, atomicity has nothing to do with volatile! It is literally impossible to write correct concurrent code when using volatile and not atomics. Unfortunately, Java's volatile is about atomicity, but that's a very different volatile than the one in C.
And finally, write also introduces a data race between an atomic read-modify-update and a non-atomic write in the other thread, so it is UB in C11 as well. And this time it is also UB in LLVM: Another thread could be reading from one of the extra locations that write affects, so calling write would introduce a data race between our writing and the other thread's reading. LLVM specifies that in this case, the read returns undef. So, calling write can make safe accesses to the same location in other threads return undef, and subsequently trigger UB.
Do we have any examples of issues caused by these functions?
The frustrating part is, while I found multiple reasons to rule out your functions following the spec(s), there seems to be no good reason that these functions are ruled out! The read and read_vol concurrency issues are fixed by LLVM's model (which however has other problems, compared to C11), but write is illegal in LLVM just because read-write data races make the read return undef -- and in this case we know we are writing the same value that was already stored in these other bytes! Couldn't LLVM just say that in this special case (writing the value that's already there), the read must return that value? Probably yes, but this stuff is subtle enough that I would also not be surprised if that invalidates some obscure optimization.
Moreover, at least on non-embedded platforms the out-of-bounds accesses done by read are unlikely to cause actual trouble. I guess one could imagine a semantics which returns undef when reading an out-of-bounds byte that is guaranteed to sit on the same page as an in-bounds byte. But that would still leave write illegal, and that is a really tough one: write can only be allowed if the memory on these other locations is left absolutely unchanged. There could be arbitrary data sitting there from other allocations, parts of the stack frame, whatever. So somehow the formal model would have to let you read those other bytes, not allow you to gain anything by inspecting them, but also verify that you are not changing the bytes before writing them back with a CAS. I'm not aware of any model that would let you do that. But I thank you for bringing these nasty cases to my attention, it's always good to know that there is still plenty of stuff left to research in terms of memory models :)
Rust's aliasing rules
Finally, what you were probably wondering about is whether these functions violate any of the additional aliasing rules that Rust adds. The trouble is, we don't know -- these rules are still under development. However, all the proposals I have seen so far would indeed rule out your functions: When you hold an &mut u8 (say, one that points right next to the one that's passed to read/read_vol/write), the aliasing rules provide a guarantee that no access whatsoever will happen to that byte by anyone but you. So, your functions reading from memory that others could hold a &mut u8 to already makes them violate the aliasing rules.
However, the motivation for these rules is to conform with the C11 concurrency model and LLVM's rules for memory access. If LLVM declares something UB, we have to make it UB in Rust as well unless we are willing to change our codegen in a way that avoids the UB (and typically sacrifices performance). Moreover, given that Rust adopted the C11 concurrency model, the same holds true for that. So for these cases, the aliasing rules really don't have any choice but make these accesses illegal. We could revisit this once we have a more permissive memory model, but right now our hands are bound.

Correctly storing a Rust Rc<T> in C-managed memory

I'm wrapping a Rust object to be used from Lua. I need the object to be destroyed when neither Rust code nor Lua still has a reference to it, so the obvious (to me) solution is to use Rc<T>, stored in Lua-managed memory.
The Lua API (I'm using rust-lua53 for now) lets you allocate a chunk of memory and attach methods and a finalizer to it, so I want to store an Rc<T> into that chunk of memory.
My current attempt looks like. First, creating an object:
/* Allocate a block of uninitialized memory to use */
let p = state.new_userdata(mem::size_of::<Rc<T>>() as size_t) as *mut Rc<T>;
/* Make a ref-counted pointer to a Rust object */
let rc = Rc::<T>::new(...);
/* Store the Rc */
unsafe { ptr::write(p, rc) };
And in the finaliser:
let p: *mut Rc<T> = ...; /* Get a pointer to the item to finalize */
unsafe { ptr::drop_in_place(p) }; /* Release the object */
Now this seems to work (as briefly tested by adding a println!() to the drop method). But is it correct and safe (as long as I make sure it's not accessed after finalization)? I don't feel confident enough in unsafe Rust to be sure that it's ok to ptr::write an Rc<T>.
I'm also wondering about, rather than storing an Rc<T> directly, storing an Option<Rc<T>>; then instead of drop_in_place() I would ptr::swap() it with None. This would make it easy to handle any use after finalization.
Now this seems to work (as briefly tested by adding a println!() to the drop method). But is it correct and safe (as long as I make sure it's not accessed after finalisation)? I don't feel confident enough in unsafe Rust to be sure that it's ok to ptr::write an Rc<T>.
Yes, you may ptr::write any Rust type to any memory location. This "leaks" the Rc<T> object, but writes a bit-equivalent to the target location.
When using it, you need to guarantee that no one modified it outside of Rust code and that you are still in the same thread as the one where it was created. If you want to be able to move across threads, you need to use Arc.
Rust's thread safety cannot protect you here, because you are using raw pointers.
I'm also wondering about, rather than storing an Rc<T> directly, storing an Option<Rc<T>>; then instead of drop_in_place() I would ptr::swap() it with None. This would make it easy to handle any use after finalisation.
The pendant to ptr::write is ptr::read. So if you can guarantee that no one ever tries to ptr::read or drop_in_place() the object, then you can just call ptr::read (which returns the object) and use that object as you would use any other Rc<T> object. You don't need to care about dropping or anything, because now it's back in Rust's control.
You should also be using new_userdata_typed instead of new_userdata, since that takes the memory handling off your hands. There are other convenience wrapper functions ending with the postfix _typed for most userdata needs.
Your code will work; of course, note that the drop_in_place(p) will just decrease the counter of the Rc and only drop the contained T if and only if it was the last reference, which is the correct action.

Assign string to zmq::message_t without copying

I need to do some high performance c++ stuff and that is why I need to avoid copying data whenever possible.
Therefore I want to directly assign a string buffer to a zmq::message_t object without copying it. But there seems to be some deallocation of the string which avoids successful sending.
Here is the piece of code:
for (pair<int, string> msg : l) {
comm_out.send_int(msg.first);
comm_out.send_int(t_id);
int size = msg.second.size();
zmq::message_t m((void *) std::move(msg.second).data(), size, NULL, NULL);
comm_out.send_frame_msg(m, false); // some zmq-wrapper class
}
How can I avoid that the string is deallocated before the message is send out? And when is the string deallocated exactly?
Regards
I think that zmq::message_t m((void *) std::move(msg.second).data()... is probably undefined behaviour, but is certainly the cause of your problem. In this instance, std::move isn't doing what I suspect you think it does.
The call to std::move is effectively creating an anonymous temporary of a string, moving the contents of msg.second into it, then passing a pointer to that temporary data into the message_t constructor. The 0MQ code assumes that pointer is valid, but the temporary object is destroyed after the constructor of message_t completes - i.e. before you call send_frame.
Zero-copy is a complicated matter in 0mq (see the 0MQ Guide) for more details, but you have to ensure that the data that hasn't been copied is valid until 0MQ tells you explicitly that it's finished with it.
Using C++ strings in this situation is hard, and requires a lot of thought. Your question about how to "avoid that the string is deallocated..." goes right to the heart of the issue. The only answer to that is "with great care".
In short, are you sure you need zero-copy at all?

Is this a safe version of double-checked locking?

Slightly modified version of canonical broken double-checked locking from Wikipedia:
class Foo {
private Helper helper = null;
public Helper getHelper() {
if (helper == null) {
synchronized(this) {
if (helper == null) {
// Create new Helper instance and store reference on
// stack so other threads can't see it.
Helper myHelper = new Helper();
// Atomically publish this instance.
atomicSet(helper, myHelper);
}
}
}
return helper;
}
}
Does simply making the publishing of the newly created Helper instance atomic make this double checked locking idiom safe, assuming that the underlying atomic ops library works properly? I realize that in Java, one could just use volatile, but even though the example is in pseudo-Java, this is supposed to be a language-agnostic question.
See also:
Double checked locking Article
It entirely depends on the exact memory model of your platform/language.
My rule of thumb: just don't do it. Lock-free (or reduced lock, in this case) programming is hard and shouldn't be attempted unless you're a threading ninja. You should only even contemplate it when you've got profiling proof that you really need it, and in that case you get the absolute best and most recent book on threading for that particular platform and see if it can help you.
I don't think you can answer the question in a language-agnostic fashion without getting away from code completely. It all depends on how synchronized and atomicSet work in your pseudocode.
The answer is language dependent - it comes down to the guarantees provided by atomicSet().
If the construction of myHelper can be spread out after the atomicSet() then it doesn't matter how the variable is assigned to the shared state.
i.e.
// Create new Helper instance and store reference on
// stack so other threads can't see it.
Helper myHelper = new Helper(); // ALLOCATE MEMORY HERE BUT DON'T INITIALISE
// Atomically publish this instance.
atomicSet(helper, myHelper); // ATOMICALLY POINT UNINITIALISED MEMORY from helper
// other thread gets run at this time and tries to use helper object
// AT THE PROGRAMS LEISURE INITIALISE Helper object.
If this is allowed by the language then the double checking will not work.
Using volatile would not prevent a multiple instantiations - however using the synchronize will prevent multiple instances being created. However with your code it is possible that helper is returned before it has been setup (thread 'A' instantiates it, but before it is setup thread 'B' comes along, helper is non-null and so returns it straight away. To fix that problem, remove the first if (helper == null).
Most likely it is broken, because the problem of a partially constructed object is not addressed.
To all the people worried about a partially constructed object:
As far as I understand, the problem of partially constructed objects is only a problem within constructors. In other words, within a constructor, if an object references itself (including it's subclass) or it's members, then there are possible issues with partial construction. Otherwise, when a constructor returns, the class is fully constructed.
I think you are confusing partial construction with the different problem of how the compiler optimizes the writes. The compiler can choose to A) allocate the memory for the new Helper object, B) write the address to myHelper (the local stack variable), and then C) invoke any constructor initialization. Anytime after point B and before point C, accessing myHelper would be a problem.
It is this compiler optimization of the writes, not partial construction that the cited papers are concerned with. In the original single-check lock solution, optimized writes can allow multiple threads to see the member variable between points B and C. This implementation avoids the write optimization issue by using a local stack variable.
The main scope of the cited papers is to describe the various problems with the double-check lock solution. However, unless the atomicSet method is also synchronizing against the Foo class, this solution is not a double-check lock solution. It is using multiple locks.
I would say this all comes down to the implementation of the atomic assignment function. The function needs to be truly atomic, it needs to guarantee that processor local memory caches are synchronized, and it needs to do all this at a lower cost than simply always synchronizing the getHelper method.
Based on the cited paper, in Java, it is unlikely to meet all these requirements. Also, something that should be very clear from the paper is that Java's memory model changes frequently. It adapts as better understanding of caching, garbage collection, etc. evolve, as well as adapting to changes in the underlying real processor architecture that the VM runs on.
As a rule of thumb, if you optimize your Java code in a way that depends on the underlying implementation, as opposed to the API, you run the risk of having broken code in the next release of the JVM. (Although, sometimes you will have no choice.)
dsimcha:
If your atomicSet method is real, then I would try sending your question to Doug Lea (along with your atomicSet implementation). I have a feeling he's the kind of guy that would answer. I'm guessing that for Java he will tell you that it's cheaper to always synchronize and to look to optimize somewhere else.

Resources